Handling concurrent requests with Python on Azure App Service Linux using Gunicorn and Flask

Abhilash_Konnur · ‎Aug 30 2023

We will learn how to configure Gunicorn to efficiently handle multiple concurrent requests for a Flask App on Azure App Service Linux. Below are few experiments with different configurations of Gunicorn.

Create Python Flask App that sleeps for two seconds and responds with message 'Finished'. This is to effectively mimic application making an outbound HTTP call or database call.

from flask import Flask
import time

app = Flask(__name__)

@app.route('/')
def hello_world():
    time.sleep(2)
    return "Finished"

if __name__ == '__main__':
    app.run()

Create script with 5 threads which will make 5 simultaneous HTTP calls to this Flask API.

from concurrent.futures import thread
import requests
from datetime import datetime
import threading

def main():
    threads = []
    for x in range(5):
        threads.append(threading.Thread(target=thread_execution))

    for thread in threads:
        thread.start()

def thread_execution():
    print("Start Time:", datetime.utcnow().strftime("%H:%M:%S") ,threading.current_thread().name)
    resp = requests.get("https://<abc>.azurewebsites.net/")
    print(resp.text, datetime.utcnow().strftime("%H:%M:%S") ,threading.current_thread().name)

if __name__ == "__main__":
    main()

TEST 1:

By default Gunicorn on App Serivce uses single sync worker that handles one request at a time. We are not using any custom startup script in this test. Below are the results after running the above script.

Default startup script:

gunicorn --bind 0.0.0.0 --timeout 600 --access-logfile '-' --error-logfile '-' app:app

Result: We can see that requests get queued and get processed one after another.

TEST 2:
We will use one Gunicorn worker with 3 threads. When we use the sync worker type and set the threads setting to more than 1, the gthread worker type will be used.

How to create startup script for Flask?

How to configure custom startup script?

Custom startup script:

gunicorn --bind 0.0.0.0 --threads 3 --timeout 600 --access-logfile '-' --error-logfile '-' app:app

Result: We can see that 3 requests are executed simultaneously because of thread count 3.

TEST 3:

We will use 3 Gunicorn workers.

Custom startup script:

gunicorn --bind 0.0.0.0 --workers 3 --timeout 600 --access-logfile '-' --error-logfile '-' app:app

Result: We see that 3 requests are processed simultaneously because of the worker count 3. Recommended worker count for Gunicorn is (2 x $num_cores) + 1

TEST 4:

We will use 3 workers and 3 threads. This means that each of the three workers will have 3 threads. We will also increase the number of requests to 20 so that we can visualize the impact of this configuration.

Custom startup script:

gunicorn --bind 0.0.0.0 --workers 3 --threads 3 --timeout 600 --access-logfile '-' --error-logfile '-' app:app

Result: We can see that app is able to process 9 requests simultaneously - which is workers * threads.

Conclusion:

We could see that multiple requests at same time could be handled by increasing threads, workers or combination of both. Now we need to understand what is the difference between them.

Before that we need to understand few important concepts. CPU can do only one thing at a time, it might seem that CPU can run many applications at same time but in reality it runs other programs only when the existing program is paused or waiting on something.

Concurrency vs. Parallelism

Concurrency is when 2 or more tasks are being performed at the same time, which might mean that only 1 of them is being worked on while the other ones are paused.
Parallelism is when 2 or more tasks are executing at the same time. Parallelism cannot be achieved on machine with single CPU core.

In Python, threads are a means of concurrency, but not parallelism - while workers are means of both concurrency and parallelism.

If the application is CPU intensive, only way to achieve parallelism is to increase workers to the suggested
(2*No of CPU cores)+1.
If the application is I/O bound, combination of workers with threads will yield better result.
If we are concerned about memory overhead on the server then using threads in favour of workers will be better because the application is loaded once per worker and every thread running on the worker shares memory.
Ex: If one worker in idle state uses 100 MB then 2 workers will use 200 MB. However, threads do not add memory overhead of this magnitude.

Products (50)

Special Topics (27)

Video Hub (462)

Most Active Hubs

Most Active Hubs

Video Hub

Handling concurrent requests with Python on Azure App Service Linux using Gunicorn and Flask