We will learn how to configure Gunicorn to efficiently handle multiple concurrent requests for a Flask App on Azure App Service Linux. Below are few experiments with different configurations of Gunicorn.
from flask import Flask
import time
app = Flask(__name__)
@app.route('/')
def hello_world():
time.sleep(2)
return "Finished"
if __name__ == '__main__':
app.run()
from concurrent.futures import thread
import requests
from datetime import datetime
import threading
def main():
threads = []
for x in range(5):
threads.append(threading.Thread(target=thread_execution))
for thread in threads:
thread.start()
def thread_execution():
print("Start Time:", datetime.utcnow().strftime("%H:%M:%S") ,threading.current_thread().name)
resp = requests.get("https://<abc>.azurewebsites.net/")
print(resp.text, datetime.utcnow().strftime("%H:%M:%S") ,threading.current_thread().name)
if __name__ == "__main__":
main()
TEST 1:
By default Gunicorn on App Serivce uses single sync worker that handles one request at a time. We are not using any custom startup script in this test. Below are the results after running the above script.
Default startup script:
gunicorn --bind 0.0.0.0 --timeout 600 --access-logfile '-' --error-logfile '-' app:app
Result: We can see that requests get queued and get processed one after another.
TEST 2:
We will use one Gunicorn worker with 3 threads. When we use the sync worker type and set the threads setting to more than 1, the gthread worker type will be used.
How to create startup script for Flask?
How to configure custom startup script?
Custom startup script:
gunicorn --bind 0.0.0.0 --threads 3 --timeout 600 --access-logfile '-' --error-logfile '-' app:app
Result: We can see that 3 requests are executed simultaneously because of thread count 3.
TEST 3:
We will use 3 Gunicorn workers.
Custom startup script:
gunicorn --bind 0.0.0.0 --workers 3 --timeout 600 --access-logfile '-' --error-logfile '-' app:app
Result: We see that 3 requests are processed simultaneously because of the worker count 3. Recommended worker count for Gunicorn is (2 x $num_cores) + 1
TEST 4:
We will use 3 workers and 3 threads. This means that each of the three workers will have 3 threads. We will also increase the number of requests to 20 so that we can visualize the impact of this configuration.
Custom startup script:
gunicorn --bind 0.0.0.0 --workers 3 --threads 3 --timeout 600 --access-logfile '-' --error-logfile '-' app:app
Result: We can see that app is able to process 9 requests simultaneously - which is workers * threads.
Conclusion:
We could see that multiple requests at same time could be handled by increasing threads, workers or combination of both. Now we need to understand what is the difference between them.
Before that we need to understand few important concepts. CPU can do only one thing at a time, it might seem that CPU can run many applications at same time but in reality it runs other programs only when the existing program is paused or waiting on something.
Concurrency vs. Parallelism
In Python, threads are a means of concurrency, but not parallelism - while workers are means of both concurrency and parallelism.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.