Hey everyone, welcome back to the channel.
In this video, we will explore different techniques to handle file uploads in fastapi,
discuss how do each technique works behind the scenes. And we will also cover the best
practices to handle files of different sizes. So the most common method to handle file
uploads is using the UploadFile class, that can be imported from fastapi. In order to read the
content of the uploaded file, we can call the read method which is actually a coroutine so we have to
await it. Now, I can go on and process the content of the uploaded file, to keep things simple,
I am just printing it. I will save the file, and test it. So, I will go to my browser, upload
a file, I am uploading a text file. Click on the execute button. Now, if I go to my terminal,
you can see the content of the file is printed. Since this flow is very common, that is all we do
is call the read method of the uploadfile class, fastapi provide this thing out of the box, all
you have to do is declare the uploaded_file class as bytes and specify the File as the source of
the uploaded_file. If I run the endpoint again, and go to the terminal, I can see that the content
is printed. So, if your usecase allows to utilize this out of the box functionality, you can
go ahead and use it. And remember behind the scenes all that fastapi is doing here is
call the read method of the UploadFile class, as I did in the previous example, here. Don't
use this for large files, its ok to use it for file sizes of upto few megabytes, but beyond that
avoid this. The reason being, fastapi is loading complete file into our memory, so, if we do this
for say file of size 2GB when our server has a memory of only 1GB, then we might get an error.
Anyway lets figure out how the UploadFile works in our previous example. The upload file
class has a file attribute, lets print it. I will comment out my previous example. So, it uses
SpooledTemporaryFile. The way SpooledTemporaryFile works is, it has a parameter called max_size,
it keeps the file in memory as long as its size is less than the max_size parameter and if the
file content size increases beyond that max_size, it stores the file on disk in a temporary file.
To check whether a file is kept in memory, i can print the uploaded_file._in_memory attribute. If
I upload a file of small size, in my case 1KB, you can see it printed true which means that the
file is kept in memory. Now, if I upload a file of more than 1Mb, in my case i am uploading a
file of size 1050KB which is more than 1MB. So, you can see it printed False, which means that the
file is being kept on disk in a temporary file. Overall, what you have to remember is, by default
the files of size more than 1 MB are kept on disk. And if you want you can change that
limit. To do that, you have to import from startlette.formparsers import
MultiPartParser, if I go inside it, you can see it has max_file_size
class attribute which is set to 1MB, and you can see it is passing this class
attribute to SpooledTemporaryFile max_size. So, to change the limit, you can change the value
of this class attribute. I am changing it to 2MB. Now, if I run the endpoint again,
you can see that it printed true this, even though my file size was more than 1MB.
I hope you now understand how UploadFile works, and remember that, the read method for in
memory file works faster that of the on disk files. So, you can play around with
this number for your particular usecases. Earlier I told you about how to read data from
upload file, that is using the uploaded_file.read method. Since this loads all the content of the
file in memory, this method is not efficient for very large files. If processing your data in
chunks makes sense, than you can read the file in chunks, all you have to do is pass the chunk size
in the read method. If I upload text file, you can see that it only read the first two characters
of the file, since i had passed chunk size of 2. I can also loop over the content of the
file in chunks, this way I can read all the content of the file, without exceeding
the memory usage by more than the chunk size. You can set the chunk size depending on your
use case and the available memory resources. Anyway lets discuss the final way to handle
the file upload, which can increase the speed of your api by a mile, especially for huge
files. If you understand how uploadFile works, then you know, fastapi is writing data on disk
which being an IO operation is slow in nature, and then in your endpoint you are reading data
from disk, which again is an IO operation and is slow. And then while processing the read data,
you may write data somewhere safe like in database or on a permanent file on disk. Overall, three
IO operation are happening here. And believe me, you can avoid the first two IO operations, by not
using the UploadFile and directly using the stream of form data. To do that, accept the request
object in your endpoint, it provides stream method that you can use to directly access the
stream of data from the client. As you can see, this method gets you complete form data, including
form headers. So, processing is a little involved in this, and there are several edge cases as well,
for example a disconnect from the client, I will give some link in description on how you can use
it effectively. I have covered this in this video, just to let you know, that such thing exists, so
you can use it when you think your endpoint is slow, and you are looking for ways to speed it up.
Lets end this video here, if you like such content you can subscribe to my channel.
Finally, Thank you for watching.