How To Download and add a File to a Persistent Modal Volume - Part 3
You are reading Part 3 of a multi-part series in which I explain the inner workings of How Tape Index Uses Modal for Its Natural Language Processing Flow. Read other parts 👉 [Part 1] [Part 2]
–
At the end of the previous article, there were two functions that were spawned. The first was a downloader function, the other was a metadata function. In this article I’ll be going over the downloader function and following up with another article on the metadata function.
What Is the Point of a Downloader Function?
The purpose of the downloader function is not about downloading, but about getting the file in a state where it can be shared amongst the other functions within the Tape Index natural language processing flow. Since each step needs to reference the file being processed, it’s much faster and efficient to not have to download it every time it’s required. Here’s a refresher of the NLP flow:
The Basics of a Modal Function
In the previous article I showed the basics of how to expose a WSGI Flask web endpoint with Modal. With this article you’ll be learning how to call a Modal function directly. Similarly to a web endpoint, there’s not a lot required to get a basic function up and running. In fact, since I’m using a class for this example, the basics are slightly more involved, but not much:
We’re stubbing out the class with a basic Debian image and locking the Python version to 3.10.8
The class here is a typical Python class with the main difference being that we are adding the
@method()
decorator to the run function. This adds a bunch of Modal specific attributes to the run command.
The Specifics of the Downloader Function
Create and Mount the Volume
Since the downloader function will also be storing the file for sharing across multiple Modal functions, we will need to create and use a persistent Modal Volume. Doing this is really simple:
Import the Volume class
Create a persisted Volume with a unique name
The unique name will be used elsewhere
Mount the volume to your Modal function with the
/data
folder being the “root” of the volume.
Download the File to the Modal Volume and Make It Persistent
Downloading a file doesn’t deviate what you might do with typical Python code. A couple things you’ll note with the code that has now been added:
I’m using a Modal specific function
imports()
that includes Class specific imports to be used within the class.I added the
__init__
function since it will need to have afile_url
passed to it and I set the directory that we need to be writing to.Within the run function you’ll see basic Python downloading code. The main thing to point out is that the code is downloading to the Modal Volume directory that was mounted in the
@stub
decorator.The final and most important piece of code to point out is the
vol.commit()
function. This ensures that the data is permanently committed to the Modal Volume and can be used elsewhere.
Putting All the Code Together
There’s a bunch more code that is not relevant to this discussion, but you might find interesting. When downloading a file there are a couple different things that happen. Depending on the file type changes how the file is processed before it is downloaded to the Modal Volume.
1. Audio & Video
If the file is either an audio or video file the flow doesn’t care too much about quality and doesn’t care about video. When downloading an audio or video file, the downloader downsamples the file to a 128kbps .mp3 file. This significantly reduces the overall file size and increases the speed of processes downstream in the pipeline.
2. Other Files
Other files are not pre-processed and are downloaded directly to the volume.
3. Spawn the Functions
At the end of the Downloader, there are two paths that can be taken. If the file is an audio or video the Diarization (speaker identification) function will be spawned.
Any file that is not an audio or video file will be passed to the Textractor (extracts text from the file) function.
With either of these functions, the flow will eventually meet back at a Categorization function and call back to Tape Index to close the loop.