Use event notification to summarize PDF files on upload
In this tutorial, you will learn how to use event notifications to process a PDF file when it is uploaded to an R2 bucket. You will use Workers AI to summarize the PDF and store the summary as a text file in the same bucket.
To continue, you will need:
- A Cloudflare account ↗ with access to R2.
- Have an existing R2 bucket. Refer to Get started tutorial for R2.
- Install
Node.js
↗.
Node.js version manager
Use a Node version manager like Volta ↗ or
nvm ↗ to avoid permission issues and change
Node.js versions. Wrangler, discussed
later in this guide, requires a Node version of 16.17.0
or later.
You will create a new Worker project that will use Static Assets to serve the front-end of your application. A user can upload a PDF file using this front-end, which will then be processed by your Worker.
Create a new Worker project by running the following commands:
For setup, select the following options:
- For What would you like to start with?, choose
Hello World example
. - For Which template would you like to use?, choose
Hello World Worker
. - For Which language do you want to use?, choose
TypeScript
. - For Do you want to use git for version control?, choose
Yes
. - For Do you want to deploy your application?, choose
No
(we will be making some changes before deploying).
Navigate to the pdf-summarizer
directory:
Using Static Assets, you can serve the front-end of your application from your Worker. To use Static Assets, you need to add the required bindings to your wrangler.toml
file.
Next, create a public
directory and add an index.html
file. The index.html
file should contain the following HTML code:
Select to view the HTML code
To view the front-end of your application, run the following command and navigate to the URL displayed in the terminal:
When you open the URL in your browser, you will see that there is a file upload form. If you try uploading a file, you will notice that the file is not uploaded to the server. This is because the front-end is not connected to the back-end. In the next step, you will update your Worker that will handle the file upload.
To handle the file upload, you will first need to add the R2 binding. In the wrangler.toml
file, add the following code:
Replace <R2_BUCKET_NAME>
with the name of your R2 bucket.
Next, update the src/index.ts
file. The src/index.ts
file should contain the following code:
The above code does the following:
- Check if the request is a POST request to the
/api/upload
endpoint. If it is, it gets the file from the request and uploads it to Cloudflare R2 using the Workers API. - If the request is not a POST request to the
/api/upload
endpoint, it returns a 404 response.
Since the Worker code is written in TypeScript, you should run the following command to add the necessary type definitions. While this is not required, it will help you avoid errors.
You can restart the developer server to test the changes:
Event notifications capture changes to data in your R2 bucket. You will need to create a new queue pdf-summarize
to receive notifications:
Add the binding to the wrangler.toml
file:
Now that you have a queue to receive event notifications, you need to update the Worker to handle the event notifications. You will need to add a Queue handler that will extract the textual content from the PDF, use Workers AI to summarize the content, and then save it in the R2 bucket.
Update the src/index.ts
file to add the Queue handler:
The above code does the following:
- The
queue
handler is called when a new message is added to the queue. It loops through the messages in the batch and logs the name of the file.
For now the queue
handler is not doing anything. In the next steps, you will update the queue
handler to extract the textual content from the PDF, use Workers AI to summarize the content, and then add it to the bucket.
To extract the textual content from the PDF, the Worker will use the unpdf ↗ library. The unpdf
library provides utilities to work with PDF files.
Install the unpdf
library by running the following command:
Update the src/index.ts
file to import the required modules from the unpdf
library:
Next, update the queue
handler to extract the textual content from the PDF:
The above code does the following:
- The
queue
handler gets the file from the R2 bucket. - The
queue
handler extracts the textual content from the PDF using theunpdf
library. - The
queue
handler logs the textual content.
To use Workers AI, you will need to add the Workers AI binding to the wrangler.toml
file. The wrangler.toml
file should contain the following code:
Execute the following command to add the AI type definition:
Update the src/index.ts
file to use Workers AI to summarize the content:
The queue
handler now uses Workers AI to summarize the content.
Now that you have the summary, you need to add it to the R2 bucket. Update the src/index.ts
file to add the summary to the R2 bucket:
The queue handler now adds the summary to the R2 bucket as a text file.
Your queue
handler is ready to handle incoming event notification messages. You need to enable event notifications with the wrangler r2 bucket notification create
command for your bucket. The following command creates an event notification for the object-create
event type for the pdf
suffix:
Replace <R2_BUCKET_NAME>
with the name of your R2 bucket.
An event notification is created for the pdf
suffix. When a new file with the pdf
suffix is uploaded to the R2 bucket, the pdf-summarizer
queue is triggered.
To deploy your Worker, run the wrangler deploy
command:
In the output of the wrangler deploy
command, copy the URL. This is the URL of your deployed application.
To test the application, navigate to the URL of your deployed application and upload a PDF file. Alternatively, you can use the Cloudflare dashboard ↗ to upload a PDF file.
To view the logs, you can use the wrangler tail
command.
You will see the logs in your terminal. You can also navigate to the Cloudflare dashboard and view the logs in the Workers Logs section.
If you check your R2 bucket, you will see the summary file.
In this tutorial, you learned how to use R2 event notifications to process an object on upload. You created an application to upload a PDF file, and created a consumer Worker that creates a summary of the PDF file. You also learned how to use Workers AI to summarize the content of the PDF file, and upload the summary to the R2 bucket.
You can use the same approach to process other types of files, such as images, videos, and audio files. You can also use the same approach to process other types of events, such as object deletion, and object update.
If you want to view the code for this tutorial, you can find it on GitHub ↗.