Access to a BigQuery table with some test data that allows you to create a BigQuery Job Query ↗. For this tutorial it is recommended you that you create your own table as sampled tables ↗, unless cloned to your own GCP namespace, won’t allow you to run job queries against them. For this example, the Hacker News Corpus ↗ was used under its MIT licence.
1. Set up your Cloudflare Worker
To ingest the data into Cloudflare and feed it into Workers AI, you will be using a Cloudflare Worker. If you have not created one yet, please feel free to review our tutorial on how to get started.
After following the steps to create a Worker, you should have the following code in your new Worker project:
If the Worker project has successfully been created, you should also be able to run npx wrangler dev in a console to run the Worker locally:
Open a browser tab at http://localhost:8787/ to see your deployed Worker. Please note that the port 8787 may be a different one in your case.
You should be seeing Hello World! in your browser:
If you are running into any issues during this step, please make sure to review Worker’s Get Started Guide.
2. Import GCP Service key into the Worker as Secrets
Now that you have verified that the Worker has been created successfully, you will need to reference the Google Cloud Platform service key created in the Prerequisites section of this tutorial.
Your downloaded key JSON file from Google Cloud Platform should have the following format:
For this tutorial, you will only be needing the values of the following fields: client_email, private_key, private_key_id, and project_id.
Instead of storing this information in plain text in the Worker, you will use secrets to make sure its unencrypted content is only accessible via the Worker itself.
Import those three values from the JSON file into Secrets, starting with the field from the JSON key file called client_email, which we will now call BQ_CLIENT_EMAIL (you can use another variable name):
You will be asked to enter a secret value, which will be the value of the field client_email in the JSON key file.
If the secret was uploaded successfully, the following message will be displayed:
Now import the secrets for the three remaining fields; private_key, private_key_id, and project_id as BQ_PRIVATE_KEY, BQ_PRIVATE_KEY_ID, and BQ_PROJECT_ID respectively:
At this point, you have successfully imported three fields from the JSON key file downloaded from Google Cloud Platform into Cloudflare secrets to be used in a Worker.
Secrets are only made available to Workers once they are deployed. To make them available during development, create a .dev.vars file to locally store these credentials and reference them as environment variables.
Your dev.vars file should look like the following:
Make sure to include .dev.vars to your .gitignore file in your project to prevent getting your credentials uploaded to a repository if you are using a version control system.
Check that secrets are loaded correctly in src/index.js by logging their values into a console output:
Restart the Worker and run npx wrangler dev. You should see that the server now mentions the newly added variables:
If you open http://localhost:8787 in your browser, you should see the values of the variables show up in your console where the npx wrangler dev command is running, while still seeing only the Hello World! text in the browser window.
You now have access to the GCP credentials from a Worker. Next, you will install a library to help with the creation of the JSON Web Token needed to interact with GCP’s API.
3. Install library to handle JWT operations
To interact with BigQuery’s REST API, you will need to generate a JSON Web Token ↗ to authenticate your requests using the credentials that you have loaded into Worker secrets in the previous step.
For this tutorial, you will be using the jose ↗ library for JWT-related operations. Install it by running the following command in a console:
To verify that the installation succeeded, you can run npm list, which lists all the installed packages and see if the jose dependency has been added:
4. Generate JSON Web Token
Now that you have installed the jose library, it is time to import it and add a function to your code that generates a signed JWT:
Now that you have created a JWT, it is time to do an API call to BigQuery to fetch some data.
5. Make authenticated requests to Google BigQuery
With the JWT token created in the previous step, issue an API request to BigQuery’s API to retrieve data from a table.
You will now query the table that you already have created in BigQuery as part of the prerequisites of this tutorial. This example uses a sampled version of the Hacker News Corpus ↗ that was used under its MIT licence and uploaded to BigQuery.
Having the raw row data from BigQuery means that you can now format it in a JSON-like style up next.
6. Format results from the query
Now that you have retrieved the data from BigQuery, it is time to note that a BigQuery API response looks something like this:
This format may be difficult to read and to work with when iterating through results, which will go on to do later in this tutorial. So you will now implement a function that maps the schema into each individual value, and the resulting output will be easier to read, as shown below. Each row corresponds to an object within an array.
Create a formatRows function that takes a number of rows and fields returned from the BigQuery response body and returns an array of results as objects with named fields.
7. Feed data into Workers AI
Now that you have converted the response from the BigQuery API into an array of results, generate some tags and attach an associated sentiment score using an LLM via Workers AI:
Uncomment the following lines from the wrangler.toml file in your project:
Restart the Worker that is running locally, and after doing so, go to your application endpoint:
It is likely that you will be asked to log in to your Cloudflare account and grant temporary access to Wrangler (the Cloudflare CLI) to use your account when using Worker AI.
Once you access http://localhost:8787 you should see an output similar to the following:
The actual values and fields will mostly depend on the query made in Step 5 that are then fed into the LLMs models.
Final result
All the code shown in the different steps are combined into the following code in src/index.js:
If you wish to deploy this Worker, you can do so by running npx wrangler deploy:
This will create a public endpoint that you can use to access the Worker globally. Please keep this in mind when using production data, and make sure to include additional access controls in place.
Conclusion
In this tutorial, you have learnt how to integrate Google BigQuery and Cloudflare Workers by creating a GCP service account key and storing part of it as Worker secrets. This was later imported in the code, and by using the jose npm library, you created a JSON Web Token to authenticate the API query to BigQuery.
Once you obtained the results, you formatted them to later be passed to generative AI models via Workers AI to generate tags and to perform sentiment analysis on the extracted data.
Next Steps
If, instead of displaying the results of ingesting the data to the AI model in a browser, your workflow requires fetching and store data (for example in R2 or D1) on regular intervals, you may want to consider adding a scheduled handler for this Worker. It allows triggering the Worker with a predefined cadence via a Cron Trigger. Consider reviewing the Reference Architecture Diagrams on Ingesting BigQuery Data into Workers AI.
A use case to ingest data from other sources, like you did in this tutorial, is to create a RAG system. If this sounds relevant to you, please check out the tutorial Build a Retrieval Augmented Generation (RAG) AI.
To learn more about what other AI models you can use at Cloudflare, please visit the Workers AI section of our docs.