Puppeteer
Puppeteer ↗ is one of the most popular libraries that abstract the lower-level DevTools protocol from developers and provides a high-level API that you can use to easily instrument Chrome/Chromium and automate browsing sessions. Puppeteer is used for tasks like creating screenshots, crawling pages, and testing web applications.
Puppeteer typically connects to a local Chrome or Chromium browser using the DevTools port. Refer to the Puppeteer API documentation on the Puppeteer.connect()
method ↗ for more information.
The Workers team forked a version of Puppeteer and patched it to connect to the Workers Browser Rendering API instead. The changes between Workers Puppeteer fork and the Puppeteer core ↗ are minimal. After connecting, the developers can then use the full Puppeteer API ↗ as they would on a standard setup.
Our version is open sourced and can be found in Cloudflare’s fork of Puppeteer ↗. The npm can be installed from npmjs ↗ as @cloudflare/puppeteer ↗:
Once the browser binding is configured and the @cloudflare/puppeteer
library is installed, Puppeteer can be used in a Worker:
This script launches ↗ the env.MYBROWSER
browser, opens a new page ↗, goes to ↗ https://example.com/ ↗, gets the page load metrics ↗, closes ↗ the browser and prints metrics in JSON.
If users omit the browser.close()
statement, it will stay open, ready to be connected to again and re-used but it will, by default, close automatically after 1 minute of inactivity. Users can optionally extend this idle time up to 10 minutes, by using the keep_alive
option, set in milliseconds:
Using the above, the browser will stay open for up to 10 minutes, even if inactive.
In order to facilitate browser session management, we’ve added new methods to puppeteer
:
puppeteer.sessions()
lists the current running sessions. It will return an output similar to this:
Notice that the session 478f4d7d-e943-40f6-a414-837d3736a1dc
has an active worker connection (connectionId=2a2246fa-e234-4dc1-8433-87e6cee80145
), while session 565e05fb-4d2a-402b-869b-5b65b1381db7
is free. While a connection is active, no other workers may connect to that session.
puppeteer.history()
lists recent sessions, both open and closed. It’s useful to get a sense of your current usage.
Session 2be00a21-9fb6-4bb2-9861-8cd48e40e771
was closed explicitly with browser.close()
by the client, while session 478f4d7d-e943-40f6-a414-837d3736a1dc
was closed due to reaching the maximum idle time (check limits).
You should also be able to access this information in the dashboard, albeit with a slight delay.
puppeteer.limits()
lists your active limits:
activeSessions
lists the IDs of the current open sessionsmaxConcurrentSessions
defines how many browsers can be open at the same timeallowedBrowserAcquisitions
specifies if a new browser session can be opened according to the rate limits in placetimeUntilNextAllowedBrowserAcquisition
defines the waiting period before a new browser can be launched.
The full Puppeteer API can be found in the Cloudflare’s fork of Puppeteer ↗.