Changelog
Workers AI Birthday Week 2024 announcements
- Meta Llama 3.2 1B, 3B, and 11B vision is now available on Workers AI
@cf/black-forest-labs/flux-1-schnell
is now available on Workers AI- Workers AI is fast! Powered by new GPUs and optimizations, you can expect faster inference on Llama 3.1, Llama 3.2, and FLUX models.
- No more neurons. Workers AI is moving towards unit-based pricing
- Model pages get a refresh with better documentation on parameters, pricing, and model capabilities
- Closed beta for our Run Any* Model feature, sign up here
- Check out the product announcements blog post for more information
- And the technical blog post if you want to learn about how we made Workers AI fast
Meta Llama 3.1 now available on Workers AI
Workers AI now suppoorts Meta Llama 3.1.
New community-contributed tutorial
- Added community contributed tutorial on how to create APIs to recommend products on e-commerce sites using Workers AI and Stripe.
Introducing embedded function calling
- A new way to do function calling with Embedded function calling
- Published new
@cloudflare/ai-utils
npm package - Open-sourced
ai-utils on Github
Added support for traditional function calling
- Function calling is now supported on enabled models
- Properties added on models page to show which models support function calling
Native support for AI Gateways
Workers AI now natively supports AI Gateway.
Deprecation announcement for `@cf/meta/llama-2-7b-chat-int8`
We will be deprecating @cf/meta/llama-2-7b-chat-int8
on 2024-06-30.
Replace the model ID in your code with a new model of your choice:
@cf/meta/llama-3-8b-instruct
is the newest model in the Llama family (and is currently free for a limited time on Workers AI).@cf/meta/llama-3-8b-instruct-awq
is the new Llama 3 in a similar precision to your currently selected model. This model is also currently free for a limited time.
If you do not switch to a different model by June 30th, we will automatically start returning inference from @cf/meta/llama-3-8b-instruct-awq
.
Add new public LoRAs and note on LoRA routing
- Added documentation on new public LoRAs.
- Noted that you can now run LoRA inference with the base model rather than explicitly calling the
-lora
version
Add OpenAI compatible API endpoints
Added OpenAI compatible API endpoints for /v1/chat/completions
and /v1/embeddings
. For more details, refer to Configurations.
Add AI native binding
- Added new AI native binding, you can now run models with
const resp = await env.AI.run(modelName, inputs)
- Deprecated
@cloudflare/ai
npm package. While existing solutions using the @cloudflare/ai package will continue to work, no new Workers AI features will be supported. Moving to native AI bindings is highly recommended