Models

deepseek-coder-6.7b-instruct-awqBeta

Deepseek Coder is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese.

deepseek-r1-distill-qwen-32b

Deepseek Coder is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese.

Text Generation • deepseek-ai

DeepSeek-R1-Distill-Qwen-32B is a model distilled from DeepSeek-R1 based on Qwen2.5. It outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.

detr-resnet-50Beta

Object Detection • facebook

DEtection TRansformer (DETR) model trained end-to-end on COCO 2017 object detection (118k annotated images).

discolm-german-7b-v1-awqBeta

Text Classification • HuggingFace

DiscoLM German 7b is a Mistral-based large language model with a focus on German-language applications. AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization.

distilbert-sst-2-int8

Distilled BERT model that was finetuned on SST-2 for sentiment classification

dreamshaper-8-lcmBeta

Text-to-Image • lykon

Stable Diffusion model that has been fine-tuned to be better at photorealism without sacrificing range.

flux-1-schnell

Text-to-Image • black-forest-labs

FLUX.1 [schnell] is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions.

falcon-7b-instructBeta

Text Generation • tiiuae

Falcon-7B-Instruct is a 7B parameters causal decoder-only model built by TII based on Falcon-7B and finetuned on a mixture of chat/instruct datasets.

gemma-2b-it-loraBeta

Text Generation • Google

This is a Gemma-2B base model that Cloudflare dedicates for inference with LoRA adapters. Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.

LoRA

gemma-7b-it-loraBeta

Text Generation • Google

This is a Gemma-7B base model that Cloudflare dedicates for inference with LoRA adapters. Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.

LoRA

gemma-7b-itBeta

Text Generation • Google

Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights, pre-trained variants, and instruction-tuned variants.

LoRA

hermes-2-pro-mistral-7bBeta

Text Generation • nousresearch

Hermes 2 Pro on Mistral 7B is the new flagship 7B Hermes! Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced Function Calling and JSON Mode dataset developed in-house.

Function calling

llama-2-13b-chat-awqBeta

Llama 2 13B Chat AWQ is an efficient, accurate and blazing-fast low-bit weight quantized Llama 2 variant.

llama-2-7b-chat-fp16

llama-2-7b-chat-hf-loraBeta

Full precision (fp16) generative text model with 7 billion parameters from Meta

Text Generation • meta-llama

This is a Llama2 base model that Cloudflare dedicated for inference with LoRA adapters. Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. This is the repository for the 7B fine-tuned model, optimized for dialogue use cases and converted for the Hugging Face Transformers format.

LoRA

llama-2-7b-chat-int8

Quantized (int8) generative text model with 7 billion parameters from Meta

llama-3-8b-instruct-awq

Quantized (int4) generative text model with 8 billion parameters from Meta.

llama-3-8b-instruct

Generation over generation, Meta Llama 3 demonstrates state-of-the-art performance on a wide range of industry benchmarks and offers new capabilities, including improved reasoning.

llama-3.1-70b-instruct

llama-3.1-8b-instruct-awq

The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models. The Llama 3.1 instruction tuned text only models are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.

llama-3.1-8b-instruct-fast

Quantized (int4) generative text model with 8 billion parameters from Meta.

llama-3.1-8b-instruct-fp8

[Fast version] The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models. The Llama 3.1 instruction tuned text only models are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.

Llama 3.1 8B quantized to FP8 precision

llama-3.1-8b-instruct

llama-3.2-11b-vision-instruct

The Llama 3.2-Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image.

llama-3.2-1b-instruct

The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks.

llama-3.2-3b-instruct

llama-3.3-70b-instruct-fp8-fast

The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks.

Llama 3.3 70B quantized to fp8 precision, optimized to be faster.

llamaguard-7b-awqBeta

Text Generation • meta-llama

Llama Guard is a model for classifying the safety of LLM prompts and responses, using a taxonomy of safety risks.

m2m100-1.2b

Translation • Meta

Multilingual encoder-decoder (seq-to-seq) model trained for Many-to-Many multilingual translation

meta-llama-3-8b-instruct

Generation over generation, Meta Llama 3 demonstrates state-of-the-art performance on a wide range of industry benchmarks and offers new capabilities, including improved reasoning.

llava-1.5-7b-hfBeta

Image-to-Text • llava-hf

LLaVA is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture.

mistral-7b-instruct-v0.1-awqBeta

Text Generation • MistralAI

Mistral 7B Instruct v0.1 AWQ is an efficient, accurate and blazing-fast low-bit weight quantized Mistral variant.

mistral-7b-instruct-v0.1

Instruct fine-tuned version of the Mistral-7b generative text model with 7 billion parameters

LoRA

mistral-7b-instruct-v0.2-loraBeta

Text Generation • MistralAI

The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.2.

LoRA

mistral-7b-instruct-v0.2Beta

Text Generation • MistralAI

The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.2. Mistral-7B-v0.2 has the following changes compared to Mistral-7B-v0.1: 32k context window (vs 8k context in v0.1), rope-theta = 1e6, and no Sliding-Window Attention.

LoRA

neural-chat-7b-v3-1-awqBeta

Text Generation • openchat

This model is a fine-tuned 7B parameter LLM on the Intel Gaudi 2 processor from the mistralai/Mistral-7B-v0.1 on the open source dataset Open-Orca/SlimOrca.

openchat-3.5-0106Beta

OpenChat is an innovative library of open-source language models, fine-tuned with C-RLFT - a strategy inspired by offline reinforcement learning.

openhermes-2.5-mistral-7b-awqBeta

Text Generation • Microsoft

OpenHermes 2.5 Mistral 7B is a state of the art Mistral Fine-tune, a continuation of OpenHermes 2 model, which trained on additional code datasets.

phi-2Beta

Phi-2 is a Transformer-based model with a next-word prediction objective, trained on 1.4T tokens from multiple passes on a mixture of Synthetic and Web datasets for NLP and coding.

qwen1.5-0.5b-chatBeta

Qwen1.5 is the improved version of Qwen, the large language model series developed by Alibaba Cloud.

qwen1.5-1.8b-chatBeta

Qwen1.5 is the improved version of Qwen, the large language model series developed by Alibaba Cloud.

qwen1.5-14b-chat-awqBeta

Qwen1.5 is the improved version of Qwen, the large language model series developed by Alibaba Cloud. AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization.

qwen1.5-7b-chat-awqBeta

Image Classification • Microsoft

resnet-50

50 layers deep image classification CNN trained on more than 1M images from ImageNet

sqlcoder-7b-2Beta

Text Generation • defog

This model is intended to be used by non-technical users to understand data inside their SQL databases.

stable-diffusion-v1-5-img2imgBeta

Text-to-Image • runwayml

Stable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images. Img2img generate a new image from an input image with Stable Diffusion.

stable-diffusion-v1-5-inpaintingBeta

Text-to-Image • runwayml

Stable Diffusion Inpainting is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input, with the extra capability of inpainting the pictures by using a mask.

stable-diffusion-xl-base-1.0Beta

Text-to-Image • Stability.ai

Diffusion-based text-to-image generative model by Stability AI. Generates and modify images based on text prompts.

stable-diffusion-xl-lightningBeta

Text-to-Image • bytedance

SDXL-Lightning is a lightning-fast text-to-image generation model. It can generate high-quality 1024px images in a few steps.

starling-lm-7b-betaBeta

Text Generation • nexusflow

We introduce Starling-LM-7B-beta, an open large language model (LLM) trained by Reinforcement Learning from AI Feedback (RLAIF). Starling-LM-7B-beta is trained from Openchat-3.5-0106 with our new reward model Nexusflow/Starling-RM-34B and policy optimization method Fine-Tuning Language Models from Human Preferences (PPO).

tinyllama-1.1b-chat-v1.0Beta

Text Generation • tinyllama

The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens. This is the chat model finetuned on top of TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T.

uform-gen2-qwen-500mBeta

Image-to-Text • unum

UForm-Gen is a small generative vision-language model primarily designed for Image Captioning and Visual Question Answering. The model was pre-trained on the internal image captioning dataset and fine-tuned on public instructions datasets: SVIT, LVIS, VQAs datasets.

una-cybertron-7b-v2-bf16Beta

Text Generation • fblgit

Cybertron 7B v2 is a 7B MistralAI based model, best on it's series. It was trained with SFT, DPO and UNA (Unified Neural Alignment) on multiple datasets.

whisper-large-v3-turboBeta

Automatic Speech Recognition • OpenAI

Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation.

whisper-tiny-enBeta

Automatic Speech Recognition • OpenAI

Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours of labelled data, Whisper models demonstrate a strong ability to generalize to many datasets and domains without the need for fine-tuning. This is the English-only version of the Whisper Tiny model which was trained on the task of speech recognition.

whisper

Automatic Speech Recognition • OpenAI

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.

zephyr-7b-beta-awqBeta