zephyr-7b-beta-awq Beta
Text Generation • theblokeZephyr 7B Beta AWQ is an efficient, accurate and blazing-fast low-bit weight quantized Zephyr model variant.
Playground
Try out this model with Workers AI LLM Playground. It does not require any setup or authentication and an instant way to preview and test a model directly in the browser.
Launch the LLM PlaygroundUsage
Worker - Streaming
Worker
Python
curl
Parameters
Input
- Prompt object
-
prompt
string min 1 max 131072The input text prompt for the model to generate a response.
-
image
one of-
0
arrayAn array of integers that represent the image data constrained to 8-bit unsigned integer values
-
items
numberA value between 0 and 255
-
-
1
stringBinary string representing the image contents.
-
-
raw
booleanIf true, a chat template is not applied and you must adhere to the specific model's expected formatting.
-
stream
booleanIf true, the response will be streamed back incrementally using SSE, Server Sent Events.
-
max_tokens
integer default 256The maximum number of tokens to generate in the response.
-
temperature
number default 0.6 min 0 max 5Controls the randomness of the output; higher values produce more random results.
-
top_p
number min 0 max 2Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses.
-
top_k
integer min 1 max 50Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises.
-
seed
integer min 1 max 9999999999Random seed for reproducibility of the generation.
-
repetition_penalty
number min 0 max 2Penalty for repeated tokens; higher values discourage repetition.
-
frequency_penalty
number min 0 max 2Decreases the likelihood of the model repeating the same lines verbatim.
-
presence_penalty
number min 0 max 2Increases the likelihood of the model introducing new topics.
-
lora
stringName of the LoRA (Low-Rank Adaptation) model to fine-tune the base model.
-
- Messages object
-
messages
arrayAn array of message objects representing the conversation history.
-
items
object-
role
stringThe role of the message sender (e.g., 'user', 'assistant', 'system', 'tool').
-
content
string max 131072The content of the message as a string.
-
-
-
image
one of-
0
arrayAn array of integers that represent the image data constrained to 8-bit unsigned integer values
-
items
numberA value between 0 and 255
-
-
1
stringBinary string representing the image contents.
-
-
functions
array-
items
object-
name
string -
code
string
-
-
-
tools
arrayA list of tools available for the assistant to use.
-
items
one of-
0
object-
name
stringThe name of the tool. More descriptive the better.
-
description
stringA brief description of what the tool does.
-
parameters
objectSchema defining the parameters accepted by the tool.
-
type
stringThe type of the parameters object (usually 'object').
-
required
arrayList of required parameter names.
-
items
string
-
-
properties
objectDefinitions of each parameter.
-
additionalProperties
object-
type
stringThe data type of the parameter.
-
description
stringA description of the expected parameter.
-
-
-
-
-
1
object-
type
stringSpecifies the type of tool (e.g., 'function').
-
function
objectDetails of the function tool.
-
name
stringThe name of the function.
-
description
stringA brief description of what the function does.
-
parameters
objectSchema defining the parameters accepted by the function.
-
type
stringThe type of the parameters object (usually 'object').
-
required
arrayList of required parameter names.
-
items
string
-
-
properties
objectDefinitions of each parameter.
-
additionalProperties
object-
type
stringThe data type of the parameter.
-
description
stringA description of the expected parameter.
-
-
-
-
-
-
-
-
stream
booleanIf true, the response will be streamed back incrementally.
-
max_tokens
integer default 256The maximum number of tokens to generate in the response.
-
temperature
number default 0.6 min 0 max 5Controls the randomness of the output; higher values produce more random results.
-
top_p
number min 0 max 2Controls the creativity of the AI's responses by adjusting how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses.
-
top_k
integer min 1 max 50Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises.
-
seed
integer min 1 max 9999999999Random seed for reproducibility of the generation.
-
repetition_penalty
number min 0 max 2Penalty for repeated tokens; higher values discourage repetition.
-
frequency_penalty
number min 0 max 2Decreases the likelihood of the model repeating the same lines verbatim.
-
presence_penalty
number min 0 max 2Increases the likelihood of the model introducing new topics.
-
Output
-
0
object-
response
stringThe generated text response from the model
-
tool_calls
arrayAn array of tool calls requests made during the response generation
-
items
object-
arguments
objectThe arguments passed to be passed to the tool call request
-
name
stringThe name of the tool to be called
-
-
-
-
1
string
API Schemas
The following schemas are based on JSON Schema