Chat and reasoning models

Developer documentation

Chat and reasoning models

Language, code, reasoning, multimodal chat, tool calling, and streaming responses.

Model Reference

Chat and reasoning models

Language, code, reasoning, multimodal chat, tool calling, and streaming responses. Endpoint: https://www.omixa.cloud/api/v1/chat/completions

Antigravity Agent Preview

antigravity-agent-preview

Antigravity Agent Preview for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Streaming supported Tool/function calling supported
input per 1m tokens $1.250000
output per 1m tokens $10.000000
minimum hold $0.010000
Integration docs

AQA

aqa

AQA for text generation, reasoning, tool calling, and live streaming responses.

Chat Context window: 7,168 tokens Max output: 1,024 tokens
minimum hold $0.010000
Integration docs

Claude Haiku 4.5

claude-haiku-4-5

Claude Haiku 4.5 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 200,000 tokens Max output: 64,000 tokens
input per 1m tokens $1.000000
cached input per 1m tokens $0.100000
output per 1m tokens $5.000000
Integration docs

Claude Opus 4.1

claude-opus-4-1

Claude Opus 4.1 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 200,000 tokens Max output: 32,000 tokens
input per 1m tokens $15.000000
cached input per 1m tokens $1.500000
output per 1m tokens $75.000000
Integration docs

Claude Opus 4.5

claude-opus-4-5

Claude Opus 4.5 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 200,000 tokens Max output: 64,000 tokens
input per 1m tokens $5.000000
cached input per 1m tokens $0.500000
output per 1m tokens $25.000000
Integration docs

Claude Opus 4.6

claude-opus-4-6

Claude Opus 4.6 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,000,000 tokens Max output: 128,000 tokens
input per 1m tokens $5.000000
cached input per 1m tokens $0.500000
output per 1m tokens $25.000000
Integration docs

Claude Opus 4.7

claude-opus-4-7

Claude Opus 4.7 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,000,000 tokens Max output: 128,000 tokens
input per 1m tokens $5.000000
cached input per 1m tokens $0.500000
output per 1m tokens $25.000000
Integration docs

Claude Opus 4.8

claude-opus-4-8

Claude Opus 4.8 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,000,000 tokens Max output: 128,000 tokens
input per 1m tokens $5.000000
cached input per 1m tokens $0.500000
output per 1m tokens $25.000000
Integration docs

Claude Sonnet 4.5

claude-sonnet-4-5

Claude Sonnet 4.5 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 200,000 tokens Max output: 64,000 tokens
input per 1m tokens $3.000000
cached input per 1m tokens $0.300000
output per 1m tokens $15.000000
Integration docs

Claude Sonnet 4.6

claude-sonnet-4-6

Claude Sonnet 4.6 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,000,000 tokens Max output: 128,000 tokens
input per 1m tokens $3.000000
cached input per 1m tokens $0.300000
output per 1m tokens $15.000000
Integration docs

Computer Use Preview

computer-use-preview

Computer Use Preview for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Streaming supported Tool/function calling supported
input per 1m tokens $1.250000
output per 1m tokens $10.000000
minimum hold $0.010000
Integration docs

DeepSeek OCR

DeepSeek-OCR

DeepSeek OCR through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Context window: 32,768 tokens Max output: 8,192 tokens
input per 1m tokens $0.560000
output per 1m tokens $1.680000
minimum hold $0.010000
Integration docs

DeepSeek R1 0528

DeepSeek-R1-0528

DeepSeek R1 0528 through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 163,840 tokens Max output: 32,768 tokens
input per 1m tokens $1.350000
output per 1m tokens $5.400000
minimum hold $0.010000
Integration docs

DeepSeek V3.1

DeepSeek-V3.1

DeepSeek V3.1 through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 131,072 tokens Max output: 32,768 tokens
input per 1m tokens $1.230000
output per 1m tokens $4.940000
minimum hold $0.010000
Integration docs

DeepSeek V3.2

DeepSeek-V3.2

DeepSeek V3.2 through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 163,840 tokens Max output: 65,536 tokens
input per 1m tokens $0.560000
cached input per 1m tokens $0.056000
output per 1m tokens $1.680000
Integration docs

Gemini 2.0 Flash

gemini-2.0-flash

Gemini 2.0 Flash for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,048,576 tokens Max output: 8,192 tokens
input per 1m tokens $0.150000
output per 1m tokens $0.600000
minimum hold $0.010000
Integration docs

Gemini 2.0 Flash-Lite

gemini-2.0-flash-lite

Gemini 2.0 Flash-Lite for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,048,576 tokens Max output: 8,192 tokens
input per 1m tokens $0.075000
output per 1m tokens $0.300000
minimum hold $0.010000
Integration docs

Gemini 2.5 Flash

gemini-2.5-flash

Gemini 2.5 Flash for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,048,576 tokens Max output: 65,536 tokens
input per 1m tokens $0.300000
cached input per 1m tokens $0.030000
output per 1m tokens $2.500000
Integration docs

Gemini 2.5 Flash-Lite

gemini-2.5-flash-lite

Gemini 2.5 Flash-Lite for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,048,576 tokens Max output: 65,536 tokens
input per 1m tokens $0.100000
cached input per 1m tokens $0.010000
output per 1m tokens $0.400000
Integration docs

Gemini 2.5 Pro

gemini-2.5-pro

Gemini 2.5 Pro for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,048,576 tokens Max output: 65,536 tokens
input per 1m tokens $1.250000
cached input per 1m tokens $0.125000
output per 1m tokens $10.000000
Integration docs

Gemini 3 Flash Preview

gemini-3-flash-preview

Gemini 3 Flash Preview for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,048,576 tokens Max output: 65,536 tokens
input per 1m tokens $0.500000
cached input per 1m tokens $0.050000
output per 1m tokens $3.000000
Integration docs

Gemini 3.1 Flash-Lite

gemini-3.1-flash-lite

Gemini 3.1 Flash-Lite for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,048,576 tokens Max output: 65,536 tokens
input per 1m tokens $0.250000
cached input per 1m tokens $0.025000
output per 1m tokens $1.500000
Integration docs

Gemini 3.1 Pro Preview

gemini-3.1-pro-preview

Gemini 3.1 Pro Preview for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,048,576 tokens Max output: 65,536 tokens
input per 1m tokens $2.000000
cached input per 1m tokens $0.200000
output per 1m tokens $12.000000
Integration docs

Gemini 3.1 Pro Preview Custom Tools

gemini-3.1-pro-preview-customtools

Gemini 3.1 Pro Preview Custom Tools for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,048,576 tokens Max output: 65,536 tokens
input per 1m tokens $2.000000
cached input per 1m tokens $0.200000
output per 1m tokens $12.000000
Integration docs

Gemini 3.5 Flash

gemini-3.5-flash

Gemini 3.5 Flash for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,048,576 tokens Max output: 65,536 tokens
input per 1m tokens $1.500000
cached input per 1m tokens $0.150000
output per 1m tokens $9.000000
Integration docs

Gemini Deep Research Max Preview

gemini-deep-research-max-preview

Gemini Deep Research Max Preview for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Streaming supported Tool/function calling supported
input per 1m tokens $2.000000
cached input per 1m tokens $0.200000
output per 1m tokens $12.000000
Integration docs

Gemini Deep Research Preview

gemini-deep-research-preview

Gemini Deep Research Preview for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Streaming supported Tool/function calling supported
input per 1m tokens $2.000000
cached input per 1m tokens $0.200000
output per 1m tokens $12.000000
Integration docs

Gemini Flash Latest

gemini-flash-latest

Gemini Flash Latest for text generation, reasoning, tool calling, and streaming responses.

Chat Streaming Tools Context window: 1,048,576 tokens Max output: 65,536 tokens
input per 1m tokens $0.500000
cached input per 1m tokens $0.050000
output per 1m tokens $3.000000
Integration docs

Gemini Robotics-ER 1.6 Preview

gemini-robotics-er-1.6-preview

Gemini Robotics-ER 1.6 Preview for text generation, reasoning, tool calling, and streaming responses.

Chat Streaming Tools Streaming supported Tool/function calling supported
input per 1m tokens $1.000000
output per 1m tokens $5.000000
minimum hold $0.010000
Integration docs

GLM 4.7

glm-4.7

GLM 4.7 through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 200,000 tokens Max output: 128,000 tokens
input per 1m tokens $1.000000
cached input per 1m tokens $0.100000
output per 1m tokens $3.200000
Integration docs

GLM 5

glm-5

GLM 5 through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 200,000 tokens Max output: 128,000 tokens
input per 1m tokens $1.000000
cached input per 1m tokens $0.100000
output per 1m tokens $3.200000
Integration docs

GPT Chat Latest

gpt-chat-latest

GPT Chat Latest for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 128,000 tokens Max output: 16,384 tokens
input per 1m tokens $5.000000
cached input per 1m tokens $0.500000
output per 1m tokens $30.000000
Integration docs

GPT OSS 120B

gpt-oss-120b

GPT OSS 120B through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 131,072 tokens Max output: 65,536 tokens
input per 1m tokens $0.150000
output per 1m tokens $0.600000
minimum hold $0.010000
Integration docs

GPT OSS 20B

gpt-oss-20b

GPT OSS 20B through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 131,072 tokens Max output: 65,536 tokens
input per 1m tokens $0.070000
output per 1m tokens $0.300000
minimum hold $0.010000
Integration docs

GPT-5

gpt-5

GPT-5 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 400,000 tokens Max output: 128,000 tokens
input per 1m tokens $1.250000
cached input per 1m tokens $0.125000
output per 1m tokens $10.000000
Integration docs

GPT-5.1

gpt-5.1

GPT-5.1 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 400,000 tokens Max output: 128,000 tokens
input per 1m tokens $1.250000
cached input per 1m tokens $0.125000
output per 1m tokens $10.000000
Integration docs

GPT-5.3 Codex

gpt-5.3-codex

GPT-5.3 Codex for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 400,000 tokens Max output: 128,000 tokens
input per 1m tokens $1.750000
cached input per 1m tokens $0.175000
output per 1m tokens $14.000000
Integration docs

GPT-5.4

gpt-5.4

GPT-5.4 for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 922,000 tokens Max output: 128,000 tokens
input per 1m tokens $2.500000
cached input per 1m tokens $0.250000
output per 1m tokens $15.000000
Integration docs

GPT-5.4 Mini

gpt-5.4-mini

GPT-5.4 Mini for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 272,000 tokens Max output: 128,000 tokens
input per 1m tokens $0.750000
cached input per 1m tokens $0.075000
output per 1m tokens $4.500000
Integration docs

GPT-5.4 Pro

gpt-5.4-pro

GPT-5.4 Pro for text generation, reasoning, tool calling, and live streaming responses.

Chat Streaming Tools Context window: 1,050,000 tokens Max output: 128,000 tokens
input per 1m tokens $30.000000
output per 1m tokens $180.000000
minimum hold $0.010000
Integration docs

GPT-5.5

gpt-5.5

GPT-5.5 for language generation, reasoning, tool calling, and streaming chat responses.

Chat Streaming Tools Context window: 922,000 tokens Max output: 128,000 tokens
input per 1m tokens $5.000000
cached input per 1m tokens $0.500000
output per 1m tokens $30.000000
Integration docs

Grok 4.1 Fast (Non-Reasoning)

grok-4.1-fast-non-reasoning

Grok 4.1 Fast (Non-Reasoning) through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 128,000 tokens Max output: 128,000 tokens
input per 1m tokens $0.200000
cached input per 1m tokens $0.050000
output per 1m tokens $0.500000
Integration docs

Grok 4.1 Fast (Reasoning)

grok-4.1-fast-reasoning

Grok 4.1 Fast (Reasoning) through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 128,000 tokens Max output: 128,000 tokens
input per 1m tokens $0.200000
cached input per 1m tokens $0.050000
output per 1m tokens $0.500000
Integration docs

Grok 4.20 (Non-Reasoning)

grok-4-20-non-reasoning

Grok 4.20 (Non-Reasoning) through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 2,000,000 tokens Max output: 8,192 tokens
input per 1m tokens $1.250000
cached input per 1m tokens $0.200000
output per 1m tokens $2.500000
Integration docs

Grok 4.20 (Reasoning)

grok-4-20-reasoning

Grok 4.20 (Reasoning) through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 2,000,000 tokens Max output: 8,192 tokens
input per 1m tokens $1.250000
cached input per 1m tokens $0.200000
output per 1m tokens $2.500000
Integration docs

Kimi K2 Thinking

Kimi-K2-Thinking

Kimi K2 Thinking through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 131,072 tokens Max output: 65,536 tokens
input per 1m tokens $1.045000
cached input per 1m tokens $0.176000
output per 1m tokens $4.400000
Integration docs

Meta Llama 3 405B Instruct

llama-3-405b-instruct

Meta Llama 3 405B Instruct through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 131,072 tokens Max output: 8,192 tokens
input per 1m tokens $2.700000
output per 1m tokens $2.700000
minimum hold $0.010000
Integration docs

Meta Llama 3 70B Instruct

llama-3-70b-instruct

Meta Llama 3 70B Instruct through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 8,192 tokens Max output: 8,192 tokens
input per 1m tokens $0.710000
output per 1m tokens $0.710000
minimum hold $0.010000
Integration docs

Meta Llama 3 8B Instruct

llama-3-8b-instruct

Meta Llama 3 8B Instruct through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 8,192 tokens Max output: 8,192 tokens
input per 1m tokens $0.200000
output per 1m tokens $0.200000
minimum hold $0.010000
Integration docs

Meta Llama 3.2 90B Instruct

llama-3.2-90b-instruct

Meta Llama 3.2 90B Instruct through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 131,072 tokens Max output: 8,192 tokens
input per 1m tokens $0.900000
output per 1m tokens $0.900000
minimum hold $0.010000
Integration docs

Meta Llama 3.3 70B Instruct

Llama-3.3-70B-Instruct

Meta Llama 3.3 70B Instruct through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 131,072 tokens Max output: 8,192 tokens
input per 1m tokens $0.710000
output per 1m tokens $0.710000
minimum hold $0.010000
Integration docs

Meta Llama 4 Maverick Instruct

Llama-4-Maverick-17B-128E-Instruct-FP8

Meta Llama 4 Maverick Instruct through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 524,288 tokens Max output: 8,192 tokens
input per 1m tokens $0.350000
output per 1m tokens $1.150000
minimum hold $0.010000
Integration docs

Meta Llama 4 Scout Instruct

llama-4-scout-17b-16e-instruct

Meta Llama 4 Scout Instruct through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 1,048,576 tokens Max output: 8,192 tokens
input per 1m tokens $0.180000
output per 1m tokens $0.590000
minimum hold $0.010000
Integration docs

MiniMax M2

MiniMax-M2

MiniMax M2 through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 196,608 tokens Max output: 196,608 tokens
input per 1m tokens $0.300000
cached input per 1m tokens $0.030000
output per 1m tokens $1.200000
Integration docs

Qwen3 235B A22B Instruct 2507

qwen3-235b-a22b-instruct-2507

Qwen3 235B A22B Instruct 2507 through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 262,144 tokens Max output: 65,536 tokens
input per 1m tokens $0.220000
cached input per 1m tokens $0.022000
output per 1m tokens $1.800000
Integration docs

Qwen3 Coder 480B A35B Instruct

qwen3-coder-480b-a35b-instruct

Qwen3 Coder 480B A35B Instruct through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 262,144 tokens Max output: 65,536 tokens
input per 1m tokens $0.220000
cached input per 1m tokens $0.022000
output per 1m tokens $1.800000
Integration docs

Qwen3 Next 80B A3B Instruct

qwen3-next-80b-a3b-instruct

Qwen3 Next 80B A3B Instruct through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 262,144 tokens Max output: 65,536 tokens
input per 1m tokens $0.220000
cached input per 1m tokens $0.022000
output per 1m tokens $1.800000
Integration docs

Qwen3 Next 80B A3B Thinking

qwen3-next-80b-a3b-thinking

Qwen3 Next 80B A3B Thinking through Google Vertex AI Model Garden/MaaS with Omixa routing and streaming.

Chat Streaming Tools Context window: 262,144 tokens Max output: 65,536 tokens
input per 1m tokens $0.220000
cached input per 1m tokens $0.022000
output per 1m tokens $1.800000
Integration docs
Copied Markdown