LLM Configuration

Control model selection, sampling parameters, and custom tool schemas to shape how your agent reasons and acts.

Available models

llama-3.3-70b-versatile

Default

Groq-hosted Llama 3.3 70B. Near-frontier reasoning with very low time-to-first-token, which keeps end-to-end voice latency around a second. Recommended for production.

gpt-4o

Premium

OpenAI GPT-4o. Highest quality on complex reasoning, function calling, and vision. Available on Pro and Custom plans.

gpt-4o-mini

Fast

Faster and cheaper OpenAI model. Good for simple scripts or high-volume outbound where throughput matters more than nuance.

Sampling parameters

llmTemperature

number 0.0–2.0

Controls randomness. 0.2 for factual, scripted agents (support, bookings). 0.7–1.0 for conversational, creative personas (sales, concierge).

llmMaxTokens

integer

Maximum tokens in the agent's response. Default 200. Keep this low — long responses add TTS latency. Agents rarely need more than 150 tokens per turn.

llmTopP

number 0.0–1.0

Nucleus sampling threshold. Default 1.0 (disabled). Usually leave untouched unless you have a specific reason.

Custom function tools

Custom tools let you extend agent behavior beyond the built-in book_appointment and transfer_to_human tools. When the LLM decides to call a custom tool, Talknex POSTs the arguments to your configured webhook URL and injects the response back into the conversation.

Example tool definition (JSON Schema):

{
  "name": "lookup_order",
  "description": "Look up an order by order number and return shipping status.",
  "parameters": {
    "type": "object",
    "properties": {
      "orderNumber": {
        "type": "string",
        "description": "The order number the caller provided, e.g. #82144"
      }
    },
    "required": ["orderNumber"]
  },
  "webhookUrl": "https://your-api.com/webhooks/talknex/lookup-order"
}

Expected webhook response:

{
  "result": "Order #82144 shipped 2026-05-04. Expected delivery: 2026-05-07. Carrier: FedEx. Tracking: 794644792798."
}

⚠

Tool webhook calls must respond within 3 seconds or the agent will fall back to a generic "I couldn't retrieve that information" response.

Prompt engineering tips

Be explicit about when to use tools

The LLM needs clear cues. "When the caller gives their order number, call lookup_order immediately" outperforms implicit expectations.

Constrain output length in the prompt

Add "Keep all responses under 30 words" or "Be brief." to the system prompt. LLMs left unconstrained tend toward long replies that take more time to speak.

Use few-shot examples for edge cases

Append 2–3 Q&A examples to the system prompt for calls you know are tricky. Few-shot beats extensive instructions for specific patterns.

Separate facts from behavior

Keep product/policy facts in the knowledge base (RAG), not the system prompt. Shorter prompts = lower latency + easier updates.

← Voice & Speech Knowledge Base