Voice API

Audio-in, audio-out astrology. Send a spoken question, receive a spoken answer — one API call handles speech recognition, AI reasoning, and speech synthesis.

Production-ready. Both /api/v1/voice (buffered) and /api/v1/voice/stream (SSE streaming) are live at api.vedika.io.

How It Works

1
Upload audio — POST a multipart form with the caller's recorded audio (webm, mp3, wav, m4a, or ogg; max 25 MB).
2
Speech-to-text — Vedika AI transcribes the audio and detects the spoken language automatically.
3
AI astrology — The transcribed question is routed through the Vedika Intelligence pipeline (fast mode) with birth chart context, yielding a personalized astrology response.
4
Text-to-speech — The response is synthesized into natural-sounding speech (MP3) in the detected language.
5
Audio response — The MP3 binary is returned directly (or streamed as base64 SSE events on the streaming endpoint).

Endpoints

POST /api/v1/voice

Buffered voice query. Accepts multipart audio, returns a complete MP3 file as the response body (audio/mpeg).

Best for: mobile apps, IVR systems, any client that plays audio after full download.

POST /api/v1/voice/stream

Streaming voice query (SSE). Same multipart input, but the response is a text/event-stream that emits audio chunks as they are generated — sub-200ms time-to-first-audio on the Jarvis tier.

Best for: real-time voice assistants, conversational UIs, and any client that can play audio progressively.

Authentication

Authenticate with your Vedika API key using either method:

Authorization: Bearer vk_live_your_api_key

or

x-api-key: vk_live_your_api_key

Voice requires a live API key (vk_live_* or vk_ent_*). Test keys (vk_test_*) are not accepted. A minimum wallet balance of $0.15 is required per call.

Request Format

Content-Type: multipart/form-data

FieldTypeRequiredDescription
audio File REQUIRED Audio file. Max 25 MB. Accepted formats: webm, mp3, wav, m4a, ogg.
birthDetails JSON string optional Birth data for personalized chart-based answers. Also accepts birth_details (snake_case alias).
{"datetime":"1992-08-20T14:30:00","latitude":12.97,"longitude":77.59,"timezone":"Asia/Kolkata"}
language String optional Hint for speech recognition. ISO 639-1 code: en, hi, ta, te, kn, ml, bn, gu, mr, etc. Auto-detected if omitted.
speed String optional Must be "fast" or omitted. Voice only supports fast mode. Sending "standard" returns a 400 error.
tier String optional Voice quality tier. One of: cheap-hi, premium-hi, premium-en, premium-multi, premium-gpt, jarvis. Auto-selected by language if omitted. See Voice Tiers.
conversationId String optional Pass a previous conversation ID for multi-turn follow-up questions. The AI will reference prior context.
signal String optional Routing hint. "b2c" or "free" forces the budget tier. "gpt" selects steerable multi-voice synthesis. "jarvis" selects the real-time streaming tier.

Voice Tiers

Each tier balances cost, latency, and audio quality. If tier is omitted, Vedika auto-selects based on the detected language.

TierPublic LabelLanguagesApprox. CostBest For
cheap-hi vedika-voice-cheap Hindi, English ~$0.02/min High-volume B2C, budget apps
premium-hi vedika-voice-premium Hindi, Hinglish, English ~$0.30/min Ultra-natural Hindi voice
premium-en vedika-voice-premium English ~$0.30/min Ultra-natural English voice
premium-multi vedika-voice-premium Tamil, Telugu, Kannada, Malayalam, Bengali, Gujarati, Marathi, Punjabi, Urdu, +20 more ~$0.23/min Any Indic language beyond Hindi
premium-gpt vedika-voice-premium English, Hindi, multilingual ~$0.07/min 6 steerable voices, tone control
jarvis vedika-voice-jarvis Hindi, English ~$0.32/min Real-time streaming, sub-200ms TTFT

Auto-selection logic: Hindi/Hinglish → premium-hi | English → premium-en | Other Indic → premium-multi | signal=b2ccheap-hi.

All costs shown are customer price (raw provider cost × 4.0 markup). The exact cost per call is returned in the response metadata. LLM inference cost is billed separately on top of voice STT+TTS cost.

Response: Buffered Endpoint

Success (audio available)

Content-Type: audio/mpeg
The response body is raw MP3 binary. Save it directly as an .mp3 file or pipe it to an audio player.

Response Headers

HeaderDescription
Content-Typeaudio/mpeg
Content-LengthSize of the MP3 in bytes
X-Vedika-Voice-MetaBase64-encoded JSON with transcription, language, tier, billing, and processing time. Decode with atob() / base64.b64decode().
X-Vedika-Voice-TierPublic tier label: vedika-voice-cheap, vedika-voice-premium, or vedika-voice-jarvis
X-Vedika-Voice-LangDetected language ISO code (e.g., hi, en)
X-Vedika-TranscriptionURL-encoded transcription of the input audio (max 2000 chars)
X-Vedika-SignatureHMAC watermark for response integrity verification

X-Vedika-Voice-Meta (decoded)

{
  "transcription": "What does my chart say about career?",
  "language": "en",
  "tier": "vedika-voice-premium",
  "tierSource": "auto",
  "processingMs": 4200,
  "sttDurationSec": 3.2,
  "ttsDurationSec": 12.5,
  "engine": "vedika-voice",
  "billing": {
    "rawCostUsd": 0.003200,
    "markupFactor": 4.0,
    "customerCostUsd": 0.012800
  },
  "totalPriceUsd": 0.012800
}

Fallback (TTS failed)

If speech synthesis fails, the endpoint degrades gracefully to a JSON response with audio: null and the text answer:

{
  "success": true,
  "audio": null,
  "response": "Your chart shows a strong period for career growth...",
  "transcription": "What does my chart say about career?",
  "language": "en",
  "tier": "vedika-voice-premium",
  "billing": { ... }
}

Response: Streaming Endpoint

Content-Type: text/event-stream (Server-Sent Events). The stream emits the following event types in order:

event: started

Fired after successful speech recognition. Contains the transcription and detected language.

data: {"transcription":"What about my marriage?","language":"hi","tier":"vedika-voice-jarvis","sttMs":820}
event: text

The complete AI-generated text answer. Emitted as a single frame (the AI pipeline returns the full answer, not token-by-token).

data: {"delta":"Your Venus is exalted in Pisces...","done":true}
event: audio

Base64-encoded MP3 chunks. Multiple audio events are emitted in sequence. Decode each chunk and append to a buffer or MediaSource for progressive playback.

data: {"bytesBase64":"//uQxAAAAAANIAAAAAExBTUUzLjEw...","seq":0}
data: {"bytesBase64":"AAAAIGZ0eXBpc29t...","seq":1}
data: {"bytesBase64":"...","seq":2}
event: completed

Final event with billing summary and total processing time.

data: {"processingMs":2400,"sttDurationSec":1.2,"ttsDurationSec":8.5,"totalChunks":14,"billing":{"rawCostUsd":0.004100,"markupFactor":4.0,"customerCostUsd":0.016400}}
event: error

Emitted on failure at any stage. The stream closes after this event.

data: {"code":"STT_FAILED","message":"Could not transcribe the submitted audio."}

Error Codes

HTTPCodeDescription
400NO_AUDIOThe audio multipart field is missing or empty.
400VOICE_REQUIRES_FAST_MODEspeed was set to something other than "fast". Voice only supports fast mode.
400BAD_BIRTH_DETAILS_JSONbirthDetails is not valid JSON.
401NO_API_KEYMissing or invalid API key.
402INSUFFICIENT_BALANCEWallet balance is below $0.15. Top up via the dashboard.
422STT_FAILEDSpeech recognition failed. The audio may be corrupted, silent, or in an unsupported format.
502LLM_FAILEDAI pipeline failed after successful transcription. The transcription field is included so the client can retry via the text API.
502EMPTY_LLMAI returned an empty response. Rare; retry typically resolves it.
500VOICE_INTERNALUnexpected server error.

On the streaming endpoint, errors are emitted as SSE event: error instead of HTTP status codes (since the stream starts with HTTP 200). Check the code field in the error event data.

Code Examples

cURL

curl -X POST https://api.vedika.io/api/v1/voice \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "audio=@question.webm" \
  -F 'birthDetails={"datetime":"1992-08-20T14:30:00","latitude":12.97,"longitude":77.59,"timezone":"Asia/Kolkata"}' \
  -F "language=hi" \
  -F "speed=fast" \
  -F "tier=cheap-hi" \
  --output response.mp3

The --output flag saves the MP3 binary to a file. To also read the metadata header, add -D headers.txt.

JavaScript (Browser / Node.js)

const form = new FormData();
form.append('audio', audioBlob, 'question.webm');
form.append('birthDetails', JSON.stringify({
  datetime: '1992-08-20T14:30:00',
  latitude: 12.97,
  longitude: 77.59,
  timezone: 'Asia/Kolkata'
}));
form.append('language', 'hi');
form.append('speed', 'fast');
form.append('tier', 'cheap-hi');

const res = await fetch('https://api.vedika.io/api/v1/voice', {
  method: 'POST',
  headers: { 'Authorization': 'Bearer YOUR_API_KEY' },
  body: form
});

// Play the audio
const audioBuffer = await res.arrayBuffer();
const audio = new Audio(URL.createObjectURL(
  new Blob([audioBuffer], { type: 'audio/mpeg' })
));
audio.play();

// Read metadata from header
const meta = JSON.parse(atob(
  res.headers.get('X-Vedika-Voice-Meta')
));
console.log('Transcription:', meta.transcription);
console.log('Cost:', meta.billing.customerCostUsd);
const form = new FormData();
form.append('audio', audioBlob, 'question.webm');
form.append('birthDetails', JSON.stringify({
  datetime: '1992-08-20T14:30:00',
  latitude: 12.97,
  longitude: 77.59,
  timezone: 'Asia/Kolkata'
}));
form.append('tier', 'jarvis');
form.append('speed', 'fast');

const res = await fetch('https://api.vedika.io/api/v1/voice/stream', {
  method: 'POST',
  headers: { 'Authorization': 'Bearer YOUR_API_KEY' },
  body: form
});

const reader = res.body.getReader();
const decoder = new TextDecoder();
let buffer = '';

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  buffer += decoder.decode(value, { stream: true });

  // Parse SSE events
  const lines = buffer.split('\n');
  buffer = lines.pop(); // keep incomplete line

  let eventType = '';
  for (const line of lines) {
    if (line.startsWith('event: ')) {
      eventType = line.slice(7);
    } else if (line.startsWith('data: ')) {
      const data = JSON.parse(line.slice(6));

      switch (eventType) {
        case 'started':
          console.log('Transcribed:', data.transcription);
          break;
        case 'text':
          console.log('Answer:', data.delta);
          break;
        case 'audio':
          // Decode base64 and queue for playback
          const bytes = Uint8Array.from(
            atob(data.bytesBase64), c => c.charCodeAt(0)
          );
          // Append to MediaSource or Web Audio buffer
          break;
        case 'completed':
          console.log('Done in', data.processingMs, 'ms');
          break;
        case 'error':
          console.error('Voice error:', data.code);
          break;
      }
    }
  }
}

Python

import requests
import base64
import json

url = "https://api.vedika.io/api/v1/voice"
headers = {"Authorization": "Bearer YOUR_API_KEY"}

birth = json.dumps({
    "datetime": "1992-08-20T14:30:00",
    "latitude": 12.97,
    "longitude": 77.59,
    "timezone": "Asia/Kolkata"
})

with open("question.webm", "rb") as f:
    files = {"audio": ("question.webm", f, "audio/webm")}
    data = {
        "birthDetails": birth,
        "language": "hi",
        "speed": "fast",
        "tier": "cheap-hi"
    }
    resp = requests.post(url, headers=headers, files=files, data=data)

if resp.status_code == 200:
    # Save audio
    with open("response.mp3", "wb") as out:
        out.write(resp.content)

    # Read metadata
    meta_b64 = resp.headers.get("X-Vedika-Voice-Meta", "")
    if meta_b64:
        meta = json.loads(base64.b64decode(meta_b64))
        print("Transcription:", meta["transcription"])
        print("Cost: $", meta["billing"]["customerCostUsd"])
else:
    print("Error:", resp.status_code, resp.json())
import requests
import json
import base64

url = "https://api.vedika.io/api/v1/voice/stream"
headers = {"Authorization": "Bearer YOUR_API_KEY"}

with open("question.webm", "rb") as f:
    files = {"audio": ("question.webm", f, "audio/webm")}
    data = {"tier": "jarvis", "speed": "fast"}
    resp = requests.post(url, headers=headers, files=files,
                         data=data, stream=True)

audio_chunks = []

for line in resp.iter_lines(decode_unicode=True):
    if not line:
        continue
    if line.startswith("event: "):
        event_type = line[7:]
    elif line.startswith("data: "):
        payload = json.loads(line[6:])

        if event_type == "started":
            print("Transcribed:", payload["transcription"])
        elif event_type == "text":
            print("Answer:", payload["delta"][:100], "...")
        elif event_type == "audio":
            chunk = base64.b64decode(payload["bytesBase64"])
            audio_chunks.append(chunk)
        elif event_type == "completed":
            print(f"Done: {payload['processingMs']}ms, "
                  f"{payload['totalChunks']} chunks")
        elif event_type == "error":
            print("Error:", payload["code"], payload.get("message"))

# Save assembled audio
with open("response.mp3", "wb") as f:
    f.write(b"".join(audio_chunks))
print(f"Saved {len(audio_chunks)} chunks to response.mp3")

Supported Languages

LanguageCodeAvailable Tiers
EnglishenAll tiers
HindihiAll tiers
Tamiltapremium-multi
Telugutepremium-multi
Kannadaknpremium-multi
Malayalammlpremium-multi
Bengalibnpremium-multi
Gujaratigupremium-multi
Marathimrpremium-multi
Punjabipapremium-multi
Odiaorpremium-multi
Urduurpremium-multi
Arabicarpremium-multi
Russianrupremium-multi
Spanishespremium-multi
Frenchfrpremium-multi
Germandepremium-multi
Chinesezhpremium-multi
Japanesejapremium-multi
Koreankopremium-multi
Thaithpremium-multi

Speech recognition supports 50+ languages. If the language hint is omitted, the system auto-detects from the audio. For best accuracy with short clips, provide the language hint.

Billing

Voice calls are billed from your wallet in two parts:

  1. Voice cost (STT + TTS) — raw provider cost × 4.0 markup.
  2. LLM cost — Vedika Intelligence inference cost (same pricing as the text /query endpoint).

Both costs are deducted from your wallet balance after the response is generated. The exact breakdown is returned in X-Vedika-Voice-Meta (buffered) or the completed SSE event (streaming).

Billing transparency: Every response includes rawCostUsd (what Vedika pays), markupFactor (4.0), and customerCostUsd (what you pay). No hidden fees.

Best Practices

Audio Quality

  • Use WebM/Opus at 48kHz for best speech recognition accuracy at small file sizes.
  • MP3 and WAV are fully supported but produce larger uploads.
  • Keep recordings under 60 seconds for optimal response time.

Multi-turn Conversations

  • Save the conversationId from the first response metadata and pass it in subsequent calls.
  • The AI will reference prior questions and answers for contextual follow-ups.
  • Birth details only need to be sent on the first call — they persist in the conversation.

Choosing a Tier

  • High-volume B2C: Use cheap-hi (or signal=b2c) for the lowest cost per query.
  • Premium experience: Use premium-hi (Hindi) or premium-en (English) for the most natural voice.
  • Multi-language: Use premium-multi for Tamil, Telugu, Bengali, and other Indic languages.
  • Real-time assistant: Use jarvis with the /voice/stream endpoint for sub-200ms time-to-first-audio.

Error Handling

  • Always check for audio: null in buffered responses — it means TTS failed but the text answer is available.
  • On the streaming endpoint, handle the error event gracefully and close the EventSource.
  • If LLM_FAILED is returned, the transcription field contains the recognized text — retry via the text /query endpoint as a fallback.

Get Started

Sign up for an API key and start building voice-powered astrology experiences.

Get API Key Back to Docs