Voice API
Audio-in, audio-out astrology. Send a spoken question, receive a spoken answer — one API call handles speech recognition, AI reasoning, and speech synthesis.
Production-ready. Both /api/v1/voice (buffered) and /api/v1/voice/stream (SSE streaming) are live at api.vedika.io.
How It Works
Endpoints
POST /api/v1/voice
Buffered voice query. Accepts multipart audio, returns a complete MP3 file as the response body (audio/mpeg).
Best for: mobile apps, IVR systems, any client that plays audio after full download.
POST /api/v1/voice/stream
Streaming voice query (SSE). Same multipart input, but the response is a text/event-stream that emits audio chunks as they are generated — sub-200ms time-to-first-audio on the Jarvis tier.
Best for: real-time voice assistants, conversational UIs, and any client that can play audio progressively.
Authentication
Authenticate with your Vedika API key using either method:
Authorization: Bearer vk_live_your_api_key
or
x-api-key: vk_live_your_api_key
Voice requires a live API key (vk_live_* or vk_ent_*). Test keys (vk_test_*) are not accepted. A minimum wallet balance of $0.15 is required per call.
Request Format
Content-Type: multipart/form-data
| Field | Type | Required | Description |
|---|---|---|---|
audio |
File | REQUIRED | Audio file. Max 25 MB. Accepted formats: webm, mp3, wav, m4a, ogg. |
birthDetails |
JSON string | optional | Birth data for personalized chart-based answers. Also accepts birth_details (snake_case alias).{"datetime":"1992-08-20T14:30:00","latitude":12.97,"longitude":77.59,"timezone":"Asia/Kolkata"} |
language |
String | optional | Hint for speech recognition. ISO 639-1 code: en, hi, ta, te, kn, ml, bn, gu, mr, etc. Auto-detected if omitted. |
speed |
String | optional | Must be "fast" or omitted. Voice only supports fast mode. Sending "standard" returns a 400 error. |
tier |
String | optional | Voice quality tier. One of: cheap-hi, premium-hi, premium-en, premium-multi, premium-gpt, jarvis. Auto-selected by language if omitted. See Voice Tiers. |
conversationId |
String | optional | Pass a previous conversation ID for multi-turn follow-up questions. The AI will reference prior context. |
signal |
String | optional | Routing hint. "b2c" or "free" forces the budget tier. "gpt" selects steerable multi-voice synthesis. "jarvis" selects the real-time streaming tier. |
Voice Tiers
Each tier balances cost, latency, and audio quality. If tier is omitted, Vedika auto-selects based on the detected language.
| Tier | Public Label | Languages | Approx. Cost | Best For |
|---|---|---|---|---|
cheap-hi |
vedika-voice-cheap | Hindi, English | ~$0.02/min | High-volume B2C, budget apps |
premium-hi |
vedika-voice-premium | Hindi, Hinglish, English | ~$0.30/min | Ultra-natural Hindi voice |
premium-en |
vedika-voice-premium | English | ~$0.30/min | Ultra-natural English voice |
premium-multi |
vedika-voice-premium | Tamil, Telugu, Kannada, Malayalam, Bengali, Gujarati, Marathi, Punjabi, Urdu, +20 more | ~$0.23/min | Any Indic language beyond Hindi |
premium-gpt |
vedika-voice-premium | English, Hindi, multilingual | ~$0.07/min | 6 steerable voices, tone control |
jarvis |
vedika-voice-jarvis | Hindi, English | ~$0.32/min | Real-time streaming, sub-200ms TTFT |
Auto-selection logic: Hindi/Hinglish → premium-hi | English → premium-en | Other Indic → premium-multi | signal=b2c → cheap-hi.
All costs shown are customer price (raw provider cost × 4.0 markup). The exact cost per call is returned in the response metadata. LLM inference cost is billed separately on top of voice STT+TTS cost.
Response: Buffered Endpoint
Success (audio available)
Content-Type: audio/mpeg
The response body is raw MP3 binary. Save it directly as an .mp3 file or pipe it to an audio player.
Response Headers
| Header | Description |
|---|---|
Content-Type | audio/mpeg |
Content-Length | Size of the MP3 in bytes |
X-Vedika-Voice-Meta | Base64-encoded JSON with transcription, language, tier, billing, and processing time. Decode with atob() / base64.b64decode(). |
X-Vedika-Voice-Tier | Public tier label: vedika-voice-cheap, vedika-voice-premium, or vedika-voice-jarvis |
X-Vedika-Voice-Lang | Detected language ISO code (e.g., hi, en) |
X-Vedika-Transcription | URL-encoded transcription of the input audio (max 2000 chars) |
X-Vedika-Signature | HMAC watermark for response integrity verification |
X-Vedika-Voice-Meta (decoded)
{
"transcription": "What does my chart say about career?",
"language": "en",
"tier": "vedika-voice-premium",
"tierSource": "auto",
"processingMs": 4200,
"sttDurationSec": 3.2,
"ttsDurationSec": 12.5,
"engine": "vedika-voice",
"billing": {
"rawCostUsd": 0.003200,
"markupFactor": 4.0,
"customerCostUsd": 0.012800
},
"totalPriceUsd": 0.012800
}
Fallback (TTS failed)
If speech synthesis fails, the endpoint degrades gracefully to a JSON response with audio: null and the text answer:
{
"success": true,
"audio": null,
"response": "Your chart shows a strong period for career growth...",
"transcription": "What does my chart say about career?",
"language": "en",
"tier": "vedika-voice-premium",
"billing": { ... }
}
Response: Streaming Endpoint
Content-Type: text/event-stream (Server-Sent Events). The stream emits the following event types in order:
event: started
Fired after successful speech recognition. Contains the transcription and detected language.
data: {"transcription":"What about my marriage?","language":"hi","tier":"vedika-voice-jarvis","sttMs":820}
event: text
The complete AI-generated text answer. Emitted as a single frame (the AI pipeline returns the full answer, not token-by-token).
data: {"delta":"Your Venus is exalted in Pisces...","done":true}
event: audio
Base64-encoded MP3 chunks. Multiple audio events are emitted in sequence. Decode each chunk and append to a buffer or MediaSource for progressive playback.
data: {"bytesBase64":"//uQxAAAAAANIAAAAAExBTUUzLjEw...","seq":0}
data: {"bytesBase64":"AAAAIGZ0eXBpc29t...","seq":1}
data: {"bytesBase64":"...","seq":2}
event: completed
Final event with billing summary and total processing time.
data: {"processingMs":2400,"sttDurationSec":1.2,"ttsDurationSec":8.5,"totalChunks":14,"billing":{"rawCostUsd":0.004100,"markupFactor":4.0,"customerCostUsd":0.016400}}
event: error
Emitted on failure at any stage. The stream closes after this event.
data: {"code":"STT_FAILED","message":"Could not transcribe the submitted audio."}
Error Codes
| HTTP | Code | Description |
|---|---|---|
| 400 | NO_AUDIO | The audio multipart field is missing or empty. |
| 400 | VOICE_REQUIRES_FAST_MODE | speed was set to something other than "fast". Voice only supports fast mode. |
| 400 | BAD_BIRTH_DETAILS_JSON | birthDetails is not valid JSON. |
| 401 | NO_API_KEY | Missing or invalid API key. |
| 402 | INSUFFICIENT_BALANCE | Wallet balance is below $0.15. Top up via the dashboard. |
| 422 | STT_FAILED | Speech recognition failed. The audio may be corrupted, silent, or in an unsupported format. |
| 502 | LLM_FAILED | AI pipeline failed after successful transcription. The transcription field is included so the client can retry via the text API. |
| 502 | EMPTY_LLM | AI returned an empty response. Rare; retry typically resolves it. |
| 500 | VOICE_INTERNAL | Unexpected server error. |
On the streaming endpoint, errors are emitted as SSE event: error instead of HTTP status codes (since the stream starts with HTTP 200). Check the code field in the error event data.
Code Examples
cURL
curl -X POST https://api.vedika.io/api/v1/voice \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "audio=@question.webm" \
-F 'birthDetails={"datetime":"1992-08-20T14:30:00","latitude":12.97,"longitude":77.59,"timezone":"Asia/Kolkata"}' \
-F "language=hi" \
-F "speed=fast" \
-F "tier=cheap-hi" \
--output response.mp3
The --output flag saves the MP3 binary to a file. To also read the metadata header, add -D headers.txt.
JavaScript (Browser / Node.js)
const form = new FormData();
form.append('audio', audioBlob, 'question.webm');
form.append('birthDetails', JSON.stringify({
datetime: '1992-08-20T14:30:00',
latitude: 12.97,
longitude: 77.59,
timezone: 'Asia/Kolkata'
}));
form.append('language', 'hi');
form.append('speed', 'fast');
form.append('tier', 'cheap-hi');
const res = await fetch('https://api.vedika.io/api/v1/voice', {
method: 'POST',
headers: { 'Authorization': 'Bearer YOUR_API_KEY' },
body: form
});
// Play the audio
const audioBuffer = await res.arrayBuffer();
const audio = new Audio(URL.createObjectURL(
new Blob([audioBuffer], { type: 'audio/mpeg' })
));
audio.play();
// Read metadata from header
const meta = JSON.parse(atob(
res.headers.get('X-Vedika-Voice-Meta')
));
console.log('Transcription:', meta.transcription);
console.log('Cost:', meta.billing.customerCostUsd);
const form = new FormData();
form.append('audio', audioBlob, 'question.webm');
form.append('birthDetails', JSON.stringify({
datetime: '1992-08-20T14:30:00',
latitude: 12.97,
longitude: 77.59,
timezone: 'Asia/Kolkata'
}));
form.append('tier', 'jarvis');
form.append('speed', 'fast');
const res = await fetch('https://api.vedika.io/api/v1/voice/stream', {
method: 'POST',
headers: { 'Authorization': 'Bearer YOUR_API_KEY' },
body: form
});
const reader = res.body.getReader();
const decoder = new TextDecoder();
let buffer = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
// Parse SSE events
const lines = buffer.split('\n');
buffer = lines.pop(); // keep incomplete line
let eventType = '';
for (const line of lines) {
if (line.startsWith('event: ')) {
eventType = line.slice(7);
} else if (line.startsWith('data: ')) {
const data = JSON.parse(line.slice(6));
switch (eventType) {
case 'started':
console.log('Transcribed:', data.transcription);
break;
case 'text':
console.log('Answer:', data.delta);
break;
case 'audio':
// Decode base64 and queue for playback
const bytes = Uint8Array.from(
atob(data.bytesBase64), c => c.charCodeAt(0)
);
// Append to MediaSource or Web Audio buffer
break;
case 'completed':
console.log('Done in', data.processingMs, 'ms');
break;
case 'error':
console.error('Voice error:', data.code);
break;
}
}
}
}
Python
import requests
import base64
import json
url = "https://api.vedika.io/api/v1/voice"
headers = {"Authorization": "Bearer YOUR_API_KEY"}
birth = json.dumps({
"datetime": "1992-08-20T14:30:00",
"latitude": 12.97,
"longitude": 77.59,
"timezone": "Asia/Kolkata"
})
with open("question.webm", "rb") as f:
files = {"audio": ("question.webm", f, "audio/webm")}
data = {
"birthDetails": birth,
"language": "hi",
"speed": "fast",
"tier": "cheap-hi"
}
resp = requests.post(url, headers=headers, files=files, data=data)
if resp.status_code == 200:
# Save audio
with open("response.mp3", "wb") as out:
out.write(resp.content)
# Read metadata
meta_b64 = resp.headers.get("X-Vedika-Voice-Meta", "")
if meta_b64:
meta = json.loads(base64.b64decode(meta_b64))
print("Transcription:", meta["transcription"])
print("Cost: $", meta["billing"]["customerCostUsd"])
else:
print("Error:", resp.status_code, resp.json())
import requests
import json
import base64
url = "https://api.vedika.io/api/v1/voice/stream"
headers = {"Authorization": "Bearer YOUR_API_KEY"}
with open("question.webm", "rb") as f:
files = {"audio": ("question.webm", f, "audio/webm")}
data = {"tier": "jarvis", "speed": "fast"}
resp = requests.post(url, headers=headers, files=files,
data=data, stream=True)
audio_chunks = []
for line in resp.iter_lines(decode_unicode=True):
if not line:
continue
if line.startswith("event: "):
event_type = line[7:]
elif line.startswith("data: "):
payload = json.loads(line[6:])
if event_type == "started":
print("Transcribed:", payload["transcription"])
elif event_type == "text":
print("Answer:", payload["delta"][:100], "...")
elif event_type == "audio":
chunk = base64.b64decode(payload["bytesBase64"])
audio_chunks.append(chunk)
elif event_type == "completed":
print(f"Done: {payload['processingMs']}ms, "
f"{payload['totalChunks']} chunks")
elif event_type == "error":
print("Error:", payload["code"], payload.get("message"))
# Save assembled audio
with open("response.mp3", "wb") as f:
f.write(b"".join(audio_chunks))
print(f"Saved {len(audio_chunks)} chunks to response.mp3")
Supported Languages
| Language | Code | Available Tiers |
|---|---|---|
| English | en | All tiers |
| Hindi | hi | All tiers |
| Tamil | ta | premium-multi |
| Telugu | te | premium-multi |
| Kannada | kn | premium-multi |
| Malayalam | ml | premium-multi |
| Bengali | bn | premium-multi |
| Gujarati | gu | premium-multi |
| Marathi | mr | premium-multi |
| Punjabi | pa | premium-multi |
| Odia | or | premium-multi |
| Urdu | ur | premium-multi |
| Arabic | ar | premium-multi |
| Russian | ru | premium-multi |
| Spanish | es | premium-multi |
| French | fr | premium-multi |
| German | de | premium-multi |
| Chinese | zh | premium-multi |
| Japanese | ja | premium-multi |
| Korean | ko | premium-multi |
| Thai | th | premium-multi |
Speech recognition supports 50+ languages. If the language hint is omitted, the system auto-detects from the audio. For best accuracy with short clips, provide the language hint.
Billing
Voice calls are billed from your wallet in two parts:
- Voice cost (STT + TTS) — raw provider cost × 4.0 markup.
- LLM cost — Vedika Intelligence inference cost (same pricing as the text
/queryendpoint).
Both costs are deducted from your wallet balance after the response is generated. The exact breakdown is returned in X-Vedika-Voice-Meta (buffered) or the completed SSE event (streaming).
Billing transparency: Every response includes rawCostUsd (what Vedika pays), markupFactor (4.0), and customerCostUsd (what you pay). No hidden fees.
Best Practices
Audio Quality
- Use WebM/Opus at 48kHz for best speech recognition accuracy at small file sizes.
- MP3 and WAV are fully supported but produce larger uploads.
- Keep recordings under 60 seconds for optimal response time.
Multi-turn Conversations
- Save the
conversationIdfrom the first response metadata and pass it in subsequent calls. - The AI will reference prior questions and answers for contextual follow-ups.
- Birth details only need to be sent on the first call — they persist in the conversation.
Choosing a Tier
- High-volume B2C: Use
cheap-hi(orsignal=b2c) for the lowest cost per query. - Premium experience: Use
premium-hi(Hindi) orpremium-en(English) for the most natural voice. - Multi-language: Use
premium-multifor Tamil, Telugu, Bengali, and other Indic languages. - Real-time assistant: Use
jarviswith the/voice/streamendpoint for sub-200ms time-to-first-audio.
Error Handling
- Always check for
audio: nullin buffered responses — it means TTS failed but the text answer is available. - On the streaming endpoint, handle the
errorevent gracefully and close the EventSource. - If
LLM_FAILEDis returned, thetranscriptionfield contains the recognized text — retry via the text/queryendpoint as a fallback.