tutorial

Handling rate limits with an astrology API

A developer guide to rate limiting on the Vedika astrology API: reading rate-limit headers, retrying 429s with backoff and jitter, caching, and batching.

To handle rate limits with the Vedika astrology API, read the rate-limit headers on every response, retry 429 responses with exponential backoff plus jitter while honouring the Retry-After header, and cache the computations that never change for a given birth chart. Because most astrology workloads are bursty — a batch of kundli generations, a matchmaking sweep, or a daily transit refresh — the bulk of your throughput problems disappear once you separate one-time computed data from per-request AI calls and queue the rest.

This guide covers how rate limiting works on the Vedika API, how to read the headers, a production-ready retry implementation, and the caching and batching patterns that keep you well under any ceiling while controlling cost.

How rate limiting works on the Vedika API

Rate limits exist to keep the service responsive for everyone and to protect you from a runaway loop quietly draining your wallet. The Vedika API applies limits per API key, so your vk_live_* key has its own budget that is not affected by other customers. Two distinct ceilings matter:

Higher plans carry more generous request windows alongside the larger wallet. If you are throughput-bound rather than balance-bound, moving up a tier or talking to us about an Enterprise window is usually cheaper than engineering around a low ceiling. The pricing page lists current tiers.

Which endpoints count differently

Not every operation carries the same weight. It helps to think in three buckets:

Read the rate-limit headers

Every response carries headers that tell you exactly where you stand, so you should never have to guess or hard-code a number. Rate-limit headers are explicitly allowed in our public surface, so they are safe to rely on in client code.

HeaderMeaning
X-RateLimit-LimitMaximum requests allowed in the current window.
X-RateLimit-RemainingRequests left before you hit the ceiling.
X-RateLimit-ResetWhen the window resets (epoch seconds).
Retry-AfterPresent on a 429; seconds to wait before retrying.

The disciplined pattern is to slow down before you are throttled. When X-RateLimit-Remaining drops near zero, pause your sender until X-RateLimit-Reset rather than firing the requests that will bounce. A quick way to inspect the headers on a single call:

curl -i -X POST https://api.vedika.io/api/v1/astrology/query \
  -H "x-api-key: vk_live_yourkey" \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What does my Moon sign say about my temperament?",
    "birthDetails": {
      "datetime": "1990-05-15T08:30:00",
      "latitude": 18.5204,
      "longitude": 73.8567,
      "timezone": "Asia/Kolkata"
    }
  }'
# Inspect the X-RateLimit-* headers in the response before scaling up.

Retry 429s with backoff and jitter

When you do get a 429, the correct response is to wait and retry — but not on a fixed delay, and never in a tight loop. Fixed delays cause synchronised retry storms where every worker wakes at the same moment and overwhelms the window again. Exponential backoff with random jitter spreads the retries out. Always prefer the server's Retry-After value when it is present; fall back to computed backoff when it is not.

Node.js

const BASE_URL = "https://api.vedika.io";

async function vedikaQuery(body, { maxRetries = 5 } = {}) {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    const res = await fetch(`${BASE_URL}/api/v1/astrology/query`, {
      method: "POST",
      headers: {
        "x-api-key": process.env.VEDIKA_API_KEY,
        "Content-Type": "application/json",
      },
      body: JSON.stringify(body),
    });

    if (res.status !== 429) return res; // success or a non-retryable error

    // Honour Retry-After if the server sent it, else exponential backoff + jitter
    const retryAfter = Number(res.headers.get("retry-after"));
    const backoff = Number.isFinite(retryAfter)
      ? retryAfter * 1000
      : Math.min(2 ** attempt * 500, 30000);
    const jitter = Math.random() * 250;
    await new Promise((r) => setTimeout(r, backoff + jitter));
  }
  throw new Error("Rate limit retries exhausted");
}

Python

import os, time, random, requests

BASE_URL = "https://api.vedika.io"

def vedika_query(body, max_retries=5):
    headers = {
        "x-api-key": os.environ["VEDIKA_API_KEY"],
        "Content-Type": "application/json",
    }
    for attempt in range(max_retries + 1):
        res = requests.post(
            f"{BASE_URL}/api/v1/astrology/query", json=body, headers=headers
        )
        if res.status_code != 429:
            return res

        retry_after = res.headers.get("Retry-After")
        backoff = (
            float(retry_after)
            if retry_after else min(2 ** attempt * 0.5, 30)
        )
        time.sleep(backoff + random.uniform(0, 0.25))
    raise RuntimeError("Rate limit retries exhausted")

Only retry on 429 and on transient 5xx responses. A 400 (bad birth data), 401 (key problem), or 402 (insufficient wallet balance) will never succeed on retry — retrying them just wastes time and, in the case of a balance error, is a signal to top up rather than to loop.

Cache what never changes

The single biggest lever for staying under a rate limit is not calling the API at all. A birth chart is fixed: a person's natal positions, divisional charts (D1 through D60), Vimshottari dasha sequence, and ashtakavarga bindus do not change after birth. Compute them once via /v2/astrology/* and store the result keyed on the normalised birth input.

In practice, teams that cache the computed layer find their actual API call volume drops sharply, because repeat visits to the same chart are served locally. That keeps you under the request ceiling and trims the per-query spend at the same time.

Batch and queue bursty workloads

Matrimony platforms running compatibility sweeps, and dashboards refreshing transits for thousands of users, generate spiky load. Rather than firing every request the moment a job starts, push the work through a queue with a bounded concurrency limit tuned to your X-RateLimit-Limit.

  1. Set a concurrency cap below your per-window limit — for example, run 5–10 workers rather than unbounded parallelism.
  2. Spread overnight jobs. A daily transit refresh for a large user base does not need to finish in one minute; pace it across the window and you will never see a 429.
  3. Drain the queue on backpressure. When X-RateLimit-Remaining approaches zero, pause the queue until X-RateLimit-Reset instead of letting workers hammer the wall.
  4. Prefer computation endpoints for bulk. If a sweep only needs scores or yogas, use the /v2/astrology/* compute path; reserve the heavier AI query path for the moments a user actually asks a question.

You can prototype all of this against the free sandbox, which needs no API key, so you can validate your backoff and queue logic before a single real request is metered.

Key facts

FAQ

What HTTP status does the Vedika API return when I exceed the rate limit?

A 429 Too Many Requests, accompanied by a Retry-After header indicating how many seconds to wait. The X-RateLimit-* headers on every response let you back off before you ever reach that point.

How is a 429 different from a 402?

A 429 means you sent too many requests too quickly — wait and retry. A 402 means your wallet balance is insufficient to cover the query — retrying will not help; add funds or move to a higher plan. They are separate ceilings.

Does streaming count differently against my rate limit?

One streaming connection to /api/v1/astrology/query/stream counts as a single request, not one per token or SSE event. Streaming changes how the answer is delivered, not how the request is metered.

How do I avoid hitting limits on a large batch job?

Cache the immutable computed layer so repeat charts never hit the API, run the job through a queue with bounded concurrency tuned to your X-RateLimit-Limit, and use the lighter /v2/astrology/* computation endpoints for bulk work. Pace overnight refreshes across the window rather than firing them all at once.

Can I test my retry logic without spending money?

Yes. The free sandbox at vedika.io/sandbox exposes mock endpoints with no API key required, so you can validate backoff, jitter, and queue behaviour before any real request is metered. See the API docs for the full endpoint reference.

Build on the Vedika astrology API

700+ operations, Vedic + Western + KP, 30 languages, an open-source XALEN ephemeris, and a built-in LLM. Free sandbox — no signup.

Try the free sandbox