Skip to main content

What is concurrency?

Concurrency refers to the number of API requests you can have in progress (or running) simultaneously. If your plan supports 10 concurrent requests, you can process up to 10 requests at the same time. You’ll get a rate limit error if you send an 11th request while 10 are already processing. Think of concurrency like a team of workers in an office. Each worker represents a “concurrent request slot.” If you have 10 workers, you can assign them 10 tasks (requests) simultaneously. If you try to assign an 11th task while all workers are occupied, you’ll need to wait until one worker finishes. In cloro, each “task” is an API request to an AI model, and each “worker” is a concurrent request slot available based on your subscription.

Rate limits vs. concurrency limits

cloro uses two different types of limits depending on the endpoint type:
Limit typeEndpoints affectedHow it works
Rate limitsAll endpoints (/v1/*)500 requests per second shared across all endpoints
Concurrency limitsMonitor endpoints (/v1/monitor/*)Based on your subscription plan (simultaneous requests)
Rate limits restrict how many requests you can make per second, while concurrency limits restrict how many requests can be processing simultaneously. Monitor endpoints are subject to both rate limits (500/sec) and concurrency limits (subscription-based).

Monitoring concurrency with headers

Each response includes HTTP headers to help you manage and optimize your API usage:
HeaderDescription
X-Concurrent-LimitTotal concurrent requests allowed by your plan
X-Concurrent-CurrentNumber of requests currently processing
X-Concurrent-RemainingAvailable concurrent slots when request was received
For example, if your plan supports 20 concurrent requests and you send 3 requests simultaneously:
X-Concurrent-Limit: 20
X-Concurrent-Current: 3
X-Concurrent-Remaining: 17
This means 17 slots were available when the request was processed.

Monitoring rate limits with headers

All endpoints include rate limit headers in each response:
HeaderDescription
X-RateLimit-LimitMaximum requests per second allowed (500)
X-RateLimit-RemainingRemaining requests available in this second
For example, if you make a request:
X-RateLimit-Limit: 500
X-RateLimit-Remaining: 499
This means you can make 499 more requests in the current second before hitting the rate limit. The counter resets every second.

Using headers for optimization

Monitor these headers to optimize your request patterns:
function checkConcurrencyUsage(response) {
  const limit = parseInt(response.headers["x-concurrent-limit"]);
  const current = parseInt(response.headers["x-concurrent-current"]);
  const remaining = parseInt(response.headers["x-concurrent-remaining"]);

  console.log(`Concurrency: ${current}/${limit} (${remaining} available)`);

  // Adjust your batch size based on remaining slots
  return Math.min(remaining, 5); // Don't exceed 5 requests per batch
}
For all endpoints with rate limits, you can monitor usage and implement backoff:
async function checkRateLimit(response) {
  const limit = parseInt(response.headers["x-ratelimit-limit"]);
  const remaining = parseInt(response.headers["x-ratelimit-remaining"]);

  console.log(`Rate limit: ${remaining}/${limit} requests remaining`);

  // If you're running low on requests, wait before continuing
  if (remaining < 50) {
    const waitTime = 1000; // Wait 1 second for the counter to reset
    console.log(`Rate limit nearly exceeded. Waiting ${waitTime}ms...`);
    await new Promise((resolve) => setTimeout(resolve, waitTime));
  }

  return remaining;
}

Implementation patterns

Most programming languages require explicit concurrency handling. Here are proven approaches:

Pattern 1: Async with webhooks

For large-scale processing, submit tasks and handle results via webhooks. You don’t need to send requests in batches - cloro handles concurrency automatically. Send API requests for all your tasks concurrently (one request per task):
import axios from "axios";

const API_KEY = process.env.API_KEY;
const TASK_API = "https://api.cloro.dev/v1/async/task";

async function submitTasks(tasks, webhookUrl) {
  // Send API requests concurrently (one request per task)
  await Promise.all(
    tasks.map((task) =>
      axios.post(
        TASK_API,
        {
          taskType: "CHATGPT",
          webhook: { url: webhookUrl },
          payload: task,
        },
        {
          headers: { Authorization: `Bearer ${API_KEY}` },
        }
      )
    )
  );
}

// Webhook handler (Express.js)
app.post("/webhook-handler", (req, res) => {
  const { task, response } = req.body;
  console.log(`Task ${task.id} completed: ${response.text.slice(0, 100)}...`);

  // Process your result here
  saveResult(task.id, response);

  // Always respond quickly
  res.status(200).send();
});

// Usage
const tasks = [
  { prompt: "Analyze market trends", country: "US" },
  { prompt: "Research competitors", country: "US" },
  // ... hundreds more
];

submitTasks(tasks, "https://your-app.com/webhook-handler");

Pattern 2: Concurrent workers

For real-time processing where you want immediate results, run multiple workers that make direct API calls:
import axios from "axios";

const API_KEY = process.env.API_KEY;
const API_URL = "https://api.cloro.dev/v1/monitor/chatgpt";

async function makeRequest(id, prompt) {
  const start = Date.now();

  try {
    const response = await axios.post(API_URL, {
      prompt: prompt,
      country: "US",
    }, {
      headers: {
        Authorization: `Bearer ${API_KEY}`
      }
    });

    const latency = Date.now() - start;
    console.log(`Request #${id}: Success (${latency}ms)`);

    // Monitor concurrency usage
    const limit = parseInt(response.headers["x-concurrent-limit"]);
    const current = parseInt(response.headers["x-concurrent-current"]);
    const remaining = parseInt(response.headers["x-concurrent-remaining"]);

    return {
      success: true,
      latency,
      data: response.data,
      usage: { limit, current, remaining }
    };

  } catch (error) {
    const latency = Date.now() - start;
    console.log(`Request #${id}: Failed (${latency}ms)`);

    if (error.response?.status === 429) {
      console.log(`Rate limited - ${error.response.headers["retry-after"] || "unknown"} seconds`);
    }

    return { success: false, latency, error: error.message };
  }
}

async function runConcurrentRequests(prompts, concurrency = 10) {
  console.log(`Starting ${prompts.length} requests with ${concurrency} concurrent workers\n`);

  const startTime = Date.now();
  const results = [];
  let requestId = 0;

  // Worker function
  async function worker() {
    while (requestId < prompts.length) {
      const id = ++requestId;
      const result = await makeRequest(id, prompts[requestId - 1]);
      results.push(result);
    }
  }

  // Run concurrent workers
  await Promise.all(
    Array(concurrency).fill(0).map(() => worker())
  );

  const duration = Date.now() - startTime;
  const successful = results.filter(r => r.success).length;
  const rateLimited = results.filter(r => r.error?.includes('Rate limited')).length;

  console.log("\n" + "=".repeat(40));
  console.log(`Total: ${prompts.length}`);
  console.log(`Success: ${successful} (${((successful/prompts.length)*100).toFixed(1)}%)`);
  console.log(`Rate limited: ${rateLimited}`);
  console.log(`Duration: ${(duration/1000).toFixed(1)}s`);
  console.log(`RPS: ${(prompts.length/duration*1000).toFixed(1)}`);
  console.log("=".repeat(40));

  return results;
}

// Usage
const prompts = [
  "What is AI and how does it work?",
  "Explain machine learning basics",
  "What are neural networks?",
  "How does deep learning work?",
  "What is natural language processing?",
  // ... add more prompts as needed
];

runConcurrentRequests(prompts, 5) // Start with conservative concurrency
  .then(results => console.log(`Completed processing`))
  .catch(console.error);

Quick reference

Use casePatternWhen to use
Large batchesAsync + webhooksLarge batches, don’t need immediate results
Real-time resultsConcurrent workersNeed immediate responses, smaller batches

Common questions

Why am I getting 429 rate limit errors?

A 429 error means you’re hitting rate limits. This can happen for two reasons: Concurrency limit exceeded (monitor endpoints only) You’re making too many simultaneous requests beyond your plan’s concurrent request limit. Solution:
  • Check your current usage with response headers: X-Concurrent-Limit, X-Concurrent-Current, X-Concurrent-Remaining
  • Implement request queuing in your application
  • Use exponential backoff when retrying
  • Upgrade your plan for higher limits
Rate limit exceeded (all endpoints) You’ve exceeded 500 requests per second. Solution:
  • Monitor X-RateLimit-Remaining header
  • Spread requests over time (the counter resets every second)
  • Use the async queue for non-time-sensitive requests
  • Implement retry logic with exponential backoff
See the error handling guide for detailed error responses and implementation patterns.

How do I check my concurrency limit?

Your concurrency limit is shown in the response headers of every API call:
X-Concurrent-Limit: 20        # Your total limit
X-Concurrent-Current: 5       # Currently in use
X-Concurrent-Remaining: 15    # Available slots
You can also use the async status endpoint to see concurrency stats of your account.

Can I increase my concurrency limit?

Yes, you can increase your concurrency limit by upgrading your plan. Changes to your concurrency limit take effect immediately after upgrading.

What’s the best way to handle large batches of requests?

For large batches, choose the right pattern based on your needs: For non-time-sensitive batches (recommended):
  • Use Pattern 1: Async with webhooks
  • Send all API requests concurrently - cloro handles queuing automatically
  • Receive results via webhook when complete
  • No need to manage concurrency yourself
For real-time results:
  • Use Pattern 2: Concurrent workers
  • Respect your plan’s concurrency limit
  • Monitor X-Concurrent-Remaining header
  • Implement exponential backoff for 429 errors