Concurrency

What is concurrency?

Concurrency refers to the number of API requests you can have in progress (or running) simultaneously. If your plan supports 10 concurrent requests, you can process up to 10 requests at the same time. You’ll get a rate limit error if you send an 11th request while 10 are already processing. Think of concurrency like a team of workers in an office. Each worker represents a “concurrent request slot.” If you have 10 workers, you can assign them 10 tasks (requests) simultaneously. If you try to assign an 11th task while all workers are occupied, you’ll need to wait until one worker finishes. In cloro, each “task” is an API request to an AI model, and each “worker” is a concurrent request slot available based on your subscription.

Monitoring Concurrency with Headers

Each response includes HTTP headers to help you manage and optimize your API usage:

Header	Description
`X-Concurrent-Limit`	Total concurrent requests allowed by your plan
`X-Concurrent-Current`	Number of requests currently processing
`X-Concurrent-Remaining`	Available concurrent slots when request was received

For example, if your plan supports 20 concurrent requests and you send 3 requests simultaneously:

X-Concurrent-Limit: 20
X-Concurrent-Current: 3
X-Concurrent-Remaining: 17

This means 17 slots were available when the request was processed.

Using headers for optimization

Monitor these headers to optimize your request patterns:

function checkConcurrencyUsage(response) {
  const limit = parseInt(response.headers["x-concurrent-limit"]);
  const current = parseInt(response.headers["x-concurrent-current"]);
  const remaining = parseInt(response.headers["x-concurrent-remaining"]);

  console.log(`Concurrency: ${current}/${limit} (${remaining} available)`);

  // Adjust your batch size based on remaining slots
  return Math.min(remaining, 5); // Don't exceed 5 requests per batch
}

Implementation patterns

Most programming languages require explicit concurrency handling. Here are proven approaches:

Pattern 1: Async with webhooks

For large-scale processing, submit tasks and handle results via webhooks. You don’t need to send requests in batches - cloro handles concurrency automatically. You can submit all tasks at once:

import axios from "axios";

const API_KEY = process.env.API_KEY;
const TASK_API = "https://api.cloro.dev/v1/task";

async function submitTasks(tasks, webhookUrl) {
  // Submit all tasks at once - cloro handles concurrency automatically
  await Promise.all(
    tasks.map((task) =>
      axios.post(
        TASK_API,
        {
          taskType: "CHATGPT",
          webhook: { url: webhookUrl },
          payload: task,
        },
        {
          headers: { Authorization: `Bearer ${API_KEY}` },
        }
      )
    )
  );
}

// Webhook handler (Express.js)
app.post("/webhook-handler", (req, res) => {
const { task, response } = req.body;
console.log(`Task ${task.id} completed: ${response.text.slice(0, 100)}...`);

// Process your result here
saveResult(task.id, response);

// Always respond quickly
res.status(200).send();
});

// Usage
const tasks = [
{ prompt: "Analyze market trends", country: "US" },
{ prompt: "Research competitors", country: "US" },
// ... hundreds more
];

submitTasks(tasks, "https://your-app.com/webhook-handler");

Pattern 2: Concurrent workers

For real-time processing where you want immediate results, run multiple workers that make direct API calls:

import axios from "axios";

const API_KEY = process.env.API_KEY;
const API_URL = "https://api.cloro.dev/v1/monitor/chatgpt";

async function makeRequest(id, prompt) {
const start = Date.now();

try {
const response = await axios.post(API_URL, {
prompt: prompt,
country: "US",
}, {
headers: {
Authorization: `Bearer ${API_KEY}`
}
});

    const latency = Date.now() - start;
    console.log(`Request #${id}: Success (${latency}ms)`);

    // Monitor concurrency usage
    const limit = parseInt(response.headers["x-concurrent-limit"]);
    const current = parseInt(response.headers["x-concurrent-current"]);
    const remaining = parseInt(response.headers["x-concurrent-remaining"]);

    return {
      success: true,
      latency,
      data: response.data,
      usage: { limit, current, remaining }
    };

} catch (error) {
const latency = Date.now() - start;
console.log(`Request #${id}: Failed (${latency}ms)`);

    if (error.response?.status === 429) {
      console.log(`Rate limited - ${error.response.headers["retry-after"] || "unknown"} seconds`);
    }

    return { success: false, latency, error: error.message };

}
}

async function runConcurrentRequests(prompts, concurrency = 10) {
console.log(`Starting ${prompts.length} requests with ${concurrency} concurrent workers\n`);

const startTime = Date.now();
const results = [];
let requestId = 0;

// Worker function
async function worker() {
while (requestId < prompts.length) {
const id = ++requestId;
const result = await makeRequest(id, prompts[requestId - 1]);
results.push(result);
}
}

// Run concurrent workers
await Promise.all(
Array(concurrency).fill(0).map(() => worker())
);

const duration = Date.now() - startTime;
const successful = results.filter(r => r.success).length;
const rateLimited = results.filter(r => r.error?.includes('Rate limited')).length;

console.log("\n" + "=".repeat(40));
console.log(`Total: ${prompts.length}`);
console.log(`Success: ${successful} (${((successful/prompts.length)*100).toFixed(1)}%)`);
console.log(`Rate limited: ${rateLimited}`);
console.log(`Duration: ${(duration/1000).toFixed(1)}s`);
console.log(`RPS: ${(prompts.length/duration*1000).toFixed(1)}`);
console.log("=".repeat(40));

return results;
}

// Usage
const prompts = [
"What is AI and how does it work?",
"Explain machine learning basics",
"What are neural networks?",
"How does deep learning work?",
"What is natural language processing?",
// ... add more prompts as needed
];

runConcurrentRequests(prompts, 5) // Start with conservative concurrency
.then(results => console.log(`Completed processing`))
.catch(console.error);

Quick reference

Use case	Pattern	When to use
Large batches	Async + webhooks	Large batches, don’t need immediate results
Real-time results	Concurrent workers	Need immediate responses, smaller batches

Product

Documentation

API Reference

What is concurrency?

Monitoring Concurrency with Headers

Using headers for optimization

Implementation patterns

Pattern 1: Async with webhooks

Pattern 2: Concurrent workers

Quick reference

Product

Documentation

API Reference

​What is concurrency?

​Monitoring Concurrency with Headers

​Using headers for optimization

​Implementation patterns

​Pattern 1: Async with webhooks

​Pattern 2: Concurrent workers

​Quick reference

What is concurrency?

Monitoring Concurrency with Headers

Using headers for optimization

Implementation patterns

Pattern 1: Async with webhooks

Pattern 2: Concurrent workers

Quick reference