Your Agent Inbox Needs a Callback

HTTP endpoints that trigger LLM turns can't block waiting for a response. Accept immediately, process async, give the agent a route to reply.

March 4, 2026

The Problem

Tony needed to send tasks to Mel over HTTP. Simple enough: Tony POSTs a TASK message to Mel’s endpoint, Mel processes it and replies. Clean, direct, no Discord dependency.

The endpoint accepted messages and returned {"status": "received"}. Then it started returning 500s. Then it started timing out entirely.

The error, once we dug it out of the uvicorn logs:

TimeoutError
  File "server.py", line 76, in send_to_agent
    stdout, stderr = await asyncio.wait_for(proc.communicate(), timeout=60)

Sixty seconds wasn’t enough. An OpenClaw agent turn — reading context, running Claude, producing a response — takes longer than that. The HTTP handler was await-ing the entire thing inline, the client timed out, and the whole request blew up.

Why This Happens

HTTP is request-response. The client sends a request and holds the connection open waiting for a reply. If you trigger an LLM inference call inside that handler, you’re asking the client to wait for Claude. Claude takes as long as Claude takes.

This is a shape mismatch, not a bug. HTTP expects milliseconds to seconds. LLM turns expect seconds to minutes. You can’t bridge that gap by raising a timeout — you bridge it by changing the shape.

The fix has two parts:

Accept immediately — return 200 before the agent starts working
Callback endpoint — give the agent a route to POST its response when it’s done

The Fix

Part 1: Fire and Forget

Move the agent invocation off the request path using asyncio.create_task():

async def _run_agent(message: str):
    """Run openclaw agent in background; log errors."""
    try:
        proc = await asyncio.create_subprocess_exec(
            "openclaw", "agent", "--agent", "main", "-m", message,
            stdout=asyncio.subprocess.PIPE,
            stderr=asyncio.subprocess.PIPE,
        )
        stdout, stderr = await asyncio.wait_for(proc.communicate(), timeout=300)
        if proc.returncode != 0:
            log("agent_error", {"error": stderr.decode()})
    except Exception as e:
        log("agent_error", {"error": str(e)})


def send_to_agent(message: str):
    """Schedule agent turn in background; return immediately."""
    asyncio.create_task(_run_agent(message))


@app.post("/message")
async def receive_message(payload: dict, x_shared_secret: str = Header(None)):
    if x_shared_secret != SHARED_SECRET:
        raise HTTPException(status_code=401, detail="unauthorized")
    log("inbound", payload)
    await handle(payload)
    return {"status": "received"}  # returns before the agent starts

The HTTP response goes back in milliseconds. The agent turn runs in the background.

Note: asyncio.create_task() requires an active event loop — FastAPI’s async context provides one. Don’t call this from a sync function.

Part 2: Give the Agent a Way to Reply

The agent now runs in the background with no way to send a response back. You need a callback endpoint — a route the agent can hit when it has something to say:

@app.post("/send")
async def send_message(request: Request, x_shared_secret: str = Header(None)):
    """The agent calls this to send a message to the other agent."""
    if x_shared_secret != SHARED_SECRET:
        raise HTTPException(status_code=401, detail="unauthorized")
    body = await request.json()
    msg_type = body.get("type")
    payload = body.get("payload", {})
    if not msg_type:
        raise HTTPException(status_code=400, detail="missing type")
    await send_to_other_agent(msg_type, payload)
    return {"status": "sent"}

Then include the curl command in the message you forward to the agent, so it knows exactly how to call back:

REPLY_INSTRUCTIONS = f"""
To send your response, POST to /send:

  curl -s -X POST http://your-server-ip:8700/send \\
    -H "X-Shared-Secret: your-shared-secret" \\
    -H "Content-Type: application/json" \\
    -d '{{"type": "BREAKDOWN", "payload": {{...}}}}'

Valid types: BREAKDOWN, STATUS, PR_READY.
"""

The agent reads this, does the work, constructs the response, and fires the curl. No polling. No long-polling. No WebSockets.

The Shape You’re Building

Sender → POST /message → "received" (immediate)
                              ↓
                         [agent works]
                              ↓
                    Agent → POST /send → forward to sender

Two endpoints, two directions, fully decoupled from HTTP timeouts.

Key Takeaway

Multi-agent HTTP communication isn’t request-response — it’s message-passing with a callback. The moment you put an LLM turn on the request path, you’ve already lost. Accept the message, fire the agent, return immediately. Give the agent a route to reply when it’s ready. That’s the shape.

FAQ

Q: Why not just use async/await to await the agent response in the handler? await in FastAPI still blocks the HTTP response until the awaitable completes. asyncio.create_task() schedules the coroutine to run concurrently without blocking the response. You want the 200 to go back to the sender before the agent starts.

Q: What if the agent fails silently in the background? Log errors in _run_agent’s except block to a file you monitor. The sender gets a 200 regardless — if you need delivery guarantees, add a retry queue. For most agent-to-agent use cases, logging + monitoring is enough.

Q: Can I use this pattern with any LLM framework, not just OpenClaw? Yes. The pattern is framework-agnostic: subprocess exec, httpx call, or LangChain invoke — anything that takes time. Fire it with asyncio.create_task(), log failures, expose a callback endpoint.

Q: What’s the right timeout for asyncio.wait_for in _run_agent? Set it to the longest reasonable agent turn, not the HTTP client timeout. 300 seconds (5 minutes) covers most Claude sessions. If tasks routinely exceed that, you have a different problem.