Go Concurrency Patterns I've Actually Used in AI Pipelines

Goroutines, channels, context cancellation how these patterns show up when building RAG and LLM pipelines in Go.

published

May 8, 2026

4 min read

go, concurrency, backend, rag, ai, advor

Go's concurrency model fits AI pipeline work well. LLM calls are slow and I/O-bound goroutines let you run multiple things in parallel without blocking, and context cancellation gives you clean timeouts on API calls that might hang. Here's how these patterns actually show up in practice.

Goroutines for Parallel LLM Calls

When a RAG pipeline retrieves multiple chunks and needs to process them in parallel or when you're running multiple LLM calls simultaneously goroutines are the straightforward tool.

func processChunks(ctx context.Context, chunks []string) ([]Result, error) {
    results := make([]Result, len(chunks))
    var wg sync.WaitGroup
    var mu sync.Mutex
    var firstErr error

    for i, chunk := range chunks {
        wg.Add(1)
        go func(i int, chunk string) {
            defer wg.Done()

            result, err := embedChunk(ctx, chunk)
            if err != nil {
                mu.Lock()
                if firstErr == nil {
                    firstErr = err
                }
                mu.Unlock()
                return
            }

            mu.Lock()
            results[i] = result
            mu.Unlock()
        }(i, chunk)
    }

    wg.Wait()
    return results, firstErr
}

sync.WaitGroup coordinates when all goroutines finish. The mutex protects the shared results slice from concurrent writes. The firstErr pattern captures the first failure without losing other results.

Context Cancellation for API Timeouts

LLM API calls can hang. Without a timeout, a stuck goroutine blocks forever. context.WithTimeout gives you a hard deadline that propagates down through every function that accepts a context.

func callLLM(parentCtx context.Context, prompt string) (string, error) {
    ctx, cancel := context.WithTimeout(parentCtx, 30*time.Second)
    defer cancel()

    req, err := http.NewRequestWithContext(ctx, "POST", llmEndpoint, buildBody(prompt))
    if err != nil {
        return "", err
    }

    resp, err := http.DefaultClient.Do(req)
    if err != nil {
        if ctx.Err() == context.DeadlineExceeded {
            return "", fmt.Errorf("LLM call timed out after 30s")
        }
        return "", err
    }
    defer resp.Body.Close()

    return parseResponse(resp)
}

The key is http.NewRequestWithContext it ties the HTTP request lifecycle to your context. When the deadline hits, the request cancels automatically. No manual cleanup.

Channels for Streaming Responses

When streaming LLM output token by token, channels model the data flow cleanly. The producer goroutine pushes tokens as they arrive; the consumer processes them without waiting for the full response.

func streamLLMResponse(ctx context.Context, prompt string) <-chan string {
    out := make(chan string)

    go func() {
        defer close(out)

        stream, err := openLLMStream(ctx, prompt)
        if err != nil {
            return
        }
        defer stream.Close()

        for {
            token, err := stream.Next()
            if err == io.EOF {
                return
            }
            if err != nil {
                return
            }

            select {
            case out <- token:
            case <-ctx.Done():
                return
            }
        }
    }()

    return out
}

The select inside the loop handles cancellation if the parent context is cancelled while the goroutine is waiting to send, it exits cleanly instead of leaking.

Select for Timeout + Result Racing

Sometimes you want whichever result comes back first, or a timeout if nothing does. select handles this without extra synchronization:

func getWithFallback(ctx context.Context, query string) (string, error) {
    primary := make(chan string, 1)
    fallback := make(chan string, 1)

    go func() { primary <- queryPrimaryLLM(ctx, query) }()
    go func() { fallback <- queryFallbackLLM(ctx, query) }()

    select {
    case result := <-primary:
        return result, nil
    case result := <-fallback:
        return result, nil
    case <-time.After(10 * time.Second):
        return "", errors.New("both LLM calls timed out")
    case <-ctx.Done():
        return "", ctx.Err()
    }
}

Buffered channels (make(chan string, 1)) matter here if select picks one result, the other goroutine can still send without blocking forever.

What I've Learned

Goroutine leaks are the main thing to watch. A goroutine that's stuck waiting on a channel send with no receiver, or an HTTP call with no timeout, will sit in memory until the process restarts. Context propagation is what prevents this every goroutine that does I/O should accept a context and respect cancellation.

The pattern I reach for most often in pipeline work: goroutines for parallelism, sync.WaitGroup for coordination, context for cancellation, buffered channels when producers and consumers run at different speeds.

Loading View...

Goroutines for Parallel LLM Calls

When a RAG pipeline retrieves multiple chunks and needs to process them in parallel or when you're running multiple LLM calls simultaneously goroutines are the straightforward tool.

func processChunks(ctx context.Context, chunks []string) ([]Result, error) { results := make([]Result, len(chunks)) var wg sync.WaitGroup var mu sync.Mutex var firstErr error for i, chunk := range chunks { wg.Add(1) go func(i int, chunk string) { defer wg.Done() result, err := embedChunk(ctx, chunk) if err != nil { mu.Lock() if firstErr == nil { firstErr = err } mu.Unlock() return } mu.Lock() results[i] = result mu.Unlock() }(i, chunk) } wg.Wait() return results, firstErr }

Context Cancellation for API Timeouts

LLM API calls can hang. Without a timeout, a stuck goroutine blocks forever. context.WithTimeout gives you a hard deadline that propagates down through every function that accepts a context.

func callLLM(parentCtx context.Context, prompt string) (string, error) { ctx, cancel := context.WithTimeout(parentCtx, 30*time.Second) defer cancel() req, err := http.NewRequestWithContext(ctx, "POST", llmEndpoint, buildBody(prompt)) if err != nil { return "", err } resp, err := http.DefaultClient.Do(req) if err != nil { if ctx.Err() == context.DeadlineExceeded { return "", fmt.Errorf("LLM call timed out after 30s") } return "", err } defer resp.Body.Close() return parseResponse(resp) }

The key is http.NewRequestWithContext it ties the HTTP request lifecycle to your context. When the deadline hits, the request cancels automatically. No manual cleanup.

Channels for Streaming Responses

When streaming LLM output token by token, channels model the data flow cleanly. The producer goroutine pushes tokens as they arrive; the consumer processes them without waiting for the full response.

func streamLLMResponse(ctx context.Context, prompt string) <-chan string { out := make(chan string) go func() { defer close(out) stream, err := openLLMStream(ctx, prompt) if err != nil { return } defer stream.Close() for { token, err := stream.Next() if err == io.EOF { return } if err != nil { return } select { case out <- token: case <-ctx.Done(): return } } }() return out }

The select inside the loop handles cancellation if the parent context is cancelled while the goroutine is waiting to send, it exits cleanly instead of leaking.

Select for Timeout + Result Racing

Sometimes you want whichever result comes back first, or a timeout if nothing does. select handles this without extra synchronization:

func getWithFallback(ctx context.Context, query string) (string, error) { primary := make(chan string, 1) fallback := make(chan string, 1) go func() { primary <- queryPrimaryLLM(ctx, query) }() go func() { fallback <- queryFallbackLLM(ctx, query) }() select { case result := <-primary: return result, nil case result := <-fallback: return result, nil case <-time.After(10 * time.Second): return "", errors.New("both LLM calls timed out") case <-ctx.Done(): return "", ctx.Err() } }

Buffered channels (make(chan string, 1)) matter here if select picks one result, the other goroutine can still send without blocking forever.

What I've Learned

Jay Rajshakha

Go Concurrency Patterns I've Actually Used in AI Pipelines

Goroutines for Parallel LLM Calls

Context Cancellation for API Timeouts

Channels for Streaming Responses

Select for Timeout + Result Racing

What I've Learned

Go Concurrency Patterns I've Actually Used in AI Pipelines

Goroutines for Parallel LLM Calls

Context Cancellation for API Timeouts

Channels for Streaming Responses

Select for Timeout + Result Racing

What I've Learned