Goroutines, channels, context cancellation how these patterns show up when building RAG and LLM pipelines in Go.
Go's concurrency model fits AI pipeline work well. LLM calls are slow and I/O-bound goroutines let you run multiple things in parallel without blocking, and context cancellation gives you clean timeouts on API calls that might hang. Here's how these patterns actually show up in practice.
When a RAG pipeline retrieves multiple chunks and needs to process them in parallel or when you're running multiple LLM calls simultaneously goroutines are the straightforward tool.
func processChunks(ctx context.Context, chunks []string) ([]Result, error) {
results := make([]Result, len(chunks))
var wg sync.WaitGroup
var mu sync.Mutex
var firstErr error
for i, chunk := range chunks {
wg.Add(1)
go func(i int, chunk string) {
defer wg.Done()
result, err := embedChunk(ctx, chunk)
if err != nil {
mu.Lock()
if firstErr == nil {
firstErr = err
}
mu.Unlock()
return
}
mu.Lock()
results[i] = result
mu.Unlock()
}(i, chunk)
}
wg.Wait()
return results, firstErr
}
sync.WaitGroup coordinates when all goroutines finish. The mutex protects the shared results slice from concurrent writes. The firstErr pattern captures the first failure without losing other results.
LLM API calls can hang. Without a timeout, a stuck goroutine blocks forever. context.WithTimeout gives you a hard deadline that propagates down through every function that accepts a context.
func callLLM(parentCtx context.Context, prompt string) (string, error) {
ctx, cancel := context.WithTimeout(parentCtx, 30*time.Second)
defer cancel()
req, err := http.NewRequestWithContext(ctx, "POST", llmEndpoint, buildBody(prompt))
if err != nil {
return "", err
}
resp, err := http.DefaultClient.Do(req)
if err != nil {
if ctx.Err() == context.DeadlineExceeded {
return "", fmt.Errorf("LLM call timed out after 30s")
}
return "", err
}
defer resp.Body.Close()
return parseResponse(resp)
}
The key is http.NewRequestWithContext it ties the HTTP request lifecycle to your context. When the deadline hits, the request cancels automatically. No manual cleanup.
When streaming LLM output token by token, channels model the data flow cleanly. The producer goroutine pushes tokens as they arrive; the consumer processes them without waiting for the full response.
func streamLLMResponse(ctx context.Context, prompt string) <-chan string {
out := make(chan string)
go func() {
defer close(out)
stream, err := openLLMStream(ctx, prompt)
if err != nil {
return
}
defer stream.Close()
for {
token, err := stream.Next()
if err == io.EOF {
return
}
if err != nil {
return
}
select {
case out <- token:
case <-ctx.Done():
return
}
}
}()
return out
}
The select inside the loop handles cancellation if the parent context is cancelled while the goroutine is waiting to send, it exits cleanly instead of leaking.
Sometimes you want whichever result comes back first, or a timeout if nothing does. select handles this without extra synchronization:
func getWithFallback(ctx context.Context, query string) (string, error) {
primary := make(chan string, 1)
fallback := make(chan string, 1)
go func() { primary <- queryPrimaryLLM(ctx, query) }()
go func() { fallback <- queryFallbackLLM(ctx, query) }()
select {
case result := <-primary:
return result, nil
case result := <-fallback:
return result, nil
case <-time.After(10 * time.Second):
return "", errors.New("both LLM calls timed out")
case <-ctx.Done():
return "", ctx.Err()
}
}
Buffered channels (make(chan string, 1)) matter here if select picks one result, the other goroutine can still send without blocking forever.
Goroutine leaks are the main thing to watch. A goroutine that's stuck waiting on a channel send with no receiver, or an HTTP call with no timeout, will sit in memory until the process restarts. Context propagation is what prevents this every goroutine that does I/O should accept a context and respect cancellation.
The pattern I reach for most often in pipeline work: goroutines for parallelism, sync.WaitGroup for coordination, context for cancellation, buffered channels when producers and consumers run at different speeds.