Building a Layer-4 Load Balancer from Scratch in Go

We all use Nginx, HAProxy, or AWS ELB daily. They are the gatekeepers of our microservices, magically distributing traffic to healthy instances. But to many developers, they remain "black boxes."

In this post, we are going to break that black box open. We will build a high-performance Layer-4 (TCP) Load Balancer in Go using only the standard library. No frameworks, no magic, just raw TCP sockets, goroutines, and atomic operations.

By the end, you'll understand exactly how traffic shifting, health checks, and connection handling work under the hood.

Layer 4 vs. Layer 7: What Are We Building?

Before writing code, we must distinguish between the two main types of load balancing:

Layer 7 (Application): Understands HTTP. Can route based on headers, cookies, or URL paths (e.g., /api/v1 goes to Server A).
Layer 4 (Transport): Blind to the content. It simply sees TCP packets. It forwards bytes from Client IP to Backend IP. It's faster and protocol-agnostic (works for HTTP, gRPC, Redis, databases).

We are building a Layer 4 Load Balancer. It will accept a connection, pick a backend, and shovel bytes back and forth.

Step 1: The Core Mechanism (Shoveling Bytes)

At its heart, a TCP proxy is simple:

Accept a connection from a client.
Dial a connection to a backend server.
Copy bytes from Client → Backend.
Copy bytes from Backend → Client.

In Go, io.Copy combined with goroutines makes this trivial.

func proxy(clientConn net.Conn, backendURL string) {
    backendConn, err := net.Dial("tcp", backendURL)
    if err != nil {
        log.Printf("Failed to dial backend: %v", err)
        clientConn.Close()
        return
    }

    // Pipe data: Client -> Backend
    go io.Copy(backendConn, clientConn)

    // Pipe data: Backend -> Client
    go io.Copy(clientConn, backendConn)
}

This is a basic proxy. But it's not a balancer yet because it only knows one backend.

Step 2: The Round-Robin Algorithm

To distribute traffic, we need a pool of backends and a strategy to pick one. We'll implement Round-Robin (selecting servers sequentially: A, B, C, A, B...).

Because our load balancer will handle thousands of concurrent requests, our selection logic must be thread-safe. We'll use Go's sync/atomic package for lock-free performance.

type Backend struct {
    URL   string
    Alive bool
    mux   sync.RWMutex
}

type ServerPool struct {
    backends []*Backend
    current  uint64
}

func (s *ServerPool) NextIndex() int {
    return int(atomic.AddUint64(&s.current, 1) % uint64(len(s.backends)))
}

func (s *ServerPool) GetNextPeer() *Backend {
    next := s.NextIndex()
    l := len(s.backends) + next

    for i := next; i < l; i++ {
        idx := i % len(s.backends)

        if s.backends[idx].IsAlive() {
            atomic.StoreUint64(&s.current, uint64(idx))
            return s.backends[idx]
        }
    }

    return nil
}

Step 3: Active Health Checks

A load balancer that sends traffic to a dead server is useless. We need a background worker that periodically "pings" the backends. If a server dies, we mark it Alive = false so the Round-Robin loop skips it.

func (s *ServerPool) HealthCheck() {
    for _, b := range s.backends {
        status := "up"

        conn, err := net.DialTimeout("tcp", b.URL, 2*time.Second)
        if err != nil {
            status = "down"
            b.SetAlive(false)
        } else {
            conn.Close()
            b.SetAlive(true)
        }

        log.Printf("%s [%s]", b.URL, status)
    }
}

Complete Implementation

Below is the full runnable main.go. This program listens on port 3030 and balances traffic between three backend servers.

package main

import (
    "io"
    "log"
    "net"
    "sync"
    "sync/atomic"
    "time"
)

// ---------------- Backend ----------------

type Backend struct {
    URL   string
    Alive bool
    mux   sync.RWMutex
}

func (b *Backend) SetAlive(alive bool) {
    b.mux.Lock()
    b.Alive = alive
    b.mux.Unlock()
}

func (b *Backend) IsAlive() bool {
    b.mux.RLock()
    defer b.mux.RUnlock()
    return b.Alive
}

// ---------------- Server Pool ----------------

type ServerPool struct {
    backends []*Backend
    current  uint64
}

func (s *ServerPool) NextIndex() int {
    return int(atomic.AddUint64(&s.current, 1) % uint64(len(s.backends)))
}

func (s *ServerPool) GetNextPeer() *Backend {
    next := s.NextIndex()
    l := len(s.backends) + next

    for i := next; i < l; i++ {
        idx := i % len(s.backends)

        if s.backends[idx].IsAlive() {
            if i != next {
                atomic.StoreUint64(&s.current, uint64(idx))
            }
            return s.backends[idx]
        }
    }

    return nil
}

func (s *ServerPool) HealthCheck() {
    for _, b := range s.backends {
        conn, err := net.DialTimeout("tcp", b.URL, 2*time.Second)
        if err != nil {
            log.Printf("Backend unreachable: %s", b.URL)
            b.SetAlive(false)
        } else {
            conn.Close()
            b.SetAlive(true)
        }
    }
}

// ---------------- Proxy Logic ----------------

func proxy(src net.Conn, dest *Backend) {
    dst, err := net.Dial("tcp", dest.URL)
    if err != nil {
        log.Printf("Backend unavailable: %s", dest.URL)
        src.Close()
        return
    }
    defer dst.Close()

    done := make(chan struct{})

    go func() {
        io.Copy(dst, src)
        done <- struct{}{}
    }()

    go func() {
        io.Copy(src, dst)
        done <- struct{}{}
    }()

    <-done
}

// ---------------- Main ----------------

func main() {
    backends := []*Backend{
        {URL: "localhost:5001", Alive: true},
        {URL: "localhost:5002", Alive: true},
        {URL: "localhost:5003", Alive: true},
    }

    serverPool := ServerPool{backends: backends}

    // Health check loop
    go func() {
        for {
            serverPool.HealthCheck()
            time.Sleep(10 * time.Second)
        }
    }()

    listener, err := net.Listen("tcp", ":3030")
    if err != nil {
        log.Fatal("Error starting load balancer listener")
    }
    defer listener.Close()

    log.Println("Load Balancer started on port 3030")

    for {
        conn, err := listener.Accept()
        if err != nil {
            log.Printf("Error accepting connection: %v", err)
            continue
        }

        peer := serverPool.GetNextPeer()
        if peer != nil {
            go proxy(conn, peer)
        } else {
            log.Println("No available backends")
            conn.Close()
        }
    }
}

Testing It Out

Start Backend Servers

python3 -m http.server 5001
python3 -m http.server 5002
python3 -m http.server 5003

Start the Load Balancer

go run main.go

Send Requests

curl localhost:3030

You will see requests distributed across the backend servers.

If you kill one backend, the health check loop will detect it and stop routing traffic to that instance automatically.

Key Takeaways

Building a load balancer demystifies distributed systems. We learned that:

Concurrency is Key: Goroutines allow thousands of simultaneous connections with minimal overhead.
Atomic Operations: sync/atomic enables lock-free round-robin state management.
Resilience Requires Monitoring: Active health checks are essential for production-grade systems.
Layer 4 Is Powerful: Even without understanding HTTP, we can build a high-performance, protocol-agnostic traffic distributor.

While tools like Nginx and HAProxy are battle-tested and production-ready, understanding how a Layer-4 proxy works gives you deeper insight into networking, performance, and system design.