We all use Nginx, HAProxy, or AWS ELB daily. They are the gatekeepers of our microservices, magically distributing traffic to healthy instances. But to many developers, they remain "black boxes."
In this post, we are going to break that black box open. We will build a high-performance Layer-4 (TCP) Load Balancer in Go using only the standard library. No frameworks, no magic, just raw TCP sockets, goroutines, and atomic operations.
By the end, you'll understand exactly how traffic shifting, health checks, and connection handling work under the hood.
Layer 4 vs. Layer 7: What Are We Building?
Before writing code, we must distinguish between the two main types of load balancing:
- Layer 7 (Application): Understands HTTP. Can route based on headers, cookies, or URL paths (e.g.,
/api/v1goes to Server A). - Layer 4 (Transport): Blind to the content. It simply sees TCP packets. It forwards bytes from Client IP to Backend IP. It's faster and protocol-agnostic (works for HTTP, gRPC, Redis, databases).
We are building a Layer 4 Load Balancer. It will accept a connection, pick a backend, and shovel bytes back and forth.
Step 1: The Core Mechanism (Shoveling Bytes)
At its heart, a TCP proxy is simple:
- Accept a connection from a client.
- Dial a connection to a backend server.
- Copy bytes from Client → Backend.
- Copy bytes from Backend → Client.
In Go, io.Copy combined with goroutines makes this trivial.
func proxy(clientConn net.Conn, backendURL string) {
backendConn, err := net.Dial("tcp", backendURL)
if err != nil {
log.Printf("Failed to dial backend: %v", err)
clientConn.Close()
return
}
// Pipe data: Client -> Backend
go io.Copy(backendConn, clientConn)
// Pipe data: Backend -> Client
go io.Copy(clientConn, backendConn)
}
This is a basic proxy. But it's not a balancer yet because it only knows one backend.
Step 2: The Round-Robin Algorithm
To distribute traffic, we need a pool of backends and a strategy to pick one. We'll implement Round-Robin (selecting servers sequentially: A, B, C, A, B...).
Because our load balancer will handle thousands of concurrent requests, our selection logic must be thread-safe. We'll use Go's sync/atomic package for lock-free performance.
type Backend struct {
URL string
Alive bool
mux sync.RWMutex
}
type ServerPool struct {
backends []*Backend
current uint64
}
func (s *ServerPool) NextIndex() int {
return int(atomic.AddUint64(&s.current, 1) % uint64(len(s.backends)))
}
func (s *ServerPool) GetNextPeer() *Backend {
next := s.NextIndex()
l := len(s.backends) + next
for i := next; i < l; i++ {
idx := i % len(s.backends)
if s.backends[idx].IsAlive() {
atomic.StoreUint64(&s.current, uint64(idx))
return s.backends[idx]
}
}
return nil
}
Step 3: Active Health Checks
A load balancer that sends traffic to a dead server is useless. We need a background worker that periodically "pings" the backends. If a server dies, we mark it Alive = false so the Round-Robin loop skips it.
func (s *ServerPool) HealthCheck() {
for _, b := range s.backends {
status := "up"
conn, err := net.DialTimeout("tcp", b.URL, 2*time.Second)
if err != nil {
status = "down"
b.SetAlive(false)
} else {
conn.Close()
b.SetAlive(true)
}
log.Printf("%s [%s]", b.URL, status)
}
}
Complete Implementation
Below is the full runnable main.go. This program listens on port 3030 and balances traffic between three backend servers.
package main
import (
"io"
"log"
"net"
"sync"
"sync/atomic"
"time"
)
// ---------------- Backend ----------------
type Backend struct {
URL string
Alive bool
mux sync.RWMutex
}
func (b *Backend) SetAlive(alive bool) {
b.mux.Lock()
b.Alive = alive
b.mux.Unlock()
}
func (b *Backend) IsAlive() bool {
b.mux.RLock()
defer b.mux.RUnlock()
return b.Alive
}
// ---------------- Server Pool ----------------
type ServerPool struct {
backends []*Backend
current uint64
}
func (s *ServerPool) NextIndex() int {
return int(atomic.AddUint64(&s.current, 1) % uint64(len(s.backends)))
}
func (s *ServerPool) GetNextPeer() *Backend {
next := s.NextIndex()
l := len(s.backends) + next
for i := next; i < l; i++ {
idx := i % len(s.backends)
if s.backends[idx].IsAlive() {
if i != next {
atomic.StoreUint64(&s.current, uint64(idx))
}
return s.backends[idx]
}
}
return nil
}
func (s *ServerPool) HealthCheck() {
for _, b := range s.backends {
conn, err := net.DialTimeout("tcp", b.URL, 2*time.Second)
if err != nil {
log.Printf("Backend unreachable: %s", b.URL)
b.SetAlive(false)
} else {
conn.Close()
b.SetAlive(true)
}
}
}
// ---------------- Proxy Logic ----------------
func proxy(src net.Conn, dest *Backend) {
dst, err := net.Dial("tcp", dest.URL)
if err != nil {
log.Printf("Backend unavailable: %s", dest.URL)
src.Close()
return
}
defer dst.Close()
done := make(chan struct{})
go func() {
io.Copy(dst, src)
done <- struct{}{}
}()
go func() {
io.Copy(src, dst)
done <- struct{}{}
}()
<-done
}
// ---------------- Main ----------------
func main() {
backends := []*Backend{
{URL: "localhost:5001", Alive: true},
{URL: "localhost:5002", Alive: true},
{URL: "localhost:5003", Alive: true},
}
serverPool := ServerPool{backends: backends}
// Health check loop
go func() {
for {
serverPool.HealthCheck()
time.Sleep(10 * time.Second)
}
}()
listener, err := net.Listen("tcp", ":3030")
if err != nil {
log.Fatal("Error starting load balancer listener")
}
defer listener.Close()
log.Println("Load Balancer started on port 3030")
for {
conn, err := listener.Accept()
if err != nil {
log.Printf("Error accepting connection: %v", err)
continue
}
peer := serverPool.GetNextPeer()
if peer != nil {
go proxy(conn, peer)
} else {
log.Println("No available backends")
conn.Close()
}
}
}
Testing It Out
Start Backend Servers
python3 -m http.server 5001
python3 -m http.server 5002
python3 -m http.server 5003
Start the Load Balancer
go run main.go
Send Requests
curl localhost:3030
You will see requests distributed across the backend servers.
If you kill one backend, the health check loop will detect it and stop routing traffic to that instance automatically.
Key Takeaways
Building a load balancer demystifies distributed systems. We learned that:
- Concurrency is Key: Goroutines allow thousands of simultaneous connections with minimal overhead.
- Atomic Operations:
sync/atomicenables lock-free round-robin state management. - Resilience Requires Monitoring: Active health checks are essential for production-grade systems.
- Layer 4 Is Powerful: Even without understanding HTTP, we can build a high-performance, protocol-agnostic traffic distributor.
While tools like Nginx and HAProxy are battle-tested and production-ready, understanding how a Layer-4 proxy works gives you deeper insight into networking, performance, and system design.