Zero-Copy Techniques in Go for Network Programming

In high-performance networking, minimizing memory copies is crucial. Every time data is copied—from the kernel to user space, from one buffer to another—it consumes CPU cycles and reduces throughput. Zero-copy techniques aim to reduce or eliminate these memory copy operations.

Go, known for its simplicity and productivity, provides a decent foundation for building network services. But what if you're building something where every microsecond counts, like a proxy, load balancer, or custom protocol handler? In this article, we'll explore zero-copy techniques and patterns you can adopt in Go to minimize overhead and squeeze out more performance from your network applications.

What is Zero-Copy?

Zero-copy is a method of transferring data from one memory location to another without involving the CPU in copying bytes. In the context of network programming, it means reading from the network interface and writing to disk or another socket without ever copying the data into user-space memory or intermediate buffers.

While Go doesn't expose kernel-level zero-copy primitives directly like sendfile() or splice() in C, there are ways to reduce memory copying and achieve "zero-copy-like" behavior in Go programs.

Traditional I/O vs Zero-Copy I/O

In traditional I/O, data flow looks like this:

[Disk/Socket] -> [Kernel Buffer] -> [User-space Buffer] -> [Application Logic]

In zero-copy:

[Disk/Socket] -> [Kernel Buffer] -> [Kernel Buffer / Another Socket]

Avoiding the intermediate user-space buffer can save context switches and CPU cycles, especially at scale.

Techniques for Zero-Copy or Minimal-Copy in Go

1. io.Copy with net.TCPConn

Go's io.Copy is highly optimized and uses an internal buffer. While it's not zero-copy in the strictest sense, it minimizes allocations and copying overhead. When copying between two TCP connections:

func proxyConn(dst, src net.Conn) {
    defer dst.Close()
    defer src.Close()
    io.Copy(dst, src)
}

This code reuses buffers and avoids manual data handling, which reduces CPU work.

2. Using syscall.Sendfile on Unix Systems

The syscall.Sendfile function can transfer data directly from a file descriptor (like a file or socket) to another, bypassing user space.

import (
    "syscall"
    "os"
)

func sendFile(outFd, inFd int64, count int) (int, error) {
    return syscall.Sendfile(int(outFd), int(inFd), nil, count)
}

This is close to true zero-copy. However, it works only with specific file descriptor types and lacks portability across all platforms.

3. Memory Pooling with sync.Pool

If zero-copy isn't possible, reducing allocations helps. sync.Pool lets you reuse buffers instead of allocating new ones on each read/write:

var bufferPool = sync.Pool{
    New: func() interface{} {
        return make([]byte, 32*1024) // 32KB buffer
    },
}

func proxyWithPool(dst, src net.Conn) {
    buf := bufferPool.Get().([]byte)
    defer bufferPool.Put(buf)
    io.CopyBuffer(dst, src, buf)
}

This isn't zero-copy, but it avoids frequent heap allocations and improves performance under load.

4. Advanced Zero-Copy with splice() System Call

For advanced users, you can use the splice() system call on Linux to achieve true zero-copy between file descriptors:

package main

import (
    "syscall"
    "unsafe"
)

// splice moves data between two file descriptors without copying between kernel and user space
func splice(rfd int, roff *int64, wfd int, woff *int64, len int, flags int) (int, error) {
    r1, _, errno := syscall.Syscall6(syscall.SYS_SPLICE,
        uintptr(rfd),
        uintptr(unsafe.Pointer(roff)),
        uintptr(wfd),
        uintptr(unsafe.Pointer(woff)),
        uintptr(len),
        uintptr(flags))

    if errno != 0 {
        return 0, errno
    }
    return int(r1), nil
}

func zeroCopyProxy(src, dst int) error {
    // Create a pipe for zero-copy transfer
    pipefd := make([]int, 2)
    err := syscall.Pipe(pipefd)
    if err != nil {
        return err
    }
    defer syscall.Close(pipefd[0])
    defer syscall.Close(pipefd[1])

    for {
        // Splice from source to pipe
        n, err := splice(src, nil, pipefd[1], nil, 32*1024, 0)
        if err != nil || n == 0 {
            break
        }

        // Splice from pipe to destination
        _, err = splice(pipefd[0], nil, dst, nil, n, 0)
        if err != nil {
            break
        }
    }
    return nil
}

5. Memory Mapping with mmap

Memory-mapping files with mmap allows applications to access file content as if it were in memory, avoiding explicit copies:

import (
    "golang.org/x/exp/mmap"
    "syscall"
    "unsafe"
)

func mmapFile(path string) ([]byte, error) {
    file, err := os.Open(path)
    if err != nil {
        return nil, err
    }
    defer file.Close()

    stat, err := file.Stat()
    if err != nil {
        return nil, err
    }

    // Memory map the file
    data, err := syscall.Mmap(int(file.Fd()), 0, int(stat.Size()),
        syscall.PROT_READ, syscall.MAP_SHARED)
    if err != nil {
        return nil, err
    }

    return data, nil
}

func sendMmappedFile(conn net.Conn, data []byte) error {
    // Send memory-mapped data directly
    _, err := conn.Write(data)
    return err
}

6. Batching and Scatter/Gather I/O

Using batching strategies and minimizing syscall invocations can indirectly achieve zero-copy-like efficiencies.

For example, using net.Buffers (available in Go 1.8+):

type Buffers [][]byte // implements io.WriterTo

func (v *Buffers) WriteTo(w io.Writer) (n int64, err error) {
    if wv, ok := w.(buffersWriter); ok {
        return wv.writeBuffers(v)
    }
    for _, buf := range *v {
        nn, err := w.Write(buf)
        n += int64(nn)
        if err != nil {
            return n, err
        }
    }
    return n, nil
}

// Example usage for batched writes
func sendMultipleBuffers(conn net.Conn, data [][]byte) error {
    buffers := net.Buffers(data)
    _, err := buffers.WriteTo(conn)
    return err
}

7. Optimized Buffer Copying with copy()

When you must copy data, use Go's built-in copy() function, which is optimized at the compiler level:

func efficientCopy(dst, src []byte) int {
    // This is faster than manual loops
    return copy(dst, src)
}

// For larger data, consider using unsafe for maximum performance
func unsafeCopy(dst, src []byte) {
    if len(dst) < len(src) {
        panic("destination too small")
    }

    // Direct memory copy using unsafe (use with caution)
    memmove(unsafe.Pointer(&dst[0]), unsafe.Pointer(&src[0]), uintptr(len(src)))
}

//go:linkname memmove runtime.memmove
func memmove(to, from unsafe.Pointer, n uintptr)

8. HTTP/2 and HTTP/3 Optimizations

For HTTP services, leverage Go's optimized HTTP/2 and HTTP/3 implementations:

func setupOptimizedHTTPServer() *http.Server {
    server := &http.Server{
        Addr:              ":8080",
        ReadHeaderTimeout: 5 * time.Second,
        WriteTimeout:      10 * time.Second,
        IdleTimeout:       120 * time.Second,
    }

    // Enable HTTP/2
    http2.ConfigureServer(server, &http2.Server{
        MaxConcurrentStreams: 250,
        MaxReadFrameSize:     16384,
        IdleTimeout:          60 * time.Second,
    })

    return server
}

Real-World Performance Example: High-Performance Proxy

Here's a complete example of a high-performance TCP proxy using zero-copy techniques:

package main

import (
    "context"
    "fmt"
    "io"
    "net"
    "sync"
    "time"
)

type ZeroCopyProxy struct {
    bufferPool sync.Pool
}

func NewZeroCopyProxy() *ZeroCopyProxy {
    return &ZeroCopyProxy{
        bufferPool: sync.Pool{
            New: func() interface{} {
                return make([]byte, 32*1024) // 32KB buffers
            },
        },
    }
}

func (p *ZeroCopyProxy) handleConnection(clientConn net.Conn, targetAddr string) {
    defer clientConn.Close()

    // Connect to target server
    serverConn, err := net.DialTimeout("tcp", targetAddr, 10*time.Second)
    if err != nil {
        fmt.Printf("Failed to connect to target: %v\n", err)
        return
    }
    defer serverConn.Close()

    // Bidirectional proxy with zero-copy techniques
    ctx, cancel := context.WithCancel(context.Background())
    defer cancel()

    go p.copyData(ctx, serverConn, clientConn) // Server -> Client
    p.copyData(ctx, clientConn, serverConn)    // Client -> Server
}

func (p *ZeroCopyProxy) copyData(ctx context.Context, dst, src net.Conn) {
    // Try to use optimized copy first
    if tcpDst, ok := dst.(*net.TCPConn); ok {
        if tcpSrc, ok := src.(*net.TCPConn); ok {
            p.tcpCopy(ctx, tcpDst, tcpSrc)
            return
        }
    }

    // Fallback to buffer pool copy
    p.bufferedCopy(ctx, dst, src)
}

func (p *ZeroCopyProxy) tcpCopy(ctx context.Context, dst, src *net.TCPConn) {
    // Use io.Copy which is optimized for TCP connections
    done := make(chan error, 1)
    go func() {
        _, err := io.Copy(dst, src)
        done <- err
    }()

    select {
    case <-ctx.Done():
        return
    case <-done:
        return
    }
}

func (p *ZeroCopyProxy) bufferedCopy(ctx context.Context, dst, src net.Conn) {
    buffer := p.bufferPool.Get().([]byte)
    defer p.bufferPool.Put(buffer)

    for {
        select {
        case <-ctx.Done():
            return
        default:
        }

        src.SetReadDeadline(time.Now().Add(30 * time.Second))
        n, err := src.Read(buffer)
        if err != nil {
            return
        }

        dst.SetWriteDeadline(time.Now().Add(30 * time.Second))
        _, err = dst.Write(buffer[:n])
        if err != nil {
            return
        }
    }
}

func main() {
    proxy := NewZeroCopyProxy()

    listener, err := net.Listen("tcp", ":8080")
    if err != nil {
        panic(err)
    }
    defer listener.Close()

    fmt.Println("Zero-copy proxy listening on :8080")

    for {
        conn, err := listener.Accept()
        if err != nil {
            continue
        }

        go proxy.handleConnection(conn, "example.com:80")
    }
}

Performance Benchmarks

Here's how to benchmark your zero-copy implementations:

func BenchmarkZeroCopy(b *testing.B) {
    data := make([]byte, 1024*1024) // 1MB
    for i := range data {
        data[i] = byte(i % 256)
    }

    b.Run("io.Copy", func(b *testing.B) {
        for i := 0; i < b.N; i++ {
            src := bytes.NewReader(data)
            dst := &bytes.Buffer{}
            io.Copy(dst, src)
        }
    })

    b.Run("Buffer Pool", func(b *testing.B) {
        pool := sync.Pool{
            New: func() interface{} {
                return make([]byte, 32*1024)
            },
        }

        for i := 0; i < b.N; i++ {
            src := bytes.NewReader(data)
            dst := &bytes.Buffer{}
            buf := pool.Get().([]byte)
            io.CopyBuffer(dst, src, buf)
            pool.Put(buf)
        }
    })
}

Limitations in Go

Garbage Collector: Go's garbage collector can introduce latency in high-performance systems.
Limited Kernel API Access: The language doesn't expose advanced kernel APIs like splice() or vmsplice() directly.
Portability: syscall usage often breaks portability or future compatibility.
Safety vs Performance: Go prioritizes safety over raw performance, limiting some optimizations.

That said, Go 1.20+ and the x/sys packages have improved access to lower-level APIs, enabling more control.

Best Practices for Zero-Copy in Go

Use io.Copy when possible - it's optimized and handles many edge cases
Pool your buffers - avoid allocations in hot paths
Profile your code - use go tool pprof to identify bottlenecks
Consider the trade-offs - sometimes readable code is better than micro-optimizations
Test thoroughly - zero-copy code can be more error-prone
Benchmark everything - assumptions about performance can be wrong

Monitoring and Profiling

Monitor your zero-copy implementations:

import (
    _ "net/http/pprof"
    "net/http"
)

func init() {
    go func() {
        log.Println(http.ListenAndServe("localhost:6060", nil))
    }()
}

Then use:

go tool pprof http://localhost:6060/debug/pprof/profile

Key Takeaways

Think twice before copying data - In Go, even small changes in buffer handling can lead to significant gains
Profile first, optimize later - Don't optimize bottlenecks you think you might have
Balance readability and performance - Premature optimization is the root of all evil
Stay current - The Go ecosystem continues to evolve with new performance opportunities

When performance matters, these zero-copy techniques can make the difference between a good Go network service and a great one. Remember to always benchmark your specific use case and measure the actual impact of your optimizations.

Conclusion

While Go isn't traditionally viewed as a systems-level language, it offers enough tools to implement efficient network programs. By using techniques like io.Copy, sync.Pool, syscall.Sendfile, and net.Buffers, developers can approach zero-copy performance in many real-world applications.

True zero-copy might require dropping into C or using CGO, but for most Go network services, these patterns offer a solid balance between maintainability and performance. The key is to reduce unnecessary allocations and minimize syscall overhead while leveraging the Go runtime's concurrency model efficiently.