Load balancing is a performance optimization tactic that provides fault-tolerance by splitting incoming tasks across resources. Load balancing distributes work across multiple computing resources, preventing any one resource from being overloaded.

Graphic for load balancing by MaxCDN.


Two of the most critical requirements for any online service provider are availability and redundancy. The time it takes for a server to respond to a request varies by its current capacity. If even a single component fails or is overwhelmed by requests, the server is overloaded and both the customer and the business suffer.

Load balancing attempts to resolve this issue by sharing the workload across multiple components. An incoming request can be routed from an overtaxed server to one that has more resources available. Load balancing has a variety of applications from network switches to database servers.

How Load Balancing Works

Service providers typically build their networks by using Internet-facing front-end servers to shuttle information to and from backend servers. These frontend servers contain load balancing software, which forwards requests to one of the backend servers based on resource availability. Load balancing software contains internal rules and logic to determine when and where to forward each request.

Here’s a rundown of how load balancing works:

  1. A user opens a webpage such as
  2. A frontend server receives the request and determines where to forward it. Various algorithms can be used to determine where to forward a request, with some of the more basic algorithms including random choice or round robin. If there are no available backend servers, then the frontend server performs a predetermined action such as returning an error message to the user.
  3. The backend server processes the request and generates a response. Meanwhile, the backend server periodically reports its current state to the load balancer.
  4. The backend server returns a response to the front end server, which is then forwarded to the user.

If all goes well, the user will have received a response in a timely manner regardless of the state of the service provider’s network. If at least one front-end server and at least one back-end server is available, the user’s request is handled properly.

Example of Load Balancing

Google’s Compute Engine is built on the same load balancing techniques used by several Google products including Gmail, Search and AdWords. Compute Engine periodically reviews the state of all backend servers and marks them as healthy or unhealthy based on their current load.

When a user connects to a Google service, Compute Engine forwards the request to a healthy server. The response is then forwarded from the healthy server through Compute Engine back to the user. Meanwhile, unhealthy servers are repaired, replaced or taken offline.

With load balancing, a server can be upgraded with no interruptions to the end user’s experience. Google and other service providers push application updates by upgrading their backend servers in waves. For instance, as a server is taken offline for upgrade, other servers take responsibility for the workload and are subsequently updated in turn.

In Compute Engine, the ability to take a system offline for maintenance and upgrade is known as “lame-duck mode”. This is how Google’s web products can be seamlessly updated even between active sessions.

Benefits of Load Balancing

Load balancing makes it easier for system administrators to handle incoming requests while decreasing wait time for users.

  • Users experience faster, uninterrupted service. Users won’t have to wait for a single struggling server to finish its previous tasks. Instead, their requests are immediately passed on to a more readily available resource.
  • Service providers experience less downtime and greater throughput. Even a full server failure won’t impact the end user experience as the load balancer will simply route around it to a healthy server.
  • System administrators experience fewer failed or stressed components. Instead of a single device performing a lot of work, load balancing has several devices perform a little bit of work.
  • Conclusion

    For many of us, we rely on web services to be available 24/7. A 30-minute downtime for Facebook could cost almost $600,000. When dealing with high traffic web applications, load balancing is essential for maintaining the integrity and availability of a service.

    From DNS requests to web servers, load balancing can mean the difference between costly downtime and a seamless end user experience.