source:http://www.javaworld.com/javaworld/jw-10-2008/jw-10-load-balancing-1.html
Availability and scalability
Server
load balancing distributes service requests across a group of real
servers and makes those servers look like a single big server to the
clients. Often dozens of real servers are behind a URL that implements
a single virtual service.
How does this
work? In a widely used server load balancing architecture, the incoming
request is directed to a dedicated server load balancer that is
transparent to the client. Based on parameters such as availability or
current server load, the load balancer decides which server should
handle the request and forwards it to the selected server. To provide
the load balancing algorithm with the required input data, the load
balancer also retrieves information about the servers' health and load
to verify that they can respond to traffic. Figure 1 illustrates this
classic load balancer architecture.
Figure 1. Classic load balancer architecture (load dispatcher)
The load-dispatcher architecture illustrated in Figure 1 is just one of several approaches. To decide which load balancing
solution is the best for your infrastructure, you need to consider availability and scalability.
Availability is defined by uptime
-- the time between failures. (Downtime is the time to detect the
failure, repair it, perform required recovery, and restart tasks.)
During uptime the system must respond to each request within a
predetermined, well-defined time. If this time is exceeded, the client
sees this as a server malfunction. High availability, basically, is
redundancy in the system: if one server fails, the others take over the
failed server's load transparently. The failure of an individual server
is invisible to the client.
Scalability means
that the system can serve a single client, as well as thousands of
simultaneous clients, by meeting quality-of-service requirements such
as response time. Under an increased load, a high scalable system can
increase the throughput almost linearly in proportion to the power of
added hardware resources.
In the scenario in
Figure 1, high scalability is reached by distributing the incoming
request over the servers. If the load increases, additional servers can
be added, as long as the load balancer does not become the bottleneck.
To reach high availability, the load balancer must monitor the servers
to avoid forwarding requests to overloaded or dead servers.
Furthermore, the load balancer itself must be redundant too. I'll
discuss this point later in this article.
Server load balancing techniques
In general, server load balancing solutions are of two main types:
- Transport-level load balancing -- such as the DNS-based approach or TCP/IP-level load balancing -- acts independently of the application
payload.
- Application-level load balancing uses the application payload to make load balancing decisions.
Load
balancing solutions can be further classified into software-based load
balancers and hardware-based load balancers. Hardware-based load
balancers are specialized hardware boxes that include
application-specific integrated circuits (ASICs) customized for a
particular use. ASICs enable high-speed forwarding of network traffic
without the overhead of a general-purpose operating system.
Hardware-based load balancers are often used for transport-level load
balancing. In general, hardware-based load balancers are faster than
software-based solutions. Their drawback is their cost.
Server farms achieve high scalability and high
availability through server load balancing, a technique that makes the
server farm appear to clients as a single server. In this two-part
article, Gregor Roth explores server load balancing architectures, with
a focus on open source solutions. Part 1 covers server load balancing
basics and discusses the pros and cons of transport-level server load
balancing. Part 2 covers application-level server load balancing architectures, which address some of the limitations of the architectures
discussed in Part 1.
The
barrier to entry for many Internet companies is low. Anyone with a good
idea can develop a small application, purchase a domain name, and set
up a few PC-based servers to handle incoming traffic. The initial
investment is small, so the start-up risk is minimal. But a successful
low-cost infrastructure can become a serious problem quickly. A single
server that handles all the incoming requests may not have the capacity
to handle high traffic volumes once the business becomes popular. In
such a situations companies often start to scale up:
they upgrade the existing infrastructure by buying a larger box with
more processors or add more memory to run the applications.
Scaling up,
though, is only a short-term solution. And it's a limited approach
because the cost of upgrading is disproportionately high relative to
the gains in server capability. For these reasons most successful
Internet companies follow a scale out approach. Application components are processed as multiple instances on server farms, which are based on low-cost hardware
and operating systems. As traffic increases, servers are added.
The
server-farm approach has its own unique demands. On the software side,
you must design applications so that they can run as multiple instances
on different servers. You do this by splitting the application into
smaller components that can be deployed independently. This is trivial
if the application components are stateless. Because the components
don't retain any transactional state, any of them can handle the same
requests equally. If more processing power is required, you just add
more servers and install the application components.
A more challenging
problem arises when the application components are stateful. For
instance, if the application component holds shopping-cart data, an
incoming request must be routed to an application component instance
that holds that requester's shopping-cart data. Later in this article,
I'll discuss how to handle such application-session data in a
distributed environment. However, to reduce complexity, most successful
Internet-based application systems try to avoid stateful application
components whenever possible.
On the
infrastructure side, the processing load must be distributed among the
group of servers. This is known as server load balancing. Load
balancing technologies also pertain to other domains, for instance
spreading work among components such as network links, CPUs, or hard
drives. This article focuses on server load balancing.
n contrast to hardware load balancers, software-based load balancers
run on standard operating systems and standard hardware components such
as PCs. Software-based solutions runs either within a dedicated load
balancer hardware node as in Figure 1, or directly in the application.
DNS-based load balancing
DNS-based
load balancing represents one of the early server load balancing
approaches. The Internet's domain name system (DNS) associates IP
addresses with a host name. If you type a host name (as part of the
URL) into your browser, the browser requests that the DNS server
resolve the host name to an IP address.
The DNS-based approach is based on the fact that DNS allows multiple IP addresses (real servers) to be assigned to one host
name, as shown in the DNS lookup example in Listing 1.
Listing 1. Example DNS lookup
>nslookup amazon.com
Server: ns.box
Address: 192.168.1.1
Name: amazon.com
Addresses: 72.21.203.1, 72.21.210.11, 72.21.206.5
If
the DNS server implements a round-robin approach, the order of the IP
addresses for a given host changes after each DNS response. Usually
clients such as browsers try to connect to the first address returned
from a DNS query. The result is that responses to multiple clients are
distributed among the servers. In contrast to the server load balancing
architecture in Figure 1, no intermediate load balancer hardware node
is required.
DNS is an
efficient solution for global server load balancing, where load must be
distributed between data centers at different locations. Often the
DNS-based global server load balancing is combined with other server
load balancing solutions to distribute the load within a dedicated data
center.
Although easy to
implement, the DNS approach has serious drawbacks. To reduce DNS
queries, client tend to cache the DNS queries. If a server becomes
unavailable, the client cache as well as the DNS server will continue
to contain a dead server address. For this reason, the DNS approach
does little to implement high availability.
TCP/IP server load balancing
TCP/IP server load balancers operate on low-level layer switching. A popular software-based low-level server load balancer
is the Linux Virtual Server
(LVS). The real servers appear to the outside world as a single
"virtual" server. The incoming requests on a TCP connection are
forwarded to the real servers by the load balancer, which runs a Linux
kernel patched to include IP Virtual Server (IPVS) code.
To ensure high
availability, in most cases a pair of load balancer nodes are set up,
with one load balancer node in passive mode. If a load balancer fails,
the heartbeat program that runs on both load balancers activates the
passive load balancer node and initiates the takeover of the Virtual IP
address (VIP). While the heartbeat is responsible for managing the
failover between the load balancers, simple send/expect scripts are
used to monitor the health of the real servers.
Transparency to
the client is achieved by using a VIP that is assigned to the load
balancer. If the client issues a request, first the requested host name
is translated into the VIP. When it receives the request packet, the
load balancer decides which real server should handle the request
packet. The target IP address of the request packet is rewritten into
the Real IP (RIP) of the real server. LVS supports several scheduling
algorithms for distributing requests to the real servers. It is often
is set up to use round-robin scheduling, similar to DNS-based load
balancing. With LVS, the load balancing decision is made on the TCP
level (Layer 4 of the OSI Reference Model).
After
receiving the request packet, the real server handles it and returns
the response packet. To force the response packet to be returned
through the load balancer, the real server uses the VIP as its default
response route. If the load balancer receives the response packet, the
source IP of the response packet is rewritten with the VIP (OSI Model
Layer 3). This LVS routing mode is called Network Address Translation
(NAT) routing. Figure 2 shows an LVS implementation that uses NAT
routing.
Figure 2. LVS implemented with NAT routing
LVS also supports other routing modes such as Direct Server Return.
In this case the response packet is sent directly to the client by the
real server. To do this, the VIP must be assigned to all real servers,
too. It is important to make the server's VIP unresolvable to the
network; otherwise, the load balancer becomes unreachable. If the load
balancer receives a request packet, the MAC address (OSI Model Layer 2)
of the request is rewritten instead of the IP address. The real server
receives the request packet and processes it. Based on the source IP
address, the response packet is sent to the client directly, bypassing
the load balancer. For Web traffic this approach can reduce the
balancer workload dramatically. Typically, many more response packets
are transferred than request packets. For instance, if you request a
Web page, often only one IP packet is sent. If a larger Web page is
requested, several response IP packets are required to transfer the
requested page.
Caching
Low-level
server load balancer solutions such as LVS reach their limit if
application-level caching or application-session support is required.
Caching is an important scalability principle for avoiding expensive
operations that fetch the same data repeatedly. A cache is a temporary
store that holds redundant data resulting from a previous data-fetch
operation. The value of a cache depends on the cost to retrieve the
data versus the hit rate and required cache size.
Based on the load
balancer scheduling algorithm, the requests of a user session are
handled by different servers. If a cache is used on the server side,
straying requests will become a problem. One approach to handle this is
to place the cache in a global space. memcached is a popular distributed cache solution that provides a large cache across multiple machines. It is a partitioned, distributed
cache that uses consistent hashing
to determine the cache server (daemon) for a given cache entry. Based
on the cache key's hash code, the client library always maps the same
hash code to the same cache server address. This address is then used
to store the cache entry. Figure 3 illustrates this caching approach.
Figure 3. Load balancer architecture enhanced by a partitioned, distributed cache
Listing 2 uses spymemcached
, a memcached
client written in Java, to cache HttpResponse
messages across multiple machines. The spymemcached
library implements the required client logic I just described.
Listing 2. memcached-based HttpResponse cache
interface IHttpResponseCache {
IHttpResponse put(String key, IHttpResponse response) throws IOException;
void remove(String key) throws IOException;
IHttpResponse get(String key) throws IOException;
}
class RemoteHttpResponseCache implements IHttpResponseCache {
private MemcachedClient memCachedClient;
public RemoteHttpResponseCache(InetSocketAddress... cacheServers) throws IOException {
memCachedClient = new MemcachedClient(Arrays.asList(cacheServers));
}
public IHttpResponse put(String key, IHttpResponse response) throws IOException {
byte[] bodyData = response.getBlockingBody().readBytes();
memCachedClient.set(key, 3600, bodyData);
return null;
}
public IHttpResponse get(String key) throws IOException {
byte[] bodyData = (byte[]) memCachedClient.get(key);
if (bodyData != null) {
return new HttpResponse(200, "text/plain", bodyData);
} else {
return null;
}
}
public void remove(String key) throws IOException {
memCachedClient.delete(key);
}
}
Listing 2 and the rest of this article's example code also uses the xLightweb HTTP library. Listing 3 shows an example business service implementation. The onRequest(...)
method -- similar to the Servlet API's goGet(...)
or doPost(...)
method -- is called each time a request header is received. The exchange.send()
method sends the response.
Listing 3. Example business service implementation
class MyRequestHandler implements IHttpRequestHandler {
public void onRequest(IHttpExchange exchange) throws IOException {
IHttpRequest request = exchange.getRequest();
int customerId = request.getRequiredIntParameter("id");
long amount = request.getRequiredLongParameter("amount");
//...
// perform some operations
//..
String response = ...
// and return the response
exchange.send(new HttpResponse(200, "text/plain", response));
}
}
class Server {
public static void main(String[] args) throws Exception {
HttpServer httpServer = new HttpServer(8180, new MyRequestHandler());
httpServer.run();
}
}
Based on the HttpResponse
cache, a simple caching solution can be implemented that caches the
HTTP response for an HTTP request. If the same request is received
twice, the corresponding response can be taken from the cache, without
calling the business service. This requires intercepting the
request-handling flow. This can be done by the interceptor shown in
Listing 4.
Listing 4. Cache-supported business service example
class CacheInterceptor implements IHttpRequestHandler {
private IHttpResponseCache cache;
public CacheInterceptor(IHttpResponseCache cache) {
this.cache = cache;
}
public void onRequest(final IHttpExchange exchange) throws IOException {
IHttpRequest request = exchange.getRequest();
// check if request is cacheable (Cache-Control header, ...)
// ...
boolean isCacheable = ...
// if request is not cacheable forward it to the next handler of the chain
if (!isCacheable) {
exchange.forward(request);
return;
}
// create the cache key
StringBuilder sb = new StringBuilder(request.getRequestURI());
TreeSet<String> sortedParamNames = new TreeSet<String>(request.getParameterNameSet());
for (String paramName : sortedParamNames) {
sb.append(URLEncoder.encode(paramName) + "=");
List<String> paramValues = Arrays.asList(request.getParameterValues(paramName));
Collections.sort(paramValues);
for (String paramValue : paramValues) {
sb.append(URLEncoder.encode(paramValue) + ", ");
}
}
final String cacheKey = URLEncoder.encode(sb.toString());
// is request in cache?
IHttpResponse cachedResponse = cache.get(cacheKey);
if (cachedResponse != null) {
IHttpResponse response = HttpUtils.copy(cachedResponse);
response.setHeader("X-Cached", "true");
exchange.send(response);
// .. no -> forward it to the next handler of the chain
} else {
// define a intermediate response handler to intercept and copy the response
IHttpResponseHandler respHdl = new IHttpResponseHandler() {
@InvokeOn(InvokeOn.MESSAGE_RECEIVED)
public void onResponse(IHttpResponse response) throws IOException {
cache.put(cacheKey, HttpUtils.copy(response));
exchange.send(response); // forward the response to the client
}
public void onException(IOException ioe) throws IOException {
exchange.sendError(ioe); // forward the error to the client
}
};
// forward the request to the next handler of the chain
exchange.forward(request, respHdl);
}
}
}
class Server {
public static void main(String[] args) throws Exception {
RequestHandlerChain handlerChain = new RequestHandlerChain();
handlerChain.addLast(new CacheInterceptor(new RemoteHttpResponseCache(new InetSocketAddress(cachSrv1, 11211), new InetSocketAddress(cachSrv2, 11211))));
handlerChain.addLast(new MyRequestHandler());
HttpServer httpServer = new HttpServer(8180, handlerChain);
httpServer.run();
}
}
The CacheInterceptor
in Listing 4 uses the memcached
-based
implementation to cache responses, based on the hashcode of dedicated
header attributes. If the cache contains a response for this hashcode,
the request is not forwarded to the business-service handler. Instead,
the response is returned from the cache. If the cache does not contain
a response, the request is forwarded by adding a response handler to
intercept the response flow. If a response is received from the
business-service handler, the response is added to the cache. (Note
that Listing 4 does not show cache invalidation. Often dedicated
business operations require the cache entry to be invalidated.)
The consistent-hashing approach leads to high scalability. Based on consistent hashing, the memcached
client implements a failover strategy to support high availability. But if a daemon crashes, the cache data is lost. This
is minor problem, because cache data is redundant by definition.
A simple approach to make the memcached
architecture fail-safe is to store the cache entry on a primary and a
secondary cache server. If the primary cache server goes down, the
secondary server probably contains the entry. If not, the required
(cached) data must be recovered from the underlying data source.
Application session data support
Supporting
application session data in a fail-safe way is more problematic.
Application session data represents the state of a user-specific
application session. Examples include the ID of a selected folder or
the articles in a user's shopping cart. The application session data
must be maintained across requests. In classic ("WEB 1.0") Web
applications, such session data must be held on the server side.
Storing it in the client by using cookies or hidden fields has two
major weaknesses. It exposes internal session data, such as the price
fields in shopping cart data, to attack on the client side, so you must
address this security risk. And this approach works only for a small
amount of data that's limited by the maximum size of the HTTP cookie
header and the overhead of transferring the application session data to
and from the client.
Similarly to the memcached
architecture, session servers can be used to store the application
session data on the server side. However, in contrast to cached data,
application session data is not redundant by definition. For this
reason application session data is not removed to make room for new
data if the maximum memory size is reached. Caches are free to remove
cache entries for memory-management reasons at any time. Caching
algorithms such as last recently used (LRU) remove cache entries if the
maximum cache size is reached.
If the session
server crashes, the application session data is lost. In contrast to
cached data, application session data is not recoverable in most cases.
For this reason it is important that failover solutions support
application session data in a fail-safe way.
Client affinity
The
disadvantage of the cache and session server approach is that each
request leads to an additional network call from the server to the
cache or session server. In most cases call latency is not a problem
because the cache or session server and the business servers are placed
in the same, fast network segment. But latency can become problematic
if the size of the data entries increases. To avoid moving large sets
of data between the business server and cache/session servers again and
again, requests of a dedicated client must always be forwarded to the
same server. This means all of a user session's requests are handled by
the same server instance.
In the case of caching, a local cache can be used instead of the distributed memcached
server infrastructure. This approach, known as client affinity, does not require cache servers. Client affinity always directs the client to "its" particular server.
The example in Listing 5 implements a local cache and requires client affinity.
Listing 5. Local cached-based example requiring client affinity
class LocalHttpResponseCache extends LinkedHashMap<String, IHttpResponse> implements IHttpResponseCache {
public synchronized IHttpResponse put(String key, IHttpResponse value) {
return super.put(key, value);
}
public void remove(String key) {
super.remove(key);
}
public synchronized IHttpResponse get(String key) {
return super.get(key);
}
protected boolean removeEldestEntry(Entry<String, IHttpResponse> eldest) {
return size() > 1000; // cache up to 1000 entries
}
}
class Server {
public static void main(String[] args) throws Exception {
RequestHandlerChain handlerChain = new RequestHandlerChain();
handlerChain.addLast(new CacheInterceptor(new LocalHttpResponseCache()));
handlerChain.addLast(new MyRequestHandler());
HttpServer httpServer = new HttpServer(8080, handlerChain);
httpServer.run();
}
}
LVS supports affinity by enabling persistence -- remembering the last connection for a predefined period of time. It makes a particular client connect to the same real
server for different TCP connections. But persistence doesn't really help in case of incoming dial-up links. If a dial-up link comes through a
provider proxy, it can use different TCP connections within the same session.
Conclusion to Part 1
Infrastructures
based on pure transport-level server load balancers are common. They
are simple, flexible, and highly efficient, and they present no
restrictions on the client side. Often such architectures are combined
with distributed cache or session servers to handle application-level
caching and session data issues. However, if the overhead caused by
moving data from and to the cache or session servers grows, such
architectures become increasingly inefficient. By implementing client
affinity based on application-level server load balancer, you can avoid
copying large datasets between servers. Read Server load balancing architectures, Part 2 for a discussion of application-level load balancing.
About the author
Gregor
Roth, creator of the xLightweb HTTP library, works as a software
architect at United Internet group, a leading European Internet service
provider to which GMX, 1&1, and Web.de belong. His areas of
interest include software and system architecture, enterprise
architecture management, object-oriented design, distributed computing,
and development methodologies.