The Concrete Architecture of the Apache Web
Server
Department of Computer Science, University
of Waterloo,
Assignment 2 for CS746G
February 9,
1999
Abstract:
This report gives a tour of the concrete architecture of the
Apache web server (release 1.3.4). The goal is to provide support for anyone
who wants to modify a subsystem, or add extra functionality.
The main
components of the concrete architecture of the Apache server are the core and
the modules.
This paper covers the details of Apache core architecture, the
essential data structures with their uses, and gives an extended insight into
the concrete architecture of a module. The concurrency approach employed in
Apache is also detailed.
In general, anyone who wants to add extra
functionality only has to write a new module. This usualy means providing one
or more handlers (functions) for one of the phases of processing an HTTP
request. In fact, even an important part of the Apache core has the "look and
feel" of a module, although it is not a proper one (it shares information with
other core sub-components).
The way a module's handlers are called is
transparent to the module, and all communication with a module is done through
pointers to functions. Because of this, fact extractors cannot capture the
interaction between core and modules.
To extract the concrete architecture
we have used a variety of sources: fact extractors (Portable Book Shelf),
papers on Apache, read me files, and to a large extent, analyzing relevant
parts of the source code.
Keywords:
Apache, concrete architecture, design, web server
Available online at:
http://www.grad.math.uwaterloo.ca/~oadragoi/CS746G/a2/caa.html
A webserver
provides users of a web a means of retrieving data, usually documents, through
web browsers. The user clicks on a link in the browser and a request is sent
from the browser to the server. The server retrieves the document from storage
and returns it to the browser, which presents it to the user.
The Apache server divides the handling of requests into separate phases.
These request phases are:
- URI -> filename translation
- Auth ID checking
- Auth access checking
- Other access checking
- Determining MIME type of requested object
- Sending response to client
- Logging the request
(Apache Group)
In Apache, each phase is handled by a module or set of modules. Each module
is looked at in succession, to see if it has a handler for the phase. This
results in a flow of control and data that is similar to a pipeline.
|
Figure 1.Apache pipe-line of request handling
phases |
The above figure illustrates the conceptual movement of the data structure
request_rec and the flow of control with the broken arrows. The process starts
and ends in the core, where request_rec is created and where the cleanup
is done after the request has been handled.
Actually, control moves from the core to each phase and then back to the
core, as is shown by the solid arrow lines. As well, request_rec is first
created by the core, then passed to each phase and back to the core in turn.
The high level concrete architecture of the Apache web server
is not very different from the high level conceptual architecture, in the sense
that we still find the same splitting of functionality between the core and the
modules. In addition to the Apache core and the modules we can identify some new
small modules. Two of them (ap, regex
) are essentially libraries of
utility functions, used by both the core and the modules. The third one
(os
) is the one that ensures the independence of the operating
system from the Apache core and the standard modules.
|
Figure 2. High level concrete architecture of
Apache. |
The next section
presents in more detail each component of Apache.
The directory structure of the Apache source code reflects
a similar partition of the code into separate components. Almost all include
files are grouped in a separate include/
directory, although in
fact they define only the functions implemented by the Apache core
(main/
subdirectory).
It is worth noting that all Apache functions, utility functions, wrappers and
re-implementations are prefixed by ap_
. This is a rule introduced
in release 1.3 of Apache, in order to avoid the possiblity of conflicting names
(without the ap_
) in the 1.2 release. However there are header
files that perform mappings between the functions, so modules written for the
1.2 release can be adapted easily.
main
is the apache core, which implements the basic
functions of a web server.
modules
is a component containing different modules
that are shipped with the Apache distribution. This includes a set of standard
modules that extend and complement the Apache core.
os
component encapsulates the functions strictly
dependent on operating system and platform. In the source code tree the
os/
directory contains directories for each specific platform
supported. Platform is used here in the sense of software platform that offers
a common programming environment. For example unix/
contains
specific files for unix platforms. However, mainframes that work with
proprietary OS that use EBCDIC character encoding instead of ASCII, different
directories are used bs2000, tpf
. OS/2 from IBM and Windows NT
are also supported (os2, win32
). The "link" between the other
platform independent components and the platform dependent os
component is the file os.h
. It is included in all Apache source
code. It declares the specific functions for that system. Such functions are,
for instance, UNIX functions not available on a Windows environment. The
implementation of such functions goes in os.c
for regular
functions and in os-inline.c
if can be in-lined. functions and
code strictly related to a platform should also go in the directory for that
system. For example, the Windows code for storing configuration files into the
registry (a repository of configuration information used by all applications
in this operating system) is also part of the os
component for
Windows (win32/
)
regex
is a separate component, used as a library of
general functions dealing with regular expression manipulation (e.g. splitting
a string in tokens in an awk fashion). It is called from the module:
main (alloc.c, http_core.c, http_request.c, util.c, util_uri.c)
ap
is a component that defines function wrappers for
functions with no unique behavior across platforms (e.g. strncpy which has
different behaviors w.r.t. the trailing '\0' in the copied string),
re-implementation for unstable library functions, and new utilities functions
(e.g. formating functions for numbers, for Internet addresses,
etc.).
Functions grouped in the module do not provide utilities specific
for a web server. That is why they form a separate directory (component)
support
is a separate component containing shell
scripts and source code for helper programs for the Apache server
administrator, such rotating log files to save space, manipulating password
files, generating statics starting from the log files. So files in this
modules are not part of the actual Apache server,
helpers
contains shell scripts used as helpers by the
compile time configuration routine for Apache.
All the files forming the Apache core are grouped in the
main/
directory. The designers of Apache wanted as much
functionality of the web server as possible to be implemented as separate
modules and therefore there are many interactions between the sub-components of
the Apache core. The idea is that someone extending Apache should not have to
modify anything in the core. The only sub-component that might need changing in
order to extend the server is the one that implements the HTTP protocol (which
is part of the core). Although (on the good side) the HTTP protocol is a
separate sub-component of the core, there is no well define API.
|
Figure 3. The concrete architecture of the Apache
core. |
The http_main.c
file contains
code that starts up the server (i.e. the actual main()
), the main
server loop, code for managing children and code for managing timeouts.
The main server loop is the one that waits for a TCP/IP connection
request, accepts it (i.e. establishes an TCP/IP connection), allocates a
resource pool, reads the HTTP request from /IP stream, and calls the appropriate
function in the http_request.c
file to handle the HTTP request.
After the request has been processed, it frees the resource pool and,
eventually, closes the TCP/IP connection.
Figure
3 shows the interaction of this file with the other sub-components of
the Apache core. This sub-component also controls the number of active child
processes, through a special shared data structure (score board) that holds
information on the status of each children. More on this data structure and the
way Apache manages concurrency is presented in the data structures section and
the section on concurrency.
The
http_core.h
file implements the most basic functions of processing
an HTTP request. In a comment from a source file http_core.c
it is
described as being "just 'barely' functional enough to serve documents,
though not terribly well".
This file could almost have been mod_core.c
. In fact it defines
a module
structure as any module. http_core.c
defines the command table
for all the standard configuration commands. It also implements handlers for
self-initialization and for some of the phases of the HTTP request cycle:
- a very rudimentary URI to file name translation
- "do-nothing" handler for MIME type checking
- "do-nothing" handler for fix-ups
- a very rudimentary default handler for all the MIME types (only delivery
of a document as it is, with no support for all special operations defined by
HTTP).
One reason why http_core.c
is not a separate module is related
to the legacy configuration commands (from NCSA web server) that Apache must
implement. These commands, and in particular Options
, are more
powerful than the typical Apache configuration commands in the sense that they
affect more than one module. In order to implement this kind of behavior
http_core.c
must have access to some Apache core data structures
that are not accessible to ordinary modules.
This an important support sub-component
in the Apache core. It is responsible for managing all the information related
to modules and what they offer. It is also responsible for initializing modules
and other sub-components of the Apache core (e.g. http_code.c
).
This includes parsing configuration files and invoking the appropriate commands
in the command table advertised by the modules.
Another major function of http_config.c
is walking through the
link list of modules and invoking the appropriate handlers for a phase of the
HTTP request handling cycle, when it is asked by the http_request.c
sub-component. The rationale behind having this function here and not in
http_request.c
is that, in a way, http_config.c
is the
owner of the data structure that holds information on current modules and does
all the book keeping related to this structure. It should be noted that
http_config.c
does not decide when a phase is invoked, it just does
the implicit invocation on the appropriate handlers.
The responsibility of the actual
control of the order in which different handlers are invoked is on
http_request.c
.
An interesting feature of the http_request.h
is the possibility
of handling sub requests, which can be viewed as a sort of recursion in the flow
of handling an HTTP request (e.g. while handling one phase a module can issue a
sub request to convert an additional URI to a file name). More on sub-requests
can be found in the section on data structures.
This component
implements the actual HTTP protocol and manages the connection with the HTTP
client. It ensures that the dialog with the client is performed by the HTTP
protocol, converting information contained in the current request data structure
in HTTP headers (text lines that advised the client on the characteristics of
the entity delivered) and HTTP content (binary information representing the
requested entity). This sub-component also takes care of correctly closing the
HTTP connection in case of error, by advising the client on the error using HTTP
protocol error codes.
The sub-component is called when the connection is established and closed,
and at the appropriate times during the processing cycle of the HTTP request
(for example when the document to be delivered must be written to the client).
The resource management sub-component is
formed by files alloc.h
and buff.h
.
buff.c
offers functions for buffered I/O and buffered character
conversion, which replace similar library functions which have semantics that
vary from platform to platform.
alloc.c
implements the management of resource pools. A resource
pool is a big memory pool which is used to allocate memory needed to process the
current request. File descriptors are also allocated on the resource pool. The
advantage of having a unique resource pool for each HTTP request is that memory
and file descriptors can be freed at once, when the processing ends. This not
only frees the programmer of a module to explicitly free each allocated
resource, but also prevents resource leakage.
The Apache core component also contains a number of utility functions that
are not of general interest outside the component.
This sub-component exports functions that
perform logging operations. It insures that logs are properly accessed (one
message at a time) and that they are properly closed. It also defines a priority
based logging system. A user of the sub-component can specify the importance of
the message it wants to be logged, ranging from debug-level messages (least
priority) to "system is unusable" messages (highest priority). This means the
system/process has become highly unstable and the message should be logged
immediately (e.g. before a crash happens).
The server can respond to more
than one name (i.e. www.example and www2.example), each assigned to one of the
multiple IP addresses of the machine. The multiple IP addresses can be
associated with physical network interfaces or with virtual network interfaces
(simulated via logical devices by the operating system). Apache is able to
"tell" which name the host has been referenced under and use different
configuration options (e.g. allowing more access rights to users accessing the
host through an interface networked in the local network, as opposed to users
accessing the web server via an interface networked in the outside-the-company
network). Modules also have accessed to this information.
The role of modules is to implement/override/extend the
functionality of the Apache web server. All modules have the same interface to
the Apache core, and they do not interact directly with each other. In fact they
are not active entities, they just offer a set of handlers for the distinct
phases of handling HTTP requests. These phases are "controlled" by the
http_request.c
component of the Apache core.
Figure
4 gives a detailed overview of the architecture of a module. A module is
a collection of exported functions (handlers) which are called by the Apache
core. The glue between one module and the core is the
module
structure. Such a structure is defined by each module and is accessible by the
core. The structure holds pointers to all handlers exported by the module.
First a module must be initialized (when it is loaded). As the core has no
information on the internal structure of the module, the module must export a
handler for this purpose.
The idea behind Apache is to be as versatile as possible, therefore each
module can define its own commands to be used by users in configuration files.
However, the core is the one that actually reads from the configuration files,
so there must be a way to match a command with a module, or more precisely with
the handler exported by the module to execute that command. This matching is
done through the command table memorized in the module
structure.
The purpose of a module, as have been said, is to add new functionality to
the way the web server is servicing HTTP request. However the flow of executing
a request is controlled by the core (http_request.c
). A module is
not required to export handlers for all the distinct phases of the HTTP request
cycle. Figure
4 shows a module that implements handlers for all of the phases but in
fact most of the modules will implement one or a few of them.
A special case of handler is the one for the phase that actually delivers the
object to the client. Those handlers are called content handlers or
response handlers. They are special because a module often implements
several content handlers, one for each type of object the module knows how to
deliver (e.g. a module might know how to deliver disk directories, but might
also know how to deliver lists of people formated in the same way as
directories). In order to discover which content handler must be called for the
current request the core will use the content handler table in the module's
modules
structure. It consists of pairs of content type(i.e.
MIME type) and a pointer to the handler. Of course a module can export the same
content handler for more than one MIME type.
An important element to be taken into account when writing a module is that
although handlers are invoked implicitly (or might not be invoked at all if
handlers of other modules, for the same phase , inform the core that they have
completed that phase) different handlers can communicate between data structures
private to the module (usually static ones).
|
Figure 4. The detailed concrete architecture of a complex
Apache module . |
To summarize,
writing an Apache module means providing functions (i.e. handlers), with a fixed
prototype, that implement routines for initialization, configuration, and
handling some of the phases of the HTTP request.
The most simple module is one that does not provide (i.e. does not need) any
initialization or configuration handlers, does not define any custom
configuration commands and implements only one handler (usually the content
handler). A complex module (like the one in Figure
4) will implement all or most of the possible handlers.
Apache comes with a set of modules grouped in the
modules
directory. The standard modules grouped in
standard
subdirectory are essential for the funtioning of Apache.
An extension of Apache (in the form of a module) enables it to work as a web
proxy. Additionally, the source contains a demo module that provides any single
handler a module can implement. It is well commented and is located in
example/
subdirectory.
In order
to be fully functional, Apache installs by default a number of standard modules.
As standard modules they are not dynamically loaded, although they could be.
Modules can be statically linked with the core or can be compiled as separate
dynamically loaded pieces. The statically linked modules are always pre-loaded
at start-up. Dynamically loaded modules can also be pre-loaded.
When the configuration scripts are run a special .c
file
modules.c
is automatically generated in the root of the source
code. modules.c
defines special arrays of pointers to module
structures, called ap_prelinked_modules[]
, for the modules that are
linked with the core, and module *ap_preloaded_modules[]
for those
that will be preloaded.
It is interesting that even mod_core
, which is the module
structure defined by the http_code.c
sub-component is listed in the
array generated in modules.c. The following list gives an overview of some of
the standard modules. The mod_core (the pseudo module define in the apache core
(subcomponent http_core.c
) is also included.
- For URI to file name translation phase:
mod_userdir
: translate the user home directories into
actual paths mod_rewrite Apache 1.2 and up
mod_rewrite
: rewrites URLs based on regular expressions, it
has additional handlers for fix-ups and for determining the mime type
- For authentication / authorization phases:
mod_auth, mod_auth_anon,mod_auth_db, mod_auth_dbm
: User
authentication using text files, anonymous in FTP-style, using Berkeley DB
files, using DBM files.
mod_access
: host based access control.
- For determining the MIME type of the requested object (the content type,
the encoding and the language):
mod_mime
: determines document types using file extensions.
mod_mime_magic
: determines document types using "magic
numbers" (e.g. all gif files start with a certain code)
- For fix-ups phase:
mod_alias
: replace aliases by the actual path
mod_env
: fix-up the environment (based on information in
configuration files)
mod_speling
: automatically correct minor typos in URLs
- For sending actual data back to the client: to chose the appropriate
module for this phase the mime type or the pseudo mime type (e.g. for a
CGI-script) is used.
mod_actions
: file type/method-based script execution
mod_asis
: send the file as it is
mod_autoindex
: send an automatic generated representation
of a directory listing
mod_cgi
: invokes CGI scripts and returns the result
mod_include
: handles server side includes (documents parse
by server which includes certain additional data before handing the document
to the client)
mod_dir
: basic directory handling.
mod_imap
: handles image-map file
- For logging the request phase:
mod_log_*
: various types of logging modules
A summary of what each standard modules has to offer (i.e. what phases
handles is given in the following. Note that a module can define handlers for
more than one phase. Again mod_core
has been included.
No | Phase | Modules |
2. | filename translation | mod_alias.c mod_userdir.c mod_core |
3. | check_user_id | mod_auth.c mod_auth_anon.c mod_auth_db.c mod_auth_dbm.c mod_digest.c |
4. | check auth | mod_auth.c mod_auth_anon.c mod_auth_db.c mod_auth_dbm.c mod_digest.c |
5. | check access | mod_access |
6. | type_checker | mod_mime.c mod_mime_magic.c mod_negotiation.c mod_core |
7. | fixups | mod_alias.c mod_cern_meta.cmod_env.c mod_expires.c mod_headers.c mod_negotiation.c mod_speling.c mod_usertrack.c mod_core |
8 | content handlers | mod_actions.c mod_asis.c mod_autoindex.c mod_cgi.c mod_dir.c mod_imap.c mod_include.c mod_info.c mod_negotiation.c mod_status.c mod_core |
9 | logger | mod_log_agent.c mod_log_config.c mod_log_referer.c |
|
As can be seen from the above table the
functionality of
mod_core
is indeed minimal. Also one can observed
that a module tends to provide handlers for related phases (e.g authorization
and authentication or MIME type check and content handlers), although this type
of behavior is not a requirement.
The
behavior of a web proxy is very similar to the one of a web server, at least as
the client is concerned (i.e. follows the same protocol). Therefore the Apache
has been easily extended to implement a proxy behavior by the means of a module.
Being a more complex module it has been implemented as set of sub-components:
mod_proxy.h mod_proxy.c
is in a sense the main files
of this module since they define the module structure. The handlers
implemented by this module are:
- URI to filename translation this phase is necessary because request
might refer to file that are not on the local disk but o different machines.
- fix-up handler this only put the requested URL in a canonical for (so it
can be used as a search key in the cache.
- content handler: this is implemented because the proxy does not simply
delivers files from the local disk, it might contact an web server in order
to obtain the file and it might save the file into a cache.
proxy_cache.c
manages the cache implemented by the
proxy module. The cache is an example of a private data structure that survive
between implicit invocation to the handlers of the proxy module.
proxy_connect.c
implements the code that connect this
server to an web server. The proxy acts as a client for the web server and as
a server for the HTTP client.
proxy_ftp.c, proxy_http.c
implement utility routines
specific for HTTP and FTP protocols.
proxy_util.c
implements various routines that main
deal with matching symbolic host name, host Internet address, etc.
5. Concurrency
Apache provides access to two
levels of concurrency. Multiple server processes are forked at startup. The
default number of these processes is five, and this is the minimum number of
idle servers that Apache tries to maintain at all times. As well, if
multi-threading is supported by the operating system, a default of up to 50
threads is allowed for each process.
Each request that the server receives is actually handled by a copy of the
httdp program. Rather than creating a new copy when it is needed, and killing it
when a request is finished, Apache maintains at least 5 and at most 10 inactive
children at any given time. The parent process runs a periodic check on a
structure called the scoreboard, which keeps track of all existing server
processes and their status. If the scoreboard lists less than the minimum number
of idle servers, then the parent will spawn more. If the scoreboard lists more
than the maximum number of idle servers, which is by default 10, then the parent
will proceed to kill off the extra children.
When it receives a request, the parent process passes it along to the next
idle child on the scoreboard. Then the parent goes back to listening for the
next request.
|
Figure 5.The parent passes a request to a child to be
served |
There is also a default limit of 256 on the total number of servers that can
exist at one time. The authors of Apache provided this upper bound in order to
keep the machine that the software is running on from being swamped by servers
and crashing. The default was picked to keep the scoreboard file small enough so
that it can be scanned by the processes without causing overhead concerns.
Since the number of requests that can be processed at any one time is limited
by the number of processes that can exist, there is a queue provided for waiting
requests. The maximum number of pending requests that can sit on the queue is
511.
5.1. Keep-alive or Persistent Connections
Apache uses the persistent connection to allow multiple requests from a
client to be handled by one connection, rather than opening and closing a
connection for each request. The default maximum number of requests allowed over
one connection is 100. The connection is closed by a timeout.
6. Data Structures
opening line - some data structures that are central to the functioning of
the Apache server
6.1. request_rec
Once a request has been read in, http_request.c is the code which handles the
main line of request processing. It finds the right per-directory configuration,
building it if necessary. It then calls all the module dispatch functions in the
right order.
|
Figure 6. Calls to modules for the request processing
cycle |
When the module handlers are called, the only argument passed to them is
request_rec. The pieces of this structure which are public to the modules allow
them to learn what the request is and how it should be handled. Most of the
handlers complete their part of the request cycle by changing some fields in the
request rec. But the response handlers must actually return something to the
client. Sometimes these handlers need to direct a server to return some other
file instead of the one that the client originally requested. This is a
redirected request.
Some handlers can farm out part of their job to another process in the form
of a sub-request.
Request_rec can be a linked list if the request is redirected by a handler.
The structure can contain pointers to the request_rec the request is redirected
to and the request_rec it is redirected from. Or, if it is a sub-request, it can
contain a pointer to the original request_rec.
6.2. Internal Redirected Requests
The response handler may find that the request to be served is better handled
as another type of request. If the request is to an imagemap, a type map, or a
CGI script, then the actual resource the user requested is in some other URI
than the one originally used. In this case, the module's handler generates a new
request and passes it to another process.
The handler invokes ap_internal_redirect
, which initiates a new
request_rec. The chain of redirects is placed in a list of request_recs which is
linked by pointers. The results of the final response handler is passed back up
the chain to the one that caught the original request, and is then sent back to
the client.
6.3. Sub-requests
The sub_request mechanism allows a response handler to look up files and URIs
without actually sending a response. This is done using the functions
ap_sub_req_lookup_file
or ap_sub_req_lookup_uri
. These
construct a new request_rec which is processed up to the point of the response.
6.4. Request_rec
*from http_request.h* the central
tie that binds the system together - the data structure request_rec contains all
the information that each module
6.4.1. Structure of request_rec
Here is a partial list of the fields that are contained in a request_rec.
- pointers to other request_rec, as described above
- pointer to resource pool
- object being requested
- URI
- filename
- path
- status of information - e.g. set to zero if no such file
- document content information
- MIME header tables - in, out, and error headers
- information about the request
- protocol to use
- method - e.g. GET, HEAD, POST
- information for logging
- bytes sent
- request description
6.5. compool
Rather than keeping track of which files are opened and where allocated
memory is, and then explicitly tracking it all done to deallocate it, Apache
uses the idea of resource pools. A resource pool is a data structure
which keeps track of all allocations of finite resources that are associated
with a request. When the request cycle is finished, all the resources held in
the pool are released at one time.
This provides the advantages of garbage collection without the extensive
code, and small amounts of space can be allocated without adding large amounts
of record keeping.
One disadvantage of this method is that resources that are not being actively
used cannot be released until the pool is cleared. This can create problems,
especially with memory. So the modules can establish private resource pools that
they can clear or destroy as they want.
6.6. Command Tables
The core's command table is held in http_core.c. - example - (Thau, pg 3)XS
Each module may have its own command table, which allows it to handle
commands read from configuration files. The entries for each command listed in
the table are:
- the name of the command
- a pointer to the command handler - This is a C function which processes
the command.
- an argument which is passed to the command handler - A command handler may
process many commands.
- where the command may appear -
- the type and number of arguments the command takes
- a description of the arguments that the command takes (Thau, pg
3)
6.7. Scoreboard
The scoreboard structure is used to keep track of the child processes. The
information is kept brief, basically just the status value and the pid, the
process id number. The creators of Apache have plans to add a separate set of
longer score structures that will give the number of requests serviced, and data
on the current or most recent request.
Each time a parent process spawns a
child, a record is created for the child in scoreboard. When a child is killed,
its record is removed from scoreboard. The status value of a process is written
to scoreboard by the process itself. The parent process uses the status value of
each child to determine if new children need to be created, or if there are too
many idle processes.
The status values defined in scoreboard.h are the following:
SERVER_DEAD 0
SERVER_STARTING 1 /* Server Starting up */
SERVER_READY 2 /* Waiting for connection (or accept() lock) */
SERVER_BUSY_READ 3 /* Reading a client request */
SERVER_BUSY_WRITE 4 /* Processing a client request */
SERVER_BUSY_KEEPALIVE 5 /* Waiting for more requests via keepalive */
SERVER_BUSY_LOG 6 /* Logging the request */
SERVER_BUSY_DNS 7 /* Looking up a hostname */
SERVER_GRACEFUL 8 /* server is gracefully finishing request */
SERVER_NUM_STATUS 9 /* number of status settings */
6.8. Module Structure
The specific contents of a module are determined by the type of function the
module performs. Figure 4 shows a generalized picture of a module.
The source code has been analyzed with the Portable Bookshelf
Tools. Beside a number of incompatibilities between the fact extractor and the
C-source of Apache (that have been resolved by editing the source files, in such
a way as not to modify the facts), the the extracted facts were not very useful
in the analysis of the source code.
One of the main difficulties resides in the characteristics of the Apache
source code, which defines a large number of macros, not only for data
structures but also for procedures, their parameters and their return functions.
This mislead PBS in many cases, to show as suppliers/users the .h
files, when the actual suppliers/users were, in fact, the .c
files
(through macros).
As an example Figure
7 shows the result for the Apache core structure. The content of the
sub-components is hidden in order to increase clarity. The arrows that point
down (at the bottom of the picture), go to the utilities component described
earlier (ap, regex, os
).
The following table shows what has been grouped under each sub-system (file.*
means both file.c and file.h, prefix*.h means all files with that prefix).
Sub-component | Files |
http_main | http_main.*, httpd.h |
protocol | http_protocol.*, rfc1413.* |
http_request | http_request.* |
http_core | http_core.c, http_core.h |
http_config | http_config.*, http_config.global.h , ap_*.h |
resources | buff.*, alloc.* |
util | util.*.* |
http_log | http_log.* |
http_vhosts | http_vhost.* |
Another major difficulty in using PBS, or any other automatic fact extractor
is that there is no way to extract relations between the Apache core and the
modules. All the calls and references are done through pointers to functions or
data structures. These kinds of interactions are difficult, if not impossible,
to extract at compile time. Even most of the interactions between the
http_core.c
and the rest of the sub-components of the Apache core
are done through the same mechanism (see the section on
http_core.c
).
|
Figure 7. Facts extracted using the Portable Bookshelf
Tools |
The containment
relation used in the generation of the above figure is based on
Figure
3, which has been constructed using information from documentation,
source code comments and read-me files.
It should be noted that the suppliers and users of http_config
have not been drawn. That is because nearly everybody references this module
since many .h files have been included in it. The above figure is more a
validation of the concrete architecture depicted in Figure
3, rather than a source for it.
This report has offered a tour of the concrete architecture of the Apache web
server (release 1.3.4). The modular architecture of the server seems to offer
great opportunity for extending the code. Designers of Apache strove to move as
much of the functionality as possible into the modules. Therefore, modules must
implement a well defined API.
Communication between the core and the external modules is done through the
modules handler functions. The module handlers are invoked to perform certain
phases of processing a request. Handlers receive a reference to the request_rec,
which contains the information about the request and the resources the handlers
need.
We did not observe the same independence and well defined API between the
components of the core. Of course there are some clear utility components that
offer services to the other components of the core and to the modules, but the
important parts of the core are tightly linked together. One example of this
inter-dependence is the http_request.c
which controls the flow of
processing a request, the http_config.c
component which performs
the actual invocation of handlers, and the http_protocol.c
which
communicates with the HTTP client. This linking of the core components makes it
somewhat more difficult to change the behavior of Apache by modifying the core.
Fortunately, modules can do the same jobs as well, if not better, and they are
usually easier to write.
Since the method of calls to its handlers is transparent to the module and
all communication with a module is done through pointers to functions, we have
found that fact extractors do not capture the interaction between core and
modules.
APIApplication Programming Interface
componentterm used throughout this report in order to avoid the term module
which has been used in connection to (referring) an Apache module. This
distinction is not a standard terminology, and has the only purpose to avoid
confusion.
core (Apache)part of the Apache server that defines and manages the steps in answering
the request and implements the HTTP protocol.
CGI (script)Common Gateway Interface, an interface describing how a web server passes
parameters and receive results form another process on the same machine called
CGI-script (executed by the web server when it receive a request referencing the
script).
FTPFile Transport Protocol, a protocol that coordinates how binary and ascii
files are transfered over the Internet.
handlera function of a module that will be implicitly invoked by the core to handle
the phase of processing the HTTP request for which the handler was designed.
HTTPHypertext Transport Protocol, the protocol that coordinate how the hypertext
files are transfered over the Internet. However any files can be transfered via
HTTP.
httpdthe usual name for the web server (stands for HTTP daemon).
IPC (mechanisms)inter process communication mechanisms (e.g. queues, semaphores, shared
memory)
MIME typeMIME stands for Multipurpose Internet Mail Extension. MIME types are the
types (e.g. gif, html) of the entities defined in MIME request for comments
module (Apache module)part of Apache server that provides some functionality in one or more phases
of servicing an HTTP request. Its functions (handler) are implicitly invoked by
the Apache core. It is interfaced with the Apache core by a special API.
NCSA web serverthe web server provided and maintained by the Development Group of the
National Center for Super-computing Applications, at the University of Illinois
at Urbana - Champaign
proxy (web)is an server that is contacted by HTTP clients to fetch web pages on their
behalf from the actual web servers. It does web page caching, in order to serve
subsequent request for the cached pages. a message from the client containing
information about the resource requested and how it is wanted to be delivered.
request (HTTP client)a message from the client containing information about the resource
requested and how it is wanted to be delivered.
resource (HTTP)a network data object or service which can be identified by a URI
response (HTTP server)the response from the web server to an HTTP request, contains a header and
usually the actual resource. The header contains status information and
information on the resource (e.g. type, length of the binary representation).
resource poolA large data structure allocated in one step by the Apache core, which holds
the resources (memory blocks, open files) associated with a given request. When
the resource pool is no longer needed it is deallocated in one step (memory is
freed and files ore closed).
URI / URLUniversal Resource Identifiers / Universal Resource Locators
TCP/IPTransfer Control Protocol / Internet Protocol, protocols suite used as
transport level (TCP), network level (IP) protocols in the Internet.
virtual hosta single physical host might have more than one network interface, each with
a different IP address and a different host name. For clients it acts as being a
number of virtual hosts, one for each name.
[Thau96]Design
considerations for the Apache Server API, Robert Thau, Fifth
International World Wide Web Conference, 1996, Paris.
[APINotes]Apache API notes
, Robert S. Thau.
[ApacheDocs]Apache server documentation[Preston99]Conceptual
Architecture of the Apache Web Server, Jean Preston, assignment for
CS746G, Feb 2 1999, Dept. of Computer Science, University of Waterloo
(http://www.grad.math.uwaterloo.ca/~je2preston/)
[Dragoi99]Conceptual
Architecture of the Apache Web Server, O. Andrei Dragoi, assignment for
CS746G, Feb 2 1999, Dept. of Computer Science, University of Waterloo
(http://www.grad.math.uwaterloo.ca/~oadragoi/CS746G/a1/apache_conceptual_arch.html)
|----------------------------------------------------------------------------------------|
版权声明 版权所有 @zhyiwww
引用请注明来源 http://www.blogjava.net/zhyiwww
|----------------------------------------------------------------------------------------|
posted on 2008-05-09 16:39
zhyiwww 阅读(994)
评论(0) 编辑 收藏 所属分类:
software