The Apache Web Server
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
= = = = = = = = = = = = =
In This Chapter
Chapter
20
The Apache Web Server
Download and
Install The Apache Package
How To Get
Apache Started
Configuring
DNS For Apache
DHCP and
Apache
General
Configuration Steps
Configuration
- Multiple Sites And IP Addresses
Using Data
Compression On Web Pages
Apache
Running On A Server Behind A NAT Firewall
How To
Protect Web Page Directories With Passwords
The
/etc/httpd/conf.d Directory
Troubleshooting
Apache
Conclusion
(c) Peter Harrison, www.linuxhomenetworking.com
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
= = = = = = = = = = = = =
Apache is
probably the most popular Linux-based Web server application in use. Once you
have DNS correctly setup and your server has access to the Internet, you'll
need to configure Apache to accept surfers wanting to access your Web site.
This chapter explains how to configure Apache in a number of
commonly encountered scenarios for small web sites.
Download and Install The Apache Package
Most RedHat and Fedora Linux software products are available in
the RPM format. When searching for the file, remember that the Apache RPM's
filename usually starts with the word httpd followed by a version number, as in
httpd-2.0.48-1.2.rpm. It is best to use the latest version of Apache. (For more
on RPMs, see Chapter 6, "Installing RPM Software," on RPMs,).
How
To Get Apache Started
Use the chkconfig
command to configure Apache to start at boot:
[root@bigboy tmp]# chkconfig
httpd on
Use the httpd
init script in the /etc/init.d
directory to start,stop, and restart Apache after booting:
[root@bigboy tmp]# service
httpd start
[root@bigboy tmp]# service httpd stop
[root@bigboy tmp]# service httpd restart
You can test whether the Apache process is running with
[root@bigboy tmp]# pgrep
httpd
you should get a response of plain old process ID numbers.
Configuring DNS For Apache
Remember that you will never receive the correct traffic unless
you configure DNS for your domain to make your new Linux box Web server the
target of the DNS domain's www entry. To do this, refer to Chapter 18, "Configuring DNS," or Chapter 19, "Dynamic DNS.".
DHCP and Apache
As you remember, if your Internet connection uses DHCP to get
its IP address, then you need to use dynamic DNS to get the correct Internet
DNS entry for your Web server. If your Web server and firewall are different
machines, then you probably also need to set up port forwarding for your Web
traffic to reach the Web server correctly. (Chapter 19 explains port
forwarding, as well.).
DHCP on your protected home network is different. In the book's
sample topology, the web server lives on the
192.168.1.0 home network protected by a firewall. The firewall uses NAT and
port forwarding to pass Internet traffic on to the web server. Remember that
the IP address of your web server can change if it gets its IP address using
DHCP. This could cause your firewall port forwarding, not Dynamic DNS, to
break.
In this case I recommend that your web server on the
192.168.1.0 network uses a fixed, or static IP address that is outside of the
range of the DHCP server to prevent you from having this
problem.
General Configuration Steps
The configuration file used by Apache is /etc/httpd/conf/httpd.conf. As for
most Linux applications, you must restart Apache before changes to this
configuration file take effect.
Where To Put Your Web Pages
All the statements that define the features of each web site
are grouped together inside their own <VirtualHost>
section, or container, in the httpd.conf
file. The most commonly used statements, or directives, inside a <VirtualHost> container are:
o servername:
Defines the name of the website managed by the <VirtualHost>
container. This is needed in named virtual hosting only, as I'll explain soon.
o DocumentRoot:
Defines the directory in which the web pages for the site can be found.
By default, Apache searches the DocumentRoot
directory for an index, or home, page named index.html.
So for example, if you have a servername
of www.my-site.com with a DocumentRoot directory of /home/www/site1/, Apache displays the
contents of the file /home/www/site1/index.html
when you enter http://www.my-site.com
in your browser.
Some editors, such as Microsoft FrontPage, create files with an
.htm extension, not .html. This
isn't usually a problem if all your HTML files have hyperlinks pointing to
files ending in .htm as
FrontPage does. The problem occurs with Apache not recognizing the topmost index.htm page. The easiest solution
is to create a symbolic link (known as a shortcut to Windows users) called
index.html pointing to the file index.htm.
This then enables you to edit or copy the file index.htm with index.html being updated
automatically. You'll almost never have to worry about index.html and Apache again!
This example creates a symbolic link to index.html in the /home/www/site1
directory.
[root@bigboy tmp]# cd
/home/www/site1
[root@bigboy site1]# ln -s
index.htm index.html
[root@bigboy site1]# ll
index.*
-rw-rw-r-- 1
root root 48590
Jun 18 23:43 index.htm
lrwxrwxrwx 1
root root 9
Jun 21 18:05 index.html -> index.htm
[root@bigboy site1]#
The l at the
very beginning of the index.html entry signifies a link and the -> the link target.
The Default File Location
By default, Apache expects to find all its web page files in
the /var/www/html/ directory
with a generic DocumentRoot
statement at the beginning of httpd.conf.
The examples in this chapter use the /home/www
directory to illustrate how you can place them in other locations successfully.
File Permissions And Apache
Apache will display Web page files as long as they are world
readable. You have to make sure you make all the files and subdirectories in
your DocumentRoot have the
correct permissions.
It is a good idea to have the files owned by a nonprivileged
user so that Web developers can update the files using FTP or SCP without
requiring the root password.
To do this:
1.
Create a user with a home directory of /home/www.
2.
Recursively change the file ownership permissions of the /home/www
directory and all its subdirectories.
3.
Change the permissions on the /home/www
directory to 755, which allows all users, including the Apache's httpd daemon, to read the files
inside.
[root@bigboy tmp]#
useradd -g users www
[root@bigboy tmp]# chown
-R www:users /home/www
[root@bigboy tmp]# chmod
755 /home/www
Now we test for the new ownership with the ll command.
[root@bigboy tmp]# ll
/home/www/site1/index.*
-rw-rw-r-- 1
www users 48590
Jun 25 23:43 index.htm
lrwxrwxrwx 1
www users 9
Jun 25 18:05 index.html -> index.htm
[root@bigboy tmp]#
Note: Be sure to FTP or SCP new files to your web server
as this new user. This will make all the transferred files automatically have
the correct ownership.
If you browse your Web site after configuring Apache and get a "403
Forbidden" permissions-related error on your screen, then your files or
directories under your DocumentRoot
most likely have incorrect permissions. Appendix II, "Codes, Scripts, and
Configurations," has a short script that you can use to recursively set
the file permissions in a directory to match those expected by Apache. You may
also have to use the Directory
directive to make Apache serve the pages once the file permissions have been
correctly set. If you have your files in the default /home/www directory then this second step becomes
unnecessary.
Security Contexts For Web Pages
Fedora Core 3 introduced the concept of security contexts as
part of the Security Enhanced Linux (SELinux) definition. (See Appendix I, "Miscellaneous
Linux Topics," for details.) A Web page may have the right permissions,
but the Apache httpd daemon to
read it unless you assign it the correct security context or daemon access
permissions. Context-related configuration errors will give "403 Forbidden"
browser messages, and in some cases, you will get the default Fedora Apache
page where your expected Web page should be.
When a file is created, it inherits the security context of its
parent directory. If you decide to place your Web pages in the default /var/www/ directory, then they will
inherit the context of that directory and you should have very few problems.
The context of a file depends on the SELinux label it is given.
The most important types of security label are listed in Table 20-1.
HTTP
Code
|
Description
|
httpd_sys_content_t
|
The type used by regular static
web pages with .html and .htm extensions.
|
httpd_sys_script_ro_t
|
Required
for CGI scripts to read files and directories.
|
httpd_sys_script_ra_t
|
Same as the httpd_sys_script_ro_t
type but also allows appending data to files by the CGI script.
|
httpd_sys_script_rw_t
|
Files with this type may be
changed by a CGI script in any way, including deletion.
|
httpd_sys_script_exec_t
|
The type required for the
execution of CGI scripts
|
As expected, security contexts become important when Web pages
need to be placed in directories that are not the Apache defaults. In this
example, user root creates a directory /home/www/site1
in which the pages for a new Web site will be placed. Using the ls -Z command, you can see that the user_home_t security label has been
assigned to the directory and the index.html
page created in it. This label is not accessible by Apache.
[root@bigboy tmp]# mkdir /home/www/site1
[root@bigboy tmp]# ls -Z /home/www/
drwxr-xr-x root root root:object_r:user_home_t site1
[root@bigboy tmp]# touch /home/www/site1/index.html
[root@bigboy tmp]# ls -Z /home/www/site1/index.html
-rw-r--r-- root root root:object_r:user_home_t
/home/www/site1/index.html
[root@bigboy tmp]#
Accessing the index.html
file via a Web browser gets a "Forbidden 403" error on your screen,
even though the permissions are correct. Viewing the /var/log/httpd/error_log gives a "Permission Denied"
message and the /var/log/messages
file shows kernel audit errors.
[root@bigboy tmp]# tail /var/log/httpd/error_log
[Fri Dec 24 17:59:24 2004] [error] [client 216.10.119.250]
(13)Permission denied: access to / denied
[root@bigboy tmp]# tail /var/log/messages
Dec 24 17:59:24 bigboy kernel: audit(1103939964.444:0): avc:
denied { getattr } for pid=2188 exe=/usr/sbin/httpd path=/home/www/site1
dev=hda5 ino=73659 scontext=system_u:system_r:httpd_t
tcontext=root:object_r:user_home_t tclass=dir
[root@bigboy tmp]#
SELinux security context labels can be modified using the chcon command. Recognizing the error,
user root uses chcon with the -R (recursive) and -h
(modify symbolic links) qualifiers to modify the label of the directory to httpd_sys_content_t with the -t qualifier.
[root@bigboy tmp]# chcon -R -h -t httpd_sys_content_t
/home/www/site1
[root@bigboy tmp]# ls -Z /home/www/site1/
-rw-r--r-- root root
root:object_r:httpd_sys_content_t index.html
[root@bigboy tmp]#
Browsing now works without errors. User root won't have to run the chcon
command again for the directory, because new files created in the directory
will inherit the SELinux security label of the parent directory. You can see
this when the file /home/www/site1/test.txt
is created.
[root@bigboy tmp]# touch /home/www/site1/test.txt
[root@bigboy tmp]# ls -Z /home/www/site1/
-rw-r--r-- root root
root:object_r:httpd_sys_content_t index.html
-rw-r--r-- root root
root:object_r:httpd_sys_content_t test.txt
[root@bigboy tmp]#
Security Contexts For CGI Scripts
You can use Apache to trigger the execution of programs called Common
Gateway Interface (CGI) scripts. CGI scripts can be written in a variety of
languages, including PERL and PHP, and can be used to do such things as
generate new Web page output or update data files. A Web page's Submit button
usually has a CGI script lurking somewhere beneath. By default, CGI scripts are
placed in the /var/www/cgi-bin/
directory as defined by the ScriptAlias
directive you'll find in the httpd.conf
file, which I'll discuss in more detail later.
ScriptAlias /cgi-bin/ "/var/www/cgi-bin/"
In the default case, any URL with the string /cgi-bin/ will trigger Apache to
search for an equivalent executable file in this directory. So, for example,
the URL, http://192.168.1.100/cgi-bin/test/test.cgi
actually executes the script file /var/www/cgi-bin/test/test.cgi.
SELinux contexts have to be modified according to the values in
Table 20.1 for a CGI script to be run in another directory or to access data
files. In the example case, the PERL script test.cgi
was created to display the word "Success" on the screen of your Web
browser.
#!/usr/bin/perl
# CGI Script "test.cgi"
print qq(
<html>
<head>
<meta http-equiv="Content-Language" content="en-us">
<meta http-equiv="Content-Type" content="text/html">
<title>Linux Home Networking</title>
</head>
<body>
<p align="center"><font size="7">Success!</font></p>
</body>
</html>
);
The ScriptAlias
directive has been set to point to /home/www/cgi-bin/
instead of /var/www/cgi-bin/.
ScriptAlias /cgi-bin/ "/home/www/cgi-bin/"
User root creates the /home/www/cgi-bin/
directory, changes the directory's security context label to httpd_sys_script_exec_t, and then
creates the script /home/www/cgi-bin/test/test.cgi
mentioned previously with the correct executable file permissions.
[root@bigboy tmp]# mkdir -p /home/www/cgi-bin/test
[root@bigboy tmp]# chcon -h -t httpd_sys_script_exec_t
/home/www/cgi-bin/
[root@bigboy tmp]# mkdir /home/www/cgi-bin/test
[root@bigboy tmp]# ls -Z /home/www/cgi-bin
drwxr-xr-x root root root:object_r:httpd_sys_script_exec_t
test
[root@bigboy tmp]# vi /home/www/cgi-bin/test/test.cgi
[root@bigboy tmp]# chmod o+x /home/www/cgi-bin/test/test.cgi
[root@bigboy tmp]#
Accessing the URL http://192.168.1.100/cgi-bin/test/test.cgi
is successful. Problems occur when the same test.cgi
file needs to be used by a second Web site housed on the same Web server. The
file is copied to a directory /web/cgi-bin/site2/
governed by the ScriptAlias in
the second Web site's <VirtualHost>
container (explained later), but the security context label isn't copied along
with it.
ScriptAlias /cgi-bin/ "/web/cgi-bin/site2/"
The file inherits the context of its new parent.
[root@bigboy tmp]# cp /home/www/cgi-bin/test/test.cgi
/web/cgi-bin/site2/test.cgi
[root@bigboy tmp]# ls -Z /web/cgi-bin/site2/test.cgi
-rw-r--r-x root root root:object_r:tmp_t
/web/cgi-bin/site2/test.cgi
[root@bigboy tmp]#
Permission denied and kernel audit errors occur once more; you
can fix them only by changing the security context of the test.cgi file.
[root@bigboy tmp]# tail /var/log/httpd/error_log
[Fri Dec 24 18:36:08 2004] [error] [client 216.10.119.250]
(13)Permission denied: access to /cgi-bin/texcelon/test.cgi denied
[root@bigboy tmp]# tail /var/log/messages
Dec 24 18:36:08 bigboy kernel: audit(1103942168.549:0): avc:
denied { getattr } for pid=2191 exe=/usr/sbin/httpd
path=/web/cgi-bin/site2/test.cgi dev=hda5 ino=77491
scontext=system_u:system_r:httpd_t tcontext=root:object_r:tmp_t tclass=file
[root@bigboy tmp]#
Note: If you find security contexts too restrictive, you
can turn them off system wide by editing your /etc/selinux/config
file, modifying the SELINUX
parameter to disabled. SELinux
will be disabled after your next reboot.
Named Virtual Hosting
You can make your Web server host more than one site per IP
address by using Apache's named virtual hosting feature. You use the NameVirtualHost directive in the /etc/httpd/conf/httpd.conf file to
tell Apache which IP addresses will participate in this feature.
The <VirtualHost>
containers in the file then tell Apache where it should look for the Web
pages used on each Web site. You must specify the IP address for which each <VirtualHost> container applies.
Named Virtual Hosting Example
Consider an example in which the server is configured to
provide content on 97.158.253.26. In the code that follows, notice that within
each <VirtualHost> container
you specify the primary Web site domain name for that IP address with the ServerName directive. The DocumentRoot directive defines the directory
that contains the index page for that site.
You can also list secondary domain names that will serve the
same content as the primary ServerName using
the ServerAlias directive.
Apache searches for a perfect match of NameVirtualHost, <VirtualHost>,
and ServerName when making a
decision as to which content to send to the remote user's Web browser. If there
is no match, then Apache uses the first <VirtualHost>
in the list that matches the target IP address of the request.
This is why the first <VirtualHost>
statement contains an asterisk: to indicate it should be used for all other Web
queries.
NameVirtualHost 97.158.253.26
<VirtualHost *>
Default Directives. (In other words, not site #1 or site
#2)
</VirtualHost>
<VirtualHost 97.158.253.26>
servername www.my-site.com
Directives for site #1
</VirtualHost>
<VirtualHost 97.158.253.26>
servername www.another-site.com
Directives for site #2
</VirtualHost>
Be careful with using the asterisk in other containers. A <VirtualHost> with a specific IP
address always gets higher priority than a <VirtualHost>
statement with an * intended to cover the same IP address, even if the ServerName directive doesn't match. To
get consistent results, try to limit the use of your <VirtualHost *> statements to the beginning of the
list to cover any other IP addresses your server may have.
You can also have multiple NameVirtualHost
directives, each with a single IP address, in cases where your Web server has
more than one IP address.
IP-Based Virtual Hosting
The other virtual hosting option is to have one IP address per
Web site, which is also known as IP-based virtual hosting. In this case, you
will not have a NameVirtualHost
directive for the IP address, and you must only have a single <VirtualHost> container per IP
address.
Also, because there is only one Web site per IP address, the ServerName directive isn't needed in
each <VirtualHost>
container, unlike in named virtual hosting.
IP Virtual Hosting Example: Single Wild Card
In this example, Apache listens on all interfaces, but gives
the same content. Apache displays the content in the first <VirtualHost *> directive even
if you add another right after it. Apache also seems to enforce the single <VirtualHost> container per IP
address requirement by ignoring any ServerName
directives you may use inside it.
<VirtualHost *>
DocumentRoot /home/www/site1
</VirtualHost>
IP Virtual Hosting Example: Wild Card and IP
addresses
In this example, Apache listens on all interfaces, but gives
different content for addresses 97.158.253.26 and 97.158.253.27. Web surfers
get the site1 content if they
try to access the web server on any of its other IP addresses.
<VirtualHost *>
DocumentRoot /home/www/site1
</VirtualHost>
<VirtualHost 97.158.253.26>
DocumentRoot /home/www/site2
</VirtualHost>
<VirtualHost 97.158.253.27>
DocumentRoot /home/www/site3
</VirtualHost>
A Note On Virtual Hosting And
SSL
Because it makes configuration easier, system administrators
commonly replace the IP address in the <VirtualHost>
and NameVirtualHost directives
with the * wildcard character to
indicate all IP addresses.
If you installed Apache with support for secure HTTPS/SSL,
which is used frequently in credit card and shopping cart Web pages, then wild
cards won't work. The Apache SSL module demands at least one explicit <VirtualHost> directive for
IP-based virtual hosting. When you use wild cards, Apache interprets it as an
overlap of name-based and IP-based <VirtualHost>
directives and gives error messages because it can't make up its mind about
which method to use:
Starting httpd: [Sat Oct 12 21:21:49 2002] [error]
VirtualHost _default_:443 -- mixing * ports and non-* ports with a
NameVirtualHost
address is not supported, proceeding with undefined results
If you try to load any Web page on your web server, you'll see
the error:
Bad request!
Your browser (or proxy) sent a request that this server could
not understand.
If you think this is a server error, please contact the
webmaster
The best solution to this problem is to use wild cards more
sparingly. Don't use virtual hosting statements with wild cards except for the
very first <VirtualHost>
directive that defines the web pages to be displayed when matches to the other <VirtualHost> directives cannot
be found. Here is an example.
NameVirtualHost *
<VirtualHost *>
Directives for other sites
</VirtualHost>
<VirtualHost 97.158.253.28>
Directives for site that also run on SSL
</VirtualHost>
Configuration - Multiple
Sites And IP Addresses
To help you better understand the edits needed to configure the
/etc/httpd/conf/httpd.conf file,
I'll walk you through an example scenario. The parameters are:
>
The web site's systems administrator previously created DNS entries for www.my-site.com, my-site.com, www.my-cool-site.com and www.default-site.com
to map the IP address 97.158.253.26 on this web server. The domain www.another-site.com is also
configured to point to alias IP address 97.158.253.27. The administrator wants
to be able to get to www.test-site.com
on all the IP addresses.
>
Traffic to www.my-site.com,
my-site.com, and www.my-cool-site.com must get content
from subdirectory site2. Hitting
these URLs causes Apache to display the contents of file index.html in this directory.
>
Traffic to www.test-site.com must
get content from subdirectory site3.
>
Named virtual hosting will be required for 97.158.253.26 as in this case
we have a single IP address serving different content for a variety of domains.
A NameVirtualHost directive for
97.158.253.26 is therefore required.
>
Traffic going to www.another-site.com
will get content from directory site4.
>
All other domains pointing to this server that don't have a matching ServerName directive will get Web
pages from the directory defined in the very first <VirtualHost> container: directory site1. Site www.default-site.com falls in this category.
Table 20-2 summarizes these requirements.
Domain
|
IP address
|
Directory
|
Type of Virtual Hosting
|
www.my-site.com
my-site.com
www.my-cool-site.com
|
97.158.253.26
|
Site2
|
Name Based
|
www.test-site.com
|
97.158.253.27
|
Site3
|
Name Based
(Wild card)
|
www.another-site.com
All other domains
|
97.158.253.27
|
Site1
|
Name Based
|
www.default-site.com
|
97.158.253.26
|
Site1
|
Name Based
|
How do these requirements translate into code? Here is a sample
snippet of a working httpd.conf
file:
ServerName localhost
NameVirtualHost 97.158.253.26
NameVirtualHost 97.158.253.27
#
# Match a webpage directory with each website
#
<VirtualHost *>
DocumentRoot /home/www/site1
</VirtualHost>
<VirtualHost 97.158.253.26>
DocumentRoot /home/www/site2
ServerName www.my-site.com
ServerAlias my-site.com, www.my-cool-site.com
</VirtualHost>
<VirtualHost 97.158.253.27>
DocumentRoot /home/www/site3
ServerName www.test-site.com
</VirtualHost>
<VirtualHost 97.158.253.27>
DocumentRoot /home/www/site4
ServerName www.another-site.com
</VirtualHost>
#
# Make sure the directories specified above
# have restricted access to read-only.
#
<Directory "/home/www/*">
Order allow,deny
Allow from all
AllowOverride FileInfo AuthConfig Limit
Options MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec
<Limit GET POST OPTIONS>
Order allow,deny
Allow from all
</Limit>
<LimitExcept GET POST OPTIONS>
Order deny,allow
Deny from all
</LimitExcept>
</Directory>
These statements would normally be found at the very bottom of
the file where the virtual hosting statements reside. The last section of this
configuration snippet has some additional statements to ensure read-only access
to your Web pages with the exception of Web-based forms using POSTs (pages with
"submit" buttons). Remember to restart Apache every time you update
the httpd.conf file for the
changes to take effect on the running process.
Note: You will have to configure your DNS server to
point to the correct IP address used for each of the Web sites you host.
Chapter 18 shows you how to configure multiple domains, such as my-site.com and another-site.com, on your DNS server.
Testing Your Website Before DNS Is Fixed
You may not be able to wait for DNS to be configured correctly
before starting your project. The easiest way to temporarily bypass this is to
modify the hosts file on the Web developer's client PC or workstation (not the
Apache server). By default, PCs and Linux workstations query the hosts file
first before checking DNS, so if a value for www.my-site.com
is listed in the file, that's what the client will use.
The Windows equivalent of the Linux /etc/hosts file is named C:\WINDOWS\system32\drivers\etc\hosts.
You need to open and edit it with a text editor, such as Notepad. Here you
could add an entry similar to:
97.158.253.26 www.my-site.com
Do not remove the localhost
entry in this file
Disabling Directory Listings
Be careful to include an index.html pages in each
subdirectories under your DocumentRoot
directory, as if one isn't found, Apache will default to giving a listing of
all the files in that subdirectory.
Say, for example, you create a subdirectory named /home/www/site1/example under www.my-site.com's DocumentRoot of /home/www/site1/. Now you'll be able
to view the contents of the file my-example.html
in this subdirectory if you point your browser to:
http://www.my-site.com/example/my-example.html
If curious surfers decide to see what the index page is for www.my-site.com/example, they would
type the link:
http://www.my-site.com/example
Apache lists all the contents of the files in the example directory if it can't find the
index.html file. You can disable the directory listing by using a -Indexes option in the <Directory> directive for the DocumentRoot like this:
<Directory "/home/www/*">
...
...
...
Options MultiViews -Indexes
SymLinksIfOwnerMatch IncludesNoExec
Remember to restart Apache after the changes. Users attempting
to access the nonexistent index page will now get a "403 Access denied"
message.
Note: When setting up a yum
server it's best to enable directory listings for the RPM subdirectories. This
allows web surfers to double check the locations of files through their
browsers.
Handling Missing Pages
You can tell Apache to display a predefined HTML file whenever
a surfer attempts to access a non-index page that doesn't exist. You can place
this statement in the httpd.conf
file, which will make Apache display the contents of missing.htm instead of a
generic "404 file Not Found" message:
ErrorDocument 404 /missing.htm
Remember to put a file with this name in each DocumentRoot directory. You can see
the missing.htm file I use by trying the nonexistent link.
http://www.linuxhomenetworking.com/bogus-file.htm
Notice that this gives the same output as
http://www.linuxhomenetworking.com/missing.htm.
Using
Data Compression On Web Pages
Apache also has the ability to dynamically compress static Web
pages into gzip format and then send the result to the remote Web surfers' Web
browser. Most current Web browsers support this format, transparently
uncompressing the data and presenting it on the screen. This can significantly
reduce bandwidth charges if you are paying for Internet access by the megabyte.
First you need to load Apache version 2's deflate module in
your httpd.conf file and then use Location directives to specify which type of
files to compress. After making these modifications and restarting Apache, you
will be able to verify from your /var/log/httpd/access_log file that the sizes
of the transmitted HTML pages have shrunk.
Compare the file sizes in this Apache log.
[root@ bigboy tmp]# grep
dns-static /var/log/httpd/access_log
...
...
67.119.25.115 - - [15/Feb/2003:23:06:51 -0800] "GET
/dns-static.htm HTTP/1.1" 200 15190 "http://www.linuxhomenetworking.com/sendmail.htm"
"Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 4.0; AT&T CSM6.0; YComp
5.0.2.6)"
...
...
[root@ bigboy tmp]#
and the corresponding directory listing
[root@ bigboy tmp]# ll
/web-dir/dns-static.htm
-rw-r--r-- 1 user group 78350
Feb 15 00:53 /home/www/ccie/dns-static.htm
[root@bigboy tmp]#
As you can see, 78,350 bytes shrunk to 15,190 bytes, that's
almost 80% compression.
Compression Configuration
Example
You can insert these statements just before your virtual
hosting section of your httpd.conf
file to activate the compression of static pages. Remember to restart Apache
when you do.
Note: Fedora's version of httpd.conf
loads the compression module mod_deflate
by default. This means that the LoadModule
line (the first line of the example snippet) is not required for Fedora. The
location statements are required, however.
LoadModule deflate_module modules/mod_deflate.so
<Location />
# Insert filter
SetOutputFilter DEFLATE
# Netscape 4.x has some problems...
BrowserMatch ^Mozilla/4 gzip-only-text/html
# Netscape 4.06-4.08 have some more problems
BrowserMatch ^Mozilla/4\.0[678] no-gzip
# MSIE masquerades as Netscape, but it is fine
BrowserMatch \bMSIE !no-gzip !gzip-only-text/html
# Don't compress images
SetEnvIfNoCase Request_URI \
\.(?:gif|jpe?g|png)$ no-gzip dont-vary
# Make sure proxies don't deliver the wrong content
Header append Vary User-Agent env=!dont-vary
</Location>
Apache
Running On A Server Behind A NAT Firewall
If your Web server is behind a NAT firewall and you are logged
on a machine behind the firewall as well, then you may encounter problems when
trying to access www.mysite.com
of www.another-site.com. Because
of NAT (network address translation), firewalls frequently don't allow access
from their protected network to IP addresses that they masquerade on the outside.
For example, Linux Web server bigboy
has an internal IP address of 192.168.1.100, but the firewall presents it to
the world with an external IP address of 97.158.253.26 via NAT/masquerading. If
you are on the inside, 192.168.1.X network, you may find it impossible to hit
URLs that resolve in DNS to 97.158.253.26.
There is a two part solution to this problem:
Step 1: Configure Virtual Hosting on
Multiple IPs
You can configure Apache to serve the correct content when
accessing www.mysite.com or www.another-site.com from the outside,
and also when accessing the specific IP address 192.168.1.100 from the inside.
Fortunately Apache allows you to specify multiple IP addresses in the <VirtualHost> statements to help
you overcome this problem.
Here is an example:
NameVirtualHost 192.168.1.100
NameVirtualHost 97.158.253.26
<VirtualHost 192.168.1.100 97.158.253.26>
DocumentRoot /www/server1
ServerName www.my-site.com
ServerAlias bigboy, www.my-site-192-168-1-100.com
</VirtualHost>
Step 2: Configure DNS "Views"
You now need to fix the DNS problem that NAT creates. Users on
the Internet need to access IP address 97.158.253.26 when visiting www.my-site.com and users on your home
network need to access IP address 192.168.1.100 when visiting the same site.
You can configure your DNS server to use views which makes your DNS server give
different results depending on the source IP address of the Web surfer's PC
doing the query. Chapter 18 explains how to do this in detail.
Note: If you have to rely on someone else to do the DNS
change, then you can edit your PC's hosts file as a quick and dirty temporary
solution to the problem. Remember that this will fix the problem on your PC
alone.
How
To Protect Web Page Directories With Passwords
You can password protect content in both the main and
subdirectories of your DocumentRoot
fairly easily. I know people who allow normal access to their regular Web
pages, but require passwords for directories or pages that show MRTG or Webalizer data. This example shows how to password
protect the /home/www directory.
1.
Use Apache's htpasswd
password utility to create username/password combinations independent of your
system login password for Web page access. You have to specify the location of
the password file, and if it doesn't yet exist, you have to include a -c, or
create, switch on the command line. I recommend placing the file in your /etc/httpd/conf directory, away from
the DocumentRoot tree where Web
users could possibly view it. Here is an example for a first user named peter
and a second named paul:
[root@bigboy tmp]#
htpasswd -c /etc/httpd/conf/.htpasswd peter
New password:
Re-type new password:
Adding password for user peter
[root@bigboy tmp]#
[root@bigboy tmp]#
htpasswd /etc/httpd/conf/.htpasswd paul
New password:
Re-type new password:
Adding password for user paul
[root@bigboy tmp]#
2.
Make the .htpasswd file
readable by all users.
[root@bigboy tmp]# chmod 644 /etc/httpd/conf/.htpasswd
3.
Create a .htaccess file
in the directory to which you want password control with these entries.
AuthUserFile /etc/httpd/conf/.htpasswd
AuthGroupFile /dev/null
AuthName EnterPassword
AuthType Basic
require user peter
Remember this password protects the directory and all its
subdirectories. The AuthUserFile
tells Apache to use the .htpasswd
file. The require user statement tells Apache that only user peter in the .htpasswd file should have access. If
you want all .htpasswd users to
have access, replace this line with require valid-user. AuthType Basic instructs Apache to accept basic
unencrypted passwords from the remote users' Web browser.
4.
Set the correct file protections on your new .htaccess file in the directory /home/www.
[root@bigboy tmp]# chmod
644 /home/www/.htaccess
5.
Make sure your /etc/httpd/conf/http.conf
file has an AllowOverride
statement in a <Directory>
directive for any directory in the tree above /home/www.
In this example below, all directories below /var/www/
require password authorization.
<Directory /home/www/*>
AllowOverride AuthConfig
</Directory>
6.
Make sure that you have a <VirtualHost>
directive that defines access to /home/www
or another directory higher up in the tree.
<VirtualHost *>
ServerName 97.158.253.26
DocumentRoot /home/www
</VirtualHost>
7.
Restart Apache.
Try accessing the web site and you'll be prompted for a
password.
The
/etc/httpd/conf.d Directory
Files in the /etc/httpd/conf.d
directory are read and automatically appended to the configuration in the httpd.conf file every time Apache is
restarted. In complicated configurations, in which a Web server has to host
many Web sites, you can create one configuration file per Web site each with
its own set of <VirtualHost>
and <Directory>
containers. This can make Web site management much simpler. To do this correctly.
Follow the following steps to do this correctly.
1.
Backup your httpd.conf
file, in case you make a mistake.
2.
Create the files located in this directory that contain the Apache required
<VirtualHost> and <Directory> containers and
directives.
3.
If each site has a dedicated IP address, then place the NameVirtualHost statements in the
corresponding /etc/httpd/conf.d
directory file. If it is shared, it'll need to remain in the main httpd.conf file.
4.
Remove the corresponding directives from the httpd.conf file.
5.
Restart Apache, and test.
The files located in the /etc/httpd/conf.d directory don't
have to have any special names, and you don't have to refer to them in the httpd.conf
file.
Troubleshooting Apache
Troubleshooting a basic Apache configuration is fairly
straightforward; you'll find errors in the /var/log/httpd/error_log
file during normal operation or displayed on the screen when Apache starts up.
Most of the errors you'll encounter will probably be related to incompatible
syntax in the <VirtualHosts>
statement caused by typing errors.
Testing Basic HTTP
Connectivity
The very first step is to determine whether your web server is
accessible on TCP port 80 (HTTP).
Lack of connectivity could be caused by a firewall with
incorrect permit, NAT, or port forwarding rules to your Web server. Other
sources of failure include Apache not being started at all, the server being
down, or network-related failures.
If you can connect on port 80 but no pages are being served,
then the problem is usually due to a bad Web application, not the Web server
software itself.
It is best to test this from both inside your network and from
the Internet. Troubleshooting with TELNET is covered in Chapter 4, "Simple Network Troubleshooting."
Browser 403 Forbidden Messages
Browser 403 Forbidden messages are usually caused by file
permissions and security context issues. Please refer to the "General
Configuration Steps" section for further details.
A sure sign of problems related to security context are "avc:
denied" messages in your /var/log/messages
log file.
Nov 21 20:41:23 bigboy kernel: audit(1101098483.897:0): avc: denied { getattr } for pid=1377
exe=/usr/sbin/httpd path=/home/www/index.html dev=hda5 ino=12
scontext=root:system_r:httpd_t tcontext=root:object_r:home_root_t tclass=file
Only The Default Apache Page Appears
When only the default Apache page appears, there are two main
causes. The first is the lack of an index.html file in your Web site's DocumentRoot directory. The second cause
is usually related to an incorrect security context for the Web page's file.
Please refer to the "General Configuration Steps" section for further
details.
Incompatible /etc/httpd/conf/http.conf Files When Upgrading
Your old configuration files will be incompatible when
upgrading from Apache version 1.3 to Apache 2.X. The new version 2.X default
configuration file is stored in /etc/httpd/conf/httpd.conf.rpmnew.
For the simple virtual hosting example above, it would be easiest to:
1.
Save the old httpd.conf
file with another name, httpd.conf-version-1.x for example. Copy the ServerName, NameVirtualHost, and VirtualHost containers from the
old file and place them in the new httpd.conf.rpmnew
file.
2.
Copy the httpd.conf.rpmnew
file an name it httpd.conf
3.
Restart Apache
Server Name Errors
All ServerName
directives must list a domain that is resolvable in DNS, or else you'll get an
error similar to these when starting httpd.
Starting httpd: httpd: Could not determine the server's fully
qualified domain name, using 127.0.0.1 for ServerName
Starting httpd: [Wed Feb 04 21:18:16 2004] [error]
(EAI 2)Name or service not known: Failed to resolve server name for
192.16.1.100 (check
DNS) -- or specify an explicit ServerName
You can avoid this by adding a default generic ServerName directive at the top of the
httpd.conf file that references localhost instead of the default
new.host.name:80.
#ServerName new.host.name:80
ServerName localhost
The Apache Status Log Files
The /var/log/httpd/access_log file is updated after every
HTTP query and is a good source of general purpose information about your
website. There is a fixed formatting style with each entry being separated by
spaces or quotation marks. Table 20-3 lists the layout.
Field
Number
|
Description
|
Separator
|
1
|
IP Address of the remote web
surfer
|
Spaces
|
2
|
Time Stamp
|
Square Brackets []
|
3
|
HTTP query including the web page
served
|
Quotes ""
|
4
|
HTTP result code
|
Spaces
|
5
|
The amount of data in bytes sent
to the remote web browser
|
Spaces
|
6
|
The web page that contained the
link to the page served.
|
Quotes ""
|
7
|
The version of the web browser
used to get the page
|
Quotes ""
|
Upon examining the entry, you can determine that someone at IP
address 67.119.25.115 on February 15, looked at the web page /dns-static.htm returning a successful
200 status code. The amount of data sent was 15190 bytes and the surfer got to
the site by clicking on the link http://www.linuxhomenetworking.com/sendmail.htm
using Microsoft Internet Explorer version 5.5.
67.119.25.115 - - [15/Feb/2003:23:06:51 -0800] "GET
/dns-static.htm HTTP/1.1" 200 15190 "http://www.linuxhomenetworking.com/sendmail.htm"
"Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 4.0; AT&T CSM6.0; YComp
5.0.2.6)"
The HTTP status code can provide some insight into the types of
operations surfers are trying to attempt and may help to isolate problems with
your pages, not the operation of the Apache. For example 404 errors are
generated when someone tries to access a web page that doesn't exist anymore.
This could be caused by incorrect URL links in other pages on you site. Table
20-4 has some of the more common examples.
HTTP
Code
|
Description
|
200
|
Successful request
|
304
|
Successful request, but the web
page requested hasn't been modified since the current version in the remote
web browser's cache. This means the web page will not be sent to the remote
browser, it will just use its cached version instead. Frequently occurs when
a surfer is browsing back and forth on a site.
|
401
|
Unauthorized access. Someone
entered an incorrect username / password on a password protected page.
|
403
|
Forbidden. File permissions
prevents Apache from reading the file. Often occurs when the web page file is
owned by user "root" even though it has universal read access.
|
404
|
Not found. Page requested doesn't
exist.
|
500
|
Internal server error.
Frequently generated by CGI scripts that fail due to bad syntax. Check your
error_log file for further details on the script's error message.
|
The Apache Error Log Files
The /var/log/httpd/error_log
file is a good source for error information. Unlike the /var/log/httpd/access_log file, there is no standardized
formatting.
Typical errors that you'll find here are HTTP queries for files
that don't exist or forbidden requests for directory listings. The file will
also include Apache startup errors which can be very useful.
The /var/log/httpd/error_log
file also is the location where CGI script errors are written. Many
times CGI scripts fail with a blank screen on your browser; the /var/log/httpd/error_log file most
likely has the cause of the problem.
Conclusion
Web sites both personal and commercial can be very rewarding
exercises as they share your interests with the world and allow you to meet new
people with whom to develop friendships or transact business.
Unfortunately, even the best Web sites can be impersonal as
they frequently only provide information that the designer expects the visitor
to need. E-mail, although ancient in comparison to newer personalized interactive
Internet technologies, such as IP telephony and instant messaging, has the
advantage of being able to relay documents and other information without
interrupting the addressee. This allows them to schedule a response when they
are better prepared to answer, a valuable quality when replies need to be
complex.
Chapter 21, "Configuring Linux Mail Servers,"
explains how to configure a Linux e-smail server to reduce spam and provide
personalized addresses across multiple domains. No Web site should be without
one.