选择java 进入自由开放的国度

随笔 - 49, 文章 - 3, 评论 - 154, 引用 - 1
数据加载中……

Getting Started with CGI Programming in C (转,译)

Getting Started with CGI Programming in C

This is an introduction to writing CGI programs in the C language. The reader is assumed to know the basics of C as well how to write simple forms in HTML and to be able to install CGI scripts on a Web server. The principles are illustrated with very simple examples.

Two important warnings:
  • To avoid wasting your time, please check--from applicable local documents or by contacting local webmaster--whether you can install and run CGI scripts written in C on the server. At the same time, please check how to do that in detail--specifically, where you need to put your CGI scripts.
  • This document was written to illustrate the idea of CGI scripting to C programmers. In practice, CGI programs are usually written in other languages, such as Perl, and for good reasons: except for very simple cases, CGI programming in C is clumsy and error-prone.

Content

Why CGI programming?

As my document How to write HTML forms briefly explains, you need a server side-script in order to use HTML forms reliably. Typically there are simple server-side scripts available for simple, common ways of processing form submissions, such as sending the data in text format by E-mail to a specified address.

But for more advanced processing, such as collecting data into a file or database, or retrieving information and sending it back, or doing some calculations with the submitted data, you will probably need to write a server-side script of your own.

CGI is simply an interface between HTML forms and server-side scripts. It is not the only possibility--see the excellent tutorial How the web works: HTTP and CGI explained by Lars Marius Garshol for both an introduction to the concepts of CGI and notes on other possibilities. But CGI is widely used and useable.

If someone suggests using JavaScript as an alternative to CGI, ask him to read my JavaScript and HTML: possibilities and caveats. Briefly, JavaScript is inherently unreliable at least if not "backed up" with server-side scripting.

So what is CGI programming?

The above-mentioned How the web works: HTTP and CGI explained is a great tutorial. There are some shorter introductions like Introduction to the Common Gateway Interface (CGI) in the Virtualville Library. The following introduction of mine is just another attempt to present the basics; please consult other sources if you get confused or need more information.

Let us consider the following simple HTML form:

				<FORM ACTION="http://www.cs.tut.fi/cgi-bin/run/~jkorpela/mult.cgi">
<P>Please specify the multiplicands:
<INPUT NAME="m" SIZE="5">
<INPUT NAME="n" SIZE="5"><BR>
<INPUT TYPE="SUBMIT" VALUE="Multiply!">
</FORM>

It will look like the following on your current browser:

Please specify the multiplicands:

You can try it if you like. Just in case the server used isn't running and accessible when you try it, here's what you would get as the result:

Multiplication results

The product of 4 and 9 is 36.

What we discuss here is how it works.

Assume that you type 4 into one input field and 9 into another and then invoke submission (typically, by clicking on a submit button), your browser will send, by HTTP, a request to the server at www.cs.tut.fi. The browser pick up this server name from the value of ACTION attribute where it occurs as the host name part of a URL. (Quite often the ACTION attribute refers, often using a relative URL, to a script on the same server as the document resides on, but this is not necessary, as this example shows.)

When sending the request, the browser provides additional information, specifying a relative URL, in this case
/cgi-bin/run/~jkorpela/mult.cgi?m=4&n=9
This was constructed from that part of the ACTION value which follows the host name, by appending a question mark ? and the form data in a specifically encoded format.

The server to which the request was sent (in this case, www.cs.tut.fi) will then process it according to its own rules. Typically, the server's configuration defines how the relative URLs are mapped to file names and which directories/folders are interpreted as containing CGI scripts. As you may guess, the part cgi-bin/ in the URL causes such interpretation in this case. This means that instead of just picking up and sending back (to the browser which sent the request) an HTML document or some other file, the server invokes a script or a program specified in the URL (mult.cgi in this case) and passes some data to it (the data m=4&n=9 in this case).

It depends on the server how this really happens. In this particular case, the server actually runs the (executable) program in the file mult.cgi in the subdirectory cgi-bin of user jkorpela's home directory. It could be something quite different, depending on server configuration.

The often-mysticized abbreviation CGI, for Common Gateway Interface, refers just to a convention on how the invocation and parameter passing takes place in detail. Invocation means different things in different cases. For a Perl script, the server would invoke a Perl interpreter and make it execute the script in an interpretive manner. For an executable program, which has typically been produced by a compiler and a loader from a source program in a language like C, it would just be started as a separate process. Although the word script typically suggests that the code is interpreted, the term CGI script refers both to such scripts and to executable programs. See answer to question Is it a script or a program? in CGI Programming FAQ by Nick Kew.

You need to compile and load your C program on the server (or, in principle, on a system with the same architecture, so that binaries produced for it are executable on the server too).

And you need to put the executable into a suitable directory and name it according to server-specific conventions For example, if the server runs some flavor of Unix and has the Gnu C compiler available, you would typically use a compilation command like gcc -o mult.cgi mult.c and then move (mv) mult.c to a directory with a name like cgi-bin. But you really need to check local instructions for such issues.

The filename extension .cgi has no fixed meaning in general. But there can be server-dependent (and operating system dependent) rules for naming executable files. Typical extensions for executables are .cgi and .exe.

How to process a simple form

For forms which use METHOD="GET" (as our simple example above uses, since this is the default), CGI specifications say that the data is passed to the script or program in an environment variable called QUERY_STRING.

It depends on the scripting or programming language used how a program can access the value of an environment variable. In the C language, you would use the library function getenv (defined in the standard library stdlib) to access the value as a string. You might then use various techniques to pick up data from the string, convert parts of it to numeric values, etc.

The output from the script or program to "primary output stream" (such as stdin in the C language) is handled in a special way. Effectively, it is directed so that it gets sent back to the browser. Thus, by writing a C program that it writes an HTML document onto its standard output, you will make that document appear on user's screen as a response to the form submission.

In this case, the source program in C is the following:

				#include <stdio.h>
#include <stdlib.h>
int main(void)
{
char *data;
long m,n;
printf("%s%c%c\n",
"Content-Type:text/html;charset=iso-8859-1",13,10);
printf("<TITLE>Multiplication results</TITLE>\n");
printf("<H3>Multiplication results</H3>\n");
data = getenv("QUERY_STRING");
if(data == NULL)
printf("<P>Error! Error in passing data from form to script.");
else if(sscanf(data,"m=%ld&n=%ld",&m,&n)!=2)
printf("<P>Error! Invalid data. Data must be numeric.");
else
printf("<P>The product of %ld and %ld is %ld.",m,n,m*n);
return 0;
}

As a disciplined programmer, you have probably noticed that the program makes no check against integer overflow, so it will return bogus results for very large operands. In real life, such checks would be needed, but such considerations would take us too far from our topic.

Note: The first printf function call prints out data which will be sent by the server as an HTTP header. This is required for several reasons, including the fact that a CGI script can send any data (such as an image or a plain text file) to the browser, not just HTML documents. For HTML documents, you can just use the printf function call above as such; however, if your character encoding is different from ISO 8859-1 (ISO Latin 1), which is the most common on the Web, you need to replace iso-8859-1 by the registered name of the encoding ("charset") you use.

I have compiled this program and saved the executable program under the name mult.cgi in my directory for CGI scripts at www.cs.tut.fi. This implies that any form with action="http://www.cs.tut.fi/cgi-bin/run/~jkorpela/mult.cgi" will, when submitted, be processed by that program.

As a consequence, anyone could write a form of his own with the same ACTION attribute and pass whatever data he likes to my program. Therefore, the program needs to be able to handle any data. Generally, you need to check the data before starting to process it.

How to process a form with METHOD="POST"

Let us consider next a different processing for form data. Assume that we wish to write a form which takes a line of text as input so that the form data is sent to a CGI script which appends the data to a text file on the server. (That text file could be readable by the author of the form and the script only, or it could be made readable to the world through another script.)

It might seem that the problem is similar to the example considered above; one would just need a different form and a different script (program). But in fact, there is a difference. The example above can be regarded as a "pure query" which does not change the "state of the world", and in particular it is "idempotent", i.e. the same form data could be submitted as many times as you like without causing any problems (except minor waste of resources). But our current task needs to cause such changes--a change in the content of a file which is intended to be more or less permanent. Therefore, one should use METHOD="POST". This is explained in more detail in the document Methods GET and POST in HTML forms - what's the difference? Here we will take it for granted that METHOD="POST" needs to be used and consider the technical consequences.

For forms which use METHOD="POST", CGI specifications say that the data is passed to the script or program in the standard input stream (stdin), and the length (in bytes, i.e. characters) of the data is passed in an environment variable called CONTENT_LENGTH.

Reading from standard input sounds probably simpler than reading from an environment variable, but there are complications. The server is not required to pass the data so that when the CGI script tries to read more data than there is, it would get an end of file indication! That is, if you read e.g. using the getchar function in a C program, it is undefined what happens after reading all the data characters; it is not guaranteed that the function will return EOF.

When reading the input, the program must not try to read more than CONTENT_LENGTH characters.

A relatively simple C program for accepting input via CGI and METHOD="POST" is the following:

				#include <stdio.h>
#include <stdlib.h>
#define MAXLEN 80
#define EXTRA 5
/* 4 for field name "data", 1 for "=" */
#define MAXINPUT MAXLEN+EXTRA+2
/* 1 for added line break, 1 for trailing NUL */
#define DATAFILE "../data/data.txt"

void unencode(char *src, char *last, char *dest)
{
for(; src != last; src++, dest++)
if(*src == '+')
*dest = ' ';
else if(*src == '%') {
int code;
if(sscanf(src+1, "%2x", &code) != 1) code = '?';
*dest = code;
src +=2; }
else
*dest = *src;
*dest = '\n';
*++dest = '\0';
}

int main(void)
{
char *lenstr;
char input[MAXINPUT], data[MAXINPUT];
long len;
printf("%s%c%c\n",
"Content-Type:text/html;charset=iso-8859-1",13,10);
printf("<TITLE>Response</TITLE>\n");
lenstr = getenv("CONTENT_LENGTH");
if(lenstr == NULL || sscanf(lenstr,"%ld",&len)!=1 || len > MAXLEN)
printf("<P>Error in invocation - wrong FORM probably.");
else {
FILE *f;
fgets(input, len+1, stdin);
unencode(input+EXTRA, input+len, data);
f = fopen(DATAFILE, "a");
if(f == NULL)
printf("<P>Sorry, cannot store your data.");
else
fputs(data, f);
fclose(f);
printf("<P>Thank you! The following contribution of yours has \
been stored:<BR>%s",data);
}
return 0;
}

Essentially, the program retrieves the information about the number of characters in the input from value of the CONTENT_LENGTH environment variable. Then it unencodes (decodes) the data, since the data arrives in a specifically encoded format. The program has been written for a form where the text input field has the name data (actually, just the length of the name matters here). For example, if the user types
Hello there!
then the data will be passed to the program encoded as data=Hello+there%21
(with space encoded as + and exclamation mark encoded as %21). The unencode routine in the program converts this back to the original format. After that, the data is appended to a file (with a fixed file name), as well as echoed back to the user.

Having compiled the program I have saved it as collect.cgi into the directory for CGI scripts. Now a form like the following can be used for data submissions:

				<FORM ACTION="http://www.cs.tut.fi/cgi-bin/run/~jkorpela/collect.cgi"
METHOD="POST">
<P>Please type your input (80 chars max.):<BR>
<INPUT NAME="data" SIZE="60" MAXLENGTH="80"><BR>
<INPUT TYPE="SUBMIT" VALUE="Send">
</FORM>

Finally, we can write a simple program for viewing the data; it only needs to copy the content of a given text file onto standard output:

				#include <stdio.h>
#include <stdlib.h>
#define DATAFILE "../data/data.txt"
int main(void)
{
FILE *f = fopen(DATAFILE,"r");
int ch;
if(f == NULL) {
printf("%s%c%c\n",
"Content-Type:text/html;charset=iso-8859-1",13,10);
printf("<TITLE>Failure</TITLE>\n");
printf("<P><EM>Unable to open data file, sorry!</EM>"); }
else {
printf("%s%c%c\n",
"Content-Type:text/plain;charset=iso-8859-1",13,10);
while((ch=getc(f)) != EOF)
putchar(ch);
fclose(f); }
return 0;
}

Notice that this program prints (when succesful) the data as plain text, preceded by a header which says this, i.e. has text/plain instead of text/html.

A form which invokes that program can be very simple, since no input data is needed:

				<FORM ACTION="http://www.cs.tut.fi/cgi-bin/run/~jkorpela/viewdata.cgi">
<P><INPUT TYPE="SUBMIT" VALUE="View">
</FORM>

Finally, here's what the two forms look like. You can now test them:

Form for submitting data

Please notice that anything you submit here will become visible to the world:

Please type your input (80 chars max.):

Form for checking submitted data

The content of the text file to which the submissions are stored will be displayed as plain text.


You may now wish to read The CGI specification which tells you all the basic details about CGI. The next step is probably to see what the CGI Programming FAQ contains.

There is a lot of material, including introductions and tutorials, in the CGI Resource Index. Notice in particular the section Programs and Scripts: C and C++: Libraries and Classes which contains libraries which can make it easier to process form data. It can be instructive to parse simple data format by using code of your own, as was done in the simple examples above, but in practical application a library routine might be better.


Date of last update: 2001-04-15. Technical fix 2002-11-26.

posted on 2006-06-26 08:41 soochow_hhb 以java论成败 以架构论英雄 阅读(1796) 评论(0)  编辑  收藏 所属分类: Reading


只有注册用户登录后才能发表评论。


网站导航: