[Go to first, previous, next page]

Chapter 17

CGI scripts

WARNING: CGI scripts without appropriate safeguards can compromise your site's security. The scripts presented here are simple examples and are not assured to be secure for actual Web use.

CGI scripts [4] are scripts that reside on a web server and can be run by a client (browser). The client accesses a CGI script by its URL, just as they would a regular page. The server, recognizing that the URL requested is a CGI script, runs it. How the server recognizes certain URLs as scripts is up to the server administrator. For the purposes of this text, we will assume that they are stored in a distinguished directory called cgi-bin. Thus, the script testcgi.scm on the server www.foo.org would be accessed as http://www.foo.org/cgi-bin/testcgi.scm.

The server runs the CGI script as the user nobody, who cannot be expected to have any PATH knowledge (which is highly subjective anyway). Therefore the introductory magic line for a CGI script written in Scheme needs to be a bit more explicit than the one we used for ordinary Scheme scripts. E.g., the line

":";exec mzscheme -r $0 "$@"

implicitly assumes that there is a particular shell (bash, say), and that there is a PATH, and that mzscheme is in it. For CGI scripts, we will need to be more expansive:

#!/bin/sh
":"
;exec /usr/local/bin/mzscheme -r $0 "$@"

This gives fully qualified pathnames for the shell and the Scheme executable. The transfer of control from shell to Scheme proceeds as for regular scripts.

17.1  Example: Displaying environment variables

Here is an example Scheme CGI script, testcgi.scm, that outputs the settings of some commonly used CGI environment variables. This information is returned as a new, freshly created, page to the browser. The returned page is simply whatever the CGI script writes to its standard output. This is how CGI scripts talk back to whoever called them -- by giving them a new page.

Note that the script first outputs the line

content-type: text/plain

followed by a blank line. This is standard ritual for a web server serving up a page. These two lines aren't part of what is actually displayed as the page. They are there to inform the browser that the page being sent is plain (i.e., un-marked-up) text, so the browser can display it appropriately. If we were producing text marked up in HTML, the content-type would be text/html.

The script testcgi.scm:

#!/bin/sh
":"
;exec /usr/local/bin/mzscheme -r $0 "$@"

;Identify content-type as plain text.

(display "content-type: text/plain") (newline)
(newline)

;Generate a page with the requested info.  This is
;done by simply writing to standard output.

(for-each
 (lambda (env-var)
   (display env-var)
   (display " = ")
   (display (or (getenv env-var""))
   (newline))
 '("AUTH_TYPE"
   "CONTENT_LENGTH"
   "CONTENT_TYPE"
   "DOCUMENT_ROOT"
   "GATEWAY_INTERFACE"
   "HTTP_ACCEPT"
   "HTTP_REFERER" 
; [sic]
   "HTTP_USER_AGENT"
   "PATH_INFO"
   "PATH_TRANSLATED"
   "QUERY_STRING"
   "REMOTE_ADDR"
   "REMOTE_HOST"
   "REMOTE_IDENT"
   "REMOTE_USER"
   "REQUEST_METHOD"
   "SCRIPT_NAME"
   "SERVER_NAME"
   "SERVER_PORT"
   "SERVER_PROTOCOL"
   "SERVER_SOFTWARE"))

testcgi.scm can be called directly by opening it on a browser. The URL is:

http://www.foo.org/cgi-bin/testcgi.scm

Alternately, testcgi.scm can occur as a link in an HTML file, which you can click. E.g.,

... To view some common CGI environment variables, click
<a href="http://www.foo.org/cgi-bin/testcgi.scm">here</a>.
...

However testcgi.scm is launched, it will produce a plain text page containing the settings of the environment variables. An example output:

AUTH_TYPE =
CONTENT_LENGTH =
CONTENT_TYPE =
DOCUMENT_ROOT = /home/httpd/html
GATEWAY_INTERFACE = CGI/1.1
HTTP_ACCEPT = image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*
HTTP_REFERER =
HTTP_USER_AGENT = Mozilla/3.01Gold (X11; I; Linux 2.0.32 i586)
PATH_INFO =
PATH_TRANSLATED =
QUERY_STRING =
REMOTE_HOST = 127.0.0.1
REMOTE_ADDR = 127.0.0.1
REMOTE_IDENT =
REMOTE_USER =
REQUEST_METHOD = GET
SCRIPT_NAME = /cgi-bin/testcgi.scm
SERVER_NAME = localhost.localdomain
SERVER_PORT = 80
SERVER_PROTOCOL = HTTP/1.0
SERVER_SOFTWARE = Apache/1.2.4

17.2  Example: Displaying selected environment variable

testcgi.scm does not take any input from the user. A more focused script would take an argument environment variable from the user, and output the setting of that variable and none else. For this, we need a mechanism for feeding arguments to CGI scripts. The form tag of HTML provides this capability. Here is a sample HTML page for this purpose:

<html>
<head>
<title>Form for checking environment variables</title>
</head>
<body>

<form method=get action="http://www.foo.org/cgi-bin/testcgi2.scm">
Enter environment variable: <input type=text name=envvar size=30>
<p>

<input type=submit>
</form>

</body>
</html>

The user enters the desired environment variable (e.g., GATEWAY_INTERFACE) in the textbox and clicks the submit button. This causes all the information in the form -- here, the setting of the parameter envvar to the value GATEWAY_INTERFACE -- to be collected and sent to the CGI script identified by the form, viz., testcgi2.scm. The information can be sent in one of two ways: (1) if the form's method=get (the default), the information is sent via the environment variable called QUERY_STRING; (2) if the form's method=post, the information is available to the CGI script at the latter's standard input port (stdin). Our form uses QUERY_STRING.

It is testcgi2.scm's responsibility to extract the information from QUERY_STRING, and output the answer page accordingly.

The information to the CGI script, whether arriving via an environment variable or through stdin, is formatted as a sequence of parameter/argument pairs. The pairs are separated from each other by the & character. Within a pair, the parameter occurs first and is separated from the argument by the = character. In this case, there is only one parameter/argument pair, viz., envvar=GATEWAY_INTERFACE.

The script testcgi2.scm:

#!/bin/sh
":"
;exec /usr/local/bin/mzscheme -r $0 "$@"

(display "content-type: text/plain") (newline)
(newline)

;string-index returns the leftmost index in string s
;that has character c

(define string-index
  (lambda (s c)
    (let ((n (string-length s)))
      (let loop ((i 0))
        (cond ((>= i n#f)
              ((char=? (string-ref s ici)
              (else (loop (+ i 1))))))))

;split breaks string s into substrings separated by character c

(define split
  (lambda (c s)
    (let loop ((s s))
      (if (string=? s "") '()
          (let ((i (string-index s c)))
            (if i (cons (substring s 0 i)
                        (loop (substring s (+ i 1)
                                         (string-length s))))
                (list s)))))))

(define args
  (map (lambda (par-arg)
         (split #\= par-arg))
       (split #\& (getenv "QUERY_STRING"))))

(define envvar (cadr (assoc "envvar" args)))

(display envvar)
(display " = ")
(display (getenv envvar))

(newline)

Note the use of a helper procedure split to split the QUERY_STRING into parameter/argument pairs along the & character, and then splitting parameter and argument along the = character. (If we had used the post method rather than get, we would have needed to extract the parameters and arguments from the standard input.)

The <input type=text> and <input type=submit> are but two of the many different input tags possible in an HTML form. Consult [4] for the full repertoire.

17.3  CGI script utilities

In the example above, the parameter's name or the argument it assumed did not themselves contain any `&' or `=' characters. In general, they may. To accommodate such characters, and not have them be mistaken for separators, the CGI argument-passing mechanism treats all characters other than letters, digits, and the underscore, as special, and transmits them in an encoded form. A space is encoded as a `+'. For other special characters, the encoding is a three-character sequence, and consists of `%' followed the special character's hexadecimal code. Thus, the character sequence `20% + 30% = 50%, &c.' will be encoded as

20%25+%2b+30%25+%3d+50%25%2c+%26c%2e

(Space become `+'; `%' becomes `%25'; `+' becomes `%2b'; `=' becomes `%3d'; `,' becomes `%2c'; `&' becomes `%26'; and `.' becomes `%2e'.)

Instead of dealing anew with the task of getting and decoding the form data in each CGI script, it is convenient to collect some helpful procedures into a library file cgi.scm. testcgi2.scm can then be written more compactly as

#!/bin/sh
":"
;exec /usr/local/bin/mzscheme -r $0 "$@"

;load the cgi utilities

(load-relatve "cgi.scm")

(display "content-type: text/plain") (newline)
(newline)

;read the data input via the form

(parse-form-data)

;get the envvar parameter

(define envvar (form-data-get/1 "envvar"))

;display the value of the envvar

(display envvar)
(display " = ")
(display (getenv envvar))
(newline)

This shorter CGI script uses two utility procedures defined in cgi.scm. parse-form-data to read the data supplied by the user via the form. The data consists of parameters and their associated values. form-data-get/1 finds the value associated with a particular parameter.

cgi.scm defines a global table called *form-data-table* to store form data.

;load our table definitions

(load-relative "table.scm")

;define the *form-data-table*

(define *form-data-table* (make-table 'equ string=?))

An advantage of using a general mechanism such as the parse-form-data procedure is that we can hide the details of what method (get or put) was used.

(define parse-form-data
  (lambda ()
    ((if (string-ci=? (or (getenv "REQUEST_METHOD""GET""GET")
         parse-form-data-using-query-string
         parse-form-data-using-stdin))))

The environment variable REQUEST_METHOD tells which method was used to transmit the form data. If the method is GET, then the form data was sent as the string available via another environment variable, QUERY_STRING. The auxiliary procedure parse-form-data-using-query-string is used to pick apart QUERY_STRING:

(define parse-form-data-using-query-string
  (lambda ()
    (let ((query-string (or (getenv "QUERY_STRING""")))
      (for-each
       (lambda (par=arg)
         (let ((par/arg (split #\= par=arg)))
           (let ((par (url-decode (car par/arg)))
                 (arg (url-decode (cadr par/arg))))
             (table-put! *form-data-table* par
                         (cons arg (table-get *form-data-table* par '()))))))
       (split #\& query-string)))))

The helper procedure split, and its helper string-index, are defined as in sec. 17.2. As noted, the incoming form data is a sequence of name-value pairs separated by &s. Within each pair, the name comes first, followed by an = character, followed by the value. Each name-value combination is collected into a global table, the *form-data-table*.

Both name and value are encoded, so we need to decode them using the url-decode procedure to get their actual representation.

(define url-decode
  (lambda (s)
    (let ((s (string->list s)))
      (list->string
       (let loop ((s s))
         (if (null? s) '()
             (let ((a (car s)) (d (cdr s)))
               (case a
                 ((#\+) (cons #\space (loop d)))
                 ((#\%) (cons (hex->char (car d) (cadr d)) (loop (cddr d))))
                 (else (cons a (loop d)))))))))))

`+' is converted into space. A triliteral of the form `%xy' is converted, using the procedure hex->char into the character whose ascii encoding is the hex number `xy'.

(define hex->char
  (lambda (x y)
    (integer->char
     (string->number (string x y16))))

We still need a form-data parser for the case where the request method is POST. The auxiliary procedure parse-form-data-using-stdin does this.

(define parse-form-data-using-stdin
  (lambda ()
    (let* ((content-length (getenv "CONTENT_LENGTH"))
           (content-length (if content-length
                               (string->number content-length0))
           (i 0))
    (let par-loop ((par '()))
      (let ((c (read-char)))
        (set! i (+ i 1))
        (if (or (> i content-length) (eof-object? c) (char=? c #\=))
            (let arg-loop ((arg '()))
              (let ((c (read-char)))
                (set! i (+ i 1))
                (if (or (> i content-length) (eof-object? c) (char=? c #\&))
                    (let ((par (url-decode (list->string (reverse! par))))
                          (arg (url-decode (list->string (reverse! arg)))))
                      (table-put! *form-data-table* par
                                  (cons arg (table-get *form-data-table*
                                                       par '())))
                      (unless (or (> i content-length)
                                  (eof-object? c))
                        (par-loop '())))
                    (arg-loop (cons c arg)))))
            (par-loop (cons c par))))))))

The POST method sends form data via the script's stdin. The number of characters sent is placed in the environment variable CONTENT_LENGTH. parse-form-data-using-stdin reads the required number of characters from stdin, and populates the *form-data-table* as before, making sure to decode the parameters' names and values.

It remains to retrieve the values for specific parameters from the *form-data-table*. Note that the table associates a list with each parameter, in order to accommodate the possibility of multiple values for a parameter. form-data-get retrieves all the values assigned to a parameter. If there is only one value, it returns a singleton containing that value.

(define form-data-get
  (lambda (k)
    (table-get *form-data-table* k '())))

form-data-get/1 returns the first (or most significant) value associated with a parameter.

(define form-data-get/1
  (lambda (k . default)
    (let ((vv (form-data-get k)))
      (cond ((pair? vv) (car vv))
            ((pair? default) (car default))
            (else "")))))

In our examples so far, the CGI script has generated plain text. Generally, though, we will want to generate an HTML page. It is not uncommon for a combination of HTML form and CGI script to trigger a series of HTML pages with forms. It is also common to code all the action corresponding to these various forms in a single CGI script. In any case, it is helpful to have a utility procedure that writes out strings in HTML format, i.e., with the HTML special characters encoded appropriately:

(define display-html
  (lambda (s . o)
    (let ((o (if (null? o) (current-output-port)
                 (car o))))
      (let ((n (string-length s)))
        (let loop ((i 0))
          (unless (>= i n)
            (let ((c (string-ref s i)))
              (display
               (case c
                 ((#\<"&lt;")
                 ((#\>"&gt;")
                 ((#\""&quot;")
                 ((#\&"&amp;")
                 (else c)) o)
              (loop (+ i 1)))))))))

17.4  A calculator via CGI

Here is an CGI calculator script, cgicalc.scm, that exploits Scheme's arbitrary-precision arithmetic.

#!/bin/sh
":"
;exec /usr/local/bin/mzscheme -r $0

;load the CGI utilities
(load-relative "cgi.scm")

(define uhoh #f)

(define calc-eval
  (lambda (e)
    (if (pair? e)
        (apply (ensure-operator (car e))
               (map calc-eval (cdr e)))
        (ensure-number e))))

(define ensure-operator
  (lambda (e)
    (case e
      ((++)
      ((--)
      ((**)
      ((//)
      ((**expt)
      (else (uhoh "unpermitted operator")))))

(define ensure-number
  (lambda (e)
    (if (number? ee
        (uhoh "non-number"))))

(define print-form
  (lambda ()
    (display "<form action=\"")
    (display (getenv "SCRIPT_NAME"))
    (display "\">
  Enter arithmetic expression:<br>
  <input type=textarea name=arithexp><p>
  <input type=submit value=\"Evaluate\">
  <input type=reset value=\"Clear\">
</form>"
)))

(define print-page-begin
  (lambda ()
    (display "content-type: text/html

<html>
  <head>
    <title>A Scheme Calculator</title>
  </head>
  <body>"
)))

(define print-page-end
  (lambda ()
    (display "</body>
</html>"
)))

(parse-form-data)

(print-page-begin)

(let ((e (form-data-get "arithexp")))
  (unless (null? e)
    (let ((e1 (car e)))
      (display-html e1)
      (display "<p>
  =&gt;&nbsp;&nbsp;"
)
      (display-html
       (call/cc
        (lambda (k)
          (set! uhoh
                (lambda (s)
                  (k (string-append "Error: " s))))
          (number->string
           (calc-eval (read (open-input-string (car e))))))))
      (display "<p>"))))

(print-form)
(print-page-end)

[Go to first, previous, next page]