Reverse Proxy Pain

Mod_proxy, Mod_cache, and Mod_proxy_html can’t save this site.

Recently, I was experimenting with mod_cache, mod_proxy, and mod_proxy_html in a reverse proxy configuration.  A friend had a site and was interesting in breaking off pieces for his own maintenance and this seemed like another great reason to use these modules and show how one can stitch together various parts of the sites from various external sites. Everything worked as planned with the exception of the top page and it wasn’t until I was using nginx that I saw the following code which I had not run into before.  See line 17 and the onload body command.

Hopefully this was done on purpose because for the life of me, I can’t understand why you would send a page to a user browser only to ask them to load another page.  Why wouldn’t you do it up front in the server and save the user the latency for first page display.  Anyway here is an example of the offending site and the page from my nginx cache and a reminder to myself that mod_proxy_html won’t find that by default thus breaking the reverse proxy.

HTTP/1.1 200 OK
Date: Sun, 23 Jan 2011 23:56:35 GMT
Server: Apache
Last-Modified: Mon, 08 Dec 2008 05:49:09 GMT
ETag: "7b605d-368-493cb555"
Accept-Ranges: bytes
Content-Length: 872
Connection: close
Content-Type: text/html

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> 
<head>
 <title>Genealowiki.com</title>
 <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />
</head>
<body bgcolor="#ffffff" onload="self.location='http://www.genealowiki.com/bin/view.cgi'">
<!--
<h1>Welcome to TWiki</h1>
 <ul>
  <li> <a href="readme.txt">readme.txt</a> </li>
  <li> <a href="license.txt">license.txt</a> </li>
  <li> <a href="TWikiDocumentation.html">TWikiDocumentation.html</a> </li>
  <li> <a href="TWikiHistory.html">TWikiHistory.html</a> </li>
 </ul>
 <p><b>Note:</b> These pages do not need to be accessible by http. Access
 <tt>http://your.server.com/your-cgi-bin/view/Main/WebHome</tt> after you install TWiki</p>
-->
</body>
</html>

The Apache code looks like the following:

   # disk cache
    CacheEnable disk /
    CacheRoot /nfsdisk/clients/proxy/morris.medicine-hat.net
    CacheIgnoreCacheControl On
    CacheIgnoreNoLastMod On
    CacheIgnoreQueryString On
    CacheLastModifiedFactor 0.5
    CacheStoreNoStore On
    CacheStorePrivate On
    CacheDirLevels 1
    CacheDirLength 3
    # 60*60 = 3600 = 1hr (when no expire specified)
    CacheDefaultExpire 3600
    # 1 day to cache a document before requesting new document
    CacheMaxExpire 86400
    #CacheIgnoreHeaders Set-Cookie
    CacheIgnoreHeaders Expires Set-Cookie If-Modified-Since


    # memory cache (2048*1024 = 4M)
    # 2K size is maximum size we allow in
    <IfModule mod_mem_cache.c>
       CacheEnable mem /graphics
       MCacheSize 2048
       MCacheMaxObjectCount 100
       MCacheMinObjectSize 1
       MCacheMaxObjectSize 2048
    </IfModule>

    ProxyRequests off
    <Proxy *>
       Order deny,allow
       Allow from all
    </Proxy>

    <Location />
        ProxyPass   http://www.genealowiki.com/
        ProxyPassReverse  http://www.genealowiki.com/
        ProxyPassReverseCookieDomain www.genealowiki.com morris.medicine-hat.net
        ProxyPassReverseCookiePath / /

        # rewrite any absolute pathnames if found
        ProxyHTMLEnable On
        # ProxyHTMLURLMap from-pattern to-pattern [flags] [cond]
    </Location>

And the nginx code is a little simpler at:

    proxy_cache_path  /usr/share/nginx/cache  levels=1:2 
                    keys_zone=STATIC:10m inactive=24h  max_size=1m;
    server {
        listen       80;
        server_name  morris.medicine-hat.net;

        access_log  /nfsdisk/clients/morris.medicine-hat.net/logs/host.access.log  main;

        # Main location
        location / {
            proxy_pass             http://www.genealowiki.com;
            proxy_set_header       Host www.genealowiki.com;

            proxy_cache            STATIC;
            proxy_cache_valid      200  1d;
            proxy_cache_use_stale  error timeout invalid_header updating
                                   http_500 http_502 http_503 http_504;
        }

    }