Mod_proxy, Mod_cache, and Mod_proxy_html can’t save this site.
Recently, I was experimenting with mod_cache, mod_proxy, and mod_proxy_html in a reverse proxy configuration. A friend had a site and was interesting in breaking off pieces for his own maintenance and this seemed like another great reason to use these modules and show how one can stitch together various parts of the sites from various external sites. Everything worked as planned with the exception of the top page and it wasn’t until I was using nginx that I saw the following code which I had not run into before. See line 17 and the onload body command.
Hopefully this was done on purpose because for the life of me, I can’t understand why you would send a page to a user browser only to ask them to load another page. Why wouldn’t you do it up front in the server and save the user the latency for first page display. Anyway here is an example of the offending site and the page from my nginx cache and a reminder to myself that mod_proxy_html won’t find that by default thus breaking the reverse proxy.
HTTP/1.1 200 OK Date: Sun, 23 Jan 2011 23:56:35 GMT Server: Apache Last-Modified: Mon, 08 Dec 2008 05:49:09 GMT ETag: "7b605d-368-493cb555" Accept-Ranges: bytes Content-Length: 872 Connection: close Content-Type: text/html <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <title>Genealowiki.com</title> <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" /> </head> <body bgcolor="#ffffff" onload="self.location='http://www.genealowiki.com/bin/view.cgi'"> <!-- <h1>Welcome to TWiki</h1> <ul> <li> <a href="readme.txt">readme.txt</a> </li> <li> <a href="license.txt">license.txt</a> </li> <li> <a href="TWikiDocumentation.html">TWikiDocumentation.html</a> </li> <li> <a href="TWikiHistory.html">TWikiHistory.html</a> </li> </ul> <p><b>Note:</b> These pages do not need to be accessible by http. Access <tt>http://your.server.com/your-cgi-bin/view/Main/WebHome</tt> after you install TWiki</p> --> </body> </html>
The Apache code looks like the following:
# disk cache CacheEnable disk / CacheRoot /nfsdisk/clients/proxy/morris.medicine-hat.net CacheIgnoreCacheControl On CacheIgnoreNoLastMod On CacheIgnoreQueryString On CacheLastModifiedFactor 0.5 CacheStoreNoStore On CacheStorePrivate On CacheDirLevels 1 CacheDirLength 3 # 60*60 = 3600 = 1hr (when no expire specified) CacheDefaultExpire 3600 # 1 day to cache a document before requesting new document CacheMaxExpire 86400 #CacheIgnoreHeaders Set-Cookie CacheIgnoreHeaders Expires Set-Cookie If-Modified-Since # memory cache (2048*1024 = 4M) # 2K size is maximum size we allow in <IfModule mod_mem_cache.c> CacheEnable mem /graphics MCacheSize 2048 MCacheMaxObjectCount 100 MCacheMinObjectSize 1 MCacheMaxObjectSize 2048 </IfModule> ProxyRequests off <Proxy *> Order deny,allow Allow from all </Proxy> <Location /> ProxyPass http://www.genealowiki.com/ ProxyPassReverse http://www.genealowiki.com/ ProxyPassReverseCookieDomain www.genealowiki.com morris.medicine-hat.net ProxyPassReverseCookiePath / / # rewrite any absolute pathnames if found ProxyHTMLEnable On # ProxyHTMLURLMap from-pattern to-pattern [flags] [cond] </Location>
And the nginx code is a little simpler at:
proxy_cache_path /usr/share/nginx/cache levels=1:2 keys_zone=STATIC:10m inactive=24h max_size=1m; server { listen 80; server_name morris.medicine-hat.net; access_log /nfsdisk/clients/morris.medicine-hat.net/logs/host.access.log main; # Main location location / { proxy_pass http://www.genealowiki.com; proxy_set_header Host www.genealowiki.com; proxy_cache STATIC; proxy_cache_valid 200 1d; proxy_cache_use_stale error timeout invalid_header updating http_500 http_502 http_503 http_504; } }