From gerald@impressive.net Fri Jan 15 00:23:10 1999 From: gerald@impressive.net (Gerald Oskoboiny) Message-Id: Newsgroups: comp.infosystems.www.servers.unix,comp.infosystems.www.misc Subject: Archiving http proxy cache? Organization: impressive.net Reply-To: gerald@impressive.net I've been archiving my incoming and outgoing e-mail for the past 6 years or so, and now that disk space is basically free I'd like to do the same for my personal HTTP traffic. Does anyone have ideas on what software/configuration to use for something like this? I installed Squid and gave it a big cache to fill, but it doesn't quite do what I want: - it stores HTTP response headers and other metadata inside the cached files (so the files are no longer valid GIF or HTML files on their own because there's extra stuff at the top); this data should be stored externally, IMO. - it doesn't keep previous revisions of documents, only the one that was most recently-fetched (hmm, I could probably fix this just by replacing the unlinkd program with one that does nothing.) - its cache storage scheme makes sense for a general proxy cache system, but for archiving I'd prefer a directory/file structure more like: $cache_root/1999/01/15/http/www.w3.org/foo.html Any ideas? Would I be better off using Apache or Jigsaw for this? (as a basis for hacking/customization, I mean; I doubt that there's anything that does exactly what I want as-is.) It would probably be easiest for me to just write a Perl script that does what I want and install that as the root document of a locally-running Apache httpd, but that would probably slow things down too much. (My environment is Redhat Linux 5.1 with kernel 2.0.34 on a P133 :( with 64M RAM and plenty of disk.) -- Gerald Oskoboiny http://impressive.net/people/gerald/