Thoughts on the design of HURL-NG: (written in 1998-2000) All URIs must be bookmarkable; exceptions to this should be clearly marked. (eventually in a machine-readable way) URIs shouldn't expose implementation internals: use msgids in the URIs instead of arbitrary numbers like hypermail, so archives can be moved without breaking links when one installation disappears. It should be easy to set up large numbers of separately-named archives, like 100-200 of them. (hwg, w3c lists.) Maybe http://impressive.net/archives should be the single CGI script whose first argument is the name of the archive? (then it uses that arg to load the appropriate DB/indexes)? Would be nice if people could add archives without having to tweak their httpd.conf's to add extra ScriptAliases or whatever. URIs for a typical archive: have a certain number of predefined names like 'search', 'status'; otherwise assume it's a msgid and try to look it up. http://impressive.net/archives/fogo/ nice overview a la egroups http://impressive.net/archives/fogo/1999/12 monthly overview pages http://impressive.net/archives/fogo/search search form http://impressive.net/archives/fogo/status archive status http://impressive.net/archives/fogo/search?subject=cmgi http://impressive.net/archives/fogo/search?subject=foo+bar&range=1-50 http://impressive.net/archives/fogo/search?subject=foo+bar&range=51-100 http://impressive.net/archives/fogo/20000214012338.E18072@impressive.net Don't have "next" and "previous" links on the message pages any more. (well, maybe. But use real cookies this time instead of URL munging.) Adding articles to the archive: add_articles: takes a list of filenames on stdin, invoked from crontab or procmailrc or find . -type f -print | ./add_articles Support MH format archives first, then mboxes, then gzipped mboxes (?), eventually http access to mboxes (?) Internals: Each new article gets assigned a unique ID when it is added to the archive, then referenced internally by that ID instead of the msgid. Msgids are stored in a table which point to these IDs; IDs are never exposed to the outside world. Log variable amounts of stuff, configurable with $loglevel For msgids mentioned in the body of other messages (in <foo@bar.com>, joe wrote:), don't precalculate and store which of them are valid msgids; do that on the fly when displaying the message. (shouldn't be too expensive, just a couple extra hashed lookups.) keep track of various things in variables in the DB, so I can display a nice "status of the archive" page: $DB{id}: numeric sequence id for messages $DB{lock}: pid of current process (?) or just use "lockfile" from procmail distribution instead? $DB{date:1998}: number of articles in the archive for 1998 $DB{date:199811}: number of articles in the archive for 1998/11 $DB{date:19981101}: number of articles in the archive for 1998/11/01 $DB{date:earliest}: date of the earliest article in the archive $DB{date:latest}: date of the latest (latest meaning having the most recent date, not most-recently-added) article in the archive $DB{date:19981101:list}: space-separated list of $DB{id}'s of messages posted on that date ($DB{date:199810:list} can be generated by the yyyymmdd:list's, no need to store it separately) $DB{num-articles}: number of articles in the archive etc. should they have a namespace a la $DB{hurl-internal.date:1998} ? to avoid clashing with msgids? ideas on OO-happy code structure: $m = new Message; $m->lookup( $msgid ); $m->parse; $m->header('from'); (calls $m->parse if not done already) $m->header('subject'); $m->subject; (?) ... $m->body; use MJD's Memoize.pm to cache function calls: http://www.plover.com/~mjd/perl/Memoize/ most of these functions should be Memoizable. This could simplify the design a lot. Might want bits of that cache to be persistent on disk; does Memoize do that? If not, store frequently-used stuff in the main DB so it can be accessed via hashes. Templates: all pages should be customizable using templates. But what language to use? Text::Template? Webmacro? PHP? Or make up my own? Simply linking to external style sheets should handle a lot of the customization that most users will need. Misc frills: Make the format of the "search results" pages defined by a variable a la "date:10;from:20;subject:50"; hardcode this for now, make it handle arbitrary formats later? Also, on the search results page headers, have arrows to widen or narrow each field a la: Date <> Author <> Subject <> or: <Date> <Author> <Subject> or: < Date > < Author > < Subject > or: Date -+ Author -+ Subject -+ and if the field is less than 10 chars, expand/widen it by 1 column; if it's greater than 10 but less than 20 expand/widen by 2; greater than 20, expand/widen by 5; greater than 40, expand/widen by 15? coooool.... At the bottom of search result pages, display: "Next 20" "Next 50" "Next 100" "Next 200" article transforms: when displaying an article, apply a list of transforms to each line of text, supplied in config.pl; for the hwg list archives, look for html elements and attributes and link them to the html spec via dtrt, etc.; for a perl newsgroup, look for perl functions and builtins, link to online perl manual; expensive but cool! (here's a sample of the HTML element one) See also Hypermail vs HURL discussion on www-talk, Dec 1995 $Id: index.html,v 1.22 2009/05/08 06:30:14 gerald Exp $