HURL: Feature List

This is a poorly-organized list of the features I would like to implement for HURL. It's mostly for my own use, to keep track of what's implemented and what's not, and to make sure ideas are written down somewhere so I don't forget about them.

Items with a ``'' next to them are already implemented; items with a ``'' next to them aren't. You can also see what's new in this document.

Article Pages

Should be a button to show the original (unformatted) article.

Browsing icons:

Currently these icons are individual inline images, which makes loading them slow (but only the first time with browsers like Mosaic that cache images); would it be better as one long icon with image mapping? I like the individual icons because if they point to an article you've already seen, you can tell from the border that Mosaic puts on the link; also, this allows for `dimmed' icons when an article is unavailable--this isn't really useful for next/previous by date or author, but for thread and selection browsing it will be useful. For example, if you were using image mapping and clicked on ``next in thread'' and there was no next article in that thread, you'd get an annoying error message. Also, using image mapping will impose a higher load on the server machine (although it shouldn't be much higher).

Should be able to `mark' an article, and save a list of marked articles somewhere; either on the archive (with a size limit) or download this list to add to your home page. This could be used to define a selection of articles which could later be viewed with the supplied message-ID list feature.

Also, it would be nice if there was some way to `push' and `pop' message-IDs onto a list (stored on the server); maybe a different list than the other one you're marking (or `n' lists, along with a ``select current list'' feature). This would be used to remember your current location: if you should happen to get distracted and read a bunch of articles that takes you away from the articles you had initially wanted to read (this happens all the time, for instance, you enter a query and read some of the returned articles, then see an author you like and start scrolling through his or her posts, then get tired of them and want to resume your position in the query-browsing.) Before you digress, you could `push' the current message onto a stack, then click on `pop' to return to that position later. The only way to return to a previous location currently is to use your browser to `back' out of all the extra messages you've read.

A button to hide or show buttons to extra features (that is, the features that don't normally merit a single click from the article browsing interface).

A button to apply a certain filter to this article (for example, to add links pointing to a dictionary on the Web for each word in the current article.) Currently this and other filters are implemented as individual icons, but this will eventually be changed to a generic ``apply a filter...'' button which brings up a list of 10-20 filters that can be applied.

A button to view the thread's structure.

A button to rot13-decode the current article. (implemented, but not documented anywhere; append ``&filter=rot13'' to a message-ID? query).

Preferences (customizable features):

Author Pages

Show the person's real name.

Link to their real home page.

Link to submit information about this person (

aliases

Mail should be sent to their most recently-used e-mail address.

If the author has not been active for a defined period of time (a year?), a warning should be issued.

This should be made intelligent enough so that it checks the frequency of use of recently-used e-mail addresses, then sends mail to the one that seems to be used the most often.

Alternatively, this could be something the author can define themself.

aliases

Query Page

Should be able to search for a large variety of things, mostly based on information found in the headers of articles. Good lists of these things can be found here and here. Queries I think we should implement:

Searches on specific header lines such as Keywords or Organization (defined at installation, depending on the amount of disk space available for indexes).

Searches for a certain `rating', as registered by our voting software.

Searches within the body of articles should only be allowed after the search has been narrowed to a defined level (e.g., 100 articles), to reduce the load on the server machine. (No longer true; HURL will support full-text searches Real Soon Now.)

Return a random selection of articles from the defined search (e.g., ``give me a random 1% of Gerald's articles'').

Most queries should be possible using a regular URL instead of just a form, so people can make a pointer to query the archive directly. For example, people can put ``click here to see a list of the top twenty articles I posted to talk.bizarre, as rated by the voting software'' in their home pages (although ``click here'' is bad document design).

The query page returns a list of message-IDs, to be browsed with the...

Message-ID List Browser

This will be like Mosaic's news interface, which shows 20 articles, with a link at the top that says ``Earlier articles...'' and one at the bottom that says ``Later articles...''

The user should be able to set the number of articles that are displayed on a single page (implemented, but not currently documented anywhere; append ``&max=100'' or whatever to the browse? URL.)

The format of the list should be customizable, with intelligent defaults. For example, if you do a search for all of a certain author's articles, you'd probably like to see a list of Date and Subject. However, the user should be able to specify a certain return format.

The script will point to a temporary file of message-IDs that resides on the server machine, in a directory that gets purged every week or whatever.

The user should be able to supply an URL pointing to a list of message-IDs on their own machine, and have this list formatted and presented by the archive itself. This would be used for people who want to make a pointer to a list of their favorite articles from their own home pages.

Author Aliases

People who have used multiple e-mail addresses should be grouped together for their author pages, queries, etc.

Should make a submission program that lets people submit this kind of information about themselves, which will automatically get included in the next build.

Might be able to build a preliminary database of this kind of stuff by looking at people's names (i.e., gerald@vnet.ibm.com (Gerald Oskoboiny) is most likely the same guy as gerald@amisk.cs.ualberta.ca (Gerald Oskoboiny) for a particular newsgroup).

It would be convenient to assume that someone@machine.network.hostname.com is someone@*.hostname.com, but is that a valid assumption? If not, how about someone@*.network.hostname.com? (This is important because of things like user@netcom11.netcom.com vs. user@netcom8.netcom.com, gerald@amisk.cs.ualberta.ca vs. gerald@gsb008.cs.ualberta.ca, etc.)

Threading

When looking at any article, user should be able to jump to articles in the same thread, as defined by the ``References:'' header lines.

A complex problem: we have some good advice on this from Wayne Davison, graciously forwarded to bizarchive by Pope Clifton.

Should also be able to see a tree of the thread:

Should also have ``next/previous in this thread'' buttons. Someone once mentioned that next/previous might not make sense in complex thread structures, but I think if a tree is traversed depth-first, and each node (?) is sorted by date, it would be a logical way to view a thread.

Could hopefully steal code from trn for this, maybe from other places.

Article Page Filters

Should automatically recognize all e-mail addresses and message-ID references throughout the article, and make a link to the relevant place in the archive.

This requires a lot of pre-computing, to check if each possible article reference is in the archive. However, it ensures that all links will be resolved, since articles that are not in the archive won't get a link. Otherwise, we have the article not found syndrome.

This is particularly important to groups like talk.bizarre, which has a lot of stuff crossposted from other groups (whose original articles will not be in the archive). Possibly this can be an option for the installers: whether or not to assume that all article references will be resolved.

Should automatically put links on any URLs within articles.

Should (?) do keyword replacement; for example to add features such as acronym expansion, sound effects (e.g., *plonk*), etc. We could get really carried away with this, if we wanted to.

We could define other filters, such as:

dictionary gateway

Acronym Expansion thing

Jargon File

Dictionary of Computing

Voting Interface

Should have some way to make posts anonymous; i.e., strip anything away that identifies who the author was.

Should only allow people to vote on articles assigned to them at random. (So you can't bring up all your articles with a query, then vote on them).

Should have a decreasing probability of presenting an article that is known to be crappy. For example, if, after receiving 20 votes, the average vote on a scale of 1 to 10 is less than 2, we can probably assume it will not significantly improve, so we should start to bring this article up for voting less often, to make the voting experience more enjoyable. We shouldn't exclude the article from being voted upon forever, of course, because it may redeem itself somehow in the future (maybe it was based on a sensitive subject that will wear off or something?); the probability of it coming up again should just decrease, that's all. I don't know what sort of formula to use for this... probably just linear, but maybe exponential. This sounds messy, but will be easy to write.

Votes will be tabulated daily (?), then they can be used as a criterion of a search query.

For initial testing, we can use the votes that PV's been collecting for the last few months.

Online Help

Not much to say here, just need to have good help for each page (article pages, author pages, the query page(!) ).

Might also be fun to implement context-sensitive online help: when you click on the CSOH button, it reloads the current screen with each component replaced with a link to help on that feature. This would be easy to implement, but reloading the whole page might make it too slow to be useful.

Article Annotations

The main reason I want to do this is so I can make excuses about lame articles from my past, saying why I posted what I did, explaining jokes that Didn't Quite Work, etc. It might also come in handy for some other purposes, but if they're too handy, people might start using them to discuss things...

There will have to be a limit on the amount of text that can be contributed, to save disk space. Might also want to escape '<', '>', and '&' characters so people can't put HTML links in.

Killfiles

Graphs

Caching

This could be generalized so that any arbitrary list of message-IDs becomes just another cached search item. (For instance, lists for each author, etc.)

Images that were generated on-the-fly.

Exception / Problem handling

Some possible exceptions are: weird thread structures (e.g., jfw's circular thread), bad header data such as duplicate message-IDs within an archive of articles (this will definitely happen), forged articles (we should flag them as such).

For bad thread structures, could just say ``no thread information is available''.

For bad headers, forged articles, etc., we could add information to the archive about exceptions, without changing the original article. For a duplicate message-ID, we could replace the Message-ID line in the header with a new, unique message-ID (for convenience), then make a note that we've done this somewhere else... possibly by adding an extra header line such as ``X-HURL-Info: old-message-ID: stupid-duplicate-message-ID@hostname.com''. Ugly, but manageable.

More Features

Do you have any ideas? If so, please mail them to me.

What's New in this Document

12/9/94:: Checked off a few more implemented features.; Added the pushing and popping stuff to the ``mark an article'' section.
10/21/94:: Added the section on caching.; Added the ``send e-mail'' item to the author page section.; Added the ``someone@*.hostname.com'' item to the author aliases section.
10/11/94:: Added the section on graphs.

Gerald Oskoboiny

(gerald@amisk.cs.ualberta.ca)