Gerald's comments on searching the archive
Here are a couple things I mailed to the bizarchive mailing list:
Subject: Some practical query examples
From: Gerald Oskoboiny <gerald@amisk.cs.ualberta.ca>
To: bizarchive@media.mit.edu
Date: Fri, 28 Jan 1994 15:01:08 -0700
Glad to see some discussion on here. I agree and disagree with lots of
what I've seen, but instead of following up, I'll just say how I think
things should be. Some of this stuff I've covered before.
First, as I and others have mentioned, we should forget about interfaces
for the time being, and concentrate on tasks. Obviously, the prime task
will be searching for a certain article or articles.
I think people should be able to search for ANY of the following things
(or for the absence of any of the following things):
1. Author
a. exact gerald@torolab.vnet.ibm.com
b. alias gerald oskoboiny
c. match schnitzius
2. Subject
a. subject short, confession
3. Date
a. since mm/dd/yyyy
b. between mm/dd/yyyy and mm/dd/yyyy
c. before mm/dd/yyyy
d. during mm/dd/yyyy
4. Header
a. specific Followup-To: misc.test
b. anywhere FAQ
5. Body
a. first 10 rigler, banana
b. last 10 endeavor, persevere
c. anywhere gerald
All of these could be combined, with any kind of logic. For example:
author alias gerald
subject short, confession
date since 01/01/1992
header specific Keywords: pathos
author alias lstewart
body anywhere "veinless coital error"
subject money
not subject make, money, fast
not subject Re:
date during 1993
Listing a number of words on a line indicate that you want logic to be
"OR", putting them on separate lines indicate "AND", and prefacing the
line with `not' indicates "NOT". Putting phrases within quotes on a line
indicate that you want to find those words as a phrase, not independently.
I guess most of that's obvious, and doesn't really affect our discussion.
I can't think of many queries that can't be handled by this approach.
We could add lots of intelligence in various places, to make searching
more friendly. We could also do approximate searches... supposedly the
"agrep" program does this very well already. We could also look for text
that spans lines, etc.
We could also have a certain number of keywords used to specify how the
result of the search is to be returned, for example:
mail - mail them in one massive file, concatenated together
mails - mail each post as a separate message
tarred - mail a uuencoded .tar.Z file
ftp - make the file (temporarily) available for FTP (attr-dammit: PV)
daily - mail one article every day
msgids - return only a list of message-IDs
subjects - return only a list of subjects
authors - return only a list of authors
dates - etc.
OK. Most of this applies particularly to an e-mail search, but it can
also be used in WWW (or gopher, or WAIS, I presume... never really used
them).
In WWW mode, you would be presented with a form with fields that you
fill out. The results of your query are given to you in whatever form
is applicable. If you search for a subject, it presents you with a list
of articles showing authors and dates, maybe. You can then browse these
using all the hypertext features.
Every interface would use the same base script which does the search and
returns a list of message-IDs. This script can use whatever indexes we
decide are necessary to make it speedier. We could also store a list of
popular queries and their results, to speed things up.
I'll mail something else about the WWW stuff.
Gerald
Subject: Re: Some practical query examples
From: Gerald Oskoboiny <gerald@amisk.cs.ualberta.ca>
To: nj@CS.Berkeley.EDU
Date: Fri, 28 Jan 1994 15:49:18 -0700
Cc: bizarchive@media.mit.edu
In-Reply-To: <199401282213.OAA06982@birch.CS.Berkeley.EDU> from "Narciso Jaramillo" at Jan 28, 94 03:13:05 pm
[me]
> > author alias gerald
> > subject short, confession
> > date since 01/01/1992
> > header specific Keywords: pathos
>
> > Listing a number of words on a line indicate that you want logic to be
> > "OR", putting them on separate lines indicate "AND", and prefacing the
> > line with `not' indicates "NOT".
[nj]
> That doesn't seem right--you don't want subject words short OR confession,
> you want subject words short AND confession., no?
Yes, of course. Sorry about that.
There are plenty of ways we could specify the different logic. I thought
my example was so simple that I couldn't screw it up. Oops.
A few other things I forgot:
random 10% - returns a random selection of your queried articles
cabal-rating higher 8
cabal-rating lower 1
Gerald