Gerald's comments on searching the archive

Here are a couple things I mailed to the bizarchive mailing list:
Subject: Some practical query examples
From:	Gerald Oskoboiny <gerald@amisk.cs.ualberta.ca>
To:	bizarchive@media.mit.edu
Date:	Fri, 28 Jan 1994 15:01:08 -0700

Glad to see some discussion on here. I agree and disagree with lots of
what I've seen, but instead of following up, I'll just say how I think
things should be. Some of this stuff I've covered before.

First, as I and others have mentioned, we should forget about interfaces
for the time being, and concentrate on tasks. Obviously, the prime task
will be searching for a certain article or articles.

I think people should be able to search for ANY of the following things
(or for the absence of any of the following things):

   1. Author

      a. exact gerald@torolab.vnet.ibm.com
      b. alias gerald oskoboiny
      c. match schnitzius

   2. Subject

      a. subject short, confession

   3. Date

      a. since mm/dd/yyyy
      b. between mm/dd/yyyy and mm/dd/yyyy
      c. before mm/dd/yyyy
      d. during mm/dd/yyyy

   4. Header

      a. specific Followup-To: misc.test
      b. anywhere FAQ

   5. Body

      a. first 10 rigler, banana
      b. last 10 endeavor, persevere
      c. anywhere gerald

All of these could be combined, with any kind of logic. For example:

   author alias gerald
   subject short, confession
   date since 01/01/1992
   header specific Keywords: pathos

   author alias lstewart
   body anywhere "veinless coital error"

   subject money
   not subject make, money, fast
   not subject Re:
   date during 1993

Listing a number of words on a line indicate that you want logic to be
"OR", putting them on separate lines indicate "AND", and prefacing the
line with `not' indicates "NOT". Putting phrases within quotes on a line
indicate that you want to find those words as a phrase, not independently.
I guess most of that's obvious, and doesn't really affect our discussion.

I can't think of many queries that can't be handled by this approach.

We could add lots of intelligence in various places, to make searching
more friendly. We could also do approximate searches... supposedly the
"agrep" program does this very well already. We could also look for text
that spans lines, etc.

We could also have a certain number of keywords used to specify how the
result of the search is to be returned, for example:

   mail - mail them in one massive file, concatenated together
   mails - mail each post as a separate message
   tarred - mail a uuencoded .tar.Z file
   ftp - make the file (temporarily) available for FTP (attr-dammit: PV)
   daily - mail one article every day
   msgids - return only a list of message-IDs
   subjects - return only a list of subjects
   authors - return only a list of authors
   dates - etc.

OK. Most of this applies particularly to an e-mail search, but it can
also be used in WWW (or gopher, or WAIS, I presume... never really used
them).

In WWW mode, you would be presented with a form with fields that you
fill out. The results of your query are given to you in whatever form
is applicable. If you search for a subject, it presents you with a list
of articles showing authors and dates, maybe. You can then browse these
using all the hypertext features.

Every interface would use the same base script which does the search and
returns a list of message-IDs. This script can use whatever indexes we
decide are necessary to make it speedier. We could also store a list of
popular queries and their results, to speed things up.

I'll mail something else about the WWW stuff.

Gerald


Subject: Re: Some practical query examples From: Gerald Oskoboiny <gerald@amisk.cs.ualberta.ca> To: nj@CS.Berkeley.EDU Date: Fri, 28 Jan 1994 15:49:18 -0700 Cc: bizarchive@media.mit.edu In-Reply-To: <199401282213.OAA06982@birch.CS.Berkeley.EDU> from "Narciso Jaramillo" at Jan 28, 94 03:13:05 pm [me] > > author alias gerald > > subject short, confession > > date since 01/01/1992 > > header specific Keywords: pathos > > > Listing a number of words on a line indicate that you want logic to be > > "OR", putting them on separate lines indicate "AND", and prefacing the > > line with `not' indicates "NOT". [nj] > That doesn't seem right--you don't want subject words short OR confession, > you want subject words short AND confession., no? Yes, of course. Sorry about that. There are plenty of ways we could specify the different logic. I thought my example was so simple that I couldn't screw it up. Oops. A few other things I forgot: random 10% - returns a random selection of your queried articles cabal-rating higher 8 cabal-rating lower 1 Gerald