HURL-like project at sourceXchange

Replies:

Parents:

  • None.
sourceXchange is a site that links open source software developers
with paying development projects. The blurb on their home page says:

   sourceXchange is a new forum linking Open Source developers
   worldwide with intensifying commercial interest in Open
   Source software.  Call it one version of the Bazaar:
   sourceXchange is a wide-open marketplace to facilitate a
   dynamic exchange between buyers and sellers, a place where
   highly skilled Open Source developers supply their expertise
   to committed buyers with well-defined, financially backed
   Open Source projects. -- http://www.sourcexchange.com/

I signed up for an account there a while ago out of general
interest in the open source community; I don't have much interest
in finding extra work or whatever.

On Friday I got mail from them titled "new RFPs on sourceXchange",
and one of the projects (included below) is basically identical
to what my HURL software aspires to be, and pays $15k USD!

HURL is something I wrote in 1994 for putting news/mail archives
online:

   http://impressive.net/software/hurl/

I recently started rewriting it from scratch, and the new version
is used in the current fogo list archives:

   http://impressive.net/archives/fogo/

It still needs a fair bit of work, but I don't think it would
take more than a month of full-time work to meet the needs of
this proposal (add mbox support, MIME support, finish other stuff),
assuming I could convince them that it's okay to do it in Perl.
(Part-time that could be anywhere from 2-6 months depending how
much my social life is allowed to suffer.)

I don't think W3C or MIT would have a huge problem with my
working on something else on the side, but it might be a pain to
get permission from the INS to have income from something besides
the job for which I was granted my visa. (I have no idea how much
of a pain that would be -- maybe I'll make some calls this week.)

I really don't want any extra work right now as I hope to travel
a fair bit this year, but I was planning to write this code in my
spare time anyway so it seems stupid not to get $15k for it. Hmm.

Here are the project details as proposed:

http://www.sourceXchange.com/RfpBrowse?Button=Details&rfpID=18

> Project Title: Java servlet- and WebMacro-based browser for Unix
> mbox-format mail archives
>
>       RFP:  18
>   Sponsor:  Collab.Net
>   Project:  Java servlet- and WebMacro-based browser for Unix
>             mbox-format mail archives
>    Skills:  Java servlets, WebMacro, mbox format files
>    Detail:  Background:
>
> There is no decent Web-based mail archive browser out there, at
> least none I've seen, and definitely none java servlet based.
> Hypermail is great, but it has three serious drawbacks as do most
> of the others:
>
>  1. the format of the HTML being presented is hardcoded at the time
>     the processing is done
>
>  2. every message is a separate file - nice for speed, but horrible if
>     we're talking about millions of messages and a potential shortage of
>     inodes.  The extra seek() is worth the optimization of disk space and
>     filesystem structure.
>
>  3. scalability - lots of manual or scripted work is usually necessary
>     to make Hypermail and other tools work for the volume suggested
>     above.
>
>
> I made a small feeble stab at implementing something like this in
> perl a while ago - see: http://www.apache.org/~brian/glob/
>
> There's no documentation there, but you will see an "indexer.pl"
> which I've used to index the messages from the .mbx files they
> were in, and then I used "AddHandler" in Apache to establish
> mbxhandler.cgi as the "handler" for all .mbx files.  The
> "handler" model would be ideal, rather than forcing people to
> access URLs under a /servlet/ hierarchy, but that complexity can
> possibly be hid using mod_rewrite or even mod_alias.
>
> Objective:
>
>   Develop a combination of WebMacro templates and Java classes that
> act as an interface to simple Unix mbox-format mailbox archives for
> the purposes of rendering them as HTML and attachments.
>
> Scope of Work:
>
>   Develop a combination of WebMacro templates and Java classes that
> act as an interface to simple Unix mbox-format mailbox archives for
> the purposes of rendering them as HTML and attachments.  The code
> should be tested against a very popular and diverse set of mailing
> lists.  The software must be capable of interpreting MIME attachments
> and rendering them as attachments in the message.  The messages in the
> archive need to be viewable by chron order, by thread, and by author;
> no need for threading or complex tree-like viewing is necessary, but
> the ability to step through the list of messages in bunches at a time
> is desired so that # of messages is never a problem.
>
>   The software must protect against any form of "malicious" code
> snippets by character-escaping all outsider-contributed portions of
> the resulting HTML pages.  The original message store must be and must
> remain mbox-format mail archives; an accompanying simple DB file to
> assist with the indexing is expected as well, to store things like
> byte offsets for beginnings and lengths of each message, the essential
> header info like subject/from/date, and thread references. Also
> implicit in this is that the software must be able to (re)build the
> indexes from the mail archives.  Not just the updating mentioned
> below, but the ability to parse through the archive and rebuild the
> index if it gets lost/corrupted/stale.
>
>   This DB would most likely be read into persistent memory by the
> servlet to avoid having to hit the disk for every entry, with a search
> to disk for cache misses (i.e., presume data doesn't change, it's just
> added). It is foreseen that there would be an archiving "script"
> invoked upon mail delivery that would insert the mail message into the
> mbox archive and update the archive's index.
>
>   In addition, the ability to view a directory of such mbox-format
> files (organized ideally by /year/month) and navigate between them
> seamlessly is strongly desired.  This software must be usable at the
> level of 20K messages per mbox file and an mbox file per month to
> cover an arbitrary number of years.  Since the interface is
> template-driven, there should be a per-list configuration file giving
> some simple intro information and perhaps even allow for some simple
> look & feel modifications per-list.
>
>   Finally, integration with a common Open Source search engine,
> especially Swish-E, for cross-archive searching, is desired.
>
> NOTE: The proposed design may leave lots to be desired.  If you feel
> that it should be different, feel free to indicate that in your
> proposal.
>
> Remember what it is that we think are shortcomings in the other tools.
> Yes, we realize that this design requires reparsing of MIME objects
> during serving instead of at indexing time; that's an acceptable
> tradeoff, but a scalable way of caching that would be interesting as
> well.
>
>  Deliverables:  Implementation plan
>                 WebMacro templates
>                 Java sourcecode
>                 Indexer script
>                 Documentation on how to set up and use this toolset
>
>    Milestones:  Please propose a set of milestones you feel is
>                 realistic.  Please include in there steps for
>                 community feedback and followup development based
>                 on that feedback.
>
>     Cash Compensation:  $15000
> Non-Cash Compensation:  none
>       Cash Equivalent:  $0
>           Review Date:  2000-02-17
>               License:  BSD


--
Gerald Oskoboiny <gerald@impressive.net>
http://impressive.net/people/gerald/

Re: HURL-like project at sourceXchange

Replies:

  • None.

Parents:

<disclaimer>
I'm not a tax accountant, attorney or immigrations expert
but think there must be some way around this slight obstacle
with your work visa.  If you get booted implementing this
suggestion, remember this limitation of liability statement.
</disclaimer>

Go offshore young man - incorpate in some tax haven like the Ceyman
Islands (where I planning on co-locating my online gambling casino).
Get a domain name registered there, web site hosting on a server
there, po box (if needed), and a bank account.  Open an E-trade
account (or someone who can handle offshore investors online) and play
the market with the money, remembering you're allowed to bring
something like up to $20k/year cash into US and you won't pay any
Capital gains on your winnings.  If Uncle Sam even gets really pissed,
so what you're not a citizen anyway.

Put the code there, even finish writing it while ssh'd to the box
and/or take a vacation there with laptop and have a coding frenzy on
the beach while watching bikinis and drinking rum.  US can't tax you
for work done outside of US, make enough grey area so they can't prove
it wasn't done here (add as many double negatives and rum drinks as
needed until this makes sense).

You could probably find a perl2java to get started if you need to port
to java to collect. If you want help turning it into Java Servlets,
let me know.  I can afford some freelance time, having just turned
down $150/hour moonlighting job yesterday because I know the client is
a total pain in the ass.  Servlets are by far my favorite web
platform, speed parallels or surpasses mod-perl, like staying as o-o
in design as long as possible (until presentation layer where
sometimes there's diminishing returns), etc.  Perl is a favorite for
serious text mangling with regex, Java lacks there without someone
else's packages like Oroinc's.  I could use a week in the Caribbean
having just shoveled rain soaked snow at 40 pounds a shovelfull out of
my driveway.  Just pay for my airfare and the rum, keep the $15k.

>From: Gerald Oskoboiny <gerald@impressive.net>
>
>sourceXchange is a site that links open source software developers
>with paying development projects. The blurb on their home page says:
>
>    sourceXchange is a new forum linking Open Source developers
>    worldwide with intensifying commercial interest in Open
>    Source software.  Call it one version of the Bazaar:
>    sourceXchange is a wide-open marketplace to facilitate a
>    dynamic exchange between buyers and sellers, a place where
>    highly skilled Open Source developers supply their expertise
>    to committed buyers with well-defined, financially backed
>    Open Source projects. -- http://www.sourcexchange.com/
>
>I signed up for an account there a while ago out of general
>interest in the open source community; I don't have much interest
>in finding extra work or whatever.
>
>On Friday I got mail from them titled "new RFPs on sourceXchange",
>and one of the projects (included below) is basically identical
>to what my HURL software aspires to be, and pays $15k USD!
>
>HURL is something I wrote in 1994 for putting news/mail archives
>online:
>
>    http://impressive.net/software/hurl/
>
>I recently started rewriting it from scratch, and the new version
>is used in the current fogo list archives:
>
>    http://impressive.net/archives/fogo/
>
>It still needs a fair bit of work, but I don't think it would
>take more than a month of full-time work to meet the needs of
>this proposal (add mbox support, MIME support, finish other stuff),
>assuming I could convince them that it's okay to do it in Perl.
>(Part-time that could be anywhere from 2-6 months depending how
>much my social life is allowed to suffer.)
>
>I don't think W3C or MIT would have a huge problem with my
>working on something else on the side, but it might be a pain to
>get permission from the INS to have income from something besides
>the job for which I was granted my visa. (I have no idea how much
>of a pain that would be -- maybe I'll make some calls this week.)
>
>I really don't want any extra work right now as I hope to travel
>a fair bit this year, but I was planning to write this code in my
>spare time anyway so it seems stupid not to get $15k for it. Hmm.
>
>Here are the project details as proposed:
>
>http://www.sourceXchange.com/RfpBrowse?Button=Details&rfpID=18
>
>> Project Title: Java servlet- and WebMacro-based browser for Unix
>> mbox-format mail archives
>>
>>       RFP:  18
>>   Sponsor:  Collab.Net
>>   Project:  Java servlet- and WebMacro-based browser for Unix
>>             mbox-format mail archives
>>    Skills:  Java servlets, WebMacro, mbox format files
>>    Detail:  Background:
>>
>> There is no decent Web-based mail archive browser out there, at
>> least none I've seen, and definitely none java servlet based.
>> Hypermail is great, but it has three serious drawbacks as do most
>> of the others:
>>
>>  1. the format of the HTML being presented is hardcoded at the time
>>     the processing is done
>>
>>  2. every message is a separate file - nice for speed, but horrible if
>>     we're talking about millions of messages and a potential shortage of
>>     inodes.  The extra seek() is worth the optimization of disk space and
>>     filesystem structure.
>>
>>  3. scalability - lots of manual or scripted work is usually necessary
>>     to make Hypermail and other tools work for the volume suggested
>>     above.
>>
>>
>> I made a small feeble stab at implementing something like this in
>> perl a while ago - see: http://www.apache.org/~brian/glob/
>>
>> There's no documentation there, but you will see an "indexer.pl"
>> which I've used to index the messages from the .mbx files they
>> were in, and then I used "AddHandler" in Apache to establish
>> mbxhandler.cgi as the "handler" for all .mbx files.  The
>> "handler" model would be ideal, rather than forcing people to
>> access URLs under a /servlet/ hierarchy, but that complexity can
>> possibly be hid using mod_rewrite or even mod_alias.
>>
>> Objective:
>>
>>   Develop a combination of WebMacro templates and Java classes that
>> act as an interface to simple Unix mbox-format mailbox archives for
>> the purposes of rendering them as HTML and attachments.
>>
>> Scope of Work:
>>
>>   Develop a combination of WebMacro templates and Java classes that
>> act as an interface to simple Unix mbox-format mailbox archives for
>> the purposes of rendering them as HTML and attachments.  The code
>> should be tested against a very popular and diverse set of mailing
>> lists.  The software must be capable of interpreting MIME attachments
>> and rendering them as attachments in the message.  The messages in the
>> archive need to be viewable by chron order, by thread, and by author;
>> no need for threading or complex tree-like viewing is necessary, but
>> the ability to step through the list of messages in bunches at a time
>> is desired so that # of messages is never a problem.
>>
>>   The software must protect against any form of "malicious" code
>> snippets by character-escaping all outsider-contributed portions of
>> the resulting HTML pages.  The original message store must be and must
>> remain mbox-format mail archives; an accompanying simple DB file to
>> assist with the indexing is expected as well, to store things like
>> byte offsets for beginnings and lengths of each message, the essential
>> header info like subject/from/date, and thread references. Also
>> implicit in this is that the software must be able to (re)build the
>> indexes from the mail archives.  Not just the updating mentioned
>> below, but the ability to parse through the archive and rebuild the
>> index if it gets lost/corrupted/stale.
>>
>>   This DB would most likely be read into persistent memory by the
>> servlet to avoid having to hit the disk for every entry, with a search
>> to disk for cache misses (i.e., presume data doesn't change, it's just
>> added). It is foreseen that there would be an archiving "script"
>> invoked upon mail delivery that would insert the mail message into the
>> mbox archive and update the archive's index.
>>
>>   In addition, the ability to view a directory of such mbox-format
>> files (organized ideally by /year/month) and navigate between them
>> seamlessly is strongly desired.  This software must be usable at the
>> level of 20K messages per mbox file and an mbox file per month to
>> cover an arbitrary number of years.  Since the interface is
>> template-driven, there should be a per-list configuration file giving
>> some simple intro information and perhaps even allow for some simple
>> look & feel modifications per-list.
>>
>>   Finally, integration with a common Open Source search engine,
>> especially Swish-E, for cross-archive searching, is desired.
>>
>> NOTE: The proposed design may leave lots to be desired.  If you feel
>> that it should be different, feel free to indicate that in your
>> proposal.
>>
>> Remember what it is that we think are shortcomings in the other tools.
>> Yes, we realize that this design requires reparsing of MIME objects
>> during serving instead of at indexing time; that's an acceptable
>> tradeoff, but a scalable way of caching that would be interesting as
>> well.
>>
>>  Deliverables:  Implementation plan
>>                 WebMacro templates
>>                 Java sourcecode
>>                 Indexer script
>>                 Documentation on how to set up and use this toolset
>>
>>    Milestones:  Please propose a set of milestones you feel is
>>                 realistic.  Please include in there steps for
>>                 community feedback and followup development based
>>                 on that feedback.
>>
>>     Cash Compensation:  $15000
>> Non-Cash Compensation:  none
>>       Cash Equivalent:  $0
>>           Review Date:  2000-02-17
>>               License:  BSD
>
>
>--
>Gerald Oskoboiny <gerald@impressive.net>
>http://impressive.net/people/gerald/
>


--
Ted Guild
Software Developer
http://www.guilds.net
ted@guilds.net  

Re: HURL-like project at sourceXchange

Replies:

  • None.

Parents:

On Mon, Feb 14, 2000 at 01:23:38AM -0500, Gerald Oskoboiny wrote:
> sourceXchange is a site that links open source software developers
> with paying development projects. The blurb on their home page says:
:
> I signed up for an account there a while ago out of general
> interest in the open source community; I don't have much interest
> in finding extra work or whatever.
>
> On Friday I got mail from them titled "new RFPs on sourceXchange",
> and one of the projects (included below) is basically identical
> to what my HURL software aspires to be, and pays $15k USD!

So I submitted a proposal just now, included below.

I guess its acceptance will depend how strongly he feels about
it being done with servlets and WebMacro, and what else was
submitted. We'll see...

> http://www.sourceXchange.com/RfpBrowse?Button=Details&rfpID=18
>
> > Project Title: Java servlet- and WebMacro-based browser for Unix
> > mbox-format mail archives

My proposal:

First a few words on my background:

I wrote something very similar to this in 1994, and have
maintained a few archives using that code since then.
(actually, they've pretty much maintained themselves. :)

But I've been rewriting this code in my head for the last few
years, and recently started writing it from scratch based on
what I've learned since my first implementation.

Notes on this project are here:

   http://impressive.net/software/hurl/
   http://impressive.net/software/hurl/design/

My proposal is to implement this in Perl, using Berkeley DB for
the indexes, and some kind of template system, but I'm not sure
which one yet. (possibly using the Text::Template perl module,
PHP, WebMacro, or some simple home-grown language; in any case,
it should be simple enough for non-programmers to edit.)

I'm interested in doing whatever it takes to make it able to run
hundreds of archives (totalling millions of messages) on a low-
to medium-end server machine (worth say $2000 USD) while being
fast enough to feel "quick" to the end user.

My code would handle berkeley mboxes and MH files as the
underlying (read-only) archives. (and probably usenet news
archives, too, since they're pretty much the same.)

I plan to make it run efficiently by calculating and storing the
most frequently-accessed data in the berkeley DB index files and
by caching query results and any other data that can be reused.

It would have built-in header searching (for arbitrary headers,
configurable by the administrator), but no full-text searching.
The generated pages will be indexable by external software, however.

I guess this design is a bit different than servlets and webmacro,
but I think we have pretty much identical goals for the software.
(actually, I wasn't even looking for extra work, but when I read the
RFP I figured I should apply for it since I'm going to be writing
that code anyway ;)

One potential obstacle is that I'm a Canadian citizen living and
working in the States on a visa, and I need to find out what the
rules are about making extra income aside from my day job while
I'm on this visa.


Deliverables: (Project components and functionality)

 - Implementation plan
 - Templates/sample config files
 - Perl and bourne shell code to build the indexes and provide
   an interactive web interface to them
 - Documentation on how to set up and use this toolset


Milestones: (Components, functionality and completion dates)

1. Implementation plan:

  due: start+2 weeks
  compensation: 10% of total

2. First round of indexer/interface code and config files,
  with several sample archives online, public review starts:

  due: start+8 weeks
  compensation: 40% more, 50% of total

3. Second round of code, incorporating feedback,
  adding MIME and template support:

  due: start+14 weeks
  compensation: 30% more, 80% of total

4. Third round of code, incorporating feedback,
  performance tuning

  due: start+18 weeks
  compensation: 10% more, 90% of total

5. Documentation, final touches based on feedback:

  due: start+22 weeks
  compensation: 10% more, 100% of total

Cash Compensation:

  $15000 USD

Non-Cash Compensation: (Equipment, services, etc.)

  None

Cash Equivalent:

  None

Estimated Duration:

  22 weeks

License:

  BSD


--
Gerald Oskoboiny <gerald@impressive.net>
http://impressive.net/people/gerald/

HURL: fogo mailing list archives, maintained by Gerald Oskoboiny