Re: The Content-Addressable Web

Replies:

Parents:

On Thu, Oct 25, 2001 at 01:33:48AM -0500, Justin Chapweske wrote:
> I've just finished the first draft of "HTTP Extensions for a
> Content-Addressable Web".  I believe that these simple extensions are a
> huge step forward in providing interopability between P2P systems.
:
> http://onionnetworks.com/caw/caw.txt

Interesting stuff...

> HTTP Extensions for a Content-Addressable Web
> Justin Chapweske, Onion Networks ([email protected])
> October 7, 2001
>
> Abstract
>
> The goal of the Content-Addressable Web (CAW) is to create
> a URN-based Web that can be optimized for content distribution.
> The use of URNs allows advanced caching techniques to be
> employed, and sets the foundation for creating ad hoc Content
> Distribution Networks (CDNs). This document specifies HTTP
> extensions that bridge the current Location-Based Web with
> the Content-Addressable Web.

The Web "is the universe of network-accessible information" [1],
i.e. anything with a URI, including URIs that are not tied to a
particular hostname.

You might find this useful:

   URIs, URLs, and URNs: Clarifications and Recommendations 1.0
   Report from the joint W3C/IETF URI Planning Interest Group
   W3C Note 21 September 2001
   http://www.w3.org/TR/uri-clarification/

it attempts to clarify confusion about URIs, URLs, and URNs.

> 2 Self-Verifying URNs
>
> While any kind of URN can be used within the Content-Addressable
> Web, there is a specific type of URN called a "Self-Verifying
> URN" that is particularly useful. These URNs have the
> property that the URN itself can be used to verify that
> the content has been received intact. It is RECOMMENDED
> that applications use cryptographically strong self-verifying
> URNs because hosts in ad hoc CDNs and the Transient Web
> are assumed to be untrusted. For instance, one could hash
> the content using the SHA-1 algorithm, and encode it using
> Base32 to produce the following URN:
>
> urn:sha1:RMUVHIRSGUU3VU7FJWRAKW3YWG2S2RFB

I think URIs based on sha-1 hashes are a fantastic way to identify
resources in P2P systems, and I don't understand why most of the
P2P systems I have used don't work like this already. (apparently,
anyway; I haven't studied the protocols, but that's my impression
as a user.)

When I search a P2P system for a particular file, I think the search
results should be a list of filenames, content-lengths, and sha-1
hash URIs, and maybe some other stuff.

Then the P2P client should present the results to me grouped by
their sha-1 hashes, let me sort those results by size/filename/
number-of-peers-with-that-hash, and when I pick one of them, it
should start downloading ranges of the file from each of, say, 10
of the peers that claim to have files with that hash.

Then it should continue downloading other ranges of the file from
the peers that gave the highest throughput on the first transfer.
(and terminate the really slow ones prematurely.)

Also, it might be good for P2P systems to be able to use an external
resolver for these URIs, so I can have a general sha-1 URI resolver
on my desktop that gets used by any P2P/Web clients I might use,
and I can set its resolution strategy according to my preferences.

> 3 HTTP Extensions
>
> In order to provide a transparent bridge between the URL-based
> Web and the Content-Addressable Web, a few HTTP extensions
> must be introduced. The nature of these extensions is that
> they need not be widely deployed in order to be useful.
> They are specifically designed to allow for proxying for
> hosts that are not CAW-aware.

I haven't reviewed this section closely, but you might want to see:

   http://www.w3.org/Protocols/HTTP/ietf-http-ext/

for info on HTTP Extensions.

[1] About the World Wide Web
   http://www.w3.org/WWW/

--
Gerald Oskoboiny <[email protected]>
http://impressive.net/people/gerald/

deployment of urn:sha1 URIs in P2P clients (was Re: The Content-Addressable Web)

Replies:

Parents:

A few months ago in <mid:[email protected]> [1]
I wrote:
> I think URIs based on sha-1 hashes are a fantastic way to identify
> resources in P2P systems, and I don't understand why most of the
> P2P systems I have used don't work like this already. (apparently,
> anyway; I haven't studied the protocols, but that's my impression
> as a user.)

I was just playing with the latest version of gtk-gnutella [2]
(version 0.80), and I noticed that some of the search results
now include sha1 URIs in the 'Info' column! Screenshot:
http://impressive.net/people/gerald/2002/01/21/gtk-gnutella.png

I am still way out of touch with the way P2P clients actually
work (including whether or not gtk-gnutella uses these URIs in
any useful way), but this seems very promising.

In the slashdot article I referenced in my earlier message
tonight [3], people were discussing ways to deal with the
bandwidth needs of popular sites like ftp.kernel.org; it seems
to me that P2P systems combined with widespread deployment of
some hash-based URI scheme are a perfect way to solve this
problem: whenever someone announces a new kernel tarball,
they could send an announcement saying:

   linux kernel 3.14.159 is now available! fetch it from:

   ftp://ftp.example.org/pub/linux-kernel-3.1.459.tar.bz2
   a.k.a. urn:sha1:QWERTYASDF123412341234

then people can just paste the urn:sha1 URI into their P2P client-
of-choice to fetch the tarball (and then verify its checksum.)

And when the urn:sha1 URI scheme gets commonly deployed, kernel.org
won't even need to offer copies by FTP any more, and we can get away
from centralized sites being a single point of {distribution,failure}.

[1] http://impressive.net/archives/fogo/[email protected]
[2] http://gtk-gnutella.sourceforge.net/
[3] http://slashdot.org/article.pl?sid=02/01/20/020244&mode=thread

--
Gerald Oskoboiny <[email protected]>
http://impressive.net/people/gerald/

file sharing developments, gnutella, hash URI support

Replies:

  • None.

Parents:

On Mon, Jan 21, 2002 at 01:54:16AM -0500, Gerald Oskoboiny wrote
in <mid:[email protected]> [1]
> I was just playing with the latest version of gtk-gnutella
> (version 0.80), and I noticed that some of the search results
> now include sha1 URIs in the 'Info' column! Screenshot:
> http://impressive.net/people/gerald/2002/01/21/gtk-gnutella.png
>
> I am still way out of touch with the way P2P clients actually
> work (including whether or not gtk-gnutella uses these URIs in
> any useful way), but this seems very promising.

gtk-gnutella has been updated again with related features, e.g.
(from announcements, forwarded below):

 - Added full HUGE support (Hash/URN Gnutella Extensions).

hmm... I wonder what that is.

http://www.google.com/search?q=HUGE+hash+urn+gnutella+extensions
-> http://rfc-gnutella.sourceforge.net/Proposals/HUGE/
   http://rfc-gnutella.sourceforge.net/Proposals/HUGE/draft-gdf-huge-0_93.txt

   HUGE is a collection of incremental extensions to the Gnutella
   protocol (v 0.4) which allow files to be identified and located by
   Uniform Resource Names (URNs) -- reliable, persistent, location-
   independent names, such as those provided by secure hash values.

nifty.

 - URN searches now supported, by typing "urn:sha1:" in search text, followed
   by the base32 value.

Excellent... though it doesn't seem to work very well for me in
practice yet. (maybe I don't have a big enough pipe here, or
there aren't enough peers online that grok these URIs yet.)

I keep seeing Gordon Mohr and Bitzi's name on a bunch of this stuff,
so I finally spent some time playing with http://bitzi.com/ tonight
(as user ger [2]). Bitzi is a metadata catalog and rating service:

   Our primary publication is the OpenBits Catalog, a collection of
   the best available identifying, descriptive, and editorial
   metadata about all kinds of digital files.

   We collect this metadata from our customers and volunteer
   contributors, manage processes which improve the metadata's
   reliability and usefulness over time, and then publish the
   metadata in a wide variety of formats.

   -- http://bitzi.com/about/

This sounds useful; one of the biggest hassles with downloading
files from p2p systems is trying to find out which of the various
versions are the best ones.

I installed bitcollider, a client to use to submit ratings on my
own files (apt-get install bitcollider on debian woody/sid systems)
and used it to submit a few ratings, but I found it kind of clumsy;
I was hoping to be able to use it to tell Bitzi about my high-quality
collection of MP3s (mostly ripped by myself from my own CDs), but
didn't see an easy way to do that all at once for hundreds of
files. (would need to label each one manually, it seems)

Actually, I'm not convinced that a lot of explicit user labelling
is needed to improve the file-sharing experience; if I do a
search for Matrix.avi and my client comes back with a bunch of
possibilities (grouped by their sha1 hashes), I expect that the
most popular one will tend to be the best quality, since people
will just remove lesser quality versions.

Hmm... I'd like to write a lot more about this stuff, but I
should go to sleep.

I look forward to the day when I have a command-line p2p app that
lets me do stuff a la "wget urn:sha1:I5HVBFCT7TK4UB7VR7DELZAOXFCUYFVS"
to fetch a certain file that I know I want.

Then I won't even need to keep around thousands of MP3s on my
hard drive as I do now: just keep a list of my favorite albums
and their hashes, and fetch them as desired (maybe keeping a
hundred gigs or so in a local cache.)

I'm excited that most of the features of my dream p2p client [3]
seem to be coming along nicely, without me having to get involved.
Shareaza (Windows-only) sounds great:

] SHAREAZA FEATURES
]
]  - A simple QuickStart wizard helps new users get up and
]    running in less than a minute. Choose the interface that's easiest for
]    you, share you files, and you're in!
]  - Swarmed downloads get what you want faster and more
]    reliably. Full support for multiple sources, automatic requerying, and
]    pause and resume across sessions.
]  - Powerful metadata support lets you search for content
]    by artist, album, bitrate, etc. Shareaza integrates varied metadata
]    formats from different clients into a simple unified display.
]  - Browser and URL integration works with web browsers and
]    other viewers to provide links to content on the Gnutella network. All
]    new and very cool! Plus you can find out more about local and remote
]    files with Bitzi lookup services.
]  - Full library management features put you in charge of
]    your collection, with a nested folder browser, file-level sharing
]    control, hit and fetch statistics, metadata editor, automatic change
]    scanning, Gnutella-aware move, copy, delete, rename and more.
]  - Automatic file hashing and metadata acquisition for
]    common music, image and application files provide the benefits of
]    metadata without the work. Operates in the background to keep your
]    metabase up to date. Shareaza automatically acquires metadata from
]    asf, bmp, exe, dll, gif, jpg, mp3, mpg, png, wma, and wmv files.
]  - Ultrapeer and query routing technology use intelligent
]    connections to save your bandwidth. With other Ultrapeer enabled
]    clients only. Shareaza gives you full control over your level of
]    Ultrapeer involvement.
]  - A kick-ass interface keeps you informed and gives you
]    the power to do what you want. Don't be restricted by dictated
]    "tab-based" interfaces. Shareaza lets you organise your session with
]    multiple windows and easy to use navigation. Right-click anywhere for
]    an intuitive popup command menu.
]  - Detailed diagnostic and monitoring capabilities
]    including smart packet dumping, search monitoring and filtered hit
]    monitoring enable you to keep an eye on the network.
]  - Fully featured and infinitely configurable
]    performance graphing visualises key indicators such as bandwidth
]    utilisation, neighbour interactions, downloads, uploads and network
]    characteristics over time. Create multiple graphs to keep track of
]    performance (or just to look cool).
]  - Chat peer-to-peer with other Shareaza or LimeWire
]    users. You can initiate a converation from a search result, a download
]    or an upload.
]  - Support for the latest Gnutella protocol enhancements
]    such as v0.6 handshaking, Ultrapeer/Supernode, QRP, BYE, pong caching,
]    flow control, EQHD, HUGE, HTTP/1.1, chat and XML metadata ensure
]    you're PC is a good citizen on the network.
]  - Extensive options put power users in total control of
]    the software, from high-level settings to low-level communications
]    parameters, buffer sizes, caching and routing policies.
]  - Shareaza's lean and mean design doesn't chew up half your
]    available memory and CPU time, or take ages to load. And it doesn't
]    crash your PC if you leave it running for too long. Multithreaded
]    native code ensures smooth operating conditions without going
]    overboard and taxing resources.

   -- http://www.shareaza.com/about/

Here are the two most recent gtk-gnutella announcements:

----- Forwarded message from Raphael Manfredi <[email protected]> -----

Date: Mon, 24 Jun 2002 23:44:24 +0200
From: Raphael Manfredi <[email protected]>
Subject: [Gtk-gnutella-announce] gtk-gnutella 0.90 beta released
To: [email protected],
[email protected]

Version 0.90 beta of gtk-gnutella has been released on sourceforge.
You can access it here:

http://prdownloads.sourceforge.net/gtk-gnutella/gtk-gnutella-0.90.tar.gz

Here is the highlight of the changes since last version (0.85):

- All configuration can now be made from the GUI.
- Fully redesigned search filters to work like ipchains/iptable on Linux.
- Obsoleted experimental "auto-download", superseded by the new filtering code.
- Added Gnutella bandwidth management.
- Both HTTP and Gnet bandwidth is now displayed real time.
- Added full HUGE support (Hash/URN Gnutella Extensions).
- Added support for local host preference.
- Gtk-gnutella can do traffic compression when connecting to a node that
 also supports it.
- Upload and downloads will now show User-Agent/Server information.
- Added status to the download queue.
- You can now freeze the download queue whilst manipulating it.
- Searches can now be listed on the left side of the screen, or as tabs like
 in the previous versions.
- Added automatic banning of servents that are hammering us.
- Many more cool new features that you'll discover whilst playing with it.

Enjoy!

Raphael


----- End forwarded message -----

----- Forwarded message from Raphael Manfredi <[email protected]> -----

Date: Mon, 01 Jul 2002 23:08:50 +0200
From: Raphael Manfredi <[email protected]>
Subject: [Gtk-gnutella-announce] gtk-gnutella 0.90 beta2 released
To: [email protected],
[email protected]

Version 0.90 beta2 of gtk-gnutella has been released on sourceforge.
You can access it here:

http://prdownloads.sourceforge.net/gtk-gnutella/gtk-gnutella-0.90b2.tar.gz

Here is the highlight of the changes since last beta (0.90 beta):

- Bug fixes and slight improvements.
- Greatly enhanced auto-selection in searches to use far less CPU.
- Will now warn user when encountering a newer version of gtk-gnutella.
- URN searches now supported, by typing "urn:sha1:" in search text, followed
 by the base32 value.

Raphael

----- End forwarded message -----


[1] http://impressive.net/archives/fogo/[email protected]
[2] http://bitzi.com/bitizen/ger
[3] http://impressive.net/archives/fogo/[email protected]

--
Gerald Oskoboiny <[email protected]>
http://impressive.net/people/gerald/

Re: deployment of urn:sha1 URIs in P2P clients (was Re: The Content-Addressable Web)

Replies:

  • None.

Parents:

Well, I can't find the exact message in the archives but I remember you
saying something about Gordon Mohr. Since it doesn't seem like you're on
p2p-hackers I thought you might be interested in this thread:

http://zgp.org/pipermail/p2p-hackers/2002-July/000713.html
http://zgp.org/pipermail/p2p-hackers/2002-July/000715.html
http://zgp.org/pipermail/p2p-hackers/2002-July/000719.html
http://zgp.org/pipermail/p2p-hackers/2002-July/000720.html

All the best,
--
Aaron Swartz [http://www.aaronsw.com]
4FAC4838B7D8D13FA6D92EDB4145521E79F0DF4B
I will be in San Diego for the O'Reilly Open Source Convention the 24-26
July.

HURL: fogo mailing list archives, maintained by Gerald Oskoboiny