Re: new mailing list: fogf

Replies:

Parents:

On Fri, Feb 02, 2001 at 02:32:26PM -0500, Eric Prud'hommeaux wrote:
> The recent displacement of Gerald Oskoboiny from the number one slot
> on http://www.google.com/search?q=gerald by some upstart US president
> has created the need for a new mailing list "fogf". I suggest everyone
> join this list to be in the new in crowd.

Ha!

Funny, if I go to that URL I'm still #1, but if I visit it from
a machine at MIT, Gerald Ford is at the top. (temporarily!)
So it depends which of Google's servers you hit.

For some reason, some of their servers keep losing pages from
their indexes: my home page disappeared completely for a while,
and now the W3C html validator isn't there [1], and that used to
be the #1 result for a search for "HTML". [2] Hey, now the #1
result for "HTML" is my old validator at the U of Alberta.

These guys seem like they are taking the Napster approach to load
balancing: hundreds of servers with none of them knowing anything
about any of the others. ;)

[1] http://www.google.com/search?q=w3c+html+validation+service
[2] http://www.google.com/search?q=html

--
Gerald Oskoboiny <[email protected]>
http://impressive.net/people/gerald/

Google indexing again (was Re: new mailing list: fogf)

Replies:

  • None.

Parents:

On Fri, Feb 02, 2001, Gerald Oskoboiny wrote:
> For some reason, some of their servers keep losing pages from
> their indexes: my home page disappeared completely for a while,
> and now the W3C html validator isn't there [1], and that used to
> be the #1 result for a search for "HTML". [2] Hey, now the #1
> result for "HTML" is my old validator at the U of Alberta.

I sent them an email about that two weeks ago and I just got a reply
this afternoon. Unfortunately, it was the standard reply (I think that
I already complained about a similar problem and got the same reply a
while back) - I am copying it here, I don't think that disclosing an
automated reply is a violation of the netiquette:

| Every time we update our database of web pages, our index invariably
| shifts: We find new sites, we lose some sites, and sites ranking may
| change.

Huh! I wonder how updating could make them lose sites.

| Here's some more information about how Google ranks pages: Google
| finds most of its pages when our robots crawl the web and jump from
| page to page via hyperlinks. The best way to ensure listing on
| Google is for a page to be linked from lots of other pages.

I think that the W3C HTML validator is linked from a few other pages.
:-)

[ useless crap ]
| If your page does not appear at all, there is another possible
| explanation.  Sometimes websites are not reachable when we tried to
| crawl them. We try to crawl a site multiple times, but if the site
| is not reachable, that can cause it to be left-out of the current
| index.  If that was a transient problem, the site will likely show
| up in the next index.

Hmmm... I don't believe in that, because the other page that
disapeared and that I complained about originally was on my site,
which is never down. ;-)

I think that they have a problem in their indexing process. I would
actually like to know more about how Google works. I would like to
know in detail about indexing, storage of information, search
algorithms, clustering, etc.

--
Hugo Haas <[email protected]> - http://larve.net/people/hugo/
Mais alors, tout se recoupe !

HURL: fogo mailing list archives, maintained by Gerald Oskoboiny