Reputation and Trust

by Gerald Oskoboiny


(this is very drafty; it became a bit unfocused and kind of got away from me. Last major edit was Sep 2005; have only been adding links since then)

Reputation and trust in real life

In real life, we trust people we know personally somehow: friends, friends of friends, and people and businesses they use or recommend.

We should use online relationships/connections to decide who to trust online as well.

@@ needs work

The (only) solution to spam and phishing

Spam filtering technology has come a long way in the last few years, and email administrators are now able to block 99.9+% of incoming junk mail with few false positives. Efforts are underway to improve this further, by allowing the sender of messages to be authenticated, e.g. using SPF, Sender-ID, DKIM, or traditional PGP signatures.

But even if we know where mail is coming from, we still need to be able to decide if a given source is trustworthy or not. Those of us who have been online for years have various tricks to try to figure that out, but most people don't know how.

Various phishing attacks like IDN spoofing come and go, but the basic problem of reputation and trust will always be there -- most users have no way of knowing that paypal.com is more trustworthy than paypal-security.com or paypa1.com (paypa one dot com) or pаypal.com (paypal.com with a unicode character U+0430 (а), "CYRILLIC SMALL LETTER A".)

The only way to solve that problem is for browsers and e-mail clients to display some kind of trust meter whenever they present web pages or email messages to users.

But... how should they decide which sites are trustworthy?

Decentralized trust

Some users may be happy with a centralized source of trust information, but the whole world is never going to agree on a single source of that info. Even within a certain geographical region or interest group, people will have wildly differing views on which sources of data are trustworthy.

Many users might be comfortable trusting Microsoft to tell them which sites are legitimate or not, and MSN could probably afford to build such a system just for their users, but I wouldn't want to trust it myself.

Many people currently place a lot of trust in Google because of their generally good track record of filtering good sites from bad and their Don't be evil policy, but like any public corporation they will only stay free from evil as long as doing so is profitable. So I wouldn't want to trust them indefinitely either.

So it is clear that web browsers and email user agents will need to allow users to select from multiple independent sources of trust data. Ideally they would allow trust data to be aggregated from a variety of sources, possibly based on some hierarchy or mathematical model. I may want to configure my browser to trust Google's notion of a site's reputation by default, but override Google's data in some cases, for example to exclude specific search engine spammers or phishing sites as they become known.

Email administrators already do this -- typically, sites will subscribe to a set of blacklist(s) of known bad guys (e.g. via DNSBLs), but override that with a local whitelist of sites they never want blocked. The popular SpamAssassin mail filtering system calculates spam scores for messages using hundreds of tests with weighted scores; the most useful of these has proven to be URI blacklist and email message checksum data published by various sites. The final decision of which of these sites to trust is left to the site's email administrator, and may be overridden by individual users.

Use Cases

When considering how to build a system to determine reputation info for a given individual or organization, it is useful to keep in mind specific use cases that we would like to handle.

Short term, deployable immediately

Medium term, deployable with a bit of new infrastructure

Long term

@@ move examples to some section below with more detail?

Abuse cases to keep in mind

How to get there from here

bootstrap using existing data sources, screen scraping; use our own system to maintain libraries of screen scraping code

@@

Existing sources of reputation information

Next steps

things to do, projects for the enthusiastic:

software libraries, configuration updates

Anyone who wants to use stock quotes in their apps has to implement screen-scraping code to grab stock quotes from Yahoo or somewhere; whenever Yahoo changes their page layout or URIs, thousands of people have to update their code. For Perl, there are libraries in CPAN to do this, but such libraries typically aren't updated often enough and quickly enough to be relied upon. It should be possible for anyone to publish a stable API for such a service ("current stock quotes"), and for anyone else to subscribe to this code using RSS and a web of trust to autoupdate their code. (Dapper sounds like it aims to solve related problems; haven't looked at it closely)

This general idea can be applied to many other projects besides Perl libraries:

Exim configuration (e.g. ACLs)

use rss and web of trust to create/publish/use exim config info. e.g. if I see an idea like "reject mail whose HELO does not match" or "reject mail whose subject contains raw 8bits", how do I know whether to trust that advice?

Spamassassin rules

also use RSS+WoT to create/publish/use spamassassin rules, a la SARE

sa-update provides automatic updates for spamassassin rules.

greasemonkey scripts

also use RSS+WoT to distribute/select greasemonkey scripts for various sites

Performance issues

semweb way may be slow, but let's optimize later.

Existing successful applications

Bad ideas

examples of the wrong way to do things:

Todo

@@ semweb motivation, techniques

other things to cover somewhere:

References

this article inspired in part by...

things I haven't read but probably have ideas worth stealing:

various related articles:

existing reputation data/services:


Valid XHTML 1.0! Last modified: $Date: 2011/05/09 21:54:42 $
Gerald Oskoboiny, <gerald@impressive.net>