Disintermediation through Aggregation
(Making your Data your Own)

Steven Pemberton, CWI and W3C, Amsterdam

Some predictions I have heard

"We will never have LCD screens - they will need too many connectors"

"Vector graphics are the future; raster graphics need too much memory"

"Full audio on computers will need too much bandwidth"

"Digital photography will never replace film"

"Moore's Law hasn't got much longer to go" (1977, 1985, 1995, 2005)

Moore's Law

We all know this one. But often people don't understand its true effects.

Take a piece of paper, divide it in two, and write this year's date in one half:

Paper

2008

2006

2005

2003

2002

2000

1999

1997

This demonstrates that your current computer is more powerful than all other computers you have had put together (and the original Macintosh (1984) had tiny amounts of computing power available.)

And so tell us Steven, Do we all have a Cray on our desks?

Sure: in fact current workstations are about 120 Craysworth.

Even my previous mobile phone was 35 Craysworth...

Nokia 9300

Nielsen's Law

What is less well-known is that bandwidth is also growing exponentially at constant cost, but the doubling time is 1 year!

(Actually 10½ months according recently to an executive of one of the larger suppliers)

Put another way, in 7 years we could have 1 Gigabit connections to the home.

What if a Rose didn't have a Name?

Sapir-Whorf Hypothesis: Connection between thought and language.

If you haven't got a word for it, you can't think it.

If you don't perceive it as a concept, you won't invent a word for it.

For example: Dutch Gezellig

An example: The Meaning of Liff

The Deeper Meaning of Liff: A Dictionary of Things There Aren't Any Words for Yet — But There Ought to Be

By Douglas Adams and John Lloyd

Such as:

PEORIA (n.): the fear of peeling too few potatoes

ABINGER (n.): Person who washes up everything except the frying pan, the cheese grater and the saucepan which the chocolate sauce has been made in.

DUNGENESS (n.): The uneasy feeling that the plastic handles of the over-loaded supermarket carrier bag you are carrying are getting steadily longer.

Web 2.0

The term Web 2.0 was invented by a book publisher (O'Reilly) as a term to build a series of conferences around.

It conceptualises the idea of Web sites that gain value by their users adding data to them, such as Wikipedia, Facebook, Flickr, ...

But the concept existed before the term: Ebay was already Web 2.0 in the era of Web 1.0.

The dangers of Web 2.0

By putting a lot of work into a website, you commit yourself to it, and lock yourself in to their data formats too.

This is similar to data lock-in with software: when you use a proprietary program you commit yourself and lock yourself in. Moving comes at great cost.

This was one of the justifications for creating XML: it reduces the possibility of data lock-in, and having a standard representation for data helps using the same data in different ways too.

But there is no standard way of getting your data out of one Web 2.0 site to get it into another.

How do you decide?

As an example, if you commit to a particular photo-sharing website, you upload thousands of photos, tagging extensively, and then a better site comes along. What do you do?

How do you decide which social networking site to join? Do you join several and repeat the work? I am currently being bombarded by emails from networking sites (LinkedIn, Dopplr, Plaxo, Facebook, MySpace, Hyves, Spock...) telling me that someone wants to be my friend, or business contact.

How about geneology sites? You choose one and spend months creating your family tree. The site then spots similar people in your tree on other trees, and suggests you get together. But suppose a really important tree is on another site?

And what if it dies? Or your account is deleted?

How about if your chosen site closes down: all your work is lost.

This happened with MP3.com for instance. And Stage6. And Pownce. And Ficlets. And Jaiku. And Google Video. And Magnolia

How about if your account gets closed down? There was someone whose Google account got hacked, and so the account got closed down. Four years of email lost, no calendar, no Orkut.

Here is someone whose Facebook account got closed. Why? Because he was trying to download all the email addresses of his friends into Outlook.

Or the woman whose account was closed for the heinous crime of posting a photo of her breastfeeding.

Metcalf's Law

Metcalf proposes that the value of a network is proportional to the square of the number of nodes.

v(n)=n²

Visual demonstration of the maths Simple maths shows that if you split a network into two, it halves the total value:

(n/2)² + (n/2)² = n²/4 + n²/4 = n²/2

This is why it is good that there is only one email network, and bad that there are so many Instant Messenger networks. It is why it is good that there is only one World Wide Web.

Walled gardens

The Web 2.0 examples are all examples of Metcalf's law in action

Web 2.0 partitions the Web into a number of topical sub-Webs, and locks you in, thereby reducing the value of the network as a whole.

This is why you should have a Web Site

What should really happen is that you have a personal Website, with your photos, your family tree, your business details, and aggregators then turn this into added value by finding the links across the whole web.

So what do we need to realize this?

Firstly and principally, machine readable Web pages.

When an aggregator comes to your Website, it should be able to see that this page represents (a part of) your family tree, and so on.

Machine-readable Web Sites

One of the technologies that can make this happen has the catchy name of RDFa.

You could describe it as a CSS for meaning: it allows you to add a small layer of markup to your page that adds machine -readable semantics.

It allows you to say "This is a date", "This is a place", "This is a person", and uniquely identify them on your web page: it turns your page into data.

Comparable to microformats, but then generalised.

RDF

Relational databases: Isolated data; the data can be joined, but usually isn't.

We could create a Metcalfe law for data

RDF is a future for data: a web of data

Data is automatically joined. You really don't (need to) know the structure. You can just add a new piece of data (a new fact) with no extra work.

Advantages of machine-readable pages

If a page has machine-understandable semantics, you can do lots more with it.

Once a search engine can derive from the document that the text "the prime minister" means "Gordon Brown", then a search for "Gordon Brown" can find that page as well, even if it doesn't mention him by name, or a browser might offer additional information.
If the browser really knows that something is an address, it can offer to add it to your address book, or find it for you on a map.
If the browser really knows that something is an announcement for an event like a conference, and can identify the sub-parts, it can offer to add it to your agenda, find it on a map, locate hotels, look up flights, ...
Upstream processors can also use the information for other purposes, such as transforming content to different devices.
Aggregators can create value by joining data across pages.

Keep your data your own

So rather than putting all your data on someone else's website, and the fact that it is there implying a certain semantics, you should put your own data on your own website with explicit semantics.

Then you get the true web-effect, with its full Metcalf value.

Advantages

No need for middle men: "Disintermediation"
Sell your stuff on your own page
Join a new photo site by just pointing them to your website
Your data belongs to you: you get to keep all your comments, reviews, ...

You can still use software to create your site (think how Blogger works)

Is it already there?

If Ebay was Web 2.0 in the era of Web 1.0, is there already some Web 3.0 out there now?

Yes, I think so: Google news is an example, even though the semantics are not explicit, but implicit.

Where should you have your Website?

It doesn't really matter, because on the whole Websites are largely interoperable, but I am particularly charmed by this sort of device:

Router containing a webserver 2 Freecom storage gateway They are wireless routers containing network storage and a media server for in your house, while offering FTP and a world-class Webserver for outside. So you can switch off all your machines, and still serve webpages to the outside world, with rather low energy use.

Summary

Web 2.0 is damaging to the Web by dividing it into topical sub-webs.

With machine-readable pages, we don't need those separate websites, but can reclaim our data, and still get the value.

Web 3.0 sites will then aggregate data from the web, and in so doing add value that will attract users.

Full text at: http://www.cwi.nl/~steven/vandf/2008.03-website.html

Disintermediation through Aggregation
(Making your Data your Own)

Some predictions I have heard

Moore's Law

Paper

Paper

Paper

Paper

One Person's Computing Power 1988-2008

The Cray

Crays

And so tell us Steven, Do we all have a Cray on our desks?

Nielsen's Law

One Person's Bandwidth 1982-2008

What if a Rose didn't have a Name?

An example: The Meaning of Liff

Web 2.0

The dangers of Web 2.0

How do you decide?

And what if it dies? Or your account is deleted?

Metcalf's Law

Walled gardens

This is why you should have a Web Site

So what do we need to realize this?

Machine-readable Web Sites

RDF

Advantages of machine-readable pages

Keep your data your own

Advantages

Is it already there?

Where should you have your Website?

Summary

Disintermediation through Aggregation (Making your Data your Own)

Some predictions I have heard

Moore's Law

Paper

Paper

Paper

Paper

One Person's Computing Power 1988-2008

The Cray

Crays

And so tell us Steven, Do we all have a Cray on our desks?

Nielsen's Law

One Person's Bandwidth 1982-2008

What if a Rose didn't have a Name?

An example: The Meaning of Liff

Web 2.0

The dangers of Web 2.0

How do you decide?

And what if it dies? Or your account is deleted?

Metcalf's Law

Walled gardens

This is why you should have a Web Site

So what do we need to realize this?

Machine-readable Web Sites

RDF

Advantages of machine-readable pages

Keep your data your own

Advantages

Is it already there?

Where should you have your Website?

Summary

Disintermediation through Aggregation
(Making your Data your Own)