The author

Internet, Security & Privacy

Steven Pemberton, CWI, Amsterdam



According to one report, in 2013 3.5 zettabytes of data were produced worldwide.

Big Numbers

In computing we have to deal with big numbers. So big that we lose track of what they really mean.

For instance: a typical desktop computer has an internal clock of around 3GHz. What does that mean?

It means that the computer clock ticks as many times per second as a regular clock ticks in a person's lifetime.

That's a big number.

How to remember all those prefixes

Exa, Peta, Tera: how big are they?

In 1799 the prefix kilo was introduced for 1000, from the Greek word for, yep, "thousand".

It was mainly used for kilograms, and kilometres.

Prefixes 1960

It wasn't until 1960 that a need was sufficiently felt for mega (million), giga (thousand million), and tera (million million). Probably for hydrogen bombs.

These were based on the greek words for "big", "giant", and "monster".

These are the household prefixes we use: megapixel cameras, gigabyte memories, and terabyte disks.

(Interestingly, we still say 1000 km, and not 1 Megametre)

I have heard youth say "That's Mega Cool" and fairly recently "That's Giga Cool!". I haven't yet heard "That's Tera Cool!".

Prefixes 1975

However, when in 1975 they wanted to upgrade again, they realised they had run out of Greek synonyms for big. So they did a sort-of clever thing.

They observed that tera is 10004, and that tera is one letter short of tetra, the Greek prefix for "four". Bingo!

So we got

Peta (from penta, five): 10005

Exa (from hexa, six): 10006

You probably won't have a petabyte disk in your home for another ten to fifteen years.

Prefixes 1991

In 1991, realising they were sooner or later going to run out of single digit Greek numbers too, and that there were far more letters than digits, they altered the rule a bit, and added:

Zetta (based on hepta, seven): 10007

Yotta (based on octa, eight): 10008

That's as high as it goes for now, but my guess is that based on nona, the next one will be something like xonna, and then based on deca, something like wecca.

[By the way, last year I read the phrase "10,000 trillion gigaelectronvolts" in a news article. The author meant of course to say "10 yottaelectronvolts"].


As I was saying...

According to one report, in 2013 3.5 zettabytes of data were produced.

Which, we now know, is 3.5 × 10007 bytes (which is 3.5 × 1021 bytes).

If that were a number of seconds, it would represent 110 million million years.

Which is about 10,000 times the age of the universe.

It's a very big number...

Data, not information, and certainly not knowledge

Remember that that 'data' includes things like

Data is increasing exponentially

In computing we are used to exponential growth.

Exponential growth: it doubles in size per period.

For instance, computers double in power per 18 months (at constant price)

Bandwidth doubles per year (AMSIX is now peaking at 4Tb/s, from an original value of 64kb/s in 1988. Now, per second the number of bits is equal to the number of seconds in more than 100,000 years...)

Exponential Data

And data is increasing exponentially as well, although it is trickier to measure what the doubling period actually is, and different reports give different numbers.

Let us assume that data doubles yearly (it may be a bit faster, it may be a bit slower).

Take a piece of paper, draw a line down the middle, and write this year's date in half:



Now divide the other half in two vertically, and write last year in one half:



Now divide the remaining space in half, and write the year before in one half:



Repeat until your pen is thicker than the space you have to divide in two:



This shows that this year we have produced as much data as the whole of history before it! And in the last three years we have produced 90% of the data produced in all history...

Internet and Control

So all of a sudden, there is a new method of distributing information, much cheaper, much easier to use, accessible for next-to-no money by everyone.

This makes many companies happy, it makes criminals happy, but the state isn't completely happy. Let's start with them.

From my son's history book:

"At that time it was common practice for the church and the state to monitor everything that was said, written and printed. This practice is known as censorship. Anyone who dared to criticise the Church, the King and his officials was prohibited from speaking and could even go to prison. In most countries there were many officials who constantly screened everything that was said or written. [...] The enlightenment thinkers were totally opposed to censorship. They wanted the freedom to express their thoughts and ideas."

And what do we get?

Up until the internet, at least in free democracies, in order to read someone's mail, listen to their phone calls, or come into their houses and read their diaries, the authorities had to get a court order that accepted the need.

Nowadays that has all changed...

Example: megaupload


Note that this (non US!) website was shut down, and its assets seized, because it had been accused of US federal crimes. It hadn't even gone to court. No "innocent until proved guilty" here.

"The US judge handling the case has expressed doubts about whether the case will come to court"

Megaupload was also used by journalists for - anonymously - passing information.

Example: Lavabit

Lavabit was a US-based service offering privacy-guaranteed email.

In July 2013 the US federal government obtained a search warrant demanding that Lavabit give away the private SSL keys to its service affecting all Lavabit users.

However, Lavabit was forbidden from telling anyone about this search warrant.

Lavabit responded by closing down its service. It was also not in a position to tell anyone why it was closing down, though many guessed.

The US authorities had discovered that Edward Snowden used Lavabit, and wanted to read his email.

"the government argued that, since the 'inspection' of the data was to be carried out by a machine, they were exempt from the normal search-and-seizure protections of the Fourth Amendment."

In fact, no company is allowed to tell anyone about secret subpoenas that they receive from the US government. As a result, there has been the birth of the Warrant Canary, a piece of text in annual reports that says they haven't received any subpoenas in the period. If they omit the text, then people can know that they have been served.

Example: UK GCHQ

"New documents revealing GCHQ's mass-surveillance activities have detailed an operation codenamed KARMA POLICE, which slurped up the details of "every visible user on the Internet"."

But they still aren't happy

The state usually justify spying for reasons of paedophiles and terrorists. Watch for them saying they need more powers at the next terrorist attack.

For instance:

MI5 chief seeks new powers after Paris magazine attack

And: David Cameron will on Monday tell Britain’s intelligence chiefs that he will introduce the so-called snooper’s charter

and: Cameron wants to ban encryption

But it's not only the state

Commercial interests too, are watching us.

For instance, I travelled to England via the Channel Tunnel last Christmas...

Channel Tunnel

On the journey there, we had to stop the car at a booth at the entrance, and fill in a long reservation number in on a touch screen, in order to know which train we would be put on.

On the way back, we drove up to the booth, and the screen already said "Welcome, Mr. Pemberton"! They must scan and store every car's number plates.

Google, Facebook

We all know how Google and Facebook are using our information.

(And don't forget, if you use Gmail, you are giving not only your own privacy away, but also the privacy of your correspondents).

But recently I was reading the British newspaper The Guardian, and even though I hadn't logged in, it offered me an advert for computer memory for the very computer I was using at that moment.

This isn't the Guardian's doing, but the advertising network that they use to supply adverts to their site.

And then there's the criminals

In 2012, a non-criminal, just for fun, tried automated login attempts at random internet IP addresses. Well, he or she was really a sort-of criminal, because it's not allowed to do that. But he or she was just doing it for fun. I use this example, because most criminals don't reveal their results.

"We started scanning and quickly realized that there should be several thousand unprotected devices on the Internet."

"After completing the scan of roughly one hundred thousand IP addresses, we realized the number of insecure devices must be at least one hundred thousand."

"Starting with one device and assuming a scan speed of ten IP addresses per second, it should find the next open device within one hour. The scan rate would be doubled if we deployed a scanner to the newly found device. After doubling the scan rate in this way about 16.5 times, all unprotected devices would be found; this would take only 16.5 hours. Additionally, with one hundred thousand devices scanning at ten probes per second we would have a distributed port scanner to port scan the entire IPv4 Internet within one hour."


"The completed scan proved our assumption was true. There were in fact several hundred thousand unprotected devices on the Internet making it possible to build a super fast distributed port scanner."

Botnet map

This will only get worse

With the arrival of the $1 computer, many more household devices will have embedded computers.

Like this.

What is to be done?

Lots of things, which I don't have time to go into here, including new laws.

But one important one: end-to-end public-key cryptography.

This would make a lot of other things better too. Less spam for instance.

Public Key Cryptography

You know how in hotels, every room has a different lock, but there is a master key that can open all locks?

Public key cryptography works in a similar way, except the other way round:

The Basics


The Basics

Public and Private Keys

Digital Signature

Digital Privacy

Combined: Secure messages

Now I can combine both things:

I am guaranteed that no one else will read it, and you are guaranteed that it really is from me. Secure messages.


In reality of course, there are no boxes, locks and keys: it is all done with mathematical formulas and numbers, but the principle is the same:

When you use https: to a web site, for instance with your bank, all communications are encrypted with a single key system, but which key to use is decided with a public key system first.

That's it

This is really all that is involved.

There are extra advantages though.

Credit cards

For instance, I could order something from a shop by sending a secure message to them. I know that only the shop will read it, and the shop knows it really is from me.

But instead of giving them my credit card number, I give them a box with my credit card number in it, locked with my private key, and the bank's public key. So the shop doesn't know what my credit card number is, but they can send it on to the bank, and I know that only the bank can read it, and the bank knows that it really is from me.

So the only people who know what my credit card number is are me and my bank (in fact, there is no reason really to have credit card numbers at all in this system because the box can just contain the message "Please pay this shop €20" and the bank knows it is from me).

The shop doesn't need to know my address for similar reasons.

No Passwords Needed Anymore

I could try to log in to a site.

I say "Hi, I'm Steven"

The site says "Oh yeah? Here's a random message. Encrypt that for me".

My browser encrypts it with my private key, the site checks it with my public key, and lets me in.

(Equivalently they could also say "Here is a message encrypted with your public key; tell me what it says"; makes no difference.)

Your private key

Of course your private key is your crown jewel. If anyone gets their hands on it, you are done for.


The internet was designed in an environment where you could trust everybody.

The infrastructure really needs to be redesigned to take away that design fault.

It is time for public-key cryptography to be an underlying part of the infrastructure. (In fact, they did consider making it part of the infrastructure in the beginning, but computers weren't fast enough then, which would have thus required specialised equipment, which was too expensive).

As part of email, it would reduce spam: you would know if a mail really was from your bank; and it would increase privacy, because you are assured that only the recipient can read the mail.

If ISPs offered it for end-to-end connections, it would mean no one could listen in to your communications (for instance over Wifi).

If done right, it could even do away with the need for passwords.