Internet, Security & Privacy

Big Numbers

In computing we have to deal with big numbers. So big that we lose track of what they really mean.

For instance: a typical desktop computer has an internal clock of around 3GHz. What does that mean?

It means that the computer clock ticks as many times per second as a regular clock ticks in a person's lifetime.

That's a big number.

How to remember all those prefixes

Exa, Peta, Tera: how big are they?

In 1799 the prefix kilo was introduced for 1000, from the Greek word for, yep, "thousand".

It was mainly used for kilograms, and kilometres.

Prefixes 1960

It wasn't until 1960 that a need was sufficiently felt for mega (million), giga (thousand million), and tera (million million). Probably for hydrogen bombs.

These were based on the greek words for "big", "giant", and "monster".

These are the household prefixes we use: megapixel cameras, gigabyte memories, and terabyte disks.

(Interestingly, we still say 1000 km, and not 1 Megametre)

I have heard youth say "That's Mega Cool" and fairly recently "That's Giga Cool!". I haven't yet heard "That's Tera Cool!".

Prefixes 1975

However, when in 1975 they wanted to upgrade again, they realised they had run out of Greek synonyms for big. So they did a sort-of clever thing.

They observed that tera is 1000⁴, and that tera is one letter short of tetra, the Greek prefix for "four". Bingo!

So we got

Peta (from penta, five): 1000⁵

Exa (from hexa, six): 1000⁶

You probably won't have a petabyte disk in your home for another ten to fifteen years.

Prefixes 1991

In 1991, realising they were sooner or later going to run out of single digit Greek numbers too, and that there were far more letters than digits, they altered the rule a bit, and added:

Zetta (based on hepta, seven): 1000⁷

Yotta (based on octa, eight): 1000⁸

That's as high as it goes for now, but my guess is that based on nona, the next one will be something like xonna, and then based on deca, something like wecca.

[By the way, last year I read the phrase "10,000 trillion gigaelectronvolts" in a news article. The author meant of course to say "10 yottaelectronvolts"].

Data

As I was saying...

According to one report, in 2013 3.5 zettabytes of data were produced.

Which, we now know, is 3.5 × 1000⁷ bytes (which is 3.5 × 10²¹ bytes).

If that were a number of seconds, it would represent 110 million million years.

Which is about 10,000 times the age of the universe.

It's a very big number...

Data is increasing exponentially

In computing we are used to exponential growth.

Exponential growth: it doubles in size per period.

For instance, computers double in power per 18 months (at constant price)

Bandwidth doubles per year (AMSIX is now peaking at 4Tb/s, from an original value of 64kb/s in 1988. Now, per second the number of bits is equal to the number of seconds in more than 100,000 years...)

Exponential Data

And data is increasing exponentially as well, although it is trickier to measure what the doubling period actually is, and different reports give different numbers.

Let us assume that data doubles yearly (it may be a bit faster, it may be a bit slower).

Take a piece of paper, draw a line down the middle, and write this year's date in half:

Paper

2015

2014

2013

2012

2011

2010

2009

2008

This shows that this year we have produced as much data as the whole of history before it! And in the last three years we have produced 90% of the data produced in all history...

Internet and Control

So all of a sudden, there is a new method of distributing information, much cheaper, much easier to use, accessible for next-to-no money by everyone.

This makes many companies happy, it makes criminals happy, but the state isn't completely happy. Let's start with them.

From my son's history book:

"At that time it was common practice for the church and the state to monitor everything that was said, written and printed. This practice is known as censorship. Anyone who dared to criticise the Church, the King and his officials was prohibited from speaking and could even go to prison. In most countries there were many officials who constantly screened everything that was said or written. [...] The enlightenment thinkers were totally opposed to censorship. They wanted the freedom to express their thoughts and ideas."

And what do we get?

Up until the internet, at least in free democracies, in order to read someone's mail, listen to their phone calls, or come into their houses and read their diaries, the authorities had to get a court order that accepted the need.

Nowadays that has all changed...

Example: megaupload

Megaupload

Note that this (non US!) website was shut down, and its assets seized, because it had been accused of US federal crimes. It hadn't even gone to court. No "innocent until proved guilty" here.

"The US judge handling the case has expressed doubts about whether the case will come to court"

Megaupload was also used by journalists for - anonymously - passing information.

Example: Lavabit

Lavabit was a US-based service offering privacy-guaranteed email.

In July 2013 the US federal government obtained a search warrant demanding that Lavabit give away the private SSL keys to its service affecting all Lavabit users.

However, Lavabit was forbidden from telling anyone about this search warrant.

Lavabit responded by closing down its service. It was also not in a position to tell anyone why it was closing down, though many guessed.

The US authorities had discovered that Edward Snowden used Lavabit, and wanted to read his email.

"the government argued that, since the 'inspection' of the data was to be carried out by a machine, they were exempt from the normal search-and-seizure protections of the Fourth Amendment."

In fact, no company is allowed to tell anyone about secret subpoenas that they receive from the US government. As a result, there has been the birth of the Warrant Canary, a piece of text in annual reports that says they haven't received any subpoenas in the period. If they omit the text, then people can know that they have been served.

Example: UK GCHQ

http://www.theregister.co.uk/2015/09/25/gchq_tracked_web_browsing_habits_karma_police/

"New documents revealing GCHQ's mass-surveillance activities have detailed an operation codenamed KARMA POLICE, which slurped up the details of "every visible user on the Internet"."

But they still aren't happy

The state usually justify spying for reasons of paedophiles and terrorists. Watch for them saying they need more powers at the next terrorist attack.

For instance:

MI5 chief seeks new powers after Paris magazine attack

And: David Cameron will on Monday tell Britain’s intelligence chiefs that he will introduce the so-called snooper’s charter

and: Cameron wants to ban encryption

But it's not only the state

Commercial interests too, are watching us.

For instance, I travelled to England via the Channel Tunnel last Christmas...

Channel Tunnel

On the journey there, we had to stop the car at a booth at the entrance, and fill in a long reservation number in on a touch screen, in order to know which train we would be put on.

On the way back, we drove up to the booth, and the screen already said "Welcome, Mr. Pemberton"! They must scan and store every car's number plates.

Google, Facebook

We all know how Google and Facebook are using our information.

(And don't forget, if you use Gmail, you are giving not only your own privacy away, but also the privacy of your correspondents).

But recently I was reading the British newspaper The Guardian, and even though I hadn't logged in, it offered me an advert for computer memory for the very computer I was using at that moment.

This isn't the Guardian's doing, but the advertising network that they use to supply adverts to their site.

And then there's the criminals

In 2012, a non-criminal, just for fun, tried automated login attempts at random internet IP addresses. Well, he or she was really a sort-of criminal, because it's not allowed to do that. But he or she was just doing it for fun. I use this example, because most criminals don't reveal their results.

"We started scanning and quickly realized that there should be several thousand unprotected devices on the Internet."

"After completing the scan of roughly one hundred thousand IP addresses, we realized the number of insecure devices must be at least one hundred thousand."

"Starting with one device and assuming a scan speed of ten IP addresses per second, it should find the next open device within one hour. The scan rate would be doubled if we deployed a scanner to the newly found device. After doubling the scan rate in this way about 16.5 times, all unprotected devices would be found; this would take only 16.5 hours. Additionally, with one hundred thousand devices scanning at ten probes per second we would have a distributed port scanner to port scan the entire IPv4 Internet within one hour."

Botnet

"The completed scan proved our assumption was true. There were in fact several hundred thousand unprotected devices on the Internet making it possible to build a super fast distributed port scanner."

Botnet map

What is to be done?

Lots of things, which I don't have time to go into here, including new laws.

But one important one: end-to-end public-key cryptography.

This would make a lot of other things better too. Less spam for instance.

Public Key Cryptography

You know how in hotels, every room has a different lock, but there is a master key that can open all locks?

Public key cryptography works in a similar way, except the other way round:

there is just one sort of lock, and
all keys work in all locks.

The Basics

Keys

Keys come in pairs, a clockwise key, that will only turn clockwise in the lock, and a paired anticlockwise key.
If you lock a lock with a clockwise key, then the only way to unlock it is with its paired anticlockwise key,
and vice versa: if you lock it with an anticlockwise key, you can only unlock it with its paired clockwise key.

The Basics

You can't tell by looking at one key what its paired key would look like.
Therefore, you can't tell if two keys are paired just by looking at them. (You have to try them).
You can lock a lock two (or more) times, for instance, first with clockwise key A and then with anticlockwise key B. If you do that, then you can only unlock by first using clockwise key B, and then anticlockwise key A.
Lock with A↷, lock with B↶, unlock with B↷, unlock with A↶

Public and Private Keys

Everyone is given a matched pair of keys, and we declare that all clockwise keys are public, so that anyone can have a copy of anyone else's clockwise key,
but all the anticlockwise keys are private: only one person may have any particular anticlockwise key.

Digital Signature

I write a message, put it in a box with one of the locks, and I lock it with my private key.
Then I send the box to you with the message "From Steven". You know from this that I have locked it with my private key.
So you get a copy of my public key, and try to unlock it. If it opens, then you know for sure that it really is from me, since only I have a copy of my private key.

Combined: Secure messages

Now I can combine both things:

I put a message in a box, and lock it first with my private key.
Then I lock it with your public key.
I mark it "From Steven, to You".

I am guaranteed that no one else will read it, and you are guaranteed that it really is from me. Secure messages.

Reality

In reality of course, there are no boxes, locks and keys: it is all done with mathematical formulas and numbers, but the principle is the same:

There are matched pairs of (very large) numbers that are used as parameters to the mathematical formulas.
One of the numbers anyone may know, and the other you keep secret, and only you may use.
If I encode a message with my private number, you can decode it with my public number and vice versa.

When you use https: to a web site, for instance with your bank, all communications are encrypted with a single key system, but which key to use is decided with a public key system first.

Credit cards

For instance, I could order something from a shop by sending a secure message to them. I know that only the shop will read it, and the shop knows it really is from me.

But instead of giving them my credit card number, I give them a box with my credit card number in it, locked with my private key, and the bank's public key. So the shop doesn't know what my credit card number is, but they can send it on to the bank, and I know that only the bank can read it, and the bank knows that it really is from me.

So the only people who know what my credit card number is are me and my bank (in fact, there is no reason really to have credit card numbers at all in this system because the box can just contain the message "Please pay this shop €20" and the bank knows it is from me).

The shop doesn't need to know my address for similar reasons.

No Passwords Needed Anymore

I could try to log in to a site.

I say "Hi, I'm Steven"

The site says "Oh yeah? Here's a random message. Encrypt that for me".

My browser encrypts it with my private key, the site checks it with my public key, and lets me in.

(Equivalently they could also say "Here is a message encrypted with your public key; tell me what it says"; makes no difference.)

Conclusion

The internet was designed in an environment where you could trust everybody.

The infrastructure really needs to be redesigned to take away that design fault.

It is time for public-key cryptography to be an underlying part of the infrastructure. (In fact, they did consider making it part of the infrastructure in the beginning, but computers weren't fast enough then, which would have thus required specialised equipment, which was too expensive).

As part of email, it would reduce spam: you would know if a mail really was from your bank; and it would increase privacy, because you are assured that only the recipient can read the mail.

If ISPs offered it for end-to-end connections, it would mean no one could listen in to your communications (for instance over Wifi).

If done right, it could even do away with the need for passwords.

Internet, Security & Privacy

Contents

Data

Big Numbers

How to remember all those prefixes

Prefixes 1960

Prefixes 1975

Prefixes 1991

Data

Data, not information, and certainly not knowledge

Data is increasing exponentially

Exponential Data

Paper

Paper

Paper

Paper

Internet and Control

From my son's history book:

And what do we get?

Example: megaupload

Example: Lavabit

Example: UK GCHQ

But they still aren't happy

But it's not only the state

Google, Facebook

And then there's the criminals

Botnet

This will only get worse

What is to be done?

Public Key Cryptography

The Basics

The Basics

Public and Private Keys

Digital Signature

Digital Privacy

Combined: Secure messages

Reality

That's it

Credit cards

No Passwords Needed Anymore

Your private key

Conclusion