Meaning in AI

The author
Steven Pemberton
, CWI, Amsterdam

Contents

Representing Meaning

Consider the UK political landscape.

Because of the ancient voting system it has a tendency to produce a small number of parties, two large parties, and a small number of regional and other parties.

Because there is such a small number of parties, the two main parties tend to be very broad, each a sort of pre-arranged coalition of interests.

Normally the UK parties are described on a left-right axis

Left..............Centre..............Right
         Labour  Libdem    Tory→

Dimension

Because there are a large group of people who would never vote Tory, and another large group who would never vote Labour, the parties tend to drift towards the centre where the voters who change their voting choice are situated.

You could describe the British parties by a position representing (approximately) where they are located on this left-right axis from -1 to 1:

Labour: -0.25
Libdem: 0
Tory: 0.7

Second dimension

Another axis might reflect their current position on Europe:

Anti...................................Pro
Tory             Labour             Libdem

Tory: -1
Labour: 0
Libdem: 1

You could then create a two-dimensional idea of the parties by combining these axes:

Labour: (-0.25, 0)
Libdem: (0, 1)
Tory: (0.7, -1)

Values

There is nothing essential to using -1 to +1 as the numbers.

You could just as well use 0 to 1 with the same effect, with 0.5 representing 'in the middle':

Labour: (0.375, 0.5)
Libdem: (0.5, 1)
Tory: (0.85, 0)

Other systems

More modern voting systems allow a greater range of parties.

For instance The Netherlands had 25 parties at the last election, of which 15 got elected.

It is less informative to display them just on a left-right axis.

One way they are displayed there is on two axes: left-right, progressive-conservative

Other systems

The Dutch Political Landscape

So you could represent the parties on this diagram by a position of two coordinates. For instance, D66, about the same as the UK Libdems, is at roughly (0, 0.5).

Other dimensions

The Dutch Political Landscape

The CDA and the VVD are very close on the above diagram, both similar to the (pre-Brexit) Conservatives, but the CDA are Christian, and the VVD secular.

So you could add another dimension of religion.

Other dimensions

The Dutch Political Landscape

Two parties considered themselves close enough to coalesce, at least for the election, The Dutch Labour Party, and the Green-Left party, where the main difference was on the environment.

So you could add environment as a dimension. Or Europeanism vs Nationalism.

Similarly there's a party for older people, and one for animal rights, and so on.

30 Dimensions

The website that produced the above image helps voters discover who they should vote for.

They ask 30 questions, and on that basis say which parties you are closest to.

This means that they use 30 dimensions to represent the parties, so really the 'semantics' of a party is a list of 30 numbers.

Your position is also a list of 30 numbers, and then a good match is the party that is the 'nearest' to you in those 30 dimensions.

You could subtract the lists of numbers for two parties, and get a list of numbers that would expose the differences in approach between them, or between a party and you.

Visualising

We are very bad at visualising anything above 3 dimensions, so they reduce the picture to the two above.

Computers don't have that problem, so they can find clusters, and tell you the semantic 'distance' you are from various parties.

This is the basis of the method that GPT programs represent the meaning of words: each word has a list of numbers, each number representing that word's position on a particular meaning axis.

Words that are synonyms, or near synonyms are then close to each other in the semantic space.

Learned Axes

There are two notable things:

  1. because they are discovered by machine learning, we don't know what the axes are, nor can it explain to us what they are;

Learned Axes

There are two notable things:

  1. because they are discovered by machine learning, we don't know what the axes are, nor can it explain to us what they are;
  2. there are more than 12,000 axes! (So each word is represented by a list of more than 12,000 numbers.)

The axes likely include male-female, big-small, young-old, singular-plural, and so on, but because machine learning is so good at spotting patterns that we can't even see, there are probably axes that we don't even have a name for.

Arithmetic on meaning

There are interesting properties of those lists of numbers: you can do a sort of arithmetic on them.

For instance, you can subtract Woman from Man:

D = Man - Woman

the resulting list of numbers then represents the semantic 'distance' between the words Man and Woman. The extraordinary thing is that you can do things with this difference. For instance

Father + D

gives you a position very close to Mother.

Similarly

Uncle + D

gives you a position very close to Aunt.

Arithmetic on meaning

Another example is

F = Italy - Pizza

If you add F to Germany

Germany + F

you get a position very close to Bratwurst.

GPT

So when GPTs produce the next word, they don't just do it on the basis of syntax (as we have been doing up to now), they also use meaning to help choose the next word.

Video: youtube.com/watch?v=wjZofJX0v4M