The Book of Doublends Jined: Parsing Finnegans Wake with ixml

The author

Steven Pemberton, CWI, Amsterdam

Contents

Abstract

Finnegans Wake by James Joyce is probably the hardest book to read in the English language. A principal hurdle is the length and convolutedness of the sentences. This paper reports an attempt to handle this complexity by parsing the sentences (at a structural, not a semantic level) to reveal their top-level structure. It takes the reader step-by-step through the construction of an ixml grammar for dealing with one chapter of the book.

Keywords: ixml, parsing, context-free grammars, XML, Finnegans Wake, James Joyce

Finnegans Wake

Finnegans Wake is the last book by James Joyce, and probably the hardest book to read in English.

If you can even say it is in English.

There are three aspects to its difficulty.

Joyce

It is by James Joyce. His aim is not to make books easy to read.

Vocabulary

Joyce uses a lot of his own made-up portmanteau words and puns.

When he refers to Finnegans Wake as "The Book of Doublends Jined" he's punning on "Dublin's Giant" and "Double Ends Joined", since Finn is a giant said to be buried under Dublin, and the two ends of the book join up to make a complete sentence, and thus a circular book.

Or with the description of the wake for the dead giant: "And the all gianed in with the shoutmost shoviality." The underlying meaning is "They all joined in with the utmost joviality", with the added elements of giant, shouting, and shoving, and probably the slurred speech of people who have had a little too much to drink. It is adroit use of language to express so much with so few words.

Portmanteau words

When thinking of such words, you might think back to Alice in Wonderland

“Well, ‘SLITHY’ means ‘lithe and slimy.’ ‘Lithe’ is the same as ‘active.’ You see it’s like a portmanteau—there are two meanings packed up into one word.”

But in fact we use them all the time:

Structure

And thirdly, and harder yet, is the length and structure of the sentences.

The TUNC page

In the book, on page 122, when Finnegans Wake is being self-referentially described, there is a reference to the TUNC page of the Book of Kells, an Irish mediaeval illustrated manuscript. This page, that begins with the word TUNC, contains but a single sentence (in Latin) that reads "Then they crucified Christ with two thieves".

But the sentence is formatted in the form of a large X, with numerous adornments around the page.

Joyce's sentences are a bit like that: there is a simple essence, but he has added enormous amounts of symbolic adornments, making them hard to decypher at a first reading. And they often take up a whole page.

Example

Our cad’s bit of strife (knee Bareniece Maxwelton) with a quick ear for spittoons (as the aftertale hath it) glaned up as usual with dumbestic husbandry (no persicks and armelians for thee, Pomeranzia!) but, slipping the clav in her claw, broke of the matter among a hundred and eleven others in her usual curtsey (how faint these first vhespers womanly are, a secret pispigliando, amad the lavurdy den of their manfolker!) the next night nudge one as was Hegesippus over a hup a ’ chee, her eys dry and small and speech thicklish because he appeared a funny colour like he couldn’t stood they old hens no longer, to her particular reverend, the director, whom she had been meaning in her mind primarily to speak with (hosch, intra! jist a timblespoon!) trusting, between cuppled lips and annie lawrie promises (mighshe never have Esnekerry pudden come Hunanov for her pecklapitschens!) that the gossiple so delivered in his epistolear, buried teatoastally in their Irish stew would go no further than his jesuit’s cloth, yet (in vinars venitas! volatiles valetotum!) it was this overspoiled priest Mr Browne, disguised as a vincentian, who, when seized of the facts, was overheard, in his secondary personality as a Nolan and underreared, poul soul, by accident — if, that is, the incident it was an accident for here the ruah of Ecclectiastes of Hippo outpuffs the writress of Havvah-ban-Annah — to pianissime a slightly varied version of Crookedribs confidentials, (what Mère Aloyse said but for Jesuphine’s sake!) hands between hahands, in fealty sworn (my bravor best! my fraur!) and, to the strains of The Secret of Her Birth, hushly pierce the rubiend aurellum of one Philly Thurnston, a layteacher of rural science and orthophonethics of a nearstout figure and about the middle of his forties during a priestly flutter for safe and sane bets at the hippic runfields of breezy Baldoyle on a date (W. W. goes through the cald) easily capable of rememberance by all pickers-up of events national and Dublin details, the doubles of Perkin and Paullock, peer and prole, when the classic Encourage Hackney Plate was captured by two noses in a stablecloth finish, ek and nek, some and none, evelo nevelo, from the cream colt Bold Boy Cromwell after a clever getaway by Captain Chaplain Blount’s roe hinny Saint Dalough, Drummer Coxon, nondepict third, at breakneck odds, thanks to you great little, bonny little, portey little, Winny Widger! you’re all their nappies! who in his never-rip mud and purpular cap was surely leagues unlike any other phantomweight that ever toppitt our timber maggies.

Essence

Our cad’s bit of strife (knee Bareniece Maxwelton) with a quick ear for spittoons (as the aftertale hath it) glaned up as usual with dumbestic husbandry (no persicks and armelians for thee, Pomeranzia!) but, slipping the clav in her claw, broke of the matter among a hundred and eleven others in her usual curtsey (how faint these first vhespers womanly are, a secret pispigliando, amad the lavurdy den of their manfolker!) the next night nudge one as was Hegesippus over a hup a ’ chee, her eys dry and small and speech thicklish because he appeared a funny colour like he couldn’t stood they old hens no longer, to her particular reverend, the director, whom she had been meaning in her mind primarily to speak with (hosch, intra! jist a timblespoon!) trusting, between cuppled lips and annie lawrie promises (mighshe never have Esnekerry pudden come Hunanov for her pecklapitschens!) that the gossiple so delivered in his epistolear, buried teatoastally in their Irish stew would go no further than his jesuit’s cloth, yet (in vinars venitas! volatiles valetotum!) it was this overspoiled priest Mr Browne, disguised as a vincentian, who, when seized of the facts, was overheard, in his secondary personality as a Nolan and underreared, poul soul, by accident — if, that is, the incident it was an accident for here the ruah of Ecclectiastes of Hippo outpuffs the writress of Havvah-ban-Annah — to pianissime a slightly varied version of Crookedribs confidentials, (what Mère Aloyse said but for Jesuphine’s sake!) hands between hahands, in fealty sworn (my bravor best! my fraur!) and, to the strains of The Secret of Her Birth, hushly pierce the rubiend aurellum of one Philly Thurnston, a layteacher of rural science and orthophonethics of a nearstout figure and about the middle of his forties during a priestly flutter for safe and sane bets at the hippic runfields of breezy Baldoyle on a date (W. W. goes through the cald) easily capable of rememberance by all pickers-up of events national and Dublin details, the doubles of Perkin and Paullock, peer and prole, when the classic Encourage Hackney Plate was captured by two noses in a stablecloth finish, ek and nek, some and none, evelo nevelo, from the cream colt Bold Boy Cromwell after a clever getaway by Captain Chaplain Blount’s roe hinny Saint Dalough, Drummer Coxon, nondepict third, at breakneck odds, thanks to you great little, bonny little, portey little, Winny Widger! you’re all their nappies! who in his never-rip mud and purpular cap was surely leagues unlike any other phantomweight that ever toppitt our timber maggies.

Distillation

Our cad’s bit of strife broke of the matter the next night nudge one to her particular reverend, trusting that the gossiple so delivered would go no further than his jesuit’s cloth, yet it was this overspoiled priest, Mr Browne, who was overheard to pianissime a slightly varied version and hushly pierce the rubiend aurellum of a layteacher of rural science during a priestly flutter at the hippic runfields.

Purpose

I am currently writing a book about Finnegans Wake, and so getting to the underlying meaning of sentences is of absolute importance.

In order to understand the long sentences, it is imperative to determine what the essence is, as shown in the above example.

So as a first step in breaking down the complexity of Joyce's work, I decided I would try writing some ixml [ixml] to break down the sentences into their structural form. I quickly discovered that each chapter has its own peculiarities, so I am writing a different grammar per chapter, in order to keep it simpler.

ixml

Invisible XML (ixml) is a notation and process that uses context-free grammars to describe the format of textual documents.

This allows documents to be parsed into an abstract parse-tree, which can be processed in various ways, but principally serialised into an XML document, thus making the implicit structure of the textual document explicit in the XML.

First Pass

So I started off as simple as can be with the following grammar:

  chapter: paragraph+. {a chapter is one or more paragraphs}
paragraph: line+, #a.  {a paragraph is one or lines, followed by a blank line}
     line: ~[#a]+, #a. {a line is characters (except end of line),
                        followed by end-of-line}

This failed because (obviously, once I had thought about it) the last paragraph is not followed by a blank line. A blank line separates paragraphs:

  chapter: paragraph++#a.
paragraph: line+.
     line: ~[#a]+, #a.

This now produced a first, very basic output (not shown here). But we are not interested in lines, but in the internal structure of paragraphs.

Structure

    chapter: paragraph++#a.            {a chapter is one or more paragraphs}
  paragraph: sentence+.                {a paragraph is one or sentences}
   sentence: phrase++punctuation, ".". {A sentence is one or more phrases,
                                        separated by some punctuation, 
                                        terminated with a point}
     phrase: word++" ".                {a phrase is one or more words,
                                        separated by spaces}
       word: [L]+.                     {a word is one or more letters}
punctuation: [",;:"].                  {phrases are separated by one of ",;:"}

This failed on the very first line of the chapter:

Now concerning the genesis of Harold or Humphrey Chimpden’s
                                                         ^
**** Character: "’" (#2019).

Ah, a word consists of more than letters. Fix that:

word: [L; "’"]+.

Next

It immediately failed on the same line:

**** Parsing failed at line 1, position 58
Now concerning the genesis of Harold or Humphrey Chimpden’s
                                                           ^
**** Character: (#A).

Of course, words are not only separated by spaces, but sometimes also by newlines.

Change

phrase: word++" ".

to

phrase: word++s.
s: [" "; #a].

Next

At least we now got to line 2:

occupational agnomen, the best authenticated version has it that it
                     ^
**** Character: " " (#20).

Oh yes, punctuation can also be followed by space...

punctuation: [",;:"], s*.

Next

Now we get to line 3:

was this way. We are told how in the beginning it came to pass that
             ^
**** Character: " " (#20).

Ah, full-stops can be followed by space as well:

sentence: phrase++punctuation, ".", s*.

Next

This gets us to line 17!

seldomer than an earwigger! Comes the question are these the facts of
                          ^
**** Character: "!" (#21).

Of course, sentences can also end with "!". Let's also add "?", just in case:

sentence: phrase++punctuation, [".!?"], s*.

Next

At line 62 we discover another character that can appear in a word:

Ides-of-April morning (the anniversary, as it fell out, of his first
    ^
**** Character: "-" (#2D).

Fix that:

word: [L; "’-"]+.

Next

And on the same line, we come to our first structuring problem:

Ides-of-April morning (the anniversary, as it fell out, of his first
                      ^
**** Character: "(" (#28).

Nested phrases! So where do we put this in the structure? I'm going to experiment, and put it at the level of a word. First separate the definition of phrases:

sentence: phrases, [".!?"], s*.
phrases: phrase++punctuation.

and add bracketed phrases:

word: [L; "’-"]+; bracketed.
bracketed: "(", phrases, ")".

Next

This does well, and gets us to line 136, when we get this surprise:

fellow—me—lieder was first poured forth to an overflow meeting of all
      ^
**** Character: "—" (#2014).

The typesetters had used two different characters for hyphenated words! Fix that:

word: [L; "’-"]+; bracketed.

Success! (ish)

And we get to the end of the chapter! Hooray! The output starts:

<chapter ixml:state="ambiguous" xmlns:ixml="http://invisiblexml.org/NS">
   <paragraph>
      <sentence>
         <phrases>
            <phrase>
               <word>Now</word>
               <s> </s>
               <word>concerning</word>
               <s> </s>
               <word>the</word>
               <s> </s>
               <word>genesis</word>
               <s> </s>
               <word>of</word>
               <s> </s>
               <word>Harold</word>
               <s> </s>
               <word>or</word>
               <s> </s>
               <word>Humphrey</word>
               <s> </s>
               <word>Chimpden’s</word>
               <s>
</s>
               <word>occupational</word>
               <s> </s>
               <word>agnomen</word>
            </phrase>
            <punctuation>,
               <s> </s>
            </punctuation>

Cleaning up the Output

We'll look at the ambiguity in a minute, but first to get rid of elements we don't need, by adding "-" before some rules. This produces a better looking result:

<chapter ixml:state="ambiguous" xmlns:ixml="http://invisiblexml.org/NS">
   <paragraph>
      <sentence>
         <phrase>Now concerning the genesis of Harold or Humphrey Chimpden’s
occupational agnomen</phrase>, 
         <phrase>the best authenticated version has it that it
was this way</phrase>. </sentence>
      <sentence>
         <phrase>We are told how in the beginning it came to pass that
the grand old gardener was saving daylight under his redwoodtree one
sultry sabbath afternoon</phrase>, 
         <phrase>when royalty was announced to have been
pleased to have halted itself on the highroad</phrase>. </sentence>

Newlines

What is also obvious is that the newline characters in the input are visible in the output. On the other hand we want to keep the spaces. So we could change

s: [" "; #a].

to

s: " "; -#a.

but then words separated by a newline will run together; so we replace newlines with spaces:

s: " "; -#a, +" ".

An alternative is to delete all whitespace, and replace it with a single space character:

s: -[" "; #a], +" ".

Next

Now we get

<chapter ixml:state="ambiguous" xmlns:ixml="http://invisiblexml.org/NS">
   <paragraph>
      <sentence>
         <phrase>Now concerning the genesis of Harold or Humphrey Chimpden’s 
                 occupational agnomen</phrase>, 
         <phrase>the best authenticated version has it that it was 
                 this way</phrase>. </sentence>
      <sentence>
         <phrase>We are told how in the beginning it came to pass that 
                 the grand old gardener was saving daylight under his 
                 redwoodtree one sultry sabbath afternoon</phrase>, 
         <phrase>when royalty was announced to have been pleased to have
                 halted itself on the highroad</phrase>. </sentence>

Bracketed phrases

If you're interested in how the bracketed phrases look, here's an example:

<sentence>
   <phrase>They tell the story how one happygogusty Ides-of-April morning 
      <bracketed>(
         <phrase>the anniversary</phrase>, 
         <phrase>as it fell out</phrase>, 
         <phrase>of his first assumption of his mirthday suit</phrase>)
      </bracketed> ages and ages after the alleged misdemeanour
      when the tried friend of all creation was billowing across 
      the wide expanse of our greatest park</phrase>, 
   <phrase>he met a cad with a pipe</phrase>. </sentence>

Ambiguity

The output claims there are ten different interpretations of the chapter, so let's have a look why.

The input from line.pos 1.1 to 148.1 can be interpreted as 'paragraph++#a'
in 10 different ways:
     1: paragraph++#a[1.1:]:  paragraph[:20.1] #a[:21.1] paragraph++#a[:148.1] 
     2: paragraph++#a[1.1:]:  paragraph[:31.1] #a[:32.1] paragraph++#a[:148.1] 
     3: paragraph++#a[1.1:]:  paragraph[:87.1] #a[:88.1] paragraph++#a[:148.1] 
     4: paragraph++#a[1.1:]:  paragraph[:94.48] #a[:95.1] paragraph++#a[:148.1] 
     5: paragraph++#a[1.1:]:  paragraph[:102.29] #a[:103.1] paragraph++#a[:148.1] 
     6: paragraph++#a[1.1:]:  paragraph[:109.1] #a[:110.1] paragraph++#a[:148.1] 
     7: paragraph++#a[1.1:]:  paragraph[:121.1] #a[:122.1] paragraph++#a[:148.1] 
     8: paragraph++#a[1.1:]:  paragraph[:139.1] #a[:140.1] paragraph++#a[:148.1] 
     9: paragraph++#a[1.1:]:  paragraph[:145.1] #a[:146.1] paragraph++#a[:148.1] 
     10: paragraph++#a[1.1:]:  paragraph[:148.1]

and if we look at the input, we find that each of the lines it mentions (20, 31, 87, etc), are paragraph breaks. The problem is that we have allowed sentences to be separated by more than one space, which also matches the extra newline after a paragraph.

sentence: phrases, [".!?"], s*.

So let's delete that *.

Initialisms

This exposes a new problem:

Haromphrey bear the sigla H.C.E. and while he was only and long and
                            ^
**** Character: "C" (#43).

Full-stops are not only used to separate sentences! "H.C.E." clearly looks like it could end a sentence, because it ends with a full-stop and a space. We'll have to treat it as a special kind of word:

-word: [L; "’-—"]+; bracketed; initialism.
initialism: ([Lu], ".")+.

(Lu matches any upper-case letter).

Another ambiguity

This reveals another ambiguity, only this time, it's a real one, and not a mistake in the grammar:

To anyone who knew and loved the christlikeness of the big cleanminded
giant H. C. Earwicker throughout his excellency long vicefreegal existence

We can see as humans that this is not ambiguous, but consider this sentence:

There are people who claim sentences
never end with a capital H. C. Earwicker 
however, in his seminal paper "Sentences 
that end with a capital H", proves otherwise.

What we get

What we get is

<sentence>
   <phrase>To anyone who knew and loved the christlikeness of
           the big cleanminded giant H</phrase>. </sentence>
<sentence>
   <phrase>C</phrase>. </sentence>
<sentence>
   <phrase>Earwicker throughout his excellency long vicefreegal existence ...

To tell you the truth, at this point I cheated. I deleted the first space in "H. C. Earwicker", and the chapter was parsed to completion without ambiguity.

Another Chapter

I presented the development of chapter 2 above, since it is fairly simply structured.

The sentence presented at the beginning of the paper on the other hand is from chapter 3, which is more complicated.

So to end, I will simply show the current ixml for chapter 3 (and chapter 1 as it happens), and the result of the parsing of the example sentence. You will see that I have handled a number of aspects, such as punctuation, differently.

                 chapter: paragraph++(-#a, -#a), -#a.
               paragraph: sentence++s; pagenumber.
              pagenumber: ["0"-"9"]+.
               -sentence: question; exclamation; statement.
                question: phrases, "?".
             exclamation: phrases, "!".
               statement: phrases, ".".
                -phrases: punctuated-phrase*, phrase.
punctuated-phrase>phrase: -phrase, ["?!"]?, punc.
                      -s: -" "+, -#a?; -#a.
                  phrase: word++(s, +" ").
                   -word: bit++("-", (-#a, +"?")?); bracketed.
                    -bit: ([L; "0"-"9"; #2019]; ".", ~[" "; #a])+.
               bracketed: s?, -"(", (phrases; -paragraph), -")";
                          s?, -"—", s?, phrases, -"—".
                   @punc: [",;:"], s.

Result

which applied to the example sentence produces a structure like this:

<statement>
   <phrase punc=','>Our cad’s bit of strife 
      <bracketed>
         <phrase>knee Bareniece Maxwelton</phrase>
      </bracketed> with a quick ear for spittoons 
      <bracketed>
         <phrase>as the aftertale hath it</phrase>
      </bracketed>
      glaned up as usual with dumbestic husbandry 
      <bracketed>
         <exclamation>
            <phrase punc=','>no persicks and armelians
               for thee
            </phrase>
            <phrase>Pome-?ranzia</phrase>!
         </exclamation>
      </bracketed> but
   </phrase>
   <phrase punc=','>slipping the clav in her claw</phrase>
   <phrase punc=','>broke of the matter among a hundred
      and eleven others in her usual curtsey 
      <bracketed>
         <exclamation>
            <phrase punc=','>how faint these first
            vhespers womanly are
            </phrase>
            <phrase punc=','>a secret pispigliando</phrase>
            <phrase>amad the lavurdy den of their
            manfolker
            </phrase>!
         </exclamation>
      </bracketed>
      the next night nudge one as was
      Hegesippus over a hup a ’ chee
   </phrase>
   <phrase punc=','>her eys dry and small and speech
      thicklish because he appeared a funny colour like
      he couldn’t stood they old hens no longer
   </phrase>
   <phrase punc=','>to her particular reverend</phrase>
   <phrase punc=','>the director</phrase>
   <phrase punc=','>whom she had been meaning in her mind
      primarily to speak with 
      <bracketed>
         <exclamation>
            <phrase punc=','>hosch</phrase>
            <phrase>intra</phrase>!
         </exclamation>
         <exclamation>
            <phrase>jist a timblespoon</phrase>!
         </exclamation>
      </bracketed>
      trusting
   </phrase>
   <phrase punc=','>between cuppled lips and annie lawrie
      promises 
      <bracketed>
         <exclamation>
            <phrase>mighshe never have Esnekerry pudden come
               Hunanov for her pecklapitschens
            </phrase>!
         </exclamation>
      </bracketed>
      that the gossiple so delivered in his epistolear
   </phrase>
   <phrase>buried teatoastally in their Irish stew would go no
      further than his jesuit’s cloth
   </phrase>
</statement>

Line breaks

The only thing I will explain here is this:

<phrase>Pome-?ranzia</phrase>!

If a word is hyphenated over the end of a line (as this word was), you can't tell if the hyphen is meant to be part of the word, or is only there to signal a word split over two lines.

So I add a question mark after such a hyphen (since ends of line are deleted), to make it clear that this is a special type of hyphen.

Conclusion

This paper is intended to give insight into the processes that an ixml grammar author can go through while trying to describe a document whose structure is not yet completely obvious.

Writing grammars is a learned skill, that needs experience to gain fluency. In particular, learning to deal with ambiguity is difficult, because what is ambiguous to the computer doesn't always appear ambiguous to the human eye.

However, once learned, the ability to parse large texts can help simplify enormously the automatic processing of large documents.