Finnegans Wake by James Joyce is probably the hardest book to read in the English language. A principal hurdle is the length and convolutedness of the sentences. This paper reports an attempt to handle this complexity by parsing the sentences (at a structural, not a semantic level) to reveal their top-level structure. It takes the reader step-by-step through the construction of an ixml grammar for dealing with one chapter of the book.
Keywords: ixml, parsing, context-free grammars, XML, Finnegans Wake, James Joyce
Finnegans Wake is the last book by James Joyce, and probably the hardest book to read in English.
If you can even say it is in English.
There are three aspects to its difficulty.
It is by James Joyce. His aim is not to make books easy to read.
Joyce uses a lot of his own made-up portmanteau words and puns.
When he refers to Finnegans Wake as "The Book of Doublends Jined" he's punning on "Dublin's Giant" and "Double Ends Joined", since Finn is a giant said to be buried under Dublin, and the two ends of the book join up to make a complete sentence, and thus a circular book.
Or with the description of the wake for the dead giant: "And the all gianed in with the shoutmost shoviality." The underlying meaning is "They all joined in with the utmost joviality", with the added elements of giant, shouting, and shoving, and probably the slurred speech of people who have had a little too much to drink. It is adroit use of language to express so much with so few words.
When thinking of such words, you might think back to Alice in Wonderland
“Well, ‘SLITHY’ means ‘lithe and slimy.’ ‘Lithe’ is the same as ‘active.’ You see it’s like a portmanteau—there are two meanings packed up into one word.”
But in fact we use them all the time:
And thirdly, and harder yet, is the length and structure of the sentences.
In the book, on page 122, when Finnegans Wake is being self-referentially described, there is a reference to the TUNC page of the Book of Kells, an Irish mediaeval illustrated manuscript. This page, that begins with the word TUNC, contains but a single sentence (in Latin) that reads "Then they crucified Christ with two thieves".
But the sentence is formatted in the form of a large X, with numerous adornments around the page.
Joyce's sentences are a bit like that: there is a simple essence, but he has added enormous amounts of symbolic adornments, making them hard to decypher at a first reading. And they often take up a whole page.
Our cad’s bit of strife (knee Bareniece Maxwelton) with a quick ear for spittoons (as the aftertale hath it) glaned up as usual with dumbestic husbandry (no persicks and armelians for thee, Pomeranzia!) but, slipping the clav in her claw, broke of the matter among a hundred and eleven others in her usual curtsey (how faint these first vhespers womanly are, a secret pispigliando, amad the lavurdy den of their manfolker!) the next night nudge one as was Hegesippus over a hup a ’ chee, her eys dry and small and speech thicklish because he appeared a funny colour like he couldn’t stood they old hens no longer, to her particular reverend, the director, whom she had been meaning in her mind primarily to speak with (hosch, intra! jist a timblespoon!) trusting, between cuppled lips and annie lawrie promises (mighshe never have Esnekerry pudden come Hunanov for her pecklapitschens!) that the gossiple so delivered in his epistolear, buried teatoastally in their Irish stew would go no further than his jesuit’s cloth, yet (in vinars venitas! volatiles valetotum!) it was this overspoiled priest Mr Browne, disguised as a vincentian, who, when seized of the facts, was overheard, in his secondary personality as a Nolan and underreared, poul soul, by accident — if, that is, the incident it was an accident for here the ruah of Ecclectiastes of Hippo outpuffs the writress of Havvah-ban-Annah — to pianissime a slightly varied version of Crookedribs confidentials, (what Mère Aloyse said but for Jesuphine’s sake!) hands between hahands, in fealty sworn (my bravor best! my fraur!) and, to the strains of The Secret of Her Birth, hushly pierce the rubiend aurellum of one Philly Thurnston, a layteacher of rural science and orthophonethics of a nearstout figure and about the middle of his forties during a priestly flutter for safe and sane bets at the hippic runfields of breezy Baldoyle on a date (W. W. goes through the cald) easily capable of rememberance by all pickers-up of events national and Dublin details, the doubles of Perkin and Paullock, peer and prole, when the classic Encourage Hackney Plate was captured by two noses in a stablecloth finish, ek and nek, some and none, evelo nevelo, from the cream colt Bold Boy Cromwell after a clever getaway by Captain Chaplain Blount’s roe hinny Saint Dalough, Drummer Coxon, nondepict third, at breakneck odds, thanks to you great little, bonny little, portey little, Winny Widger! you’re all their nappies! who in his never-rip mud and purpular cap was surely leagues unlike any other phantomweight that ever toppitt our timber maggies.
Our cad’s bit of strife (knee Bareniece Maxwelton) with a quick ear for spittoons (as the aftertale hath it) glaned up as usual with dumbestic husbandry (no persicks and armelians for thee, Pomeranzia!) but, slipping the clav in her claw, broke of the matter among a hundred and eleven others in her usual curtsey (how faint these first vhespers womanly are, a secret pispigliando, amad the lavurdy den of their manfolker!) the next night nudge one as was Hegesippus over a hup a ’ chee, her eys dry and small and speech thicklish because he appeared a funny colour like he couldn’t stood they old hens no longer, to her particular reverend, the director, whom she had been meaning in her mind primarily to speak with (hosch, intra! jist a timblespoon!) trusting, between cuppled lips and annie lawrie promises (mighshe never have Esnekerry pudden come Hunanov for her pecklapitschens!) that the gossiple so delivered in his epistolear, buried teatoastally in their Irish stew would go no further than his jesuit’s cloth, yet (in vinars venitas! volatiles valetotum!) it was this overspoiled priest Mr Browne, disguised as a vincentian, who, when seized of the facts, was overheard, in his secondary personality as a Nolan and underreared, poul soul, by accident — if, that is, the incident it was an accident for here the ruah of Ecclectiastes of Hippo outpuffs the writress of Havvah-ban-Annah — to pianissime a slightly varied version of Crookedribs confidentials, (what Mère Aloyse said but for Jesuphine’s sake!) hands between hahands, in fealty sworn (my bravor best! my fraur!) and, to the strains of The Secret of Her Birth, hushly pierce the rubiend aurellum of one Philly Thurnston, a layteacher of rural science and orthophonethics of a nearstout figure and about the middle of his forties during a priestly flutter for safe and sane bets at the hippic runfields of breezy Baldoyle on a date (W. W. goes through the cald) easily capable of rememberance by all pickers-up of events national and Dublin details, the doubles of Perkin and Paullock, peer and prole, when the classic Encourage Hackney Plate was captured by two noses in a stablecloth finish, ek and nek, some and none, evelo nevelo, from the cream colt Bold Boy Cromwell after a clever getaway by Captain Chaplain Blount’s roe hinny Saint Dalough, Drummer Coxon, nondepict third, at breakneck odds, thanks to you great little, bonny little, portey little, Winny Widger! you’re all their nappies! who in his never-rip mud and purpular cap was surely leagues unlike any other phantomweight that ever toppitt our timber maggies.
Our cad’s bit of strife broke of the matter the next night nudge one to her particular reverend, trusting that the gossiple so delivered would go no further than his jesuit’s cloth, yet it was this overspoiled priest, Mr Browne, who was overheard to pianissime a slightly varied version and hushly pierce the rubiend aurellum of a layteacher of rural science during a priestly flutter at the hippic runfields.
I am currently writing a book about Finnegans Wake, and so getting to the underlying meaning of sentences is of absolute importance.
In order to understand the long sentences, it is imperative to determine what the essence is, as shown in the above example.
So as a first step in breaking down the complexity of Joyce's work, I decided I would try writing some ixml [ixml] to break down the sentences into their structural form. I quickly discovered that each chapter has its own peculiarities, so I am writing a different grammar per chapter, in order to keep it simpler.
Invisible XML (ixml) is a notation and process that uses context-free grammars to describe the format of textual documents.
This allows documents to be parsed into an abstract parse-tree, which can be processed in various ways, but principally serialised into an XML document, thus making the implicit structure of the textual document explicit in the XML.
So I started off as simple as can be with the following grammar:
chapter: paragraph+. {a chapter is one or more paragraphs} paragraph: line+, #a. {a paragraph is one or lines, followed by a blank line} line: ~[#a]+, #a. {a line is characters (except end of line), followed by end-of-line}
This failed because (obviously, once I had thought about it) the last paragraph is not followed by a blank line. A blank line separates paragraphs:
chapter: paragraph++#a. paragraph: line+. line: ~[#a]+, #a.
This now produced a first, very basic output (not shown here). But we are not interested in lines, but in the internal structure of paragraphs.
chapter: paragraph++#a. {a chapter is one or more paragraphs} paragraph: sentence+. {a paragraph is one or sentences} sentence: phrase++punctuation, ".". {A sentence is one or more phrases, separated by some punctuation, terminated with a point} phrase: word++" ". {a phrase is one or more words, separated by spaces} word: [L]+. {a word is one or more letters} punctuation: [",;:"]. {phrases are separated by one of ",;:"}
This failed on the very first line of the chapter:
Now concerning the genesis of Harold or Humphrey Chimpden’s ^ **** Character: "’" (#2019).
Ah, a word consists of more than letters. Fix that:
word: [L; "’"]+.
It immediately failed on the same line:
**** Parsing failed at line 1, position 58 Now concerning the genesis of Harold or Humphrey Chimpden’s ^ **** Character: (#A).
Of course, words are not only separated by spaces, but sometimes also by newlines.
Change
phrase: word++" ".
to
phrase: word++s. s: [" "; #a].
At least we now got to line 2:
occupational agnomen, the best authenticated version has it that it ^ **** Character: " " (#20).
Oh yes, punctuation can also be followed by space...
punctuation: [",;:"], s*.
Now we get to line 3:
was this way. We are told how in the beginning it came to pass that ^ **** Character: " " (#20).
Ah, full-stops can be followed by space as well:
sentence: phrase++punctuation, ".", s*.
This gets us to line 17!
seldomer than an earwigger! Comes the question are these the facts of ^ **** Character: "!" (#21).
Of course, sentences can also end with "!". Let's also add "?", just in case:
sentence: phrase++punctuation, [".!?"], s*.
At line 62 we discover another character that can appear in a word:
Ides-of-April morning (the anniversary, as it fell out, of his first ^ **** Character: "-" (#2D).
Fix that:
word: [L; "’-"]+.
And on the same line, we come to our first structuring problem:
Ides-of-April morning (the anniversary, as it fell out, of his first ^ **** Character: "(" (#28).
Nested phrases! So where do we put this in the structure? I'm going to experiment, and put it at the level of a word. First separate the definition of phrases:
sentence: phrases, [".!?"], s*. phrases: phrase++punctuation.
and add bracketed phrases:
word: [L; "’-"]+; bracketed. bracketed: "(", phrases, ")".
This does well, and gets us to line 136, when we get this surprise:
fellow—me—lieder was first poured forth to an overflow meeting of all ^ **** Character: "—" (#2014).
The typesetters had used two different characters for hyphenated words! Fix that:
word: [L; "’-—"]+; bracketed.
And we get to the end of the chapter! Hooray! The output starts:
<chapter ixml:state="ambiguous" xmlns:ixml="http://invisiblexml.org/NS"> <paragraph> <sentence> <phrases> <phrase> <word>Now</word> <s> </s> <word>concerning</word> <s> </s> <word>the</word> <s> </s> <word>genesis</word> <s> </s> <word>of</word> <s> </s> <word>Harold</word> <s> </s> <word>or</word> <s> </s> <word>Humphrey</word> <s> </s> <word>Chimpden’s</word> <s> </s> <word>occupational</word> <s> </s> <word>agnomen</word> </phrase> <punctuation>, <s> </s> </punctuation>
We'll look at the ambiguity in a minute, but first to get rid of elements we don't need, by adding "-" before some rules. This produces a better looking result:
<chapter ixml:state="ambiguous" xmlns:ixml="http://invisiblexml.org/NS"> <paragraph> <sentence> <phrase>Now concerning the genesis of Harold or Humphrey Chimpden’s occupational agnomen</phrase>, <phrase>the best authenticated version has it that it was this way</phrase>. </sentence> <sentence> <phrase>We are told how in the beginning it came to pass that the grand old gardener was saving daylight under his redwoodtree one sultry sabbath afternoon</phrase>, <phrase>when royalty was announced to have been pleased to have halted itself on the highroad</phrase>. </sentence>
What is also obvious is that the newline characters in the input are visible in the output. On the other hand we want to keep the spaces. So we could change
s: [" "; #a].
to
s: " "; -#a.
but then words separated by a newline will run together; so we replace newlines with spaces:
s: " "; -#a, +" ".
An alternative is to delete all whitespace, and replace it with a single space character:
s: -[" "; #a], +" ".
Now we get
<chapter ixml:state="ambiguous" xmlns:ixml="http://invisiblexml.org/NS"> <paragraph> <sentence> <phrase>Now concerning the genesis of Harold or Humphrey Chimpden’s occupational agnomen</phrase>, <phrase>the best authenticated version has it that it was this way</phrase>. </sentence> <sentence> <phrase>We are told how in the beginning it came to pass that the grand old gardener was saving daylight under his redwoodtree one sultry sabbath afternoon</phrase>, <phrase>when royalty was announced to have been pleased to have halted itself on the highroad</phrase>. </sentence>
If you're interested in how the bracketed phrases look, here's an example:
<sentence> <phrase>They tell the story how one happygogusty Ides-of-April morning <bracketed>( <phrase>the anniversary</phrase>, <phrase>as it fell out</phrase>, <phrase>of his first assumption of his mirthday suit</phrase>) </bracketed> ages and ages after the alleged misdemeanour when the tried friend of all creation was billowing across the wide expanse of our greatest park</phrase>, <phrase>he met a cad with a pipe</phrase>. </sentence>
The output claims there are ten different interpretations of the chapter, so let's have a look why.
The input from line.pos 1.1 to 148.1 can be interpreted as 'paragraph++#a' in 10 different ways: 1: paragraph++#a[1.1:]: paragraph[:20.1] #a[:21.1] paragraph++#a[:148.1] 2: paragraph++#a[1.1:]: paragraph[:31.1] #a[:32.1] paragraph++#a[:148.1] 3: paragraph++#a[1.1:]: paragraph[:87.1] #a[:88.1] paragraph++#a[:148.1] 4: paragraph++#a[1.1:]: paragraph[:94.48] #a[:95.1] paragraph++#a[:148.1] 5: paragraph++#a[1.1:]: paragraph[:102.29] #a[:103.1] paragraph++#a[:148.1] 6: paragraph++#a[1.1:]: paragraph[:109.1] #a[:110.1] paragraph++#a[:148.1] 7: paragraph++#a[1.1:]: paragraph[:121.1] #a[:122.1] paragraph++#a[:148.1] 8: paragraph++#a[1.1:]: paragraph[:139.1] #a[:140.1] paragraph++#a[:148.1] 9: paragraph++#a[1.1:]: paragraph[:145.1] #a[:146.1] paragraph++#a[:148.1] 10: paragraph++#a[1.1:]: paragraph[:148.1]
and if we look at the input, we find that each of the lines it mentions (20, 31, 87, etc), are paragraph breaks. The problem is that we have allowed sentences to be separated by more than one space, which also matches the extra newline after a paragraph.
sentence: phrases, [".!?"], s*.
So let's delete that *
.
This exposes a new problem:
Haromphrey bear the sigla H.C.E. and while he was only and long and ^ **** Character: "C" (#43).
Full-stops are not only used to separate sentences! "H.C.E." clearly looks like it could end a sentence, because it ends with a full-stop and a space. We'll have to treat it as a special kind of word:
-word: [L; "’-—"]+; bracketed; initialism. initialism: ([Lu], ".")+.
(Lu
matches any upper-case letter).
This reveals another ambiguity, only this time, it's a real one, and not a mistake in the grammar:
To anyone who knew and loved the christlikeness of the big cleanminded giant H. C. Earwicker throughout his excellency long vicefreegal existence
We can see as humans that this is not ambiguous, but consider this sentence:
There are people who claim sentences never end with a capital H. C. Earwicker however, in his seminal paper "Sentences that end with a capital H", proves otherwise.
What we get is
<sentence> <phrase>To anyone who knew and loved the christlikeness of the big cleanminded giant H</phrase>. </sentence> <sentence> <phrase>C</phrase>. </sentence> <sentence> <phrase>Earwicker throughout his excellency long vicefreegal existence ...
To tell you the truth, at this point I cheated. I deleted the first space in "H. C. Earwicker", and the chapter was parsed to completion without ambiguity.
I presented the development of chapter 2 above, since it is fairly simply structured.
The sentence presented at the beginning of the paper on the other hand is from chapter 3, which is more complicated.
So to end, I will simply show the current ixml for chapter 3 (and chapter 1 as it happens), and the result of the parsing of the example sentence. You will see that I have handled a number of aspects, such as punctuation, differently.
chapter: paragraph++(-#a, -#a), -#a. paragraph: sentence++s; pagenumber. pagenumber: ["0"-"9"]+. -sentence: question; exclamation; statement. question: phrases, "?". exclamation: phrases, "!". statement: phrases, ".". -phrases: punctuated-phrase*, phrase. punctuated-phrase>phrase: -phrase, ["?!"]?, punc. -s: -" "+, -#a?; -#a. phrase: word++(s, +" "). -word: bit++("-", (-#a, +"?")?); bracketed. -bit: ([L; "0"-"9"; #2019]; ".", ~[" "; #a])+. bracketed: s?, -"(", (phrases; -paragraph), -")"; s?, -"—", s?, phrases, -"—". @punc: [",;:"], s.
which applied to the example sentence produces a structure like this:
<statement> <phrase punc=','>Our cad’s bit of strife <bracketed> <phrase>knee Bareniece Maxwelton</phrase> </bracketed> with a quick ear for spittoons <bracketed> <phrase>as the aftertale hath it</phrase> </bracketed> glaned up as usual with dumbestic husbandry <bracketed> <exclamation> <phrase punc=','>no persicks and armelians for thee </phrase> <phrase>Pome-?ranzia</phrase>! </exclamation> </bracketed> but </phrase> <phrase punc=','>slipping the clav in her claw</phrase> <phrase punc=','>broke of the matter among a hundred and eleven others in her usual curtsey <bracketed> <exclamation> <phrase punc=','>how faint these first vhespers womanly are </phrase> <phrase punc=','>a secret pispigliando</phrase> <phrase>amad the lavurdy den of their manfolker </phrase>! </exclamation> </bracketed> the next night nudge one as was Hegesippus over a hup a ’ chee </phrase> <phrase punc=','>her eys dry and small and speech thicklish because he appeared a funny colour like he couldn’t stood they old hens no longer </phrase> <phrase punc=','>to her particular reverend</phrase> <phrase punc=','>the director</phrase> <phrase punc=','>whom she had been meaning in her mind primarily to speak with <bracketed> <exclamation> <phrase punc=','>hosch</phrase> <phrase>intra</phrase>! </exclamation> <exclamation> <phrase>jist a timblespoon</phrase>! </exclamation> </bracketed> trusting </phrase> <phrase punc=','>between cuppled lips and annie lawrie promises <bracketed> <exclamation> <phrase>mighshe never have Esnekerry pudden come Hunanov for her pecklapitschens </phrase>! </exclamation> </bracketed> that the gossiple so delivered in his epistolear </phrase> <phrase>buried teatoastally in their Irish stew would go no further than his jesuit’s cloth </phrase> </statement>
The only thing I will explain here is this:
<phrase>Pome-?ranzia</phrase>!
If a word is hyphenated over the end of a line (as this word was), you can't tell if the hyphen is meant to be part of the word, or is only there to signal a word split over two lines.
So I add a question mark after such a hyphen (since ends of line are deleted), to make it clear that this is a special type of hyphen.
This paper is intended to give insight into the processes that an ixml grammar author can go through while trying to describe a document whose structure is not yet completely obvious.
Writing grammars is a learned skill, that needs experience to gain fluency. In particular, learning to deal with ambiguity is difficult, because what is ambiguous to the computer doesn't always appear ambiguous to the human eye.
However, once learned, the ability to parse large texts can help simplify enormously the automatic processing of large documents.