Banking with ixml and XForms

Steven Pemberton, CWI, Amsterdam

Cite as: Steven Pemberton, Banking with ixml and XForms, Proc. Declarative Amsterdam 2024, Amsterdam, The Netherlands.

Contents

A History of ixml

The year 2024 has turned out to be a year of ixml. Looking at the graph of conference papers referencing the language, we see an explosion this year:

The number of talks about ixml each year
The number of ixml talks over the years

That first mention in 2004 wasn't actually a talk about ixml, but a keynote on the design of notations [notations] where I said, in passing: "Parsing is quite easy. It would be fairly easy to add a generalised part to the XML pipeline that parsed unmarked-up text, and produced XML as a parse tree: it's just a different sort of transform. We could have our cake and eat it!"

The first real paper was in 2013 [ixml1], followed by iterations based on user experience in 2016 [ixml2], and 2017 [ixml3]. In 2020 a working group was formed [wg], which led in 2022 to the formal specification being published [spec]. At the time of writing there are now 5 or 6 serious implementations, with others in development [impl].

The purpose of ixml

Most textual data has an implicit structure. For instance dates, like 8/11/2024, URLs such as http://cwi.nl/~steven/Talks/2024/11-08-banking/ and bibliographic references, such as Steven Pemberton, Banking with ixml and XForms, Proc. Declarative Amsterdam 2024, all have a structure obvious to the human reader, but opaque to programs processing the data.

Using ixml turns data with implicit structure such as these into data with explicit structure, such as

<date>
   <day>8</day>
   <month>11</month>
   <year>2024</year>
</date>

To achieve this, you describe the format, such as this simple example for dates:

date: day, -"/", month, -"/", year.
day: digit, digit?.
month: digit, digit?.
year: digit, digit, digit, digit.
-digit: ["0"-"9"].

which is then then fed, together with the data it is describing, through the ixml processor to give you a structured version of the data. Diagramatically, it looks like this.

ixml process
The ixml process

Since both description and document are in textual format, they both get processed separately by the ixml processor to produce the equivalent structured versions.

Aims

One of the principle aims of invisible markup in general is to draw attention to the abstract document that underlies any data representation. Once you have that abstract document after parsing the textual representation, there are various purposes you can put it to, that may not even involve transcription to XML only to be reparsed by an XML processor. For instance in the diagram above, the description document may be converted directly to data structures suitable for the parser, rather than converted to XML.

However, ixml does specifically open up the XML pipeline to more than just XML, and one of the perceived targets was XForms.

XForms is a declarative, XML-based, programming language [XForms]. It is a declarative rather than a procedural language which has proven to make life much easier for the programmer. It is like spreadsheets but generalised. Experience over the years has demonstrated huge productivity gains for large projects, typically 10 fold, but sometimes better, such as 10 people in 1 year instead of 30 people in 5 years, with the best case to date of 1 person in 3 years against 70 people in 10 years [CityHCR]. However XForms expects its data in XML, and many potential sources of data aren't in XML: ixml is a way of making those data sources accessible to XForms.

However, there is another side to XML and XForms: XForms programs are written directly in XML. But some people prefer to write programs and other documents in textual form rather than marked-up form. This is why we have such formats as Markdown, to make document entry easier for the user. It can be faster, easier, and more human-oriented to write programs in text, and let the computer convert it to XML. It is future work to define a textual version for XForms programs.

How ixml is being used

It is educational to see how ixml is being put to use. Just drawing from reports in emails or conference papers, we see:

Banking

My bank has stopped sending paper statements, and offers online statements instead, either in PDF for printing yourself, or CSV for digital storage. While the bank does offer some online search facilities, it is slow, and inconvenient. I needed something similar to how Quicken used to work, and so I wrote an application in XForms.

To write the app in XForms I first had to get the data into a usable form.

Since the bank only supplied PDF or CSV, CSV had to be the source of the data:

"Date","Name / Description","Account","Counterparty","Code","Debit/credit","Amount (EUR)","Transaction type","Notifications"
"20161230","The Movies Art House AMSTERDAM","NL80INGB1234567890","","BA","Debit","11,00","Payment terminal","Card sequence no.: 009 29/12/2016"
"20161229","DIJKMAN B.V. MUZIEK AMSTERDAM","NL80INGB1234567890","","BA","Debit","99,00","Payment terminal","Card sequence no.: 009 28/12/2016"
"20161227","CCV*FOODHALLEN AMSTERD AMSTERDAM","NL80INGB1234567890","","BA","Debit","6,25","Payment terminal","Card sequence no.: 009 24/12/2016"

In the first instance I used the streaming editor sed, which resulted in largely opaque code:

head -n 2 $1 | tail -1 | sed 's/"\(....\)[^,]*,[^,]*,"NL..INGB\([^"]*\)".*/<bank year="\1" acct="\2">/'

tail --lines=+2  $1 | sed '
     s/^/<entry>/
     s/&/&amp;/g
     s/"\(....\)\(..\)\(..\)",/<date>\1-\2-\3<\/date>/
     s/"\([^"]*\)",/<type>other<\/type><name>\1<\/name>/
     s/"\([^"]*\)",/<from>\1<\/from>/
     s/"\([^"]*\)",/<to>\1<\/to>/
     s/"\([^"]*\)",/<code>\1<\/code>/
     s/"Debit","\([^,]*\),\([^"]*\)",/<amount>-\1.\2<\/amount>/
     s/"Credit","\([^,]*\),\([^"]*\)",/<amount>\1.\2<\/amount>/
     s/"\([^"]*\)",/<sort>\1<\/sort>/
     s/"\([^"]*\)"/<description>\1<\/description>/
     s/$/<\/entry>/
    '
echo '</bank>'

Once ixml was available, the code was at least more descriptive and tractable:

       bank: labels, entry*.
    -labels: -'"Date","Name / Description","Account","Counterparty",',
             -'"Code","Debit/credit","Amount (EUR)","Transaction type",',
             -'"Notifications"', nl.
      entry: date, type, name, from, to, code, amount, sort, description, nl.
       type: +"other".
       date: -'"', y, +"-", m, +"-", d, -'",'.
         -y: digit, digit, digit, digit.
         -m: digit, digit.
         -d: digit, digit.
       name: field, -",".
       from: field, -",".
         to: field, -",".
       code: field, -",".
       sort: field, -",".
description: field.
     -field: -'"', c*, -'"'.
         -c: ~['"'; #a; #d].
     amount: neg; pos.
       -neg: -'"Debit",', +"-", number, -",".
       -pos: -'"Credit",',      number, -",".
    -number: -'"', euros, -",", +".", cents, -'"'.
     -euros: digit+.
     -cents: digit, digit.
     -digit: ["0"-"9"].
        -nl: (-#a; -#d)+.

Let's consider this code in detail. A bank statement consists of a line of the labels that are completely ignored, followed by any number of entries:

   bank: labels, entry*.
-labels: -'"Date","Name / Description","Account","Counterparty",',
         -'"Code","Debit/credit","Amount (EUR)","Transaction type",',
         -'"Notifications"', nl.

By including these labels literally (rather than just skipping the line), if the bank ever changes the format, an immediate error will be given. nl is just a newline.

Each entry, which takes up a single line, has a number of fields:

entry: date, type, name, from, to, code, amount, sort, description, nl.

Of these, one extra field has been added, type, that the application will use. In the input it is empty, and will appear in the output as a preset field:

type: +"other".

which will always appear as

<type>other</type>

which can later be changed in the application.

A date is a field with a string of numbers in the input, like "20161230" so hyphens are added at the right places:

date: -'"', y, +"-", m, +"-", d, -'",'.
  -y: digit, digit, digit, digit.
  -m: digit, digit.
  -d: digit, digit.

which would give something like

<date>2016-12-30</date>

This matches the data type for a date in XML, and so does not need to be structured any more than that.

A number of the fields just contain unstructured character data:

       name: field, -",".
       from: field, -",".
         to: field, -",".
       code: field, -",".
       sort: field, -",".
description: field.
     -field: -'"', c*, -'"'.
         -c: ~['"'; #a; #d].

A field is just zero or more characters surrounded by quotes, where a character is anything except a quote or an end-of-line character.

<name>The Movies Art House AMSTERDAM</name>

Finally, amounts are supplied in an odd way, using two fields, since banks don't believe in negative numbers, just positive amounts of debit or credit:

"Credit","11,00"
"Debit","6,25"

Note the European style of number representation, which gets treated suitably, by adding a minus sign before the debits, and replacing the commas with points:

 amount: neg; pos.
   -neg: -'"Debit",', +"-", number, -",".
   -pos: -'"Credit",',      number, -",".
-number: -'"', euros, -",", +".", cents, -'"'.
 -euros: digit+.
 -cents: digit, digit.
 -digit: ["0"-"9"].

Which gives results like

<amount>11.00</amount>

or

<amount>-6.25</amount>

Processing the data with this description gives XML like this:

<bank>
   <entry>
      <date>2016-12-30</date>
      <type>other</type>
      <name>The Movies Art House AMSTERDAM</name>
      <from>NL80INGB1234567890</from>
      <to/>
      <code>BA</code>
      <amount>-11.00</amount>
      <sort>Payment terminal</sort>
      <description>Card sequence no.: 009 29/12/2016</description>
   </entry>
   <entry>
      <date>2016-12-29</date>
      <type>other</type>
      <name>DIJKMAN B.V. MUZIEK AMSTERDAM</name>
      <from>NL80INGB1234567890</from>
      <to/>
      <code>BA</code>
      <amount>-99.00</amount>
      <sort>Payment terminal</sort>
      <description>Card sequence no.: 009 28/12/2016</description>
   </entry>
   <entry>
      <date>2016-12-27</date>
      <type>other</type>
      <name>CCV*FOODHALLEN AMSTERD AMSTERDAM</name>
      <from>NL80INGB1234567890</from>
      <to/>
      <code>BA</code>
      <amount>-6.25</amount>
      <sort>Payment terminal</sort>
      <description>Card sequence no.: 009 24/12/2016</description>
   </entry>
</bank>

This can be then used for the XForms banking application.

The XForms Banking app
The XForms Banking App

Credit card account

While my bank will gladly give me downloadable statements for my current account, for some reason it doesn't for my credit card, even though it doesn't send paper statements for that either.

All it does is display my transactions on the screen.

The Credit Card Display
The bank's display for a credit card account

To deal with this, what I do is select all this text from the screen, and copy it as text to a file. Then I can use ixml to transform it to structured data (note that the blank lines are not separators between entries).

Transactions current period
2024
Today
EUR

Kobo Software Ireland D02T380 IE
−5.49
22 October
EUR

THEATRE ROYAL HAYMARKE LONDON GBR
−214.62
10 October
EUR

SP THEPHONESHOPBE JETTE BEL
−1,199.00
Period of 5 Sept 2024 to 4 Oct 2024
Opening balance for this period:
0.00
Transaction total:
−289.33
Monthly repayment:
289.33
Closing balance for this period:
0.00
2024
4 October
EUR

AFLOSSING
289.33
3 October
EUR

OTT* NT AT HOME LONDON GBR
−11.39
2 October
EUR

Google Payment IE LTD Dublin IRL
−14.99
24 September
EUR

TEAPIGS BRENTFORD GBR
−140.56
23 September
EUR

The Pilgrim Hotel London GBR
−4.38
22 September
EUR

The Pilgrim Hotel London GBR
−9.18
18 September
EUR

TRIPIT REDMOND USA
−44.99

The ixml: top level

The ixml for this has at the top level the current period, followed by a number of earlier periods, all of which contain a number of days.

     cc: current, period*.
current: -"Transactions current period", -#a,
         day*.
 period: -"Period of ", from, -" to ", to, -#a,
         opening, total, repayment, closing,
         day*.
   from: -date.
     to: -date.

This is mostly uninteresting stuff, since the opening and closing balances are always zero, the repayment and total are always the same, and the repayment amount occurs in the transactions anyway.

  opening: -"Opening balance for this period:", -#a, -amount, -#a.
  closing: -"Closing balance for this period:", -#a, -amount, -#a.
repayment: -"Monthly repayment:", -#a,               -amount, -#a.
    total: -"Transaction total:", -#a,               -amount, -#a.

The interesting detail is in the days. Each day has the date and a number of transactions (the currency is always EUR, which gets deleted):

        day: date, -#a, (-"EUR", -#a)?, transaction+.
transaction: -#a, payee, -#a, amount, -#a.
      payee: ~[#a]*.

About the only interesting thing about the transactions is that they do acknowledge that negative numbers exist, and (correctly) use the Unicode character at point #2212 as minus sign, which the ixml replaces with hyphen, which is the minus sign in XML.

They use commas to separate thousands, which get deleted, and point for the decimal separator:

amount: (-#2212, +"-")?, (digit; -",")+, ".", digit, digit.

Dates are a bit of a mess: sometimes the year is before, sometimes after, sometimes it's not there at all. Sometimes it's the year and the word "Today":

date: y, -#a, (d, -" ", m; "Today"); 
      d, -" ", m, (-" ", y)?.
   d: digit, digit?.

Month names are either written in full, or as 3 letters, with the exception of Sept...

     m: "January"; "February"; "March"; "April";
        "May"; "June"; "July"; "August"; "September";
        "October"; "November"; "December";
        "Jan"; "Feb"; "Mar"; "Apr"; "May"; "Jun";
        "Jul"; "Aug"; "Sept"; "Oct"; "Nov"; "Dec".
     y: digit, digit, digit, digit.
-digit: ["0"-"9"].

Processing the scraped text with this description gives XML like this:

<cc>
   <current>
      <day>
         <date>
            <y>2024</y>Today</date>
         <transaction>
            <payee>Kobo Software Ireland D02T380 IE</payee>
            <amount>-5.49</amount>
         </transaction>
      </day>
      <day>
         <date>
            <d>22</d>
            <m>October</m>
         </date>
         <transaction>
            <payee>THEATRE ROYAL HAYMARKE LONDON GBR</payee>
            <amount>-214.62</amount>
         </transaction>
      </day>
   </current>
   <period>
      <from>
         <d>5</d>
         <m>Sept</m>
         <y>2024</y>
      </from>
      <to>
         <d>4</d>
         <m>Oct</m>
         <y>2024</y>
      </to>
      <opening>0.00</opening>
      <total>-289.33</total>
      <repayment>289.33</repayment>
      <closing>0.00</closing>
      <day>
         <date>
            <y>2024</y>
            <d>4</d>
            <m>October</m>
         </date>
         <transaction>
            <payee>AFLOSSING</payee>
            <amount>289.33</amount>
         </transaction>
      </day>
      <day>
         <date>
            <d>3</d>
            <m>October</m>
         </date>
         <transaction>
            <payee>OTT* NT AT HOME LONDON GBR</payee>
            <amount>-11.39</amount>
         </transaction>
      </day>
   </period>
</cc>

XForms Code

Once we have the data, the code to display it is simple, and very much reflects the structure of the data as described by the ixml: we repeat over the top level elements (either current or period), provide a heading, then repeat over the days, and in the days, repeat over the transactions:

<repeat ref="*">
   <label class="period">
      <output value="if(from, concat(from/y, '-', from/m, '-', from/d, ' to ', 
                                       to/y, '-',   to/m, '-',   to/d),
                        'Current')"/>
   </label>
   <repeat ref="day">
      <output ref="d"/> <output ref="m"/>
      <repeat ref="transaction">
         <output class="amount" ref="amount"/> <output class="payee" ref="payee"/>
      </repeat>
   </repeat>
</repeat>

Adding a simple search facility:

<input ref="instance('search')/q" incremental="true">
   <label>Search</label>
</input>
<trigger>
   <label>×</label>
   <setvalue ref="instance('search')/q" ev:event="DOMActivate"/>
</trigger>

and modify the top-level repeat:

<repeat ref="*[. = instance('search')/q]">

and we already have a very useful application:

The credit card application
The XForms Credit Card App

Conclusion

Invisible markup is about creating an abstract structured document from textual data where the structure is implicit. Invisible XML makes an XML serialisation of that abstract document, that can then be used as a source of data for the XML pipeline. It should be noted that ixml is being added to XPath and XQuery as a function [xpath], which means that ixml will shortly automatically be available to XForms (and any other XML-based technology that uses XPath or XQuery).

References

[accts] Steven Pemberton, Banking with ixml and XForms, Proc. Declarative Amsterdam 2024, Amsterdam, The Netherlands.

[aero] Nordström, Ari. “Adventures in Mainframes, Text-based Messaging, and iXML.” Presented at Balisage: The Markup Conference 2024, Washington, DC, July 29 - August 2, 2024. In Proceedings of Balisage: The Markup Conference 2024. Balisage Series on Markup Technologies, vol. 29 (2024). https://doi.org/10.4242/BalisageVol29.Nordstrom01.

[art] Holstege, Mary. “Invisible Fish: API Experimentation with InvisibleXML.” Presented at Balisage: The Markup Conference 2024, Washington, DC, July 29 - August 2, 2024. In Proceedings of Balisage: The Markup Conference 2024. Balisage Series on Markup Technologies, vol. 29 (2024). https://doi.org/10.4242/BalisageVol29.Holstege01.

[CityHCR] John Chelsom, Scalability of an Open Source XML Database for Big Data, Proc. XML London 2016, pp 57-63, https://xmllondon.com/2016/xmllondon-2016-proceedings.pdf#page=57

[code] Pieter Lamers, Nico Verwer, Syntax highlighting for code blocks using ixml, https://declarative.amsterdam/resources/da/slides/da.2024.verwer.syntax-highlighting-using-ixml.pdf

[crochet] Tovey-Walsh, Bethan. “When women do algorithms: a semi-generative approach to overlay crochet with iXML and XSLT.” Presented at Balisage: The Markup Conference 2024, Washington, DC, July 29 - August 2, 2024. In Proceedings of Balisage: The Markup Conference 2024. Balisage Series on Markup Technologies, vol. 29 (2024). https://doi.org/10.4242/BalisageVol29.Tovey-Walsh01.

[impl] ixml Implementations, Invisible XML Organisation, invisiblexml.org

[ixml1] Steven Pemberton, “Invisible XML.” In Proceedings of Balisage: The Markup Conference 2013. Balisage Series on Markup Technologies, vol. 10 (2013). doi:10.4242/BalisageVol10.Pemberton01. http://www.balisage.net/Proceedings/vol10/html/Pemberton01/BalisageVol10-Pemberton01.html. Revised version https://cwi.nl/~steven/Talks/2013/08-07-invisible-xml/invisible-xml-4.html

[ixml2] Steven Pemberton, Data Just Wants to Be Format-Neutral, Proc. XML Prague, 2016, Prague, Czech Republic, pp109-120, ISBN 978-80-906259-0-7, http://archive.xmlprague.cz/2016/files/xmlprague-2016-proceedings.pdf#page=121

[ixml3] Steven Pemberton, On the Descriptions of Data: The Usability of Notations, Proc. XML Prague, 2017, Prague, Czech Republic, pp143-159. https://archive.xmlprague.cz/2017/files/xmlprague-2017-proceedings.pdf#page=155

[laws] Nico Verwer, Transparent Invisible XML, Proc. XML Prague 2024, https://archive.xmlprague.cz/2024/files/xmlprague-2024-proceedings.pdf#page=147

[nmr] Courtney, Joseph Michael, and Michael Robert Gryk. “Pulse, Parse, and Ponder: Using Invisible XML to Dissect a Scientific Domain Specific Language.” Presented at Balisage: The Markup Conference 2024, Washington, DC, July 29 - August 2, 2024. In Proceedings of Balisage: The Markup Conference 2024. Balisage Series on Markup Technologies, vol. 29 (2024). https://doi.org/10.4242/BalisageVol29.Courtney01.

[notations] Steven Pemberton, On the Design of Notations, XML Europe 2004, https://www.w3.org/2004/Talks/05-steven-XMLEuropeKeynote/

[spec] Steven Pemberton (ed.), Invisible XML Specification, Invisible XML Organisation, 2022, https://invisiblexml.org/1.0/

[trials] Sperberg-McQueen, C. M. “From Word to XML via iXML: a Word-first XML workflow in the TLRR 2e project.” Presented at Balisage: The Markup Conference 2024, Washington, DC, July 29 - August 2, 2024. In Proceedings of Balisage: The Markup Conference 2024. Balisage Series on Markup Technologies, vol. 29 (2024). https://doi.org/10.4242/BalisageVol29.Sperberg-McQueen01.

[vin] Ari Nordström, It's Useful After All — VIN Numbers, DITA, and iXML, Proc. XML Prague 2024, https://archive.xmlprague.cz/2024/files/xmlprague-2024-proceedings.pdf#page=305

[wg] Invisible Markup Community Group, W3C, https://www.w3.org/community/ixml/

[XForms] Erik Bruchez, et al., (eds.) XForms 2.0 https://www.w3.org/community/xformsusers/wiki/XForms_2.0

[xpath] Michael Kay (ed.), XPath and XQuery Functions and Operators 4.0, https://qt4cg.org/specifications/xpath-functions-40/Overview.html#func-invisible-xml