Invisible XML

Steven Pemberton, CWI

The ixml flow

Data is an abstraction: there is no essential difference between the JSON

{"temperature": {"value": 21; "scale": "C"}}

and an equivalent XML

<temperature value="21" scale="C"/>

or

<temperature scale="C">21</temperature>

or

<temperature>
   <value>21</value>
   <scale>C</scale>
</temperature>

or indeed

Temperature: 21°C

since the underlying abstractions being represented are the same.

We choose which representations of our data to use, JSON, CSV, XML, or whatever, depending on habit, convenience, or the context we want to use that data in. On the other hand, having an interoperable generic toolchain such as that provided by XML to process data is of immense value. How do we resolve the conflicting requirements of convenience, habit, and context, and still enable a generic toolchain?

Invisible XML (ixml) is a method for treating non-XML documents as if they were XML, enabling authors to write documents and data in a format they prefer while providing XML for processes that are more effective with XML content. For example, CSS code like:

body {color: blue; font-weight: bold}

can be turned into XML like:

<css>
   <rule>
      <simple-selector name="body"/>
      <block>
         <property>
            <name>color</name>
            <value>blue</value>
         </property>
         <property>
            <name>font-weight</name>
            <value>bold</value>
         </property>
      </block>
   </rule>
</css>

or depending on choice, like this:

<css>
   <rule>
      <selector>body</selector>
      <block>
         <property name="color" value="blue"/>
         <property name="font-weight" value="bold"/>
      </block>
   </rule>
</css>

Method

The format of the textual document is described with a context-free grammar, which is used with a general parser to create an internal abstract structured document. The grammar is decorated as necessary with details of how the resulting abstract document should be serialised as XML.

temperature: value, -"°", scale.
     -value: "-"?, digit+.
     @scale: "C"; "F".
     -digit: ["0"-"9"].

With the input 21°C this would yield

<temperature scale="C">21</temperature>

Changing -value to @value would give

<temperature value="21" scale="C"/>

and changing @value and @scale to just value and scale would give

<temperature>
   <value>21</value>
   <scale>C</scale>
</temperature>

Learning

Resources

Papers

There are a number of papers on the development including:

Examples of Use

Here are some published examples of ixml use in the wild:

Implementations

There are a number of implementations already available; see invisiblexml.org for details.