Designing a Notation Using ixml

Steven Pemberton, CWI, Amsterdam

Abstract

The design of ixml was not about converting text files into particular XML document types, but just converting them to some XML document type for further transformation. However, the other direction is possible: if you have a particular document type, you can design a textual notation for it. This paper treats a particular use case, in order to reveal some of the options available to designers of such a notation.

Keywords: Markup, ixml, Invisible Markup, notations, parsing, XML, design, XForms.

Contents

Introduction

The ixml language [ixml] was originally designed with the principal aim of allowing un-marked-up textual documents to be treated as if they were XML documents with markup.

This can be seen as part of a progression of abstractions being made on documents: originally we had individual documents, with markup to detail the structure, and with embedded presentation details for the styling. Style sheets allowed us to abstract out the presentation into a separate file, and consequently use the style sheet for a whole class of similar documents. In the same way, ixml allows us to abstract the markup out of the documents into a separate file, similarly to be used for a whole class of related documents.

Although ixml was not initially designed to convert textual documents to particular XML document types, but just to get a textual document into an initial XML form that could later be refined as necessary using existing XML tools, it is possible to work in the other direction: if you have an XML document type, you can use ixml to design a textual representation for it. People often seem to prefer authoring flat textual documents because they can see and understand the structure unaided, and find the need to add markup to make it readable for computers a distraction. An example is Markdown [md], or indeed almost any programming language. However ixml supports both approaches.

Markdown is an example of the approach, where the target is an HTML Document produced from a textual file, and indeed there is an example of ixml being used to process Markdown in exactly this way [adv] .

In an earlier ixml paper on Modularisation [m12n], there was a hint of a similar approach for XForms [xf], which exists as an XML language with no equivalent textual form, but in that paper to demonstrate the use of modularisation on a larger example. In this paper we take this further, and examine the processes you have to go through to design a flat textual notation, and the options you have, using XForms as an example target language.

The Approach

The most important, and distinguishing factor of designing a notation for an existing XML document type is that the structure has already been specified: there are no decisions to be made on that front. As pointed out in the earlier example of defining ixml for markdown, the top level ixml rules for Markdown must be html, head, and body, since they have to match the final target structure.

Similarly in the case of XForms, the overall structure of the rules has already been decided for us, which we can determine directly from the XForms schema, where at the top level we have:

         model: (instance; bind; action; submission)*.
      -Content: Controls.
     -Controls: Core-Controls; group; switch; repeat.
-Core-Controls: input; secret; textarea; output; upload; 
                range; trigger; submit; select; select1.

(as in the XForms specification, all names with an initial lower-case letter are used for actual elements that will occur in the output, and names with an initial capital for other rules).

Of course, XForms wasn't designed to be a standalone language, but one embedded in other languages, so we need to specify a top-level structure in a host language, in this case XHTML:

html: head, body.

where head contains the models, and body contains the content. For instance:

head: title, Style*, model+.
body: Content.

Recognising Input

There are two approaches to recognising input: either by position, or by adding extra characters to identify what we are dealing with.

For instance, since the title is the first item in our input, we can just require that the first line be the title of our XForm:

title: ~[#a]+, nl.

The rule for nl requires a newline, and allows extra optional trailing space:

-nl: -#a, s?.

The rule for s is to allow trailing space, but we will also use it where spacing is required, not just optional:

-s: -[" "; #9; #a]+.

For styling we use extra characters to identify the input, in this case the word "style"; although it would also be possible to allow embedded CSS, to keep it simple we will just use html link elements:

           Style: -"style", s, link.
            link: href, Style-type, Style-rel.
           @href: URL.
@Style-type>type: +"text/css".
  @Style-rel>rel: +"stylesheet".
            -URL: [L;"0"-"9"; ":/@.~#?"]+. {A simple version for now}

This requires a URL, and adds two other attributes to the output. Note how ixml renaming has been used; although this is not yet officially part of the language, it is in the future specification [ixml2] and in all implementations. So if a flat XForms begins

XForms Example
style app.css

we will get an output that starts

<html>
   <head>
      <title>XForms Example</title>
      <link href='app.css' type='text/css' rel='stylesheet'/>

A Reminder About Spacing

Although this has been treated elsewhere [adv], it is worth pointing out the best technique for dealing with white space, since it is an easy source of ambiguity.

The first tip is: consume extraneous spaces after recognising a symbol. For instance,

name: [L]+, s?.

In that way, having recognised a name, the parser is positioned at the next meaningful character, and doesn't have to try lots of different rules beginning with a space. It also means that extra whitespace at the end of the document is already dealt with.

Secondly: recognise spaces as early as possible. Do this:

id: -"#", name.
-name: [L]+, s?.

and not this:

id: -"#", name, s?.
-name: [L]+.

and certainly never this:

id: -"#", name, s?.
-name: [L]+, s?.

because in that case, if you had #abc followed by a space, the parser wouldn't know whether the space was a part of id or name, in other words, you would get an ambiguous parse.

Namespaces

This brings us to the sticky question of namespaces; sticky, because at the time of writing, the issue is not yet resolved in the working group.

The XML design group did a clever thing when designing a notation for namespaces [xmlns]: they designed the namespace declarations to look like attributes, so that XML documents would be syntactically compatible with earlier software. Thus although namespace declarations look like attributes, they have a different semantic interpretation because they begin with the characters xmlns.

It is this author's opinion that ixml can use the same approach, by specifying that things that look like attributes should be interpreted as namespace declarations if the serialisation of the node starts with the letters xmlns. For implementations that produce textual output, this adds no extra processing; for implementations that go directly to an XML internal form, the namespace declarations have to be recognised and handled appropriately.

Accepting this, we can redefine the html rule to include a namespace in this way:

           html: xhtml-ns, head, body.
@xhtml-ns>xmlns: +"http://www.w3.org/1999/xhtml".

which will give

<html xmlns='http://www.w3.org/1999/xhtml'>

Content

We can use a similar approach to enclose the XForms controls in the body in an element that declares the namespace:

         body: Content.
Content>group: xf-ns, Controls.
 @xf-ns>xmlns: +"http://www.w3.org/2002/xforms".
    -Controls: Control*.
     -Control: CoreControl; group; switch; repeat.
 -CoreControl: input; secret; textarea; output; upload;
               range; trigger; submit; select; select1.

which will give

<body>
   <group xmlns='http://www.w3.org/2002/xforms'>

Simple Controls

Most controls have a number of required parameters, and a number of optional ones. For instance, consider input:

<input ref="person/@age">
   <label>Age</label>
</input>

We can define this using positioning after a leading keyword:

input person/@age "Age"

like this:

input: -"input", s, ref, label.
 @ref: XPath.
label: -'"', ~['"'; #a]*, -'"', s?.
XPath: [L; "0"-"9"; "/:@[]()+-*'><!=."]+, s?. {A simple version for now}

There's one other useful attribute for several controls, and that is incremental="true" that specifies that the control activates for every character typed. Since incremental="false" is the default, we don't have to specify it, so you can write:

input person/@age "Age" incremental

by changing the rule for input to:

       input: -"input", s, ref, label, incremental?.
@incremental: -"incremental", +"true", s?.

so that we get

<input ref='person/@age' incremental='true'>
   <label>Age</label>
</input>

Common Attributes

Nearly all elements in XForms can have certain common attributes, in particular class for presentation purposes, and id for identification.

<output class="error" id="out1" ref="message">
   <label>Error</label>
</input>

One option would be to give these a keyword to identify them:

output class:error id:out1 message "Error"

but another would be to use the same notation as used in CSS [css]:

output.error #out1 message "Error"

like this:

output: -"output", class?, id?, ref, label.
@class: -".", name.
   @id: -"#", name.
 -name: [L], [L; "0"-"9"]+, s?.

We can group them together as Common attributes:

-Common: class?, id?.

and use them everywhere:

output: -"output", Common, ref, label.

The Model

Going back to the definition of the head

head: title, Style*, model+.

we have to define the model, for instance:

         model: "model", s, id?, Model-content.
-Model-content: (instance; bind; Action; submission)*.
      instance: -"data", s, id?, src.
          @src: URL.
          bind: -"bind", s, ref, Property+.
     -Property: type; constraint; relevant; required; readonly.
         @type: -"type:", s?, name.
   @constraint: -"constraint:", s?, Expression.
   -Expression: XPath.

(we'll come back to Action and submission later), looking like this:

model
   data people.xml
   bind person/@age type:integer constraint:.>0

As you can see, we are not obliged to use the same keywords in the input as the elements in the output, so in this case we have replaced the somewhat technical instance with the more general data.

To distinguish the various types of property in a bind, we have to use keywords like this, however another approach would be to give them each a separate definition:

 -Model-content: (instance; Bind; Action; submission)*.
          -Bind: Type; Constraint; Relevant; Required; Readonly.
      Type>bind: -"type", s, ref, s, type.
          @type: name.
Constraint>bind: -"constraint", s, ref, constraint.
    @constraint: Expression.

etc., giving

model
   data people.xml
   type person/@age integer
   constraint person/@age .>0

yielding

<model>
   <instance src='data.xml'/>
   <bind ref='person/@age' type='integer'/>
   <bind ref='person/@age' constraint='.>0'/>

It is worth noting that nearly all XForms applications only have a single model, so an alternative approach is to define models so that in the simple (usual) case you don't have to declare a model at all, only when there is more than one:

               head: title, Style*, Models.
            -Models: Single-model; model+.
-Single-model>model: Model-content.
              model: -"model", s, id?, Model-content.

allowing in the simple case:

XForms Example
style app.css
data people.xml
   type person/@age integer
   constraint person/@age .>0

Container Controls

Some controls can contain other content, and be nested, the simplest case being group:

<group>
   ...controls...
</group>

So we have a design a syntax for this style of control. Options could include

group:
   ...
:group

or

group
   ...
/group

or

group{
   ...
}group

or indeed

group {
   ...
}

It is also worth noting that controls that are not in themselves principally containers, may nevertheless also contain content:

<input ref="person/@age">
   <label>Age</label>
   <dispatch name="CHANGED" targetid="m" ev:event="xforms-value-changed"/>
</input>

so it would be good if any syntax we choose be consistent with these cases. For instance:

input person/@age "Age" {
   dispatch CHANGED m xforms-value-changed
}

and

input person/@age "Age" {
   hint "An integer"
}

We can do this by declaring a block:

-Block: -"{", s?, Controls, "}", s?.

and then define group as:

group: -"group", Common, ref?, label?, Block.

which requires a block, and

input: -"input", Common, ref, label, incremental?, Block?

where it is optional.

For the switch control, it could look like this:

switch {
   case #closed
        trigger ">" {
           toggle open DOMActivate
        }
   case #open
        trigger "<" {
          toggle close DOMActivate
        }
        repeat item {
           output .
        }
}

Defined like this:

switch: -"switch", Common, Cases.
-Cases: -"{", s?, case+, -"}", s?.
  case: id, Controls.

Actions

XForms actions respond to asynchronous events that may occur. We have already seen a few above, such as toggle, and dispatch. These all have various attributes, plus optionally an event that they are responding to. For instance within a submission, a setvalue might look like this.

<setvalue ref="message" ev:event="xforms-submit-error">Failed</setvalue>

We could represent this directly as

setvalue message "Failed" xforms-submit-error

however, setvalue can also calculate a value

<setvalue ref="count" value=".+1" ev:event="DOMActivate"/>

Luckily these two cases are syntactically distinguishable, so we can define it as

         setvalue: -"setvalue", s, ref, (string; value), event.
           @value: expression.
@event>"ev:event": name.

There is a grouping element for several actions, called action:

<action ev:event="xforms-ready">
   <setvalue ref="date" value="local-dateTime()"/>
   <dispatch name="TICK" targetid="clock"/>
</action>

We can treat that in the same way that we treated group earlier:

      action: -"action", s, event, ActionBlock.
-ActionBlock: -"{", s?, Action*, -"}", s?.
     -Action: toggle; setvalue; dispatch; action. {etc}

allowing

action xforms-ready {
   setvalue date local-dateTime()
   dispatch TICK clock
}

However, we are not confined to doing it this way. Another approach would express it as:

xforms-ready? {
   setvalue date local-dateTime()
   dispatch TICK clock
}

defined by:

action: event, -"?", s?, (Action; ActionBlock).

which would also allow:

DOMActivate? setvalue count .+1

Submission

The submission element is the most complex one in XForms for the simple reason that HTTP submission is complicated, and the element tries to cover all cases. Therefore we will only address a subset of its features here.

A typical submission looks like this:

<submission id="save" method="put" ref="instance('data')" resource="data.xml" replace="none">
    <setvalue ref="instance('q')/message" ev:event="xforms-submit-error">Save failed</setvalue>
    <setvalue ref="instance('q')/message" ev:event="xforms-submit-done"/>
</submission>

which could be represented like this:

submission #save put instance('data') data.xml replace:none {
   xforms-submit-error? setvalue instance('q')/message "Save failed" 
   xforms-submit-done? setvalue instance('q')/message ""
}

However, this is such a common pattern, it might be worth enforcing the handling of the return events, something like this:

submission #save put:instance('data') to:data.xml replace:none {
   FAILURE setvalue message "Save failed"
   SUCCESS setvalue message ""
}

along these lines:

        submission: -"submission", Common, s, Method, resource, replace?, SubBlock.
           -Method: PUT; GET; POST; DELETE; HEAD.
              -PUT: method-put, ref.
              -GET: method-get, ref.
@method-put>method: -"put:", +"PUT".
@method-get>method: -"get:", +"GET".

etc., and then define a Submission Block to allow the success and failure parts in either order:

            -SubBlock: -"{", s?, (SUCCESS, FAILURE; FAILURE, SUCCESS), -"}", s?.
       SUCCESS>action: -"SUCCESS", s?, evSuccess, (Action; ActionBlock).
@evSuccess>"ev:event": +"xforms-submit-done".
       FAILURE>action: -"FAILURE", s?, evFailure, (Action; ActionBlock).
@evFailure>"ev:event": +"xforms-submit-error".

giving an output like this:

<submission id='save' method='PUT' ref='instance(&apos;data&apos;)' resource='data.xml' replace='none'>
   <action ev:event='xforms-submit-error'>
      <setvalue ref='message'>Save failed</setvalue>
   </action>
   <action ev:event='xforms-submit-done'>
      <setvalue ref='message'/>
   </action>
</submission>

Embedded XML

Although we can easily embed and recognise other languages such as CSS in our flat XForms, there is an irony that we can't embed raw XML. This is partly because we can't get the names of the elements into the output form (though see [gixml] for an approach), and partly because serialising "<" and ">" characters would appear as "&lt;" and "&gt;", even if all we did was copy the embedded XML from input to output.

Conclusion

Designing a text notation for a given XML Document type is an interesting, even fun, exercise. While the overall structure of the document is already established, the designer has a lot of freedom in using keywords, extra characters, or positioning, to identify syntactic forms. While at first unexpected, there is also a lot of freedom in the choice of keywords and similar, that are not required to match the terminology used in the document type.

References

[adv] Steven Pemberton, Advanced Invisible XML (ixml) Tutorial, CWI, 2025, https://cwi.nl/~steven/ixml/advanced/

[css] Håkon Wium Lie et al. (eds.), Cascading Style Sheets level 1, W3C, 1996, https://www.w3.org/TR/CSS1/

[gixml] Steven Pemberton, Generalised Invisible Markup, Proc. Declarative Amsterdam, 2025, https://declarative.amsterdam/article?doi=da.2025.pemberton.generalised-invisible-markup

[ixml] Steven Pemberton (ed.), Invisible XML Specification, Invisible XML Organisation, 2022, https://invisiblexml.org/1.0/

[ixml2] Steven Pemberton (ed.), Invisible XML Specification Community Group Editorial Draft, Invisible XML Organisation, 2026, https://invisiblexml.org/current/

[m12n] Steven Pemberton, Modular ixml, Proc. MarkupUK 2025, pp 6-20, https://markupuk.org/pdf/proceedings-2025-2.pdf

[md] John Gruber, Markdown, Daring Fireball, 2004, https://daringfireball.net/projects/markdown/

[xf] Erik Bruchez et al. (eds.), XForms 2.0, W3C, 2026, https://www.w3.org/community/xformsusers/wiki/XForms_2.0

[xmlns] Tim Bray et al., Namespaces in XML 1.0, W3C, 2009, https://www.w3.org/TR/xml-names/