Steven Pemberton, CWI, Amsterdam
The design of ixml was not about converting text files into particular XML document types, but just converting them to some XML document type for further transformation. However, the other direction is possible: if you have a particular document type, you can design a textual notation for it. This paper treats a particular use case, in order to reveal some of the options available to designers of such a notation.
Keywords: Markup, ixml, Invisible Markup, notations, parsing, XML, design, XForms.
The ixml language [ixml] was originally designed with the principal aim of allowing un-marked-up textual documents to be treated as if they were XML documents with markup.
This can be seen as part of a progression of abstractions being made on documents: originally we had individual documents, with markup to detail the structure, and with embedded presentation details for the styling. Style sheets allowed us to abstract out the presentation into a separate file, and consequently use the style sheet for a whole class of similar documents. In the same way, ixml allows us to abstract the markup out of the documents into a separate file, similarly to be used for a whole class of related documents.
Although ixml was not initially designed to convert textual documents to particular XML document types, but just to get a textual document into an initial XML form that could later be refined as necessary using existing XML tools, it is possible to work in the other direction: if you have an XML document type, you can use ixml to design a textual representation for it. People often seem to prefer authoring flat textual documents because they can see and understand the structure unaided, and find the need to add markup to make it readable for computers a distraction. An example is Markdown [md], or indeed almost any programming language. However ixml supports both approaches.
Markdown is an example of the approach, where the target is an HTML Document produced from a textual file, and indeed there is an example of ixml being used to process Markdown in exactly this way [adv] .
In an earlier ixml paper on Modularisation [m12n], there was a hint of a similar approach for XForms [xf], which exists as an XML language with no equivalent textual form, but in that paper to demonstrate the use of modularisation on a larger example. In this paper we take this further, and examine the processes you have to go through to design a flat textual notation, and the options you have, using XForms as an example target language.
The most important, and distinguishing factor of designing a notation for an
existing XML document type is that the structure has already been specified:
there are no decisions to be made on that front. As pointed out in the earlier
example of defining ixml for markdown, the top level ixml rules for Markdown
must be html, head, and body,
since they have to match the final target structure.
Similarly in the case of XForms, the overall structure of the rules has already been decided for us, which we can determine directly from the XForms schema, where at the top level we have:
model: (instance; bind; action; submission)*.
-Content: Controls.
-Controls: Core-Controls; group; switch; repeat.
-Core-Controls: input; secret; textarea; output; upload;
range; trigger; submit; select; select1.
(as in the XForms specification, all names with an initial lower-case letter are used for actual elements that will occur in the output, and names with an initial capital for other rules).
Of course, XForms wasn't designed to be a standalone language, but one embedded in other languages, so we need to specify a top-level structure in a host language, in this case XHTML:
html: head, body.
where head contains the models, and body contains
the content. For instance:
head: title, Style*, model+. body: Content.
There are two approaches to recognising input: either by position, or by adding extra characters to identify what we are dealing with.
For instance, since the title is the first item in our input, we can just require that the first line be the title of our XForm:
title: ~[#a]+, nl.
The rule for nl requires a newline, and allows extra optional
trailing space:
-nl: -#a, s?.
The rule for s is to allow trailing space, but we will also use
it where spacing is required, not just optional:
-s: -[" "; #9; #a]+.
For styling we use extra characters to identify the input, in this case the
word "style"; although it would also be possible to allow embedded CSS, to keep
it simple we will just use html link elements:
Style: -"style", s, link.
link: href, Style-type, Style-rel.
@href: URL.
@Style-type>type: +"text/css".
@Style-rel>rel: +"stylesheet".
-URL: [L;"0"-"9"; ":/@.~#?"]+. {A simple version for now}
This requires a URL, and adds two other attributes to the output. Note how ixml renaming has been used; although this is not yet officially part of the language, it is in the future specification [ixml2] and in all implementations. So if a flat XForms begins
XForms Example style app.css
we will get an output that starts
<html>
<head>
<title>XForms Example</title>
<link href='app.css' type='text/css' rel='stylesheet'/>
Although this has been treated elsewhere [adv], it is worth pointing out the best technique for dealing with white space, since it is an easy source of ambiguity.
The first tip is: consume extraneous spaces after recognising a symbol. For instance,
name: [L]+, s?.
In that way, having recognised a name, the parser is positioned
at the next meaningful character, and doesn't have to try lots of different
rules beginning with a space. It also means that extra whitespace at the end of
the document is already dealt with.
Secondly: recognise spaces as early as possible. Do this:
id: -"#", name. -name: [L]+, s?.
and not this:
id: -"#", name, s?. -name: [L]+.
and certainly never this:
id: -"#", name, s?. -name: [L]+, s?.
because in that case, if you had #abc followed by a space, the
parser wouldn't know whether the space was a part of id or
name, in other words, you would get an ambiguous parse.
This brings us to the sticky question of namespaces; sticky, because at the time of writing, the issue is not yet resolved in the working group.
The XML design group did a clever thing when designing a notation for
namespaces [xmlns]: they designed the namespace
declarations to look like attributes, so that XML documents would be
syntactically compatible with earlier software. Thus although namespace
declarations look like attributes, they have a different semantic
interpretation because they begin with the characters xmlns.
It is this author's opinion that ixml can use the same approach, by
specifying that things that look like attributes should be interpreted as
namespace declarations if the serialisation of the node starts with the letters
xmlns. For implementations that produce textual output, this adds
no extra processing; for implementations that go directly to an XML internal
form, the namespace declarations have to be recognised and handled
appropriately.
Accepting this, we can redefine the html rule to include a
namespace in this way:
html: xhtml-ns, head, body. @xhtml-ns>xmlns: +"http://www.w3.org/1999/xhtml".
which will give
<html xmlns='http://www.w3.org/1999/xhtml'>
We can use a similar approach to enclose the XForms controls in the body in an element that declares the namespace:
body: Content.
Content>group: xf-ns, Controls.
@xf-ns>xmlns: +"http://www.w3.org/2002/xforms".
-Controls: Control*.
-Control: CoreControl; group; switch; repeat.
-CoreControl: input; secret; textarea; output; upload;
range; trigger; submit; select; select1.
which will give
<body> <group xmlns='http://www.w3.org/2002/xforms'>
Most controls have a number of required parameters, and a number of optional
ones. For instance, consider input:
<input ref="person/@age"> <label>Age</label> </input>
We can define this using positioning after a leading keyword:
input person/@age "Age"
like this:
input: -"input", s, ref, label.
@ref: XPath.
label: -'"', ~['"'; #a]*, -'"', s?.
XPath: [L; "0"-"9"; "/:@[]()+-*'><!=."]+, s?. {A simple version for now}
There's one other useful attribute for several controls, and that is
incremental="true" that specifies that the control activates for
every character typed. Since incremental="false" is the default,
we don't have to specify it, so you can write:
input person/@age "Age" incremental
by changing the rule for input to:
input: -"input", s, ref, label, incremental?. @incremental: -"incremental", +"true", s?.
so that we get
<input ref='person/@age' incremental='true'> <label>Age</label> </input>
Nearly all elements in XForms can have certain common attributes, in
particular class for presentation purposes, and id
for identification.
<output class="error" id="out1" ref="message"> <label>Error</label> </input>
One option would be to give these a keyword to identify them:
output class:error id:out1 message "Error"
but another would be to use the same notation as used in CSS [css]:
output.error #out1 message "Error"
like this:
output: -"output", class?, id?, ref, label. @class: -".", name. @id: -"#", name. -name: [L], [L; "0"-"9"]+, s?.
We can group them together as Common attributes:
-Common: class?, id?.
and use them everywhere:
output: -"output", Common, ref, label.
Going back to the definition of the head
head: title, Style*, model+.
we have to define the model, for instance:
model: "model", s, id?, Model-content.
-Model-content: (instance; bind; Action; submission)*.
instance: -"data", s, id?, src.
@src: URL.
bind: -"bind", s, ref, Property+.
-Property: type; constraint; relevant; required; readonly.
@type: -"type:", s?, name.
@constraint: -"constraint:", s?, Expression.
-Expression: XPath.
(we'll come back to Action and submission later),
looking like this:
model data people.xml bind person/@age type:integer constraint:.>0
As you can see, we are not obliged to use the same keywords in the input as
the elements in the output, so in this case we have replaced the somewhat
technical instance with the more general data.
To distinguish the various types of property in a bind, we have to use keywords like this, however another approach would be to give them each a separate definition:
-Model-content: (instance; Bind; Action; submission)*.
-Bind: Type; Constraint; Relevant; Required; Readonly.
Type>bind: -"type", s, ref, s, type.
@type: name.
Constraint>bind: -"constraint", s, ref, constraint.
@constraint: Expression.
etc., giving
model data people.xml type person/@age integer constraint person/@age .>0
yielding
<model> <instance src='data.xml'/> <bind ref='person/@age' type='integer'/> <bind ref='person/@age' constraint='.>0'/>
It is worth noting that nearly all XForms applications only have a single model, so an alternative approach is to define models so that in the simple (usual) case you don't have to declare a model at all, only when there is more than one:
head: title, Style*, Models.
-Models: Single-model; model+.
-Single-model>model: Model-content.
model: -"model", s, id?, Model-content.
allowing in the simple case:
XForms Example style app.css data people.xml type person/@age integer constraint person/@age .>0
Some controls can contain other content, and be nested, the simplest case
being group:
<group> ...controls... </group>
So we have a design a syntax for this style of control. Options could include
group: ... :group
or
group ... /group
or
group{
...
}group
or indeed
group {
...
}
It is also worth noting that controls that are not in themselves principally containers, may nevertheless also contain content:
<input ref="person/@age"> <label>Age</label> <dispatch name="CHANGED" targetid="m" ev:event="xforms-value-changed"/> </input>
so it would be good if any syntax we choose be consistent with these cases. For instance:
input person/@age "Age" {
dispatch CHANGED m xforms-value-changed
}
and
input person/@age "Age" {
hint "An integer"
}
We can do this by declaring a block:
-Block: -"{", s?, Controls, "}", s?.
and then define group as:
group: -"group", Common, ref?, label?, Block.
which requires a block, and
input: -"input", Common, ref, label, incremental?, Block?
where it is optional.
For the switch control, it could look like this:
switch {
case #closed
trigger ">" {
toggle open DOMActivate
}
case #open
trigger "<" {
toggle close DOMActivate
}
repeat item {
output .
}
}
Defined like this:
switch: -"switch", Common, Cases.
-Cases: -"{", s?, case+, -"}", s?.
case: id, Controls.
XForms actions respond to asynchronous events that may occur. We have
already seen a few above, such as toggle, and
dispatch. These all have various attributes, plus optionally an
event that they are responding to. For instance within a submission, a
setvalue might look like this.
<setvalue ref="message" ev:event="xforms-submit-error">Failed</setvalue>
We could represent this directly as
setvalue message "Failed" xforms-submit-error
however, setvalue can also calculate a value
<setvalue ref="count" value=".+1" ev:event="DOMActivate"/>
Luckily these two cases are syntactically distinguishable, so we can define it as
setvalue: -"setvalue", s, ref, (string; value), event.
@value: expression.
@event>"ev:event": name.
There is a grouping element for several actions, called
action:
<action ev:event="xforms-ready"> <setvalue ref="date" value="local-dateTime()"/> <dispatch name="TICK" targetid="clock"/> </action>
We can treat that in the same way that we treated group
earlier:
action: -"action", s, event, ActionBlock.
-ActionBlock: -"{", s?, Action*, -"}", s?.
-Action: toggle; setvalue; dispatch; action. {etc}
allowing
action xforms-ready {
setvalue date local-dateTime()
dispatch TICK clock
}
However, we are not confined to doing it this way. Another approach would express it as:
xforms-ready? {
setvalue date local-dateTime()
dispatch TICK clock
}
defined by:
action: event, -"?", s?, (Action; ActionBlock).
which would also allow:
DOMActivate? setvalue count .+1
The submission element is the most complex one in XForms for
the simple reason that HTTP submission is complicated, and the element tries to
cover all cases. Therefore we will only address a subset of its features
here.
A typical submission looks like this:
<submission id="save" method="put" ref="instance('data')" resource="data.xml" replace="none">
<setvalue ref="instance('q')/message" ev:event="xforms-submit-error">Save failed</setvalue>
<setvalue ref="instance('q')/message" ev:event="xforms-submit-done"/>
</submission>
which could be represented like this:
submission #save put instance('data') data.xml replace:none {
xforms-submit-error? setvalue instance('q')/message "Save failed"
xforms-submit-done? setvalue instance('q')/message ""
}
However, this is such a common pattern, it might be worth enforcing the handling of the return events, something like this:
submission #save put:instance('data') to:data.xml replace:none {
FAILURE setvalue message "Save failed"
SUCCESS setvalue message ""
}
along these lines:
submission: -"submission", Common, s, Method, resource, replace?, SubBlock.
-Method: PUT; GET; POST; DELETE; HEAD.
-PUT: method-put, ref.
-GET: method-get, ref.
@method-put>method: -"put:", +"PUT".
@method-get>method: -"get:", +"GET".
etc., and then define a Submission Block to allow the success and failure parts in either order:
-SubBlock: -"{", s?, (SUCCESS, FAILURE; FAILURE, SUCCESS), -"}", s?.
SUCCESS>action: -"SUCCESS", s?, evSuccess, (Action; ActionBlock).
@evSuccess>"ev:event": +"xforms-submit-done".
FAILURE>action: -"FAILURE", s?, evFailure, (Action; ActionBlock).
@evFailure>"ev:event": +"xforms-submit-error".
giving an output like this:
<submission id='save' method='PUT' ref='instance('data')' resource='data.xml' replace='none'>
<action ev:event='xforms-submit-error'>
<setvalue ref='message'>Save failed</setvalue>
</action>
<action ev:event='xforms-submit-done'>
<setvalue ref='message'/>
</action>
</submission>
Although we can easily embed and recognise other languages such as CSS in our flat XForms, there is an irony that we can't embed raw XML. This is partly because we can't get the names of the elements into the output form (though see [gixml] for an approach), and partly because serialising "<" and ">" characters would appear as "<" and ">", even if all we did was copy the embedded XML from input to output.
Designing a text notation for a given XML Document type is an interesting, even fun, exercise. While the overall structure of the document is already established, the designer has a lot of freedom in using keywords, extra characters, or positioning, to identify syntactic forms. While at first unexpected, there is also a lot of freedom in the choice of keywords and similar, that are not required to match the terminology used in the document type.
[adv] Steven Pemberton, Advanced Invisible XML (ixml) Tutorial, CWI, 2025, https://cwi.nl/~steven/ixml/advanced/
[css] Håkon Wium Lie et al. (eds.), Cascading Style Sheets level 1, W3C, 1996, https://www.w3.org/TR/CSS1/
[gixml] Steven Pemberton, Generalised Invisible Markup, Proc. Declarative Amsterdam, 2025, https://declarative.amsterdam/article?doi=da.2025.pemberton.generalised-invisible-markup
[ixml] Steven Pemberton (ed.), Invisible XML Specification, Invisible XML Organisation, 2022, https://invisiblexml.org/1.0/
[ixml2] Steven Pemberton (ed.), Invisible XML Specification Community Group Editorial Draft, Invisible XML Organisation, 2026, https://invisiblexml.org/current/
[m12n] Steven Pemberton, Modular ixml, Proc. MarkupUK 2025, pp 6-20, https://markupuk.org/pdf/proceedings-2025-2.pdf
[md] John Gruber, Markdown, Daring Fireball, 2004, https://daringfireball.net/projects/markdown/
[xf] Erik Bruchez et al. (eds.), XForms 2.0, W3C, 2026, https://www.w3.org/community/xformsusers/wiki/XForms_2.0
[xmlns] Tim Bray et al., Namespaces in XML 1.0, W3C, 2009, https://www.w3.org/TR/xml-names/