Designing a notation using ixml

Steven Pemberton, CWI, Amsterdam

Abstract

Contents

Introduction

The ixml language [ixml] was originally designed with the principal aim of allowing un-marked-up textual documents to be treated as if they were XML documents with markup.

This can be seen as part of a progression of abstractions being made on documents: originally we had individual documents, with markup to detail the structure, and with embedded presentation details for the styling. Style sheets allowed us to abstract out the presentation into a separate file, and consequently use the style sheet for a whole class of similar documents. In the same way, ixml allows us to abstract the markup out of the documents into a separate file, similarly to be used for a whole class of related documents.

Although ixml was not initially designed to convert textual documents to particular XML document types, but just to get a textual document into an initial XML form that could later be refined as necessary using existing XML tools, it is possible to work in the other direction: if you have an XML document type, you can use ixml to design a textual representation for it. People often seem to prefer authoring flat textual documents because they can see and understand the structure unaided, and find the need to add markup to make it readable for computers a distraction. An example is Markdown [md], or indeed almost any programming language. However ixml allows both approaches to be supported.

Markdown is an example of the approach, where the target is an HTML Document produced from a textual file, and indeed there is an an example of ixml being used to process Markdown [adv] in exactly this way.

In an earlier ixml paper on Modularisation [m12n], there was a hint of a similar approach for XForms [xf], which exists as an XML language with no equivalent textual form, but in that paper to demonstrate the use of modularisation on a larger example. In this paper we take this further, and examine the processes you have to go through to design a flat textual notation, and the options you have, using XForms as an example target language.

The Approach

The most important, and distinguishing factor of designing a notation for an existing XML document type is that the structure has already been specified: there are no decisions to be made on that front. As pointed out in the earlier example of defining ixml for markdown, the top level ixml rules for Markdown have to be html, head, and body, since they have to match the final target structure.

Similarly in the case of XForms, the overall structure of the rules has already been decided for us, which we can determine directly from the XForms schema. At the top level we have:

         model: (instance; bind; action; submission)*.
      -Content: Controls.
     -Controls: Core-Controls; group; switch; repeat.
-Core-Controls: input; secret; textarea; output; upload; range; trigger; submit; select; select1.

(in all the rules, as in the XForms specification, names with an initial lower-case letter are used for actual elements that will occur in the output, and names with an initial capital for other rules).

Of course, XForms wasn't designed to be a standalone language, but one embedded in other languages, so we have to devise a top-level structure in a host language, in this case XHTML:

html: head, body, s?.

where head contains the models, and body contains the content. For instance:

head: title, Style*, model+.
body: Content.

There are two approaches to recognising input: either by position, or by adding extra characters to identify what sort of input we are dealing with.

For instance, the title is the first item in our input, so we can just require that the first line be the title of our XForm:

title: ~[#a]+, nl.

The rule for nl requires a newline, and allows extra optional trailing space:

-nl: -#a, s?.

The rule for s is to allow trailing space:

-s: -[" "; #9; #a]+.

but we will also use it where spacing is required.

For styling, although it would also be possible to allow embedded CSS, to keep it simple we will just use html link elements:

         Style: "style:", s?, link, nl.
          link: href, Css-type, Style-rel.
         @href: url.
@Css-type>type: +"text/css".
@Style-rel>rel: +"stylesheet".
           url: [L;"0"-"9"; ":/@."]+. {A simple version for now}

This requires a URL, and adds two other attributes to the output. Note how ixml renaming has been used; although this is not yet officially part of the language, it is in the future specification [ixml2] and in all implementations. So if a flat XForms begins

XForms Example
style:app.css

we will get an output that starts

<html>
   <head>
      <title>XForms Example</title>
      <link href='app.css' type='text/css' rel='stylesheet'/>

Namespaces

This brings us to the sticky question of namespaces; sticky, because at the time of writing the issue is not yet resolved in the working group.

The XML design group did a clever thing when designing a notation for namespaces: they designed the namespace declarations to look like attributes, so that XML documents would be syntactically compatible with earlier software. Thus although namespace declarations look like attributes, they have a different semantic interpretation because they begin with the character xmlns.

It is this author's opinion that ixml can use the same approach, by specifying that things that look like attributes should be interpreted as namespace declarations if the serialisation of the node starts with the letters xmlns.

Accepting this, we can redefine the html rule to include a namespace in this way:

           html: xhtml-ns, head, body, s?.
@xhtml-ns>xmlns: +"http://www.w3.org/1999/xhtml".

which will give

<html xmlns='http://www.w3.org/1999/xhtml'>

Content

We can use a similar technique to enclose the XForms controls in an element that declares the namespace:

         body: Content.
Content>group: xf-ns, Controls.
 @xf-ns>xmlns: +"http://www.w3.org/2002/xforms".

which will give

<body>
   <group xmlns='http://www.w3.org/2002/xforms'>

Simple Controls

Most controls have a number of required parameters, and a number of optional ones. For instance, consider input:

<input ref="person/@age">
   <label>Age</label>
</input>

We can define this using positioning after a leading keyword:

input person/@age "Age"

like this:

input: -"input", s, ref, s, label.
 @ref: XPath.
label: -'"', ~['"'; #a]*, -'"', s?.
XPath: ... more ... . {assume this to be defined}

There's one other useful attribute for many controls, and that is incremental="true" that specifies that the control activates for every character typed. Since incremental="false" is the default, we don't have to specify it, so you can write:

input person/@age "Age" incremental

by changing the rule for input to:

       input: -"input", s, ref, s, label, s?, incremental.
@incremental: -"incremental", +"true".

so that we get

<input ref='person/@age' incremental='true'>
   <label>Age</label>
</input>

Common attributes

Nearly all elements in XForms can have certain common attributes, in particular class for presentation purposes, and id for identification.

<output class="error" id="out1" ref="message">
   <label>Error</label>
</input>

One option would be to give these a keyword to identify them:

output class:error id:out1 message "Error"

but another would be to use the same notation as used in CSS:

output.error #out1 message "Error"

like this:

output: class?, id?, ref, label, s?.
class: -".", name.
id: -"#", name.
-name: [L], [L; "0"-"9"]+.

The Model

Going back to the definition of the head

head: title, style*, model+.

we have to define the model, for instance:

model: "model", s, id?, Model-content.
-Model-content: (instance; bind; Action; submission)*.
instance: -"data", s, id?, src.
bind: -"bind", s, ref, s, Property+.
-Property: type; constraint; relevant; required; readonly.
@type: -"type:", s?, name.
@constraint: -"constraint:", s?, expression.

(we'll come back to Action and submission later), looking like this:

model
   data people.xml
   bind person/@age type:integer constraint: .>0

As you can see, we are not obliged to use the same keywords in the input as the elements in the output, so in this case we have replaced the somewhat technical "instance" with the more general "data".

To distinguish the various types of property in a bind, we have to use keywords like this, however another approach would be to give them each a separate definition:

-Model-content: (instance; Bind; Action; submission)*.
Bind: Type; Constraint; Relevant; Required; Readonly.
Type>bind: -"type", s, ref, s, name.
Constraint>bind: -"constraint", s, ref, expr.

giving

model
   data people.xml
   type person/@age integer
   constraint person/@age .>0

It is worth noting that nearly all XForms only have a single model, so an alternative approach to defining models is this:

head: title, Style*, Models.
Models: Single-model; model+.
Single-model>model: Model-content.
model: -"model", s, id?, Model-content.

so that in the usual case, you don't have to declare a model at all, only when there is more than one:

XForms Example
style:app.css
data people.xml
   type person/@age integer
   constraint person/@age .>0

Container Controls

switch
  case closed
  /case
  case open
  /case
/switch

switch(
  case #closed(
  )case
  case #open(
  )case
)switch

switch:
  case: #closed
  :case
  case: #open
  :case
:switch

But some elements are sometimes enclosing

input a "Age" (
   hint "An integer"
)input

input: person/@age "Age"
   hint "An integer"
:input

Common attributes

id: always with #

submission #send

extend XPath to allow for instances

output @admin/a => instance('admin')/a
Actions
trigger "OK"
   toggle @closed DOMActivate
/trigger

Techniques

input age "Age" incremental

can use:

@incremental: +"true".

to give

<input ref="age" incremental="true">
   <label>Age</label>
</input>

Since non-incremental is the default, we don't need to treat it specially: you just leave it out.

Issues

Problem with embedded XML

Submission

Common
Evaluation
Single Item Binding
value (expression)
nonrelevant ("keep" | "remove" | "empty")
relevant (xs:boolean) [deprecated]
validate (xs:boolean)

resource (xs:anyURI)
action (xs:anyURI) [deprecated]
mode ("asynchronous"|"synchronous")
method ("post" | "get" | "put" | "delete" | "multipart-post" | "form-data-post" | "urlencoded-post" | Any other NCName | PrefixedName)
serialization (xs:string)
mediatype (xs:string)
encoding (xs:string)

separator (";" | "&")

version (xs:NMTOKEN)
indent (xs:boolean)
omit-xml-declaration (xs:boolean)
standalone (xs:boolean)
cdata-section-elements (QNameList)
includenamespaceprefixes (xs:NMTOKENS)

response-mediatype (xs:string)
replace ("all" | "instance" | "text" | "none" | PrefixedName)
instance (xs:IDREF)
targetref ( Single Item Binding)
target ("_self" | "_blank" | xs:string)

Conclusion

References