Evolution of HTML Diamonds

HTML Diamonds is a Ruby library for producing HTML. Here is its story.

Template Interpolation

I first started generating HTML in 1999 by interpolating strings. I guess most web programmers have done this. In Perl it can look like:
<h1>$title</h1>
There are variations, such as templating systems where you use some special HTML-like syntax to indicate variable interpolation:
<h1><== $title ==></h1>
Some systems, like Perl's HTML::Mason, let you define components which can be included in pages or in larger components. This helps manage the complexity of the HTML document.
The template-based approaches seemed good at the time, but in retrospect they have some problems.
First, you have to escape all variables before interpolation, which seems to defeat the apparent convenience of interpolation. Failing to escape means that an innocent '<' can blow up your page, and malicious javascript can hijack your user's cookies. But if you build up the page in layers, you have to avoid escaping substrings generated by your own software. In practice, from the real-world code I've seen, the data is often not escaped at all.
Even if you get it right, the software will probably be a house of cards; any future maintainer risks introducing an injection bug or double-escaping something.
Second, the syntax is clumsy and verbose. Ideally, interpolation should be kicked off by a minimal signal, such as the '$' used in Perl.
The template interpolation approach puts the burden of creating well-formed HTML on humans. Systems like this often have bugs where, for instance, a table element is not closed and it's hard to find which function / component is responsible.

First improvement - HTML Triples

I tried to solve these problems with a more abstract representation of HTML. Each HTML element is a triple: name, attributes, kids:
[ 'img', { src => '/images/icon1.gif', alt => 'New Entry' } ]
and:
[ 'ul', {}, [
    [ 'li', {}, 'Apple' ],
    [ 'li', {}, 'Banana' ]
]
If kids is a string, it gets escaped. At first this seemed like a big improvement. Keeping the data as triples works well with componentized page creation. It also fits list-oriented programming:
# ruby this time:

ul = [ 'ul', {}, fruits.map{|f| [ 'li', {}, f ]} ]

Setback - Slippery Triples

The HTML triples did not work out very well. I frequently got mysterious errors when converting the huge, nested structure to HTML.
For example, the html writer would complain of a hashref in the wrong position. In this scheme, a hashref can only be in position 1, the attributes. But where did the structure get corrupted? If a function or expression produced an invalid structure, it was quietly swallowed into a larger structure, hiding its provenance.
HTML Triples were just too slippery. I added a check() function which raises an exception if the given structure is invalid, and became more efficient at squashing this bug by inserting check() here and there. But that was a band-aid.

HTML Diamonds

I needed real objects, not arrays. Objects that had integrity by definition. That means no need for check() - everything is checked as it's created. But I did not want to saddle the user with burdensome OO noise:
## noisy!  no way!
button = new HTMLAnchor(
        new HTMLImage("/images/redpix.gif")
        .setAlt("Erase"), "http://somesys/?action=erase")
At the same time, I really like printf(). I find this:
sprintf("%s/%s: using %d bytes",
    environment.getName(),
    job.getName(),
    job.getMemSize()
)
easier to understand than this:
environment.getName() +
  "/" + job.getName() + ": using"
  + job.getMemSize() + "bytes"
(Incidentally, the second version has a bug - how easy is it to spot?)
Finally, Ruby has the awesome feature of allowing a library to inject a new method into an established class. This is just the thing for enabling low-noise invocation of a new functionality.
'div.main hello'.h().to_html
produces:
<div class="main">hello</div>
What happened here? Ruby's String class has a new method: h(), which interprets the string as a format for generating an HTML Diamond. h() returned an HTML Diamond, and we invoked to_html. In the next examples I'll skip the to_html step:
fruits = [ 'apple', 'pear', 'peaches & cream' ]
'ul %a'.h(fruits.map{|f| 'li %s'.h(f)})
produces:
<ul>
    <li>apple</li>
    <li>pear</li>
    <li>peaches &amp; cream</li>
</ul>
The map expression produced an array of Diamonds; the %a placeholder expects an array. The resulting Diamond, the ul, is now ready to be used in a bigger structure. A page could be expressed as:
'html (head (title %s) %s) (body %s %s %s)'.h(
    title,
    get_style(),
    get_top_nav(),
    get_main(),
    get_bottom_nav()
)
The HTML Diamonds scheme lets you decide how many levels of nesting to swallow in one expression. When the number of placeholders is confusingly high, you can break an expression into sub-expressions.
The expressions are compact due to the omission of closing tags and the replacement of lengthy strings with placeholders. That makes it easier to understand what the code is doing.
When you interpolate one Diamond into another, the inclusion is logical, not physical. The inner Diamond is not stringified and then pasted in. Therefore there is no risk of double-encoding, nor is there a performance penalty of copying strings.
I could have made one placeholder work for everything. Let %s consume Strings, Arrays, and Diamonds. But that approach worried me after experiencing the HTML Triples slipperiness. The placeholders in HTML Diamonds are picky about their args.
%s will take a string or a Diamond. %a takes an Array of whatever %s takes. %h takes a Hash, and adds it to the element's attributes. I want mistakes to show up early.

Attributes

Attributes can be literal:
'img src=/images/redpix.gif alt="Erase"'.h
produces:
<img src="/images/redpix.gif" alt="Erase" />
Quotes are only needed around attribute values that contain spaces. Attributes can come from a hash:
button = {
    'src' => '/images/redpix.gif',
    'alt' => 'Erase'
}
'img %h'.h(button)
or from string interpolation:
'img src=%s alt=%s'.h('/images/redpix.gif', 'Erase')
or from any mixture - later settings override earlier:
'img %h alt=Remove'.h(button)
produces:
<img src="/images/redpix.gif" alt="Remove" />
Here's a key-value table generated from a hash:
bob = {
    'name'    => 'Bob',
    'age'     => '86',
    'state'   => 'Texas'
}

'table.person (tr (td "Key") (td "Value")) %a'.h(
        bob.map{|k,v| 'tr (td %s) (td %s)'.h(k, v)}
)
<table class="person">
    <tr>
        <td>Key</td>
        <td>Value</td>
    </tr>
    <tr>
        <td>name</td>
        <td>Bob</td>
    </tr>
    <tr>
        <td>age</td>
        <td>86</td>
    </tr>
    <tr>
        <td>state</td>
        <td>Texas</td>
    </tr>
</table>
This expression produces a multiplication table:
n=5
'table %a'.h(
    (1..n).map{|i| 'tr %a'.h(
	(1..n).map{|j| 'td %s'.h(i*j)}
    )}
)

Download

It's pretty alpha: html-diamonds.tar.gz

asher@wildsparx.com