MutationObservers as Client-side Pseudo-PHP

25Jun14

One of my favourite pieces of technology is WordPress. My main project with it is Computer Science Circles, but I’ve also used it for online course notes and I first learned of it helping my colleague Douglas with a student organization website. It was my head-first introduction to the language PHP.

PHP is capable of a lot of nifty stuff. PHP pages act like little programs that run and whose output is an HTML page that is displayed to the user. This lets you do things like interact with databases or client state (e.g. shopping baskets, logins, etc). But even with stateless pages, one interesting thing about using PHP in place of HTML is that you can define new commands to shorten or clarify repetitive structures.

Here’s some repetitive structure as an example: imagine a page with a lot of links to Wikipedia articles, like this.

<a href='http://en.wikipedia.org/wiki/Cheese'>Cheese</a>
<a href='http://en.wikipedia.org/wiki/milk'>milk</a>
which comes from
<a href='http://en.wikipedia.org/wiki/mammals'>mammals</a>

Displays as: Cheese is made from milk which comes from mammals

There’s a common pattern repeated 3 times in that HTML fragment: the pattern

<a href='http://en.wikipedia.org/wiki/X'>X</a>

occurs once with X=Cheese, once with X=milk, and once with X=mammals. Were HTML a real programming language, it would be good programming practice to define a function that could handle that pattern, and to call it three times. This can be done in PHP like so:

<?php
function linkish($X) { echo "<a href='http://en.wikipedia.org/wiki/$X'>$X</a>";} ?> <?php linkish("Cheese"); ?> is made from <?php linkish("milk"); ?> which comes from <?php linkish("mammals"); ?> In WordPress, which first made me think of this kind of design pattern, you are allowed to do an essentially identical thing called a “shortcode”. It looks a lot like the above PHP, where you define and register a shortcode inside of a plugin, and you write [linkish]Cheese[/linkish] to call it. Note that it is almost like you are defining a new HTML element in terms of old ones. However, it seems like a little bit of overkill to ask the server to do so much thinking, whether with plain PHP or with shortcodes. It’s not serving a truly dynamic page: this content will be the same every time a user looks as it. On top of this, served webpages are allowed to contain function calls and definitions, as long as they are written in JavaScript. Can we get JavaScript to do this work for us? One approach would be to signify these elements and arguments in some way, to wait until the client has the page, and then to have the client do all of the substitutions. <a class='linkish'>Cheese</a> is made from <a class='linkish'>milk</a> which comes from <a class='linkish'>mammals</a> <script type='text/javascript'> var elts = document.querySelectorAll('a.linkish'); for (var i=0; i<elts.length; i++) elts[i].setAttribute('href', 'http://en.wikipedia.org/wiki/'+elts[i].innerHTML); </script> This code waits until everything’s displayed, then goes through and retroactively modifies the <a class=’linkish’>X</a> objects as desired. This is similar to how MathJax is able to insert $\LaTeX$ formulas like $\sin^2(x)+\cos^2(x)=1$ into virtually any webpage, including this one! After the page is loaded, it searches the whole page for expressions like$latex \sin^2(x)+\cos^2(x)=1\$ and then re-renders the contents appropriately.

However, this approach still has one downside compared to PHP. Until the full HTML page has been received by the human browsing the site, no replacements will take place. (You can confirm this by trying out MathJax on a long page: no LaTeX appears until everything is loaded.) This is not ideal since it’s jarring to the user to see weird placeholders sit around for so long, and it prevents them from starting to read the page until everything it totally done. This is the opposite of progressive HTML rendering, which refers to the fact that your browser will try to show you the first parts of the page while it’s still receiving and rendering the parts further down off-screen.

You could periodically re-check the page for new <a class=’linkish’> or $...$ elements, but this is a pretty hacky solution and causes work to be repeated and therefore does more computation than necessary. What we’d really like is to transform the incoming data on-the-fly. You could imagine it as a filter: as the HTML data is coming in, we would like to have some kind of stream-replacer that would search for something and replace it appropriately. I don’t think there is any way to literally do stream replacement on incoming HTML, but what I learned this week is that you can use a JavaScript MutationObserver to achieve basically the same effect.

A MutationObserver is a device that lets your code get notified whenever HTML elements on your page are changed; including, how the page is built while it is loaded. For example, here’s how it gives a new solution to the task mentioned above.

<script type='text/javascript'>
// construct observer _before_ anything is rendered
var mo = new MutationObserver(
// constructor argument: callback on MutationRecord[]
function (events) {
// for each record,
for (var i=0; i<events.length; i++)
// MutationRecord has Node "target" and Node[] "addedNodes"
// we'll define "process(parent, child)" below
}
);

// what should we observe, and with what options?
mo.observe(document, {childList: true, subtree: true});

// as promised, the callback
var process = function (parent, child) {
child.setAttribute('href',
'http://en.wikipedia.org/wiki/'+child.innerHTML);
};
</script>
<!-- now the same page as before -->
which comes from <a class='linkish'>mammals</a>

I was curious to explore a few of the finer details of this technique. What order will elements be added in? HTML has a tree-like structure, but it’s not clear if parents will be added before children or vice-versa. So I built a small test, replacing process() with a simple logging function, and using the content

<div>This is <b>bold</b><i> and italically <span>nested</span></i></div>

(See source of this page, and its console.) The order of events happening was:

1. The div was added to its parent
2. “This is” was added to the div
3. The <b> was added to the div
4. “bold” was added to the b
5. The <i> was added to the div
6. “and italically” was added to the <i>
7. The span was added to the <i>
8. “nested” was added to the span.

So, this would be a tree preorder. But because the MutationObserver sends you updates in batches, there was one more surprise: in step 3, by the time I looked at the <b> element, it already contained “bold”, and similarly for the other grandchildren.

This general design goal, of defining a new kind of element in terms of old ones, is something that HTML feels like it sorely needs. If you are making an awesome code editing widget or database slice displayer or simply trying to add “click to pop up” or “click to hide/show” functionality, this is the most natural approach. In fact, it looks like the new Custom Elements API lets us do precisely that!