Tuesday, April 18, 2006

Sorting groups in XSL

For some of my recent work I had to generate reports from XML using XSL. One of the techniques I had to use frequently was grouping data using the Muenchian Method.

The referenced web page shows an example of getting a group of contacts and sorting it by surname. That's all well and good if your only computing the group once and processing it straight-forward as shown. But what if the XML data is large so calculating the group is costly (in terms of memory and CPU) and you need to use it multiple times and the order is important because you're using it within the context of processing some larger processing loop (ex. 10 at a time associated with another value)?

For an example, let's just say you need to generate a report that has a cover sheet listing all the unique surnames in your file, then separate statistics for each surname such as how many people share that surname. This is a very contrived example, but at least it will demonstrate what I mean.

Using the example in the referenced web page, we could build a list of unique sorted surnames to use in various places in our XSL via:


<xsl:key name="contacts-by-surname" match="contact" use="surname"/>
<xsl:variable name="surnames" select="contact[count(. | key('contacts-by-surname', surname)[1]) = 1]/surname"/>

<xsl:template match="/">
<xsl:call-template name="sortList">
<xsl:with-param name="list" select="$surnames"/>
</xsl:call-template>

<!-- list is now sorted to use whenever necessary -->
<xsl:for-each select="$surnames">
[process sorted surname]
</xsl:for-each>
</xsl:template>


<!-- sort list by the value of its current node -->
<xsl:template name="sortList">
<xsl:param name="list"/>
<xsl:for-each select="$list">
<xsl:sort select="." />
</xsl:for-each>
</xsl:template>

The call to sortList will sort it for all subsequent uses.

So, to see it in action, here is our sample data:

<records>
<contact id="0001">
<title>Mr</title>
<forename>John</forename>
<surname>Smith</surname>
</contact>
<contact id="0002">
<title>Dr</title>
<forename>Amy</forename>
<surname>Jones</surname>
</contact>
<contact id="0003">
<title>Mr</title>
<forename>John</forename>
<surname>Doe</surname>
</contact>
<contact id="0004">
<title>Mr</title>
<forename>Jack</forename>
<surname>Smith</surname>
</contact>
<contact id="0005">
<title>Mr</title>
<forename>Mike</forename>
<surname>Jordan</surname>
</contact>
<contact id="0006">
<title>Mr</title>
<forename>Jim</forename>
<surname>Johnson</surname>
</contact>
<contact id="0007">
<title>Mr</title>
<forename>Randy</forename>
<surname>Jones</surname>
</contact>
<contact id="0008">
<title>Mr</title>
<forename>Bobby</forename>
<surname>Jones</surname>
</contact>
<contact id="0009">
<title>Mr</title>
<forename>Glenn</forename>
<surname>Smith</surname>
</contact>
<contact id="0010">
<title>Mr</title>
<forename>Bill</forename>
<surname>McDonnald</surname>
</contact>

</records>


Here is the complete style sheet:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" indent="no" omit-xml-declaration="yes" />

<xsl:variable name="newline">
<!-- Hex 10 is a line feed, and hex 13 is a carriage return -->
<xsl:value-of select="'&#13;'" />
</xsl:variable>

<!-- Remove all whitespace from the source XML so that it is not echoed into the new report -->
<xsl:strip-space elements="//*" />

<xsl:key name="contacts-by-surname" match="contact" use="surname" />
<xsl:variable name="surnames" select="//contact[count(. | key('contacts-by-surname', surname)[1]) = 1]/surname" />

<xsl:template match="/">
<xsl:text>Full list of surnames:</xsl:text>
<xsl:value-of select="$newline" />
<xsl:for-each select="//surname">
<xsl:value-of select="." />
<xsl:value-of select="' '" />
</xsl:for-each>
<xsl:value-of select="$newline" />
<xsl:value-of select="$newline" />

<xsl:text>List of unique surnames:</xsl:text>
<xsl:value-of select="$newline" />
<xsl:for-each select="$surnames">
<xsl:value-of select="." />
<xsl:value-of select="' '" />
</xsl:for-each>
<xsl:value-of select="$newline" />
<xsl:value-of select="$newline" />

<xsl:call-template name="sortList">
<xsl:with-param name="list" select="$surnames" />
</xsl:call-template>

<!-- list is now sorted to use whenever necessary -->
<xsl:text>List of unique surnames (sorted):</xsl:text>
<xsl:value-of select="$newline" />
<xsl:for-each select="$surnames">
<xsl:value-of select="." />
<xsl:value-of select="' '" />
</xsl:for-each>
<xsl:value-of select="$newline" />
<xsl:value-of select="$newline" />

<xsl:text>People per surname:</xsl:text>
<xsl:value-of select="$newline" />
<xsl:for-each select="$surnames">
<xsl:variable name="surname" select="."/>
<xsl:value-of select="$surname" />
<xsl:value-of select="' ('" />
<xsl:value-of select="count(//contact[surname = $surname])" />
<xsl:value-of select="'):'" />
<xsl:value-of select="$newline" />
<xsl:for-each select="//contact[surname = $surname]">
<xsl:sort select="forname"/>
<xsl:value-of select="' '" />
<xsl:value-of select="forename" />
<xsl:value-of select="$newline" />
</xsl:for-each>
<xsl:value-of select="$newline" />
</xsl:for-each>


</xsl:template>

<!-- sort list by the value of its current node -->
<xsl:template name="sortList">
<xsl:param name="list" />
<xsl:for-each select="$list">
<xsl:sort select="." />
</xsl:for-each>
</xsl:template>

</xsl:stylesheet>


And here is the report:

Full list of surnames:
Smith Jones Doe Smith Jordan Johnson Jones Jones Smith McDonnald

List of unique surnames:
Smith Jones Doe Jordan Johnson McDonnald

List of unique surnames (sorted):
Doe Johnson Jones Jordan McDonnald Smith

People per surname:
Doe (1):
John

Johnson (1):
Jim

Jones (3):
Amy
Randy
Bobby

Jordan (1):
Mike

McDonnald (1):
Bill

Smith (3):
John
Jack
Glenn

No comments: