Documents
I wanted to use XML instead of HTML for storing the content (written text) for the site. There were a couple of reasons for this. One reason is that, despite having almost no layout/presentation instructions, the XHTML includes many tags surrounding the content. The tags must also be nested in the appropriate way. This is because the CSS needs fairly granular control over the display of different items. For example, when I display a box with a list of links, the "heading" in the box has one color, the body of the box another, and both of them can have different text colors for readability. Dividing the XHTML to make this easy to access within CSS makes sense, but it makes editing the web site more difficult.
Secondly, I wanted to be able to change the layout of the site more radically, and with less work, than in the past. And a short-term goal was to produce the site in two formats: as XHTML/CSS, and as HTML with no CSS, for older browsers. By pulling out all the site's content into XML, I could just apply a different XSL transformation to the content, without rewriting the content itself.
The basic idea is that I write XML files in a text editor, then apply one or more XSL transformations to produce the target web site. The XSL also prepares things such as the navigation boxes on the left of the screen--linking to the major sections of the site, other pages within this topic, sections within the page, related pages, and external sites.
The XML structure itself is very simple, with only a handful of document-types required for the site. The site is built from XML documents each containing a <page-content> element. Each of these represents one 'page' of information. The page contains one or more <section> elements, each of which has a heading. Each section contains a single piece of content, which can have one or more paragraphs. Each page also has a main title, a topic, a unique key, a short text used for links.
The XSL does respect any HTML tags embedded in the content, however, in some cases I replaced with my own tags to have more control over the transformation in XSL. That is, a <p> tag should always be dropped into the XHTML as a <p>, but a <para> tag might have a <div> around it for the CSS to control. In general, though, I allowed myself the freedom to embed HTML in the content so that I could easily create tables when necessary, without having to define some special-case XML/XSL transformation for formatting.
Linking is handled as a special case. I can link between two pages in the site, or to external sites, using a regular HTML <a> (anchor) tag. However, I decided while writing to use my own <link> tag. A <link> can have an explicit href attribute, like <a>, but it can also opt to use a key element to name another page within the system. This way the XSL transformation resolves the reference to the target page, and pages can be moved around between subdirectories without breaking links. Using <link> I can also have the target pages declare the preferred text to use while linking to them, for consistency across the site; I can easily build a list of "see also" references for the current page (and easily distinguish local from external links)...and so on. In any case, it acts just like an HTML <a> anchor, but with some special powers.
The XSL transformation builds a set of links for each section in the page, so a navigation bar (on the left) allows you to jump to sections within the page--useful if the page is long. Using the topic for the page, it also builds links to any other pages within a topic. Originally, I had written this technical note as one long page; I then split it up into multiple pages, with the same topic "technotes". The links to different parts of the technote are built automatically.
As I write, there is only one level of categorization allowed: pages belong to topics, and there is one master list of topics across the whole site. Obviously, if the site grows, I will probably want to have nested topics, so that "Politics" can have "American" and "European" as sub-topics. I don't think this will be hard to do, but I will wait until I need it to build that functionality.
Also on the issue of topics: it seems like a useful word and concept when the content on a web site is topical. So--current news is one topic, "my travels" is another, and so on. It's not clear that a contact page is a topic. Right now I'm just squeezing all the content so it belongs to a topic, even if that isn't completely logical. Later I may need to expand the scheme to be more general.
Here is a sample of the XML used to produce this page--cropped a little bit (and spaced to take up less space horizontally on this page):
<?xml version="1.0"
encoding="UTF-8"?>
<!DOCTYPE page-content
SYSTEM "../dtd/page-content.dtd" >
<page-content
topic="technotes"
key="xhtml-and-css"
seq-in-topic="1" >
<title>Building the Web Site: XHTML and XSS</title>
<link-text>This Site: XHTML/CSS</link-text>
<section
heading="XHTML and CSS"
key="xhtml_css"
short="XHTML and CSS">
<sec-content>
<para>
XHTML is a standard created
by the <link key="w3c">W3C</link>
to replace...
</para>
</sec-content>
</section>
</page-content>
The <para> tag would contain larger blocks of text, and make up the bulk of the XML file in the actual pages; by clipping this out, it may be misleading, as the surrounding information looks voluminous. In the actual pages, it isn't, and much of the surrounding tags are boilerplate anyway.
Note that the <link> element will be replaced in the output by a hyperlink in HTML, to an external site for the W3C. In this case, because the href is explicit, the link is more or less like using an <a> tag directly. However, in the case of jEdit, I use the alternate version <link key="jedit" /> where the 'key' resolves not to another page on this local site, but is looked up from a master list of external sites, in XML. That master file is maintained by hand (currently) and again just gives me some central control if I need to replace the link text, change the standard text used when the link is displayed, and so on.
Building this Website with XML and XSL