iamcal.com

4

Previewing DocBook books in HTML on Windows

By Cal Henderson, July 28th 2009.

Introduction

I wrote my book in DocBook, an XML format for writing structured documents, which makes it easy to convert them into multiple output formats. This is very useful when you want to publish the same book for print, PDFs and on the web (all of which O'Reilly does).

If you're an idiot like me, you'll write your book entirely in a notepad clone you wrote in 2001, creating all the XML by hand. This is great from a control point of view - every tag and every piece of formatting is under my control and I know exactly how my output is going to be styled. No chunk of test wrongly formatted, etc.

This is all very well, but it doesn't allow you to get a good feel for the structure of the book you're writing. It's a book, after all, not a series of XML files, linked via XInclude. Luckily, it's possible to turn DocBook files into HTML in the comfort of your own home. It's a little bit tricky to get up and running, so after the second time going through it, I thought I'd share how it's done.

I'm a Windows user, because I enjoy pain, so these instructions are Windows-centric. However, all of these steps will work on Mac/Linux too.

XSLT - The Magic Glue

Because DocBook is XML, it can be formatted with XSLT, using an XSLT processor. There are a bunch of them, but in theory the fastest one that supports XInclude is libxslt. XInclude allows you to include one XML file in another - when writing a book it allows you to have one file per chapter and a single 'linking' document that brings it all together; this is much more manageable than having a single monolithic file. libxslt is written in C, so you'll need to either download the source (libxslt-git-snapshot.tar.gz is the latest) and compile it, or find binaries for your platform. There are windows binaries here (libxslt-1.1.24.win32.zip) which work great. Just unzip it somewhere useful, like C:\docbook\libxslt\.

The next step is getting hold of the XSLT to turn your DocBook files into HTML. Luckily, these are provided by the folks at docbook.org and downloadable on sourceforge. The trick is downloading the right thing - you're looking for a section called docbook-xsl which contains different versions. Pick the latest one (1.75.2 at the time of writing) and download the zip file called something like docbook-xsl-1.75.2.zip. You do not want to download the docbook-xsl-doc package - this does not contain any XSL! Unzip the file you downloaded to somewhere like C:\docbook\xsl\.

Putting it all together

Unfortunately, the documentation for the XSL stylesheets are a little verbose and confusing. If you want to output HTML in 'chunked' format (multiple HTML pages, instead of one monolithic document), you'll want to use html/chunk.xsl as your stylesheet. I like to set up a batch file to do the conversion, so that I don't have to mess with commands each time I make a change. Create compile.bat in the same folder as your book's XML, and put this in it:

c:\docbook\libxslt\bin\xsltproc.exe c:\docbook\xsl\html\chunk.xsl book.xml
pause

The pause command tells the batch file to prompt you to hit return. This is super useful, as it allows you to see any errors that come up during the running - without it, the batch windows closes immediately.

When you double-click on the batch file, you'll get ... nothing. It hangs. It turns out you need to pass a few arguments to libxslt to make it work as intended.

--nonet - don't try to fetch DTDs from the web - this is what causes the hanging
--novalid - don't try and validate the xml against the DTD - this slows down processing
--xinclude - process XInclude statements - without this, our book will appear empty
--timing - I like to add this option so that I can see how long it took

If you modify the batch file to add these parameters, it will then output a bunch of HTML files. Success! Unfortunately, by default the XSL will create an HTML document per 1st-level section, which will mean multiple files per chapter. This probably isn't what you want, but luckily there's a parameter you can pass (if you find the magical documentation) called chunk.section.depth. If defaults to 1, but you can in fact set it to 0 to set the chunk level to per-chapter.

Our final batch file looks like this:

c:\docbook\libxslt\bin\xsltproc.exe --nonet --stringparam chunk.section.depth 0^
 --novalid --xinclude --timing  c:\docbook\xsl\html\chunk.xsl book.xml
pause

Running this will create a preview of your book, perfect for proof reading. Enjoy!

Post-Script

I'm not a DocBook expert my any means. If you spot any glaring mistakes or omissions, drop me an email and teach me: cal [at] iamcal.com

Copyright © 2009 Cal Henderson.

The text of this article is all rights reserved. No part of these publications shall be reproduced, stored in a retrieval system, or transmitted by any means - electronic, mechanical, photocopying, recording or otherwise - without written permission from the publisher, except for the inclusion of brief quotations in a review or academic work.

All source code in this article is licensed under a Creative Commons Attribution-ShareAlike 3.0 License. That means you can copy it and use it (even commerically), but you can't sell it and you must use attribution.

4 people have commented

libor
# March 19, 2010 - 12:24 pm PST
thanks much for this Cal. Worked like a charm. Been trying to figure this out for a while, with no luck.
sashan
# June 12, 2010 - 6:38 pm PST
I tried this but ended up in dll hell. First xsltproc complained that it couldn't find libxml2.dll. Sorted that out then tried again, and it complained that it couldn't find iconv.dll. Sorted that out and then it complained it couldn't find zlib1.dll .... after which I gave up. I think I'll revert to using docbook under Linux.
Cal
# June 13, 2010 - 10:14 am PST
you can get all of the dependencies here: <a href="ftp://ftp.zlatkovic.com/libxml/">ftp.zlatkovic.com/libxml</a>
Matt
# May 11, 2011 - 2:32 pm PST
Cal I can understand how much pain you took for this. Once again great work man.
- Matt

Leave your own comment

Comments have been disabled