The Importance of an Automated Documentation Process

I imagine I’m like many people who dislike writing documentation. Forget the documentation, I often think to myself. Good code should be enough! The temptation to skip documentation in the hope that people will simply read and understand code is ever-present. However, as the project increases in scale and the number of lines increases from a few hundred to tens of thousands, it will eventually become unmanageable. It doesn’t take a computer scientist to figure out that a lack of documentation will come back to bite you.

You might be able to get away with such irresponsible behavior in small projects where you’re the sole programmer, but if you plan on expanding and allowing contributions, documentation becomes so much more important. By forcing myself to document, I had found that it helps me expose bad design. It’s a bit like re-reading an email and finding all the bad grammar and spelling mistakes. There’s nothing like trying to explain how something works in getting you to realize all the flaws you had overlooked. After a while, you discover that there’s a certain degree of satisfaction that comes from designing something well enough to produce clear and beautiful documentation.

However, documentation can also be a double-edged sword. If you over-document, you will be constantly out of date because your software is constantly changing. In that case, your documentation doesn’t serve to inform, it serves to confuse. The development process never ends as functionality is always being added or amended. In the face of constant change, how do you keep documentation up-to-date? Fortunately, this problem has been tackled by the software engineers that created automated documentation tools like Sphinx and Doxygen. These are tools that scan through your code and automatically pull out embedded comments to produce your documentation. The tools depend on certain annotations in your comments and can format the documentation accordingly. What’s great about this method is that the documentation lives alongside your code. If you comment your code correctly, you will produce good documentation.

By “commenting correctly” I mean that comments document intent while the code itself documents function. In this way, you don’t need to constantly rewrite the documentation so long as intent hasn’t changed. If you’re relying on the code itself to document function, that will also lead to a certain attitude for how you approach programming– from the way you name variables to the way you define functions and design your object models. You will want your naming and design to inform understanding in the most transparent and obvious way possible and you use comments to augment the understanding of the intent behind design. So in many ways, laziness can be a virtue in a programmer when paired with a strong desire for elegance and efficiency. But I digress. The topic of this post isn’t to discuss the finer points of code-craft. It is to discuss why automated documentation is essential and what tools I’m using to achieve it.

Sphinx

Sphinx is an automated documentation tool for Python. If you write your docstrings in rst format, Sphinx understands the formatting annotations and can render the documentation accordingly. You have options for multiple output formats, which include html and pdf among others. It can understand LaTeX to render beautifully typeset math formulas if you need it. To produce documentation, I’m using Sphinx to generate html pages, which are hosted on Github Pages.

To begin understanding Sphinx’s templating features, it’s elucidating to look at an existing theme. If you don’t already have Sphinx installed, you can do so using pip:

pip install sphinx

Once you have it, you can find where the source files have been placed:

pip show sphinx

On my machine, it was installed to /usr/local/lib/python3.5/site-packages. Within that directory, you will find Sphinx/themes, and within that directory you will find the included themes. Each theme will have the same set of template html files. For example, here’s what you will find in the basic directory:

Thomas@Ixion:/usr/local/lib/python3.5/site-packages/sphinx/themes/basic$ ls
changes/              genindex-split.html   localtoc.html         search.html           static/
defindex.html         genindex.html         opensearch.xml        searchbox.html        theme.conf
domainindex.html      globaltoc.html        page.html             searchresults.html
genindex-single.html  layout.html           relations.html        sourcelink.html

To get started creating your own template, you can make a copy of this entire directory and put it into a _templates subdirectory within your project’s documentation directory. For example, Quantum’s documentation source can be found in QuantumDocs/src and I put the templates in QuantumDocs/src/_templates.

Github pages

Github Pages is a free web-hosting service provided by Github that uses their internally developed Jekyll static site generator. The best I can muster to describe it is that Jekyll is a html file generator that traverses your directory structure looking for certain files and certain reserved tags within those files to fill them with content in the generated html files. So to work with Jekyll, you need to understand some of these mechanics and related syntax. Under the hood, Jekyll uses the same templating technology that Sphinx does, which is Jinja. Jekyll documentation is great and I’ve linked it above. You should have no trouble getting started if you spend 15 minutes looking over their getting-started guide.

Jinja is a templating language that allows you to generate documents using variables as placeholders for content. It looks like this:

<html>
  {% my_content %}
</html>

You might have some static text in the document like the <html> tags serving as a skeleton and some Jinja variables mixed in where you want to have content filled in. Jinja also has facilities for some programmatic logic like if-statements and loops.

Although both Jekyll and Sphinx use Jinja, the problem that you’ll quickly encounter is that Jekyll is not designed to automatically produce documentation from your code in the same way that Sphinx is. Jekyll’s strength is as a blogging platform. You write your blog posts in markdown and it is automatically posted to your site when you commit it to your repository. Jekyll is fairly simple in the way it works and once you understand the basics, it’s super easy to customize your site to your liking. Simply change a couple of html templates and you can completely change the way your site looks. I find the templating and design features of Jekyll to be superior to Sphinx.

So in creating Quantum’s project site, I sought to exploit the strengths of both systems. I wanted to use Jekyll to control styling, layout, and overall design, as well as use it for the automated blogging features. At the same time, I wanted Sphinx to generate documentation automatically and do so in a way that would feed into Jekyll’s layout system so that there would be a consistent look and feel throughout the site.

Putting it all together

I knew I had to get Sphinx to generate html in a Jekyll-friendly format– that is, it would have front matter as well as the proper variable references in the pages where appropriate. It’s difficult to find beginner-level information about Sphinx’s templating features. There is documentation on the site, but it assumes you know Jinja, which I didn’t. (I ended up learning enough to get by.) When you first try to put a Sphinx generated page on a Jekyll site, it will look like a basic html page with absolutely no styling. It will simply be a white page with your documentation text. This is because Sphinx uses special directories for it’s assets prepended with an underscore. In the Jekyll world, any underscored directories will be ignored for building your pages. There is an option to get around this, but to do so you must completely disable Jekyll processing. To do this, you can include an empty file named .nojekyll in your root directory. Once you do that, your Sphinx site will show up with the Sphinx-generated styling and you will have no access to Jekyll features.

This obviously was not what I wanted. So I had to figure out how to modify the Sphinx templates. I found that most of the changes could simply be made to the _templates/layout.html file within my QuantumDocs directory. With the templates modified, Sphinx was now generating Jekyll-aware html files. This means that when the Jekyll processor runs, it will pick up the Sphinx-generated files, which will include the necessary information for Jekyll to render it correctly.