Treat documentation as source code

3 minute read

Published:

Title: A Collaborative Scientific Document Authoring workflow using hosted Git

Date: 2020-08-26

Hosted by: Code and Supply Pittsburgh Meetups (https://www.meetup.com/Pittsburgh-Code-Supply)

Speaker: Colin Dean

Summary and thoughts:

This was the first time I’ve attened a talk from Code and Supply Pittsburgh. The talk by Colin Dean about collaborative documentation and tools was informative and interesting, as well as particularly relevant. We all are heavily involved in documentation of work and presentation of results (even without publications). Colin Dean presented his experience with documenting a white paper with engineers and data scientisits as the major stakeholders. The goal of the paper was a content-focused documentation that could be understood and used by all - a major requirement for reproducible research. The first part of the talk focused on the need for good documentation, goals of different members of the team and the overall requirements, while the second part discussed the various tools used in detail.

Taking the analogy of a research project, before starting the actual work (i.e. writing), there must be a study design and architecture defined, as well as the user requirements. The same holds for good documentation, which can make or break the project itself. Common requirements for good documentation include a standardized format/style, reviewable, version control, automated rendering, and ease of use (for both humans and computers). I am quite familiar with documentation on Github as I’ve used this extensively for my projects. However, the combination of using git, pandoc (which was new to me) and Drone CI was compelling. The workflow with these tools was clearly defined, with each tool having specific tasks. For instance, version control using git is convenient while the open-source pandoc modules are a good choice of input and output, making it a solid choice for markup. While using 4 different tools just for documentation may seem too much, the speaker addressed the concern by pointing out the haphazard way in which most of our documentation ends up. I believe this is true as even with the best intentions, my documentation ends up in multiple formats (pdf, docx, Google docs, git markdown) and locations (local system, remote system, Github, Box etc.). I can see the benefit to using a set number of tools for collaborative documentation that can involve all people with different functions in the team. Colin Dean also delineated the pain points of the process - namely the dependencies, learning curve, rendering, and conversion between different document formats.

While this approach for collaborative writing seems useful to me, I would be a little concerned if the team involved persons other than engineers and data scientists, such as if the need is not just for documentation of code. It is well known that not everyone is comfortable with LaTeX, much less a collection of tools thrust upon them for writing. However, I believe it is worth exploring these options for future work and I intend to do this - at least enough to figure out if it is worth overcoming the pain points.