I’m currently collaborating on a paper. My collaborator and I are writing the paper using LaTeX and we’re using git to track and share changes to the manuscript. We currently have a shared repository on GitHub.
GitHub has a lot of great features for collaborating on software — after all that’s why it was developed. The “Issues” features in a repository is a particularly useful feature. This allows you to discuss problems, and track the resolution of those problems. Text formatting is supported in GitHub Issues using Markdown. In many flavors of markdown, you can also embed math using LaTeX syntax. Unfortunately, GitHub flavored markdown does not support math. This is probably fine for the vast majority of software projects. However, it is a problem when we’re trying to discuss a mathematical model.
Several people on the internet have suggested various solutions to this shortcoming. Some have suggested using an external engine to render your math as an image, then embed that image in markdown. This works, but I think it’s cumbersome.
Several others have suggested using a Jupyter Notebook, which GitHub does actually render. I think that this is a better solution, and this is the solution that I’m planning on using with my collaborator.
In our git repository, I’m creating a folder called
this folder is a set of Jupyter Notebooks, one per issue. Each collaborator
can review these Notebooks, which conveniently get rendered on the
GitHub web interface.
When a collaborator has something to add to the issue, they can fire up
their Jupyter instance and make some changes — either by adding new
cells to the bottom of the notebook, or making changes to the existing text —
and committing and pushing the changes. We’ve adopted the practice of
starting each cell with a heading with the name of the author of that cell.
This way, the Notebook looks a bit like a conversation.
Launching Jupyter Notebooks
We’re using a conda environment for Python so that we’re synced up on the versions of each package we’re using. So, the first step will be creating the conda environment from the environment YAML file. In our case, this would look like this:
conda env create -f environment.yml
This only needs to be done once on each computer. Once that’s been done, you just need to activate the environment. This is basically just telling your terminal that you want to use that version of Python. This can be accomplished like the following (obviously, replace the name of the environment with the correct name):
conda activate my-environment
Now, you can launch the Jupyter Notebook session using the following. Your web browser should pop up and allow you to create new notebooks and edit existing notebooks in the browser once you run this command.
Collaborating on Issues
The Jupyter Notebook interface is relatively straight forward and doesn’t need much discussion here. Most of the important features are available through the menus. There are keyboard shortcuts that come in handy, which can be found here.
Jupyter notebooks comprise a set of cells. The basic types of cells are markdown, code and raw. We’ll ignore raw cells here. Markdown cells contain text styled using markdown syntax. Code cells contain executable code. In our case, this will all be Python code.
If there is any code in the notebook, it’s important to realize that it
runs interactively. You execute one code cell at a time. You don’t have to
execute them in order either. So, if the code has side effects — like
changing a global variable — the order that you run the cells in makes a
difference. I think it’s good practice to restart your Python interpreter
and re-run all the cells before committing a notebook in git. To do this, just
Restart & Run All. This guarantees that the cells were
run in order and have repeatable output.
The other advantage to restarting the kernel and re-running all the cells before committing is to avoid extraneous changes being tracked by git. The notebook files include a counter indicating the order in which the cells were executed. The first cell to be executed will have a counter value of 1, the second will have a value of 2, etcetera. If you execute the first five cells, then execute the first one again, it will now have a counter value of 6. If you’ve been playing around with a notebook for a while, all those counters will be incremented even higher. Even if you make no real changes to the notebook, git will register these counter changes as changes that need to be committed and tracked. You really only want the real changes to be tracked, and the easiest way to do this is to ensure that the code cells are executed in order starting from an execution count of one.
Closing an Issue
When it’s time to close an issue, whomever closes the issue simply
moves the Jupyter Notebook discussing the issue to a folder called
issues-closed. This should be a
git-mv so that the history is maintained.
As an example, to close the issue discussed in the Notebook
reorder-model-development.ipynb, the command would be:
git mv issues-open/reorder-model-development.ipynb issues-closed/