kloppenborg.cahttps://www.kloppenborg.ca/2020-07-22T14:20:41-04:00cmstatr: Composite Material Data Statistics in R2020-07-22T14:20:41-04:002020-07-22T14:20:41-04:00Stefan Kloppenborgtag:www.kloppenborg.ca,2020-07-22:/2020/07/cmstatr/<p>From what I&#8217;ve seen, a lot of the statistical analysis of data from composite material data is done in <span class="caps">MS</span> Excel. There are a number of very good tools for doing this analysis in <span class="caps">MS</span> Excel: <a href="https://www.niar.wichita.edu/agate/Documents/default.htm"><span class="caps">ASAP</span></a>, <a href="http://www.niar.wichita.edu/coe/NCAMP_Documents/Programs/HYTEQ%20Feb%207%20%202011.xls"><span class="caps">HYTEQ</span></a>, <span class="caps">STAT</span>-17, and more recently, <a href="https://www.cmh17.org/RESOURCES/StatisticsSoftware.aspx"><span class="caps">CMH17</span>-<span class="caps">STATS</span></a>. I expect that the …</p><p>From what I&#8217;ve seen, a lot of the statistical analysis of data from composite material data is done in <span class="caps">MS</span> Excel. There are a number of very good tools for doing this analysis in <span class="caps">MS</span> Excel: <a href="https://www.niar.wichita.edu/agate/Documents/default.htm"><span class="caps">ASAP</span></a>, <a href="http://www.niar.wichita.edu/coe/NCAMP_Documents/Programs/HYTEQ%20Feb%207%20%202011.xls"><span class="caps">HYTEQ</span></a>, <span class="caps">STAT</span>-17, and more recently, <a href="https://www.cmh17.org/RESOURCES/StatisticsSoftware.aspx"><span class="caps">CMH17</span>-<span class="caps">STATS</span></a>. I expect that the reason for the popularity of <span class="caps">MS</span> Excel for this application is that everyone in the industry has <span class="caps">MS</span> Excel installed on their computer and <span class="caps">MS</span> Excel is easy to&nbsp;use.</p> <p>If you&#8217;ve read my blog before, you&#8217;ll know that I think that <a href="https://www.kloppenborg.ca/2019/02/reproducibility/">reproducibility</a> is important for engineering calculations. In my view, this includes statistical analysis. If the analysis isn&#8217;t reproducible, how does a reviewer &#8212; either now or in the future &#8212; know if it&#8217;s&nbsp;right?</p> <p>The current <span class="caps">MS</span> Excel tools are typically password protected so that users can&#8217;t view the macros that perform the calculations. I suspect that this was done with the best of intentions in order to prevent users from changing the code. But it also means that users can&#8217;t verify that the code is correct, or check if there are any unstated assumptions&nbsp;made.</p> <p>To allow statistical analysis of composite material data using open-source software, I&#8217;ve written an package for the <a href="https://www.r-project.org/">R programming language</a> that implements the statistical methods described in <a href="https://www.cmh17.org"><span class="caps">CMH</span>-17-1G</a>. This package, <a href="https://www.cmstatr.net"><code>cmstatr</code></a> has been released on <a href="https://cran.r-project.org/package=cmstatr"><span class="caps">CRAN</span></a>. There is also a brief discussion of this package in a <a href="https://doi.org/10.21105/joss.02265">paper published in the Journal of Open Source Software</a>.</p> <p>This R package allows statistical analysis to be performed using open-source tools &#8212; which can be verified by the user &#8212; and facilitates statistical analysis reports to be written at the same time that the analysis is performed by using <code>R-Notebooks</code> (see my <a href="https://www.kloppenborg.ca/2019/06/pandoc-report-templates/">earlier post</a>).</p> <p>I&#8217;ve tried to write the functions in a consistent manner so that it&#8217;s easier to learn how to use the package. I&#8217;ve also written functions to work well with the <a href="https://www.tidyverse.org/"><code>tidyverse</code></a> set of&nbsp;packages.</p> <p>There are some examples of how to use the <code>cmstatr</code> package in <a href="https://www.cmstatr.net/articles/cmstatr_Tutorial.html">this vignette</a>.</p> <p>I hope that people find this package useful. If you use this package and find a bug, have feedback or would like a feature added, please raise an issue on <a href="https://github.com/ComtekAdvancedStructures/cmstatr/issues">GitHub</a>.</p>Tracking Issues using Jupyter Notebooks2020-04-13T01:38:34-04:002020-04-13T01:38:34-04:00Stefan Kloppenborgtag:www.kloppenborg.ca,2020-04-13:/2020/04/tracking-issues/<p>I&#8217;m currently collaborating on a paper. My collaborator and I are writing the paper using LaTeX and we&#8217;re using git to track and share changes to the manuscript. We currently have a shared repository on <a href="https://www.github.com">GitHub</a>.</p> <p>GitHub has a lot of great features for collaborating on software &#8212; after …</p><p>I&#8217;m currently collaborating on a paper. My collaborator and I are writing the paper using LaTeX and we&#8217;re using git to track and share changes to the manuscript. We currently have a shared repository on <a href="https://www.github.com">GitHub</a>.</p> <p>GitHub has a lot of great features for collaborating on software &#8212; after all that&#8217;s why it was developed. The &#8220;Issues&#8221; features in a repository is a particularly useful feature. This allows you to discuss problems, and track the resolution of those problems. Text formatting is supported in GitHub Issues using Markdown. In many flavors of markdown, you can also embed math using LaTeX syntax. Unfortunately, GitHub flavored markdown <a href="https://github.com/github/markup/issues/274">does not support math</a>. This is probably fine for the vast majority of software projects. However, it is a problem when we&#8217;re trying to discuss a mathematical&nbsp;model.</p> <p>Several people on the internet have suggested various solutions to this shortcoming. Some have suggested using an external engine to render your math as an image, then embed that image in markdown. This works, but I think it&#8217;s&nbsp;cumbersome.</p> <p>Several others have suggested using a <a href="https://jupyter.org">Jupyter Notebook</a>, which GitHub does actually render. I think that this is a better solution, and this is the solution that I&#8217;m planning on using with my&nbsp;collaborator.</p> <h1>Implementation&nbsp;Summary</h1> <p>In our git repository, I&#8217;m creating a folder called <code>issues-open</code>. Inside this folder is a set of Jupyter Notebooks, one per issue. Each collaborator can review these Notebooks, which conveniently get rendered on the GitHub web interface. When a collaborator has something to add to the issue, they can fire up their Jupyter instance and make some changes &#8212; either by adding new cells to the bottom of the notebook, or making changes to the existing text &#8212; and committing and pushing the changes. We&#8217;ve adopted the practice of starting each cell with a heading with the name of the author of that cell. This way, the Notebook looks a bit like a&nbsp;conversation.</p> <h1>Launching Jupyter&nbsp;Notebooks</h1> <p>We&#8217;re using a <a href="https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html">conda environment</a> for Python so that we&#8217;re synced up on the versions of each package we&#8217;re using. So, the first step will be creating the conda environment from the environment <span class="caps">YAML</span> file. In our case, this would look like&nbsp;this:</p> <div class="highlight"><pre><span></span>conda env create -f environment.yml </pre></div> <p>This only needs to be done once on each computer. Once that&#8217;s been done, you just need to activate the environment. This is basically just telling your terminal that you want to use that version of Python. This can be accomplished like the following (obviously, replace the name of the environment with the correct&nbsp;name):</p> <div class="highlight"><pre><span></span>conda activate my-environment </pre></div> <p>Now, you can launch the Jupyter Notebook session using the following. Your web browser should pop up and allow you to create new notebooks and edit existing notebooks in the browser once you run this&nbsp;command.</p> <div class="highlight"><pre><span></span>jupyter notebook </pre></div> <h1>Collaborating on&nbsp;Issues</h1> <p>The Jupyter Notebook interface is relatively straight forward and doesn&#8217;t need much discussion here. Most of the important features are available through the menus. There are keyboard shortcuts that come in handy, which can be found <a href="https://towardsdatascience.com/jypyter-notebook-shortcuts-bf0101a98330">here</a>.</p> <p>Jupyter notebooks comprise a set of cells. The basic types of cells are markdown, code and raw. We&#8217;ll ignore raw cells here. Markdown cells contain text styled using <a href="https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Working%20With%20Markdown%20Cells.html">markdown syntax</a>. Code cells contain executable code. In our case, this will all be Python&nbsp;code.</p> <p>If there is any code in the notebook, it&#8217;s important to realize that it runs interactively. You execute one code cell at a time. You don&#8217;t have to execute them in order either. So, if the code has side effects &#8212; like changing a global variable &#8212; the order that you run the cells in makes a difference. I think it&#8217;s good practice to restart your Python interpreter and re-run all the cells before committing a notebook in git. To do this, just click <code>Kernel</code> / <code>Restart &amp; Run All</code>. This guarantees that the cells were run in order and have repeatable&nbsp;output.</p> <p>The other advantage to restarting the kernel and re-running all the cells before committing is to avoid extraneous changes being tracked by git. The notebook files include a counter indicating the order in which the cells were executed. The first cell to be executed will have a counter value of 1, the second will have a value of 2, etcetera. If you execute the first five cells, then execute the first one again, it will now have a counter value of 6. If you&#8217;ve been playing around with a notebook for a while, all those counters will be incremented even higher. Even if you make no real changes to the notebook, git will register these counter changes as changes that need to be committed and tracked. You really only want the real changes to be tracked, and the easiest way to do this is to ensure that the code cells are executed in order starting from an execution count of&nbsp;one.</p> <h1>Closing an&nbsp;Issue</h1> <p>When it&#8217;s time to close an issue, whomever closes the issue simply moves the Jupyter Notebook discussing the issue to a folder called <code>issues-closed</code>. This should be a <code>git-mv</code> so that the history is&nbsp;maintained.</p> <p>As an example, to close the issue discussed in the Notebook <code>reorder-model-development.ipynb</code>, the command would&nbsp;be:</p> <div class="highlight"><pre><span></span>git mv issues-open/reorder-model-development.ipynb issues-closed/ </pre></div>Package Adequacy for Engineering Calculations2019-06-29T17:11:00-04:002019-06-29T17:11:00-04:00Stefan Kloppenborgtag:www.kloppenborg.ca,2019-06-29:/2019/06/adequacy/<p>If you do engineering calculations or analysis using a language like <a href="https://www.r-project.org/">R</a> or <a href="https://www.python.org/">Python</a>, chances are that you&#8217;re going to use some packages. Packages are collections of code that someone else has written that you can use in your code. For example, if you need to solve a system …</p><p>If you do engineering calculations or analysis using a language like <a href="https://www.r-project.org/">R</a> or <a href="https://www.python.org/">Python</a>, chances are that you&#8217;re going to use some packages. Packages are collections of code that someone else has written that you can use in your code. For example, if you need to solve a system of linear equations by inverting a matrix and you&#8217;re using Python, you might use <a href="http://www.numpy.org/">numpy</a>. Or if you&#8217;re using R and you need to fit a linear model to some data, you would probably use the <a href="https://stat.ethz.ch/R-manual/R-devel/library/stats/html/00Index.html">stats</a>&nbsp;package.</p> <p>If you&#8217;re involved in &#8220;engineering,&#8221; you need a high level of confidence that the results that you&#8217;re getting are correct. Note that in this post &#8212; and my blog in general &#8212; that when I say &#8220;engineering,&#8221; I don&#8217;t mean <em>software engineering:</em> I mean design and analysis of of structures or systems that have an effect on safety. I work in civil aeronautics, mainly dealing with composites, but also dealing with metallic structure regularly. Depending on the particular type of engineering that you&#8217;re engaged in and the particular problem at hand, the consequences of getting the wrong answer could be fairly severe. You better be sure that both the interpreter and the packages are correct. Probably the best way to do this is to validate the results using another method: are there published results for a similar problem that you can use as a benchmark? Perhaps you can do some physical testing? But even if you&#8217;re doing you due diligence and validating the results somehow, you will still waste a lot of your time if there were a problem with either the interpreter or one of the&nbsp;packages.</p> <p>Compiled languages &#8212; like <code>C</code> or <code>FORTRAN</code> &#8212;are compiled into machine code that runs directly on the processor. Interpreted languages, like Python, R or JavaScript, are not compiled into machine code, but instead an interpreter (a piece of software) reads each line of code and figures out how to run it when you run the code (not ahead of time). As far as interpreters go, if you&#8217;re using <code>CPython</code> (the &#8220;standard&#8221; Python interpreter) or <code>GNU-R</code> (the &#8220;standard&#8221; R interpreter), I think there is a rather low risk that there are any errors in the interpreter. These interpreters are written by a bunch of smart people, and both are open source, so the code that makes up the interpreters themselves are read by a much larger group of smart people. Furthermore, both interpreters are widely used and have been around for a while, so it&#8217;s very likely that significant bugs that are likely to change the result of an engineering calculation would have been found by users by now and would have been&nbsp;fixed.</p> <p>Packages are more of a risk than interpreters are. Again, if you&#8217;re using a very widely used package that has been around for a while, like <code>numpy</code> (in Python) or <code>stats</code> (in R), there&#8217;s a pretty good chance that any bugs that would affect your calculations would have been found by now &#8212; and packages like these are maintained by groups of dedicated&nbsp;people.</p> <p>If you&#8217;re using R, chances are that you&#8217;re getting your packages from <a href="https://cran.r-project.org/"><span class="caps">CRAN</span></a>. You should be reading the <span class="caps">CRAN</span> page for the package that you&#8217;re using. You can find an example of such a page <a href="https://cran.r-project.org/package=MASS">here</a>. There are a few things that you should look for to help you evaluate the reliability of the package (in addition to reference manual and any vignettes that explain how to use the package). The first is the priority of the package. Not all packages have a priority, but if the priority is &#8220;base&#8221; or &#8220;recommended,&#8221; the package is maintained by the r-core team and is almost certainly used by a lot of people. You can be fairly comfortable with these&nbsp;packages.</p> <p>The second thing that you should look at on the <span class="caps">CRAN</span> page for a package is the <span class="caps">CRAN</span> Checks. <span class="caps">CRAN</span> will test all the packages every time a new version of R is released and it tests all the packages routinely to determine if a change in one package caused errors in another packages. You can see an example <span class="caps">CRAN</span> Check for my package <code>rde</code> <a href="https://cran.r-project.org/web/checks/check_results_rde.html">here</a>.</p> <p>This practice is called <a href="https://en.m.wikipedia.org/wiki/Continuous_integration">continuous integration</a>. It does all of these checks on several different operating systems &#8212; Windows, <span class="caps">OSX</span>, and several Linux distributions. If you open the <span class="caps">CRAN</span> Checks results for a package, you&#8217;ll see a table of all the various combinations of R version and operating system that have been tested along with the amount of time that it took to run the test and a status for each. If the Status is &#8220;<span class="caps">OK</span>,&#8221; then there were no errors identified. If the Status is &#8220;<span class="caps">NOTE</span>,&#8221; &#8220;<span class="caps">WARNING</span>,&#8221; or &#8220;<span class="caps">ERROR</span>.&#8221; There might be something wrong and it may or may not be serious. If you click on the Status link, you&#8217;ll see details and can evaluate for&nbsp;yourself.</p> <p>I think that these <span class="caps">CRAN</span> checks are actually a very strong point for the R ecosystem. It ensures that package maintainers know when something outside of their package breaks their code. And, it enforces a certain level of quality: package maintainers are given a certain amount of time to fix errors, and if they don&#8217;t the package gets removed from <span class="caps">CRAN</span>.</p> <p>The <span class="caps">CRAN</span> checks do a few things. First, they check that the package can, in fact, be loaded (maybe there&#8217;s an error that prevents you from using it at all). There are a few other things that it does, but the most important in terms of reliability of the package is that the <span class="caps">CRAN</span> checks will run any test created by the package maintainer. These tests are called <a href="https://en.m.wikipedia.org/wiki/Unit_testing">unit tests</a>. They are test that determine if the code in the package actually has the expected behavior. Package maintainers don&#8217;t have to write unit tests, but the good ones do. You can look at what tests the package maintainer has written by downloading the code of the package (you can download it from <span class="caps">CRAN</span>). The test are in a folder called <code>tests</code>. Tests basically work by providing some input to the package&#8217;s functions, and checking that the result is correct. For R packages, the <code>testthat</code> framework is a popular testing framework. For packages that use the <code>testthat</code> framework, you&#8217;ll see a number of statements that use the <code>expect_...</code> family of functions. Some of these tests will likely ensure that the package works at all &#8212; checking things like the return type for functions, or that a function actually does raise certain errors when invalid arguments are passed to it. Some of the tests should also ensure that the package provides correct results. When I write tests for a package, I always write both types of tests. For the tests that ensure that the results are correct, I often either check cases that have closed-form solutions, or check that the code in the package produces results that are approximately equal to example results published in articles or books. You&#8217;ll need to read through the tests to decide if they provide enough assurance that the package is&nbsp;correct.</p> <p>If you decide that the tests for a package are not sufficient, you have three&nbsp;options.</p> <ul> <li>You could choose not the use that package: maybe there is another that does something&nbsp;similar.</li> <li>You can write tests yourself and contribute those tests back to the package maintainer. After all, R packages are open-source and users are encouraged to contribute back to the community. Most package maintainers would be happy to receive a patch that adds more tests: writing tests is not fun, and most people would be grateful if someone else offers to do&nbsp;it.</li> <li>You could also manually test the package. The difficulty here is ensuring that you re-test the package every time you update the version of this package on your&nbsp;system.</li> </ul> <p>In the python world, continuous integration isn&#8217;t as well integrated into the ecosystem. Most packages that you install probably come from PyPI. As far as I know, PyPI doesn&#8217;t do any continuous integration: it&#8217;s up to the package maintainer to run their tests regularly. Package maintainers can do one of two things: they can run the tests on their own machine before releasing a new version to PyPI, or they can use a continuous integration service like Travis-<span class="caps">CI</span> or CircleCI. Many of the continuous integration services provide the service for free for open source projects, so many Python packages do use a continuous integration services. Packages that use a continuous integration service normally advertise it in their <span class="caps">README</span> file. You&#8217;ll still need to assess whether the tests are adequate, and if the package doesn&#8217;t use continuous integration, you&#8217;ll have to either run the test yourself, or trust that the package maintainer&nbsp;did.</p> <p>If you have already written tests for your package, setting up continuous integration using Travis-<span class="caps">CI</span> is quite straight forward. I haven&#8217;t personally used CircleCI, but I would imagine that it&#8217;s similarly easy to use. You can see the continuous integration results from my pcakge <code>rde</code> on Travis-<span class="caps">CI</span> <a href="https://travis-ci.com/kloppen/rde">here</a>.</p> <p>Whether you&#8217;re using Python or R, there are ways of ensuring that the packages you use for engineering calculations are adequate for your needs. Some people seem to be a little bit scared of open source packages and software for engineering calculations, but in a lot of ways, open source software is actually better for this since you have the ability of verifying it yourself and making a decision about whether to use&nbsp;it.</p>Pandoc Report Templates2019-06-21T01:55:33-04:002019-10-29T16:09:56-04:00Stefan Kloppenborgtag:www.kloppenborg.ca,2019-06-21:/2019/06/pandoc-report-templates/<p>The main benefit of using Notebooks (R Notebooks or Jupyter Notebooks) is that the document is reproducible: the reader knows exactly how the results of the analysis were obtained. I wrote about the use of Notebooks in an <a href="https://www.kloppenborg.ca/2019/02/reproducibility/">earlier&nbsp;post.</a></p> <p>Most organizations have a certain report format: a certain cover …</p><p>The main benefit of using Notebooks (R Notebooks or Jupyter Notebooks) is that the document is reproducible: the reader knows exactly how the results of the analysis were obtained. I wrote about the use of Notebooks in an <a href="https://www.kloppenborg.ca/2019/02/reproducibility/">earlier&nbsp;post.</a></p> <p>Most organizations have a certain report format: a certain cover sheet layout, a certain font, a log of revisions, etcetera. For the most part, organizations have an <span class="caps">MS</span> Word template for this report format. If you want to use a Notebook for you analysis and to write your report, you have a few&nbsp;options:</p> <ul> <li>You could write front matter in <span class="caps">MS</span> Word using your company&#8217;s report template and then attach the Notebook as an&nbsp;appendix.</li> <li>You could also use <code>Pandoc</code> (more about what this is later) to convert the Notebook into a .docx file and then merge it into the report&nbsp;template.</li> <li>You could create your own <code>Pandoc</code> template to convert a Notebook directly into a <span class="caps">PDF</span> with the correct&nbsp;formatting.</li> </ul> <p>The first option of attaching a Notebook as an appendix to a report otherwise created in <span class="caps">MS</span> Word is effective but is means that you need to maintain two different files: the <span class="caps">MS</span> Word report and the Notebook itself. The second option of exporting the Notebook to <span class="caps">MS</span> Word and merging it into the template is problematic when it comes to document revisions. If the part of the analysis is revised, there is a temptation to change the affected part by either only re-exporting that section from the Notebook into docx, or worse, making the change directly in <span class="caps">MS</span> Word. In both cases, there is the possibility of breaking the reproducibility. For example, let&#8217;s say that in your report you define some constants at the beginning and do some math using these&nbsp;constants:</p> <div class="highlight"><pre><span></span><span class="n">P</span> <span class="o">=</span> <span class="mi">1000</span> <span class="n">A1</span> <span class="o">=</span> <span class="mi">2</span> <span class="n">A2</span> <span class="o">=</span> <span class="mi">4</span> <span class="n">sigma1</span> <span class="o">=</span> <span class="n">P</span> <span class="o">/</span> <span class="n">A1</span> <span class="k">print</span><span class="p">(</span><span class="n">sigma1</span><span class="p">)</span> <span class="c1"># 500</span> <span class="n">sigma2</span> <span class="o">=</span> <span class="n">P</span> <span class="o">/</span> <span class="n">A2</span> <span class="k">print</span><span class="p">(</span><span class="n">sigma2</span><span class="p">)</span> <span class="c1"># 250</span> </pre></div> <p>Now let&#8217;s say that you ask your new intern to revise the document so that <span class="math">\(P = 1200\)</span>. They just edit the <span class="caps">MS</span> Word version of the report thinking that they will save some time. They don&#8217;t notice that <span class="math">\(P\)</span> is used twice in the calculation and only update the result from the first time it&#8217;s used. Now the report&nbsp;reads:</p> <div class="highlight"><pre><span></span><span class="n">P</span> <span class="o">=</span> <span class="mi">1200</span> <span class="n">A1</span> <span class="o">=</span> <span class="mi">2</span> <span class="n">A2</span> <span class="o">=</span> <span class="mi">4</span> <span class="n">sigma1</span> <span class="o">=</span> <span class="n">P</span> <span class="o">/</span> <span class="n">A1</span> <span class="k">print</span><span class="p">(</span><span class="n">sigma1</span><span class="p">)</span> <span class="c1"># 600</span> <span class="n">sigma2</span> <span class="o">=</span> <span class="n">P</span> <span class="o">/</span> <span class="n">A2</span> <span class="k">print</span><span class="p">(</span><span class="n">sigma2</span><span class="p">)</span> <span class="c1"># 250</span> </pre></div> <p>The report is now wrong. In a simple case like this, you&#8217;ll probably notice the error when you review your intern&#8217;s work, but if the math was significantly more complex, there is probably a fairly good chance that you wouldn&#8217;t pick up on the newly introduced&nbsp;error.</p> <p>For this reason, I think that the best option is to create a <code>Pandoc</code> template for your company&#8217;s report template. This means that you&#8217;ll be creating a <span class="caps">PDF</span> directly from the Notebook. In order to revise the report, you have to re-run the Notebook &#8212; the whole&nbsp;Notebook.</p> <p>For those unfamiliar with <a href="https://pandoc.org/"><code>Pandoc</code></a>, it is a program for converting between various file formats. It&#8217;s also free and open-source software. Commonly, it&#8217;s used for converting from Markdown into <span class="caps">HTML</span> or <span class="caps">PDF</span> (actually, <code>Pandoc</code> converts to a <a href="https://www.latex-project.org/">LaTeX</a> format and LaTeX converts to <span class="caps">PDF</span>, but this happens transparently). <code>Pandoc</code> can also convert into <span class="caps">MS</span> Word (.docx) and several other&nbsp;formats.</p> <p>When I decided to create a corporate format for use with notebooks, I looked at the types of notebooks that we use. Generally, statistics are done in an <a href="https://bookdown.org/yihui/rmarkdown/notebook.html">R-Notebook</a> and other analysis is done in a <a href="https://jupyter.org/">Jupyter notebook</a>. Unfortunately, R-Notebooks and Jupyter Notebooks use different templates. R-Notebooks use <code>pandoc</code> templates, while Jupyter uses its own template. Fortunately, there is a workaround. Jupyter is able to export to markdown, which can be read by <code>pandoc</code> and translated to <span class="caps">PDF</span> using a pandoc template. Thus, I made the decision to write a <code>pandoc</code> template.</p> <p>When <code>pandoc</code> converts a markdown file to <span class="caps">PDF</span>, it actually uses LaTeX. The <code>pandoc</code> template is actually a template for converting markdown into LaTeX. <code>Pandoc</code> then calls <code>pdflatex</code> to turn this <code>.tex</code> file into a <span class="caps">PDF</span>. </p> <p>When I first started figuring out how to write a template for converting markdown to <span class="caps">PDF</span>, I thought I was going to have to write a LaTeX class or style. I got scared. LaTeX classes are not for the faint of heart. But, I soon realized that I didn&#8217;t actually have to do that. The <code>pandoc</code> template that I needed to write was just a regular LaTeX document that has some parameters that <code>pandoc</code> can fill in. I&#8217;m not sure that I could figure out how to write a LaTeX class in a reasonable amount of time, but I sure can write a document using LaTeX. This is something that I learned to do when I wrote my undergraduate thesis, and while I don&#8217;t write LaTeX often anymore, it&#8217;s really not that&nbsp;hard.</p> <p>A very basic LaTeX file would look something like&nbsp;this:</p> <div class="highlight"><pre><span></span><span class="k">\documentclass</span><span class="nb">{</span>article<span class="nb">}</span> <span class="k">\begin</span><span class="nb">{</span>document<span class="nb">}</span> <span class="k">\title</span><span class="nb">{</span>My Report Title<span class="nb">}</span> <span class="k">\author</span><span class="nb">{</span>A. Student<span class="nb">}</span> <span class="k">\maketitle</span> <span class="k">\section</span><span class="nb">{</span>Introduction<span class="nb">}</span> Some text <span class="k">\end</span><span class="nb">{</span>document<span class="nb">}</span> </pre></div> <p>A <code>pandoc</code> template is just a LaTeX file, but with placeholder for the content that <code>pandoc</code> will insert. These placeholders are just variables surrounded with dollar signs. For example, <code>pandoc</code> has a variable called <code>body</code>. This variable will contain the body of the report. We would simply put <code>$body$</code> in the part of the template where we want <code>pandoc</code> to insert the body of the&nbsp;report.</p> <p><code>Pandoc</code> also supports <code>for</code> and <code>if</code> statements. A common pattern is to check for the existence of a variable and use it if it does exist and use a default value if it does not. The syntax for this would look something&nbsp;like:</p> <div class="highlight"><pre><span></span>$if(myvar)$ $myvar$ $else$ Default text $endif$ </pre></div> <p>I&#8217;ve written the above code on multiple lines for readability, but it could be written on a single line&nbsp;too.</p> <p>Similarly, if a variable is a list, you&#8217;d use a <code>for</code> statement to iterate over the list. We&#8217;ll cover this later when we talk about adding logs of&nbsp;revisions.</p> <h1>Defining New Template&nbsp;Variables</h1> <p><code>Pandoc</code> defines a number of variables by default. However, you&#8217;ll likely need to define some variables of your own. First of all, you&#8217;ll likely need to define a variable for the report number and the&nbsp;revision.</p> <p>To create the variable, it&#8217;s just a matter of defining it in the <a href="http://yaml.org/"><code>YAML</code></a> header of the markdown file. Variables can either have a single value or they can be lists. Elements of a list start with dash at the beginning of the&nbsp;line.</p> <p>Once we add the report number (which we&#8217;ll call <code>report-no</code>) and the revision (which we&#8217;ll call <code>rev</code>) to the <code>YAML</code> header, the <span class="caps">YAML</span> header will look like the&nbsp;following:</p> <div class="highlight"><pre><span></span><span class="nt">title</span><span class="p">:</span> <span class="s">&quot;Report</span><span class="nv"> </span><span class="s">Title&quot;</span> <span class="nt">author</span><span class="p">:</span> <span class="s">&quot;A.</span><span class="nv"> </span><span class="s">Student&quot;</span> <span class="nt">report-no</span><span class="p">:</span> <span class="s">&quot;RPT-001&quot;</span> <span class="nt">rev</span><span class="p">:</span> <span class="l l-Scalar l-Scalar-Plain">B</span> </pre></div> <p>(Bonus points if you immediately though of William Sealy Gosset when you read&nbsp;that).</p> <p>We&#8217;ll probably want to add a log of revisions to the report. The contents of this log of revisions will have to come from somewhere, and the <code>YAML</code> header is the most logical place. The log of revisions will be a list with one element of the list corresponding to each revision in the log. Lists can have nested members. In our case, an entry within the log of revisions will have a revision letter, a date and a description. Including the log of revisions, the <code>YAML</code> header will look like&nbsp;this:</p> <div class="highlight"><pre><span></span><span class="nt">title</span><span class="p">:</span> <span class="s">&quot;Report</span><span class="nv"> </span><span class="s">Title&quot;</span> <span class="nt">author</span><span class="p">:</span> <span class="s">&quot;A.</span><span class="nv"> </span><span class="s">Student&quot;</span> <span class="nt">report-no</span><span class="p">:</span> <span class="s">&quot;RPT-001&quot;</span> <span class="nt">rev</span><span class="p">:</span> <span class="l l-Scalar l-Scalar-Plain">B</span> <span class="nt">rev-log</span><span class="p">:</span> <span class="p p-Indicator">-</span> <span class="nt">rev</span><span class="p">:</span> <span class="l l-Scalar l-Scalar-Plain">A</span> <span class="nt">date</span><span class="p">:</span> <span class="l l-Scalar l-Scalar-Plain">1-Jun-2019</span> <span class="nt">desc</span><span class="p">:</span> <span class="l l-Scalar l-Scalar-Plain">Initial release</span> <span class="p p-Indicator">-</span> <span class="nt">rev</span><span class="p">:</span> <span class="l l-Scalar l-Scalar-Plain">B</span> <span class="nt">date</span><span class="p">:</span> <span class="l l-Scalar l-Scalar-Plain">18-Jun-2019</span> <span class="nt">desc</span><span class="p">:</span> <span class="l l-Scalar l-Scalar-Plain">Updated loads based on fligt test data</span> </pre></div> <p>We can now use these variables in our <code>pandoc</code> template. Using the variables <code>report-no</code> and <code>rev</code> are straight forward and will be just the same as using the default variables (like <code>title</code> and <code>author</code>).</p> <p>Using the list variables will require the use of a <code>for</code> statement. In the case of a log of revisions, each revision will get a row in a LaTeX table. Using the variable <code>rev-log</code>, this table will look like&nbsp;this:</p> <div class="highlight"><pre><span></span><span class="k">\begin</span><span class="nb">{</span>tabular<span class="nb">}{</span>| m<span class="nb">{</span>0.25in<span class="nb">}</span> | m<span class="nb">{</span>0.95in<span class="nb">}</span> | m<span class="nb">{</span>4.0in<span class="nb">}</span> |<span class="nb">}</span> <span class="k">\hline</span> Rev Ltr <span class="nb">&amp;</span> Date <span class="nb">&amp;</span> Description <span class="k">\\</span> <span class="s">$</span><span class="nb">for</span><span class="o">(</span><span class="nb">rev</span><span class="o">-</span><span class="nb">log</span><span class="o">)</span><span class="s">$</span> <span class="k">\hline</span> <span class="s">$</span><span class="nb">rev</span><span class="o">-</span><span class="nb">log.rev</span><span class="s">$</span> <span class="nb">&amp;</span> <span class="s">$</span><span class="nb">rev</span><span class="o">-</span><span class="nb">log.date</span><span class="s">$</span> <span class="nb">&amp;</span> <span class="s">$</span><span class="nb">rev</span><span class="o">-</span><span class="nb">log.desc</span><span class="s">$</span> <span class="k">\\</span> <span class="s">$</span><span class="nb">endfor</span><span class="s">$</span> <span class="k">\hline</span> <span class="k">\end</span><span class="nb">{</span>tabular<span class="nb">}</span> </pre></div> <p>In the above LaTeX code, everything between <code>$for(...)$</code> and <code>$endfor$</code> gets repeated for each item in the list <code>rev-log</code>. We can access the nested members using dot&nbsp;notation.</p> <h1>Using the Pandoc Template from an&nbsp;R-Notebook</h1> <p>RStudio handles a lot of the interface with <code>pandoc</code>. Adding the following to the <code>YAML</code> header of the R-Notebook should cause RStudio to use your new template when it compiles the R-Notebook to <span class="caps">PDF</span>. This <em>should</em> be all you need to&nbsp;do.</p> <div class="highlight"><pre><span></span><span class="nt">output</span><span class="p">:</span> <span class="nt">pdf_document</span><span class="p">:</span> <span class="nt">template</span><span class="p">:</span> <span class="l l-Scalar l-Scalar-Plain">my_template_file.tex</span> <span class="nt">toc_depth</span><span class="p">:</span> <span class="l l-Scalar l-Scalar-Plain">3</span> <span class="nt">fig_caption</span><span class="p">:</span> <span class="l l-Scalar l-Scalar-Plain">true</span> <span class="nt">keep_tex</span><span class="p">:</span> <span class="l l-Scalar l-Scalar-Plain">false</span> <span class="nt">df_print</span><span class="p">:</span> <span class="l l-Scalar l-Scalar-Plain">kable</span> </pre></div> <h1>Using the Pandoc Template from a Jupyter&nbsp;Notebook</h1> <p>Using your new <code>pandoc</code> template from a Jupyter Notebook is a bit more complicated because Jupyter doesn&#8217;t work directly with <code>pandoc</code>. First of all, we need to tell <code>nbconvert</code> to convert to markdown. I think that it&#8217;s best to re-run the notebook at the same time (to make sure that it is, in fact, fully reproducible. You can do this using <code>nbconvert</code> as&nbsp;follows:</p> <div class="highlight"><pre><span></span>jupyter nbconvert --execute --to markdown my-notebook.ipynb </pre></div> <p>But, Jupyter notebooks don&#8217;t have <code>YAML</code> headers like R-Notebooks do, so we need a place to put all the variables that the template needs. The easiest way to do this is to create a cell at the beginning of the notebook with the cell type set as <code>raw</code>, then enter the <code>YAML</code> header into this cell, including the starting end ending fences (<code>---</code>). This cell would, then, have a content similar to the following. Cells of type <code>raw</code> simply get copied to the output, so this becomes the <code>YAML</code> header in the resulting markdown&nbsp;file.</p> <div class="highlight"><pre><span></span><span class="nn">---</span> <span class="nt">title</span><span class="p">:</span> <span class="s">&quot;Report</span><span class="nv"> </span><span class="s">Title&quot;</span> <span class="nt">author</span><span class="p">:</span> <span class="s">&quot;A.</span><span class="nv"> </span><span class="s">Student&quot;</span> <span class="nt">report-no</span><span class="p">:</span> <span class="s">&quot;RPT-001&quot;</span> <span class="nt">rev</span><span class="p">:</span> <span class="l l-Scalar l-Scalar-Plain">B</span> <span class="nt">rev-log</span><span class="p">:</span> <span class="p p-Indicator">-</span> <span class="nt">rev</span><span class="p">:</span> <span class="l l-Scalar l-Scalar-Plain">A</span> <span class="nt">date</span><span class="p">:</span> <span class="l l-Scalar l-Scalar-Plain">1-Jun-2019</span> <span class="nt">desc</span><span class="p">:</span> <span class="l l-Scalar l-Scalar-Plain">Initial release</span> <span class="p p-Indicator">-</span> <span class="nt">rev</span><span class="p">:</span> <span class="l l-Scalar l-Scalar-Plain">B</span> <span class="nt">date</span><span class="p">:</span> <span class="l l-Scalar l-Scalar-Plain">18-Jun-2019</span> <span class="nt">desc</span><span class="p">:</span> <span class="l l-Scalar l-Scalar-Plain">Updated loads based on flight test data</span> <span class="nn">---</span> </pre></div> <p>Once you&#8217;ve used <code>nbconvert</code> to create the markdown file, you can call <code>pandoc</code>. You&#8217;ll have to provide the template as a command-line argument and also specify the output filename (so that <code>pandoc</code> knows you want a pdf) and also give the code highlighting style. The call to <code>pandoc</code> will look something like&nbsp;this.</p> <div class="highlight"><pre><span></span><span class="sb">`</span>pandoc<span class="sb">`</span> my-notebook.md -N --template<span class="o">=</span>my_template_file.tex -o my-notebook.pdf --highlight-style<span class="o">=</span>tango </pre></div> <h1>Documentation of Your&nbsp;Template</h1> <p>A &#8220;trick&#8221; that I&#8217;ve used is to add some documentation about how to use the template inside the template itself. It&#8217;s pretty unlikely that the user will actually open up the template, but it&#8217;s relatively likely that the user will forget one of the variables that the template expects. Since <code>pandoc</code> allows <code>if/else</code> statements, I&#8217;ve added the following to my&nbsp;template:</p> <div class="highlight"><pre><span></span><span class="s">$</span><span class="nb">if</span><span class="o">(</span><span class="nb">abstract</span><span class="o">)</span><span class="s">$</span> <span class="k">\abstract</span><span class="nb">{</span><span class="s">$</span><span class="nb">abstract</span><span class="s">$</span><span class="nb">}</span> <span class="s">$</span><span class="nb">else</span><span class="s">$</span> <span class="k">\abstract</span><span class="nb">{</span> The documentation for using the template goes here <span class="nb">}</span> <span class="s">$</span><span class="nb">endif</span><span class="s">$</span> </pre></div> <p>This means that if the user forgets to define the <code>abstract</code> variable, the cover page of the report (where the abstract normally goes in my case) will contain the documentation for the&nbsp;template.</p> <h1>Change Bars: Future&nbsp;Work</h1> <p>One of the things that I haven&#8217;t yet figured out are change bars. In my organization, we put vertical bars in the margin of reports to indicate what part of a report has been revised. There are LaTeX packages for (manually) inserting <a href="https://www.ctan.org/pkg/changebar">change bars into documents</a>. However, I haven&#8217;t yet figured out how to automatically insert these into a report generated using <code>pandoc</code>. I&#8217;m sure there&#8217;s a way,&nbsp;though.</p> <h1>Conclusion</h1> <p>I hope that this demystifies the process of writing a <code>pandoc</code> template to allow you to create reports directly from Jupyter Notebooks or R-Notebooks in your company&#8217;s report&nbsp;format.</p> <p><em>(Edited to fix a few&nbsp;typos)</em></p> <script type="text/javascript">if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) { var align = "center", indent = "0em", linebreak = "false"; if (false) { align = (screen.width < 768) ? "left" : align; indent = (screen.width < 768) ? "0em" : indent; linebreak = (screen.width < 768) ? 'true' : linebreak; } var mathjaxscript = document.createElement('script'); mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#'; mathjaxscript.type = 'text/javascript'; mathjaxscript.src = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/latest.js?config=TeX-AMS-MML_HTMLorMML'; mathjaxscript[(window.opera ? "innerHTML" : "text")] = "MathJax.Hub.Config({" + " config: ['MMLorHTML.js']," + " TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'AMS' } }," + " jax: ['input/TeX','input/MathML','output/HTML-CSS']," + " extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," + " displayAlign: '"+ align +"'," + " displayIndent: '"+ indent +"'," + " showMathMenu: true," + " messageStyle: 'normal'," + " tex2jax: { " + " inlineMath: [ ['\\\\(','\\\\)'] ], " + " displayMath: [ ['$$','$$'] ]," + " processEscapes: true," + " preview: 'TeX'," + " }, " + " 'HTML-CSS': { " + " styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} }," + " linebreaks: { automatic: "+ linebreak +", width: '90% container' }," + " }, " + "}); " + "if ('default' !== 'default') {" + "MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" + "var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" + "VARIANT['normal'].fonts.unshift('MathJax_default');" + "VARIANT['bold'].fonts.unshift('MathJax_default-bold');" + "VARIANT['italic'].fonts.unshift('MathJax_default-italic');" + "VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" + "});" + "MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" + "var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" + "VARIANT['normal'].fonts.unshift('MathJax_default');" + "VARIANT['bold'].fonts.unshift('MathJax_default-bold');" + "VARIANT['italic'].fonts.unshift('MathJax_default-italic');" + "VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" + "});" + "}"; (document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript); } </script>Automating Software Validation Reports2019-06-07T04:39:42-04:002019-06-21T01:55:33-04:00Stefan Kloppenborgtag:www.kloppenborg.ca,2019-06-07:/2019/06/automating-software-validation-reports/<p>I&#8217;ve been working on a Python package to analyze adhesively bonded joints recently. This package will be used to analyze adhesive joints in certain aircraft structure and will be used to substantiate the design of structural repairs, amongst other uses. Because of this, the output of this package needs …</p><p>I&#8217;ve been working on a Python package to analyze adhesively bonded joints recently. This package will be used to analyze adhesive joints in certain aircraft structure and will be used to substantiate the design of structural repairs, amongst other uses. Because of this, the output of this package needs to be validated against test data. This validation also needs to be documented in an engineering&nbsp;report.</p> <p>I&#8217;ve been thinking about how to do this. On one hand, I&#8217;ve been thinking about the types of (mechanical) tests that we&#8217;ll need to run to validate the model and the various test configurations that we&#8217;ll need to include in the validation test matrix. On the other hand, I&#8217;ve also been thinking about change change management of the package and ensuring that validation report stays up to&nbsp;date.</p> <p>I&#8217;m imagining the scenario where we run the validation testing and find that the model and the test results agree within, say, 10%. Maybe that&#8217;s good enough for the purpose (depending on the direction of the disagreement). We can then write our validation report and type out the sentence <em>&#8220;the test data and the model were found to agree within 10%.&#8221;</em> Then, I&#8217;m imagining that we make a refinement to the model formulation and release a new version of the package that now agrees with the test data within 5%. Now, we have a validation report for the old version of the package, but no report describing the validation of the new version. We&#8217;d need to go back through the validation report, re-run the model for all the validation cases and update the&nbsp;report.</p> <p>When we update the validation report manually, there&#8217;s probably a pretty good chance that some aspect of the update gets missed. Maybe it&#8217;s as simple as a one of the model outputs doesn&#8217;t get updated in the revised validation report. It&#8217;s also potentially rather time consuming to update this report. It would be faster to make this validation report a Jupyter Notebook (which I&#8217;ve <a href="https://www.kloppenborg.ca/2019/02/reproducibility/">previously talked about</a>). I haven&#8217;t yet written about it here, but it is possible to have a Jupyter Notebook render to a <span class="caps">PDF</span> using a corporate report format, so it&#8217;s even possible to make this validation report look like it should <em>(Edit: I&#8217;ve now written about this <a href="https://www.kloppenborg.ca/2019/06/pandoc-report-templates/">here</a>)</em>. We could also set up a test in the package to re-run the Jupyter Notebook, and perhaps integrate it into a continuous integration system so that the Notebook gets re-run every time a change is made to the package. This would mean that the validation report is always up to&nbsp;date.</p> <p>When you write a Jupyter Notebook, it usually has some code that produces a result &#8212; either a numeric result, or a graph &#8212; and then you have some text that you&#8217;ve written which explains the result. The problem is that this text that you&#8217;ve written doesn&#8217;t doesn&#8217;t respond to changes in the result. Sure, there are ways of automatically updating individual numbers inside the text that you&#8217;ve written, but sometimes the way that the result of the code changes warrants a change in the sentiment of the text. Maybe the text needs to change from <em>&#8220;the model shows poor agreement with experimental results and shouldn&#8217;t be used in this case&#8221;</em> to <em>&#8220;the model shows excellent agreement with experimental results and has been validated.&#8221;</em> There&#8217;s no practical way that this type of update to the text could be automated. But if the update to the result of the code in the Notebook has been automated, there&#8217;s a good chance that the text and the results from the code will end up disagreeing &#8212; especially if the report is more than a few&nbsp;pages.</p> <h1>The&nbsp;Solution</h1> <p>So, what can be done to rectify this? We want to have the ease of having the results of the code automatically update, but we want to make sure that those results and the text of the report match. One approach to this problem &#8212; and the approach that I intend to use for the adhesive joint analysis package &#8212; is to add <code>assert</code> statements to the Notebook. This way, if the assertion fails, the Notebook won&#8217;t automatically rebuild and our attention will be drawn to the&nbsp;issue.</p> <p>As an example, if the text says that the model is conservative, meaning that the strain predicted by the model is higher than the strain measured by strain gauges installed on the test articles from the validation testing, we could write the following assert statement in the Jupyter&nbsp;Notebook:</p> <div class="highlight"><pre><span></span><span class="k">assert</span><span class="p">(</span><span class="n">model_strain</span> <span class="o">&gt;</span> <span class="n">experimental_strain</span><span class="p">)</span> </pre></div> <p>Now, if we later make a change to the model that causes it to under-predict strain, we&#8217;ll be alerted to this and prompted to update the validation&nbsp;report.</p> <h1>Implementing the&nbsp;Solution</h1> <p>To run a Jupyter Notebook from code (for example in a test suite), I&#8217;ve use the following code in the past. This code was based on code found on <a href="http://blog.thedataincubator.com/2016/06/testing-jupyter-notebooks/">The Data Incubator&nbsp;Blog</a></p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">_notebook_run</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">path</span><span class="p">):</span> <span class="n">kernel_name</span> <span class="o">=</span> <span class="s2">&quot;python{}&quot;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">version_info</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span> <span class="n">file_dir</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">dirname</span><span class="p">(</span><span class="vm">__file__</span><span class="p">)</span> <span class="n">errors</span> <span class="o">=</span> <span class="p">[]</span> <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">path</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span> <span class="n">nb</span> <span class="o">=</span> <span class="n">nbformat</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">as_version</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span> <span class="n">nb</span><span class="o">.</span><span class="n">metadata</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">&quot;kernelspec&quot;</span><span class="p">,</span> <span class="p">{})[</span><span class="s2">&quot;name&quot;</span><span class="p">]</span> <span class="o">=</span> <span class="n">kernel_name</span> <span class="n">ep</span> <span class="o">=</span> <span class="n">ExecutePreprocessor</span><span class="p">(</span><span class="n">kernel_name</span><span class="o">=</span><span class="n">kernel_name</span><span class="p">,</span> <span class="n">timeout</span><span class="o">=</span><span class="mi">3600</span><span class="p">)</span> <span class="k">try</span><span class="p">:</span> <span class="n">ep</span><span class="o">.</span><span class="n">preprocess</span><span class="p">(</span><span class="n">nb</span><span class="p">,</span> <span class="p">{</span><span class="s2">&quot;metadata&quot;</span><span class="p">:</span> <span class="p">{</span><span class="s2">&quot;path&quot;</span><span class="p">:</span> <span class="n">file_dir</span><span class="p">}})</span> <span class="k">except</span> <span class="n">CellExecutionError</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span> <span class="k">if</span> <span class="s2">&quot;SKIP&quot;</span> <span class="ow">in</span> <span class="n">e</span><span class="o">.</span><span class="n">traceback</span><span class="p">:</span> <span class="n">errors</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">e</span><span class="o">.</span><span class="n">traceback</span><span class="p">))</span> <span class="k">else</span><span class="p">:</span> <span class="k">raise</span> <span class="n">e</span> <span class="k">return</span> <span class="n">nb</span><span class="p">,</span> <span class="n">errors</span> <span class="n">_notebook_run</span><span class="p">(</span><span class="s2">&quot;file-name-of-my-notebook.ipynb&quot;</span><span class="p">)</span> </pre></div> <p>This code will run the Notebook <code>file-name-of-my-notebook.ipynb</code> and will raise an error if an error is encountered. If this is inside a <code>unittest2</code> or <code>NoseTest</code> test suite, this will cause a test&nbsp;failure.</p> <h1>Conclusion</h1> <p>Validating software used in a way that affects an aircraft design is very important in ensuring the safety of that design. Keeping the validation report up to date can be tedious, but can be automated using Jupyter Notebooks. The conclusions drawn in the validation report need to match the results of the software being validated. One approach to ensuring that this is always true is to add <code>assert</code> statements to the Jupyter Notebook that forms the validation&nbsp;report.</p>Reproducibility of Engineering Calculations2019-02-08T11:21:57-05:002019-06-21T01:55:33-04:00Stefan Kloppenborgtag:www.kloppenborg.ca,2019-02-08:/2019/02/reproducibility/<p>Reproducibility in engineering work doesn&#8217;t seem to get the attention that it deserves. I can&#8217;t count the number of times that I&#8217;ve read an old engineering report in search of a particular result, only to find that the calculation that lead to that result is only barely …</p><p>Reproducibility in engineering work doesn&#8217;t seem to get the attention that it deserves. I can&#8217;t count the number of times that I&#8217;ve read an old engineering report in search of a particular result, only to find that the calculation that lead to that result is only barely described, or there is just a screenshot of an Excel workbook with a few input numbers and a final result. When I find things like this, it makes me a little nervous: did the original author use the correct formula when computing this result? What assumptions did the author make and neglect to document? What approximations were made? Was the original review of the report diligent enough to check this particular&nbsp;result?</p> <p>Let&#8217;s take a hypothetical example. For simplicity, let&#8217;s assume that we&#8217;re analyzing some sort of bracket. It&#8217;s 2 inches wide, 0.125 inches thick and 5 inches long. It&#8217;s cantilievered with a load applied 2 inches from the free edge. We care about both the deflection and the maximum stress. The formulae for deflection and stress are given by Roark<sup id="fnref:1"><a class="footnote-ref" href="#fn:1" rel="footnote">1</a></sup>. We&#8217;ll adapt those equations&nbsp;slightly:</p> <div class="math">$$ \delta_a = \frac{-P}{6 E I} (2 L^3 - 3 L^2 a + a^3) $$</div> <div class="math">$$ \sigma = \frac{M_B \frac{t}{2}}{I} = \frac{P (L - a) \frac{t}{2}}{I} $$</div> <p>Given these equations and the data above, we could quite easily do the calculation in an spreadsheet program like <span class="caps">MS</span>-Excel. But, if we want to include our calculation in a report (most likely as a screenshot of the spreadsheet), our report will probably just look like&nbsp;this:</p> <p><img alt="excel1" src="https://www.kloppenborg.ca/2019/02/reproducibility/reproducibility_excel.png"></p> <p>This shows the &#8220;right&#8221; answer, but if you&#8217;re reviewing the report, how do you know that the answer is right? If you&#8217;re reviewing the report before it&#8217;s released, you can probably get a copy of the Excel file and check the formulae in the cells. You&#8217;ll spend a few minutes deciphering the formula to figure out if it&#8217;s correct. But, if you&#8217;re reading the report later, especially if you&#8217;re outside the company that wrote it, good luck. You&#8217;re going to have to get out a pen, paper and your calculator to repeat the calculation and figure out if it&#8217;s right. This problem is even worse if the author of the report hard coded in a few of the input values (i.e. length, width, elastic modulus, etc.) into the&nbsp;formulae.</p> <p>There are a few ways to address this problem of reproducibility. We&#8217;ll explore two of these ways. The first is to use software like <a href="https://www.ptc.com/en/products/mathcad">MathCAD</a>, or it&#8217;s free alternative <a href="https://en.smath.info/view/SMathStudio/summary">SMath-Studio</a>. Both of these products are <a href="https://en.wikipedia.org/wiki/WYSIWYG"><span class="caps">WYSIWYG</span></a> math editors that are unit aware. With either of these, your could do your calculations in the MathCAD or SMath-Studio and paste a screenshot of this into your&nbsp;report. </p> <p><img alt="smath" src="https://www.kloppenborg.ca/2019/02/reproducibility/reproducibility_smath.png"></p> <p>Now, the input data and the formula would be shown directly in the report. The added benefit is that, since these pieces of software are unit aware, you can&#8217;t make simple unit errors &#8212;- if you forget an exponent, the units shown in the result won&#8217;t be what you expect, so you know that you&#8217;ve made a&nbsp;mistake.</p> <p>The other way to approach this problem is to use something called a notebook. If you&#8217;re comfortable enough to write simple code in <a href="https://www.python.org/">Python</a>, you could use a <a href="http://jupyter.org/">jupyter notebook</a>. If you&#8217;re doing some data analysis or statistics, you might prefer to write some code in <a href="https://www.r-project.org/">R</a> (though, you could use <a href="https://pandas.pydata.org/">pandas</a> if you prefer to use Python). While you use R with jupyter notebooks (as well as several other languages), in my opionion R Studio&#8217;s <a href="https://rmarkdown.rstudio.com/r_notebooks.html">R Notebooks</a> are a little bit better to work with. If you were to do the same calculation with a notebook (in this case, we&#8217;ll use a jupyter notebook and Python), it would look like&nbsp;this:</p> <p><img alt="notebook" src="https://www.kloppenborg.ca/2019/02/reproducibility/reproducibility_notebook.png"></p> <p>There are a few advantages of using a notebook. First, you can use a programming language with a little bit more power than MathCAD or SMath-Studio &#8212; if you need to do an iterative calculation or find the root of system of non-linear equations, you can do it with a language like Python or R &#8212; and do so in a way that&#8217;s not too difficult for the reader to understand. The other advantage of using a notebook is that notebooks are intended to mix code, results and text. You could actually write your whole report using a notebook! You could explain your approach to solving the problem, include the code used to solve the problem and then show the results all in the same document. No need to copy-and-paste anything and no need to store multiple files (like a word document and a SMath-Studio&nbsp;file).</p> <p>Text written in a notebook (either a jupyter notebook or an R Notebook) is written using using something called <a href="https://en.m.wikipedia.org/wiki/Markdown">markdown</a>. This is a &#8220;lightweight&#8221; way of formatting text. If you want a bullet list, you just type an asterix at the beginning of each line; if you want a heading, you start the line with a hash symbol (or two for a sub-heading). And, most importantly for engineering reports, you can include formulae using <a href="https://en.m.wikibooks.org/wiki/LaTeX/Mathematics">LaTeX</a> from within markdown just by enclosing the formula with two dollar signs before and after it &#8212; no need to suffer through using the <span class="caps">MS</span>-Word Equation&nbsp;Editor.</p> <p>If you need a corporate format for your report, there are ways to create PDFs from either a jupyter notebook or an R Notebook using a custom format. I plan on writting about this in a later post. Stay tuned. <em>(Edit: I&#8217;ve written about this <a href="https://www.kloppenborg.ca/2019/06/pandoc-report-templates/">here</a>)</em></p> <p>We&#8217;ve explored a few ways of making an engineering report more reproducible. Neither of the solutions explored are idea for every scenario &#8212; some scenarios are more suited to one of the solutions or the other &#8212; but both will improve many engineering&nbsp;reports.</p> <div class="footnote"> <hr> <ol> <li id="fn:1"> <p>W. Young and R. Budynas, Roark&#8217;s Formulas for Stress and Strain, Seventh Edition. New York: McGraw-Hill, 2002.&#160;<a class="footnote-backref" href="#fnref:1" rev="footnote" title="Jump back to footnote 1 in the text">&#8617;</a></p> </li> </ol> </div> <script type="text/javascript">if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) { var align = "center", indent = "0em", linebreak = "false"; if (false) { align = (screen.width < 768) ? "left" : align; indent = (screen.width < 768) ? "0em" : indent; linebreak = (screen.width < 768) ? 'true' : linebreak; } var mathjaxscript = document.createElement('script'); mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#'; mathjaxscript.type = 'text/javascript'; mathjaxscript.src = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/latest.js?config=TeX-AMS-MML_HTMLorMML'; mathjaxscript[(window.opera ? "innerHTML" : "text")] = "MathJax.Hub.Config({" + " config: ['MMLorHTML.js']," + " TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'AMS' } }," + " jax: ['input/TeX','input/MathML','output/HTML-CSS']," + " extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," + " displayAlign: '"+ align +"'," + " displayIndent: '"+ indent +"'," + " showMathMenu: true," + " messageStyle: 'normal'," + " tex2jax: { " + " inlineMath: [ ['\\\\(','\\\\)'] ], " + " displayMath: [ ['$$','$$'] ]," + " processEscapes: true," + " preview: 'TeX'," + " }, " + " 'HTML-CSS': { " + " styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} }," + " linebreaks: { automatic: "+ linebreak +", width: '90% container' }," + " }, " + "}); " + "if ('default' !== 'default') {" + "MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" + "var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" + "VARIANT['normal'].fonts.unshift('MathJax_default');" + "VARIANT['bold'].fonts.unshift('MathJax_default-bold');" + "VARIANT['italic'].fonts.unshift('MathJax_default-italic');" + "VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" + "});" + "MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" + "var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" + "VARIANT['normal'].fonts.unshift('MathJax_default');" + "VARIANT['bold'].fonts.unshift('MathJax_default-bold');" + "VARIANT['italic'].fonts.unshift('MathJax_default-italic');" + "VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" + "});" + "}"; (document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript); } </script>rde: Now on CRAN2018-07-10T02:46:34-04:002018-07-10T02:46:34-04:00Stefan Kloppenborgtag:www.kloppenborg.ca,2018-07-10:/2018/07/rde/<p>For the last couple of years, we&#8217;ve been using the statistical programming language <a href="https://www.r-project.org">R</a> when we do statistical analysis or data visualizations at work. We typically deal with <em>small data</em> &mdash; most of the time, our data sets are high-tens or low-hundreds of rows of&nbsp;data.</p> <p>A lot of the …</p><p>For the last couple of years, we&#8217;ve been using the statistical programming language <a href="https://www.r-project.org">R</a> when we do statistical analysis or data visualizations at work. We typically deal with <em>small data</em> &mdash; most of the time, our data sets are high-tens or low-hundreds of rows of&nbsp;data.</p> <p>A lot of the time, we create <a href="https://rmarkdown.rstudio.com/r_notebooks.html">R Notebooks</a> with our analysis and visualizations. This works well for us: the R Notebook contains the code used to do the analysis, the results of the analysis and the visualizations, all in one place. This eliminates questions like: &#8220;did you remove outliers before making the graph?&#8221; Or, &#8220;did you check that the data are distributed normally before you did that test?&#8221; A reviewer of the R Notebook can see exactly what was&nbsp;done.</p> <p>By default, the R Notebook produces an html file that you can open in your browser. You can email this html file to a colleague, and they can see your results and graphs, as well as exactly how you obtained them. If you made a logical mistake, or an inappropriate assumption, your colleague has the opportunity to find&nbsp;it.</p> <p>There is also a button in the html file that the R Notebook gets exported to that says &#8220;Download Rmd.&#8221; This allows your colleague to open the notebook in <a href="https://www.rstudio.com/">R Studio</a> and run your code. <em>If you sent your&nbsp;data.</em></p> <p>The one problem with just emailing R Notebooks to a colleague is that the R Notebook does not include the data. This might be okay if the data source is a file on a network, or a database that you both have access to, but in a lot of cases &mdash; at least in my work &mdash; the data is a <span class="caps">CSV</span> or Excel file. Now, if I want to send an R Notebook to a colleague to review, I need to remember to send the data file along with&nbsp;it.</p> <p>Enter <code>rde</code>.</p> <p>I wrote the package <a href="https://cran.r-project.org/web/packages/rde/"><code>rde</code></a> (which stands for Reproducible Data Embedding) to tackle this problem. This package allows you to embed data right in your R Notebook (or any other R code). It does so by compressing the data and then <a href="https://en.wikipedia.org/wiki/Base64">base-64 encoding</a> it into an <span class="caps">ASCII</span> string. This string can be pasted into the R Notebook and converted back into the original data when someone re-runs the&nbsp;Notebook.</p> <p>I won&#8217;t go into all the details of how to use the package. If you&#8217;d like to learn more, you can read the package <a href="https://cran.r-project.org/web/packages/rde/vignettes/rde_tutorial.html">vignette</a>.</p> <p>This isn&#8217;t the first R Package that I&#8217;ve written, but it is the first one that I&#8217;ve submitted to <a href="https://cran.r-project.org/"><span class="caps">CRAN</span></a>. When you install an R package using <code>install.packages()</code>, you&#8217;re installing it from <span class="caps">CRAN</span>. I think that <span class="caps">CRAN</span> is one of the best parts of the R ecosystem since it does <a href="https://en.wikipedia.org/wiki/Continuous_integration">continuous integration</a> for all of the packages hosted there. This helps ensure that all the packages continue to work as R is updated and as other packages are updated. I&#8217;ll likely talk about this more in a future blog&nbsp;post.</p> <p>If you&#8217;re an R user and you think that the package <code>rde</code> would help you in your workflow, check it out. You can install it by typing <code>install.packages("rde")</code> in R. If you find a bug, please file an <a href="https://github.com/kloppen/rde/issues">issue on GitHub</a>. And, if you would like to add functionality or improve it in some way, feel free to send me a pull&nbsp;request.</p>Welcome to Kloppenborg.ca2018-06-30T19:29:19-04:002018-06-30T19:29:19-04:00Stefan Kloppenborgtag:www.kloppenborg.ca,2018-06-30:/2018/06/welcome/<p>Welcome to&nbsp;kloppenborg.ca</p> <p>I plan to use this website as a blog where I discuss topics related to engineering, technology and whatever else I&#8217;m thinking about at the&nbsp;time.</p> <p>If you find any of the posts here interesting, feel free to share them. If you don&#8217;t feel …</p><p>Welcome to&nbsp;kloppenborg.ca</p> <p>I plan to use this website as a blog where I discuss topics related to engineering, technology and whatever else I&#8217;m thinking about at the&nbsp;time.</p> <p>If you find any of the posts here interesting, feel free to share them. If you don&#8217;t feel free to ignore&nbsp;them.</p>