Package Adequacy for Engineering Calculations

If you do engineering calculations or analysis using a language like R or Python, chances are that you’re going to use some packages. Packages are collections of code that someone else has written that you can use in your code. For example, if you need to solve a system of linear equations by inverting a matrix and you’re using Python, you might use numpy. Or if you’re using R and you need to fit a linear model to some data, you would probably use the stats package.

If you’re involved in “engineering,” you need a high level of confidence that the results that you’re getting are correct. Note that in this post — and my blog in general — that when I say “engineering,” I don’t mean software engineering: I mean design and analysis of of structures or systems that have an effect on safety. I work in civil aeronautics, mainly dealing with composites, but also dealing with metallic structure regularly. Depending on the particular type of engineering that you’re engaged in and the particular problem at hand, the consequences of getting the wrong answer could be fairly severe. You better be sure that both the interpreter and the packages are correct. Probably the best way to do this is to validate the results using another method: are there published results for a similar problem that you can use as a benchmark? Perhaps you can do some physical testing? But even if you’re doing you due diligence and validating the results somehow, you will still waste a lot of your time if there were a problem with either the interpreter or one of the packages.

Compiled languages — like C or FORTRAN —are compiled into machine code that runs directly on the processor. Interpreted languages, like Python, R or JavaScript, are not compiled into machine code, but instead an interpreter (a piece of software) reads each line of code and figures out how to run it when you run the code (not ahead of time). As far as interpreters go, if you’re using CPython (the “standard” Python interpreter) or GNU-R (the “standard” R interpreter), I think there is a rather low risk that there are any errors in the interpreter. These interpreters are written by a bunch of smart people, and both are open source, so the code that makes up the interpreters themselves are read by a much larger group of smart people. Furthermore, both interpreters are widely used and have been around for a while, so it’s very likely that significant bugs that are likely to change the result of an engineering calculation would have been found by users by now and would have been fixed.

Packages are more of a risk than interpreters are. Again, if you’re using a very widely used package that has been around for a while, like numpy (in Python) or stats (in R), there’s a pretty good chance that any bugs that would affect your calculations would have been found by now — and packages like these are maintained by groups of dedicated people.

If you’re using R, chances are that you’re getting your packages from CRAN. You should be reading the CRAN page for the package that you’re using. You can find an example of such a page here. There are a few things that you should look for to help you evaluate the reliability of the package (in addition to reference manual and any vignettes that explain how to use the package). The first is the priority of the package. Not all packages have a priority, but if the priority is “base” or “recommended,” the package is maintained by the r-core team and is almost certainly used by a lot of people. You can be fairly comfortable with these packages.

The second thing that you should look at on the CRAN page for a package is the CRAN Checks. CRAN will test all the packages every time a new version of R is released and it tests all the packages routinely to determine if a change in one package caused errors in another packages. You can see an example CRAN Check for my package rde here.

This practice is called continuous integration. It does all of these checks on several different operating systems — Windows, OSX, and several Linux distributions. If you open the CRAN Checks results for a package, you’ll see a table of all the various combinations of R version and operating system that have been tested along with the amount of time that it took to run the test and a status for each. If the Status is “OK,” then there were no errors identified. If the Status is “NOTE,” “WARNING,” or “ERROR.” There might be something wrong and it may or may not be serious. If you click on the Status link, you’ll see details and can evaluate for yourself.

I think that these CRAN checks are actually a very strong point for the R ecosystem. It ensures that package maintainers know when something outside of their package breaks their code. And, it enforces a certain level of quality: package maintainers are given a certain amount of time to fix errors, and if they don’t the package gets removed from CRAN.

The CRAN checks do a few things. First, they check that the package can, in fact, be loaded (maybe there’s an error that prevents you from using it at all). There are a few other things that it does, but the most important in terms of reliability of the package is that the CRAN checks will run any test created by the package maintainer. These tests are called unit tests. They are test that determine if the code in the package actually has the expected behavior. Package maintainers don’t have to write unit tests, but the good ones do. You can look at what tests the package maintainer has written by downloading the code of the package (you can download it from CRAN). The test are in a folder called tests. Tests basically work by providing some input to the package’s functions, and checking that the result is correct. For R packages, the testthat framework is a popular testing framework. For packages that use the testthat framework, you’ll see a number of statements that use the expect_... family of functions. Some of these tests will likely ensure that the package works at all — checking things like the return type for functions, or that a function actually does raise certain errors when invalid arguments are passed to it. Some of the tests should also ensure that the package provides correct results. When I write tests for a package, I always write both types of tests. For the tests that ensure that the results are correct, I often either check cases that have closed-form solutions, or check that the code in the package produces results that are approximately equal to example results published in articles or books. You’ll need to read through the tests to decide if they provide enough assurance that the package is correct.

If you decide that the tests for a package are not sufficient, you have three options.

You could choose not the use that package: maybe there is another that does something similar.
You can write tests yourself and contribute those tests back to the package maintainer. After all, R packages are open-source and users are encouraged to contribute back to the community. Most package maintainers would be happy to receive a patch that adds more tests: writing tests is not fun, and most people would be grateful if someone else offers to do it.
You could also manually test the package. The difficulty here is ensuring that you re-test the package every time you update the version of this package on your system.

In the python world, continuous integration isn’t as well integrated into the ecosystem. Most packages that you install probably come from PyPI. As far as I know, PyPI doesn’t do any continuous integration: it’s up to the package maintainer to run their tests regularly. Package maintainers can do one of two things: they can run the tests on their own machine before releasing a new version to PyPI, or they can use a continuous integration service like Travis-CI or CircleCI. Many of the continuous integration services provide the service for free for open source projects, so many Python packages do use a continuous integration services. Packages that use a continuous integration service normally advertise it in their README file. You’ll still need to assess whether the tests are adequate, and if the package doesn’t use continuous integration, you’ll have to either run the test yourself, or trust that the package maintainer did.

If you have already written tests for your package, setting up continuous integration using Travis-CI is quite straight forward. I haven’t personally used CircleCI, but I would imagine that it’s similarly easy to use. You can see the continuous integration results from my pcakge rde on Travis-CI here.

Whether you’re using Python or R, there are ways of ensuring that the packages you use for engineering calculations are adequate for your needs. Some people seem to be a little bit scared of open source packages and software for engineering calculations, but in a lot of ways, open source software is actually better for this since you have the ability of verifying it yourself and making a decision about whether to use it.

kloppenborg.ca

Package Adequacy for Engineering Calculations