I have a new Python project I would like to share with the community. Actually, this project isn't so new. I developed an initial version about two years before completing my postdoctoral research, and it has undergone various revisions over the past three years. Having finally made time to give it the clean-up it needed,1 I am excited to share it on GitHub.
pdLSR is a library for performing least squares minimization. It attempts to seamlessly incorporate this task in a Pandas-focused workflow. Input data are expected in dataframes, and multiple regressions can be performed using functionality similar to Pandas
groupby. Results are returned as grouped dataframes and include best-fit parameters, statistics, residuals, and more. The results can be easily visualized using
pdLSR currently utilizes
lmfit, a flexible and powerful library for least squares minimization, which in turn, makes use of
scipy.optimize.leastsq. I began using
lmfit because it is one of the few libraries that supports non-linear least squares regression, which is commonly used in the natural sciences. I also like the flexibility it offers for testing different modeling scenarios and the variety of assessment statistics it provides. However, I found myself writing many
for loops to perform regressions on groups of data and aggregate the resulting output. Simplification of this task was my inspiration for writing
pdLSR is related to libraries such as
scikit-learn that provide linear regression functions that operate on dataframes. However, these libraries don't directly support grouping operations on dataframes.
The aggregation of minimization output parameters that is performed by
pdLSR has many similarities to the R library
broom, which is written by David Robinson and with whom I had an excellent conversation about our two libraries.
broom is more general in its ability to accept input from many minimizers, and I think expanding
pdLSR in this fashion, for compatibility with
scikit-learn for example, could be useful in the future.