Thanks @bashtage for introducing me to SUR and linearmodels over at stackoverflow.
I'm having a bit of trouble with using the SUR model in the way I would like to, and I hope you can help. I'm sorry that this issue has become quite long. I'm happy to ask elsewhere, but if you have the time, I'd be very happy to hear your thoughts.
I'm trying to fit a number of linear components to a number of spectra. Each spectrum is a one dimensional array (typically of length 1024) that has been taken by a microscope. The microscope does this across a number of pixels (lets say a grid of 64 by 64) on a sample. So my raw data, my dependent variable has numpy shape (64,64,1024). Later on in this post I'll take the product of the pixels, changing the shape into (64*64, 1024).
My components can be a variety of things, but for the sake of argument, let's say I'm trying to fit a second order polynomial as a background. So we have three components, x**2, x and 1. I compute each one for the spectrum which results in three arrays, or a ndarray of shape (3,1024).
Normally, I would now either (functions timed with jupyter notebook):
A) loop over the 64x64 pixels, and run np.linalg.lstsq() on each spectrum:
%%timeit
rawdata = np.random.random((64,64,1024)).reshape(64*64,1024)
components = np.random.random((3,1024))
for i in range(rawdata.shape[0]):
np.linalg.lstsq(comps.T, rawdata[i])
664 ms ± 25.7 ms per loop
or B) use the "not-really-designed-for-this" OLS:
%%timeit
rawdata = np.random.random((64,64,1024)).reshape(64*64,1024)
components = np.random.random((3,1024))
sm.OLS(rawdata.T, components.T)
57 ms ± 1.78 ms per loop
OLS is much faster, especially when I increase the number of pixels. That's great.
Unfortunately, my problem is ideally a little bit more complicated. In addition to simple components like x**2, we often want to use Gaussians or similar expressions, but where all the parameters are fixed. However, these components will not always be equal in every pixel. In some pixels the component has been pre-fitted in some manner - perhaps the centre of the gaussian is shifted a little to the right or to the left.
This means that I have increased the number of dimensions of my components so my new component shape is now (64,64,3,1024). Around now I think you'll be seeing my problem. Trying to feed my component data as the exog variable, I get the ValueError: exog_0 has too many dims. Maximum is 2, actual is 3
.
This leads me to think that I've got the wrong model, or at least a problem which is not suitable to be solved in this manner.
I should add that I wasn't able to fit the first case (like I do with numpy and statsmodels) with SUR - I'm not sure where I go wrong, but it's definitely my syntax:
from linearmodels import SUR
import numpy as np
rawdata = np.random.random((64,64,1024)).reshape(64*64,1024)
components = np.random.random((3,1024))
equations = {
'test':{
'dependent':rawdata,
'exog':comps
}
}
SUR(equations)
ValueError: Array required to have 4096 obs, has 3
Thanks for reading. Sorry that the post became so long, but I feared giving too little information, and since we work in very different fields and with very different "languages", I thought it better to fully explain. If you have any suggestions or advice, I'll gladly take them.