starfish-develop / starfish Goto Github PK

View Code? Open in Web Editor NEW

67.0 12.0 22.0 303.35 MB

Tools for Flexible Spectroscopic Inference

Home Page: https://starfish.readthedocs.io/en/latest/

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

starfish python spectra astronomy astrophysics inference mcmc

starfish's Introduction

Starfish

Starfish is a set of tools used for spectroscopic inference. We designed the package to robustly determine stellar parameters using high resolution spectral models.

Warning!

There have been major, breaking updates since version 0.2.0, please see this page regarding these changes if you are used to the old version!

Citations

If you use this code or derivative components of this code in your research, please cite our paper as well as the code. See CITATION.bib for a BibTeX formatted reference of this work.

Papers

If you have used Starfish in your work, please let us know and we can add you to this list!

Please bear in mind that this package is under heavy development and features may evolve rapidly. If something doesn't work, please fill an issue on this repository. If you would like to contribute to this project (either with bugfixes, documentation, or new features) please feel free to fork the repository and submit a pull request!

Installation Instructions

Prerequisites

Starfish has several dependencies, however most of them should be satisfied by an up-to-date scientific python installation. We highly recommend using the Anaconda Scientific Python Distribution and updating to Python 3.6 or greater. This code makes no attempt to work on the Python 2.x series, and I doubt it will if you try. This package is tested across Linux, Mac OS X, and Windows.

To make sure you are running the correct version of python, start a python interpreter via the system shell and you should see something similar

$ python
Python 3.6.1 |Anaconda custom (64-bit)| (default, May 11 2017, 13:25:24) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>

If your shell says Python 2.x, try using the python3 command instead of python.

Installation

For the most current stable release of Starfish, use the releases from PyPI

$ pip install astrostarfish

If you want to be on the most up-to-date version (or a development version), install from source via

$ pip install git+https://github.com/iancze/Starfish.git#egg=astrostarfish

To test that you've properly installed Starfish, try doing the following inside of a Python interpreter session

>>> import Starfish
>>> Starfish.__version__
'0.3.0'

If you see any errors, then something went wrong--please file an issue.

Now that you've successfully installed the code, please see the documentation on how to begin using Starfish to solve your spectroscopic inference problem.

Contributing

If you are interested in contributing to Starfish, first off, thank you! We appreciate your time and effort into making our project better. To get set up in a development environment, it is highly recommended to develop in a virtual environment. We use pipenv (pending a better PEP 517/518 compliant tool) to manage our environments, to get started clone the repository (and we recommend forking us first)

$ git clone https://github.com/<your_fork>/Starfish.git starfish
$ cd starfish

and then create the virtual environment and install all the packages and developer dependencies from the Pipfile with

$ pipenv install -d

and to enter the virtual environment, simply issue

$ pipenv shell

whenever you're in the starfish folder.

We also enforce the black code style. This tools allows automatically formatting everything for you, which is much easier than caring about it yourself! We have a pre-commit hook that will blacken your code before you commit so you can avoid failing the CI tests because you forgot to format. To use this, just install the hook with

$ pipenv run pre-commit install

From then on, any commits will format your code before succeeding!

Take a look through the issues if you are looking for a place to start improving Starfish!

Tests

We use pytest for testing; within the virtual environment

$ pytest

Note that we use the black code style and our CI testing will check that everything is formatted correctly. To check your code

$ pytest --black

although if you follow the instructions for using pre-commit you should have no issues.

Contributors

See CONTRIBUTORS.md for a full list of contributors.

starfish's People

Contributors

Stargazers

Watchers

Forkers

kgullikson88 soylentdeen browndwarf awmann ellipticalboy shryguo gully edwardbetts mrawls kevinkhu jbirky norrisryan spencerhurt lazygun37 itdebicki austenwallis dgonzalezpicos

starfish's Issues

Cookbook has deprecated content and typos.

There are a few problems with the cookbook:
grid.py create ➡️ grid.py --create
grid.py plot ➡️ grid.py --plot

splot.py does not have a D3 flag

I could not get star.py --optimize=Phi to work, without first running --initPhi. (for a run consisting of a single spectral order)

`grid_tools.py` needs some simple API methods

Currently, a lot of the grid_tools codebase has been written in support of the end-goal of spectroscopic inference. However, there are also a lot of useful routines in here that could be extracted out of some of the larger classes (e.g., HDF5Creator) and provided as stand-alone methods, either as module-level methods or methods of RawGridInterface. Specifically, I am thinking of the ability to

convolve a spectrum with an arbitrary instrumental broadening width
convolve a spectrum with an arbitrary vsini
resample a spectrum to a new wavelength grid, provided this grid is finely spaced enough to satisfy the information content of the above two points
Doppler shift a spectrum

These are all very simple tasks, but I frequently find myself in the position where I will need to do this for one or two spectra for a plot or just quick playing around, and I don't want to have to go through the hassle of creating a whole grid of spectra when I just need one.

get comments from Hogg

I have some comments on the write-up you sent me. Bug me by email to get them.

Mixture model mode

The problem: Many astrophysical spectra are composed of the superposition of two or more individual spectra, for example unresolved multiple systems or starspots.

Suggested solution:
The composite spectra can be modeled as a mixture model, M_mix = f M_B + (1-f) M_A, where f is a filling factor and M is a stellar model as defined in Czekala et al. 2015.
Depending on the application, the model components will share some of the same stellar parameters. For example if the composite spectra are from starspots, the mixture model should share the same log g, metallicity, vsini, and RV, but if composite spectra originates from multiple systems the mixture model should have different different values for the stellar parameters.
Naturally, if the flux ratio is too low, the secondary component will not be detectable. But the MCMC machinery should deliver a robust upper limit to the flux ratio, f, in this case.
One could imagine a solution intermediate to vanilla Starfish and the full mixture model-- one mixture model component is held fixed, and the other component is varied.

Practical Considerations and Costs:
There are some practical limits to searching for buried signals like this.
Primaries or secondaries that span a very wide range in stellar properties will require a large spectral emulation volume. The spectral emulation procedure is computationally expensive for large parameter ranges, so it might make sense to chop up the spectral emulation into two islands surrounding the expected stellar properties for components A and B, if constraints exist. If these constraints do not exist, then the MCMC machinery might struggle to land on a unique global solution for the secondary, which is fine, but should be anticipated.

support for binary star modelling

Currently the code can model single stars with starspots @gully (with a single temp. and fill factor). We (@tofflemire, @gully and @EdGillen) want to extend that to do full modelling of intrinsic (e.g. logg, Teff) and extrinsic (e.g. vsini) model parameters. This would describe double-lined binary stars.

The main changes would be to update_theta.

Issues running the example WASP14 in OSX 10.9.5

Hi! I finally had time to test the installation and Starfish in my desktop computer (OSX 10.9.5). I followed the instructions (installing python 3.x using anaconda and the Starfish installation instructions) and had no issues until trying to follow the example for WASP14.

The error appear when trying to run:
$ star.py --run_index 1

There was an ImportError related to the Starfish.covariace module (it said that it didn't exist)
The problem was solved by installing in develop mode and not in install mode (as suggested in the documentation!)

Then, everything work out fine while following the WASP14 example, until reaching the following command:

bash-3.2$ pca.py --plot=reconstruct
The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec().
Break on THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATION_FUNCTIONALITY___YOU_MUST_EXEC() to debug.
The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec().
Break on
(THE SAME THING REPEATS OVER AND OVER)

I assume this is an issue with the settings of my computer. I checked online but it was not clear to me what to do.

I also got an ImportError for the 'triangle' module when running this command:

bash-3.2$ pca.py --plot=emcee
Traceback (most recent call last):
File "/Users/babs/Starfish/scripts/pca.py", line 236, in
import triangle
ImportError: No module named ‘triangle'

and then, the same error above about the CoreFoundation functionality appear for the following command:

*_bash-3.2$ pca.py --plot=emulator --params=emcee
*_Using emcee median
Emulator parameters are:
lambda_xi 0.324512928601
[ 77.07873252 396.41612777 1.4809038 0.68124892]
[ 84.08142575 368.33807248 1.33421923 1.10646269]
[ 52.25411952 390.77569538 2.7047259 0.57798053]
[ 114.65852761 296.03821255 2.8677721 2.82354632]
[ 93.51798777 297.57419507 2.70087382 2.11223688]
[ 89.57373729 260.38294681 1.59486175 3.09759106]
[ 79.93869222 256.38805317 3.06567889 3.093641 ]
/Users/babs/Starfish/Starfish/emulator.py:427: RuntimeWarning: covariance is not positive-semidefinite.
weights = np.random.multivariate_normal(mu, sig)
/Users/babs/Starfish/Starfish/emulator.py:427: RuntimeWarning: covariance is not positive-semidefinite.
weights = np.random.multivariate_normal(mu, sig)
/Users/babs/Starfish/Starfish/emulator.py:427: RuntimeWarning: covariance is not positive-semidefinite.
weights = np.random.multivariate_normal(mu, sig)
The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec().
Break on THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATION_FUNCTIONALITY___YOU_MUST_EXEC() to debug.
The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec().
Break on
(THE SAME THING REPEATS OVER AND OVER)

A copy of what appears in the terminal with comments is in the following link
https://www.evernote.com/l/AMZx5WMzrD5AU5D_E132aQ4af2rwEQ7HWf0

I do not use python, so it gets a bit tricky for me to figure out what to do when I get an error like the ones above. The one about the triangle module worries me a bit since it seems the installation was not correct or complete

Generic model grid parameters

Right now the stellar parameters effective temperature, surface gravity, and metallicity are hard-coded in as temp, logg, and Z. To enable other users to utilize more diverse model grids, we should abstract these parameters into a configuration file and only refer to them throughout the code as par0, par1, etc.

cc @soylentdeen

Feature: Expand environment variables in `config.yaml` filename paths.

It'd be useful to support environment variables in the config.yaml files. For example, if users are running jobs on multiple machines, environment variables could stabilize config.yaml files.

As it stands, the raw filename is evaluated directly from the yaml string, e.g.: base=Starfish.grid["raw_path"]. We could simply add a line like:

base = os.path.expandvars(base)

Simultaneously infer telluric correction

The problem: The near-infrared wavelength range is peppered with telluric absorption lines that vary depending on many atmospheric and observation properties. Removal of the telluric lines is usually performed by observing relatively featureless spectra (A stars), and deriving a telluric correction spectrum, which is then divided out of the target star. This process is imperfect and generally leaves artifacts, especially if the wavelength solution is mismatched between the target and telluric standard. Another strategy for removing the telluric lines is to model directly the telluric spectrum with Earth atmospheric models. Kevin Gullikson (@kgullikson88) has done this with his TelFit package, and solutions exist for ESO and elsewhere.

The problem is that TelFit struggles when the continuum is not smooth, as in a relatively featureless spectrum (like an A star). As a result, the telluric models derived for M-dwarfs are biased, and telluric correction is poor. Besides, the uncertainty in the telluric correction is never taken into account, and therefore stellar properties could be biased. The Right Thing To Do is simultaneously model the telluric absorption lines and stellar spectrum.

Suggested solution:
Incorporate the TelFit atmospheric model directly into the Starfish Model, treating the atmospheric model parameters as nuisance parameters.

Practical Consideration and Costs:
Let's just clear the air that this idea is very low priority. There are many costs to get this to work. First, the upfront cost involves splitting out the relevant part of TelFit and incorporating it into the Starfish Model. Once in the Starfish Model, the telluric model had better be fast, otherwise it's going to slow down our likelihood calculation unacceptably. And for all that work, 1) only a small portion of the spectrum has been affected so its impact on derived stellar parameters will probably be low, and 2) we still might not get a good telluric fit anyways. Plus to really get this right, the user has to go through some outside research to acquire the Earth's upper atmosphere profile for the date, time, and location of the observation. In principle this profile could also be inferred, but that's overkill.

Installation in OSX10.9.5

Hello! I'm installing Starfish following the instructions in the README, and following all the recommendations (installing from Anaconda, etc). There are few things I want to mention to know how important are they (maybe other ones installing it in osx will have similar issues):

While installing the python packages, I wasn't able to install emcee. I checked anaconda.org for it, I tried to install the ones I found but there was a conflict. E.g:
bash-3.2$ conda install --channel https://conda.anaconda.org/timknab emcee
Fetching package metadata: ......
Solving package specifications: .........
Error: Unsatisfiable package specifications.
Generating hint:
[ COMPLETE ]|################################################################| 100%
Hint: the following packages conflict with each other:

emcee
python 3.5*
Use 'conda info emcee' etc. to see the dependencies for each package.

Do you know where can I find that package?

While installing Starfish, I got the following warnings (in bold) ... should I care?

i) warning: unknown warning option '-Wno-unused-but-set-variable'; did you mean
'-Wno-unused-const-variable'? [-Wunknown-warning-option]

ii) /Users/babs/anaconda/lib/python3.5/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning:
"Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API
NPY_1_7_API_VERSION" [-W#warnings]

iii) ld: warning: directory not found for option '-L/sw/lib' (this one was not bold, but I've seen this before installing other stuff, so I'm not worried)

I tested Starfish in python, and I got a warning about a file ' config.yaml', asking me to not use that one and to make a copy in my current directory. Once I copied the file to the current directory, I got no issues. Is this needed?

bash-3.2$ python
Python 3.5.1 |Anaconda 2.4.1 (x86_64)| (default, Dec 7 2015, 11:24:55)
[GCC 4.2.1 (Apple Inc. build 5577)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

import Starfish
/Users/babs/anaconda/lib/python3.5/site-packages/Starfish-0.1-py3.5-macosx-10.5-x86_64.egg/Starfish/init.py:16: UserWarning: Using the default config.yaml file located at /Users/babs/anaconda/lib/python3.5/site-packages/Starfish-0.1-py3.5-macosx-10.5-x86_64.egg/Starfish/config.yaml. This is likely NOT what you want. Please create a similar 'config.yaml' file in your current working directory.
warnings.warn("Using the default config.yaml file located at {0}. This is likely NOT what you want. Please create a similar 'config.yaml' file in your current working directory.".format(default), UserWarning)
Traceback (most recent call last):
File "/Users/babs/anaconda/lib/python3.5/site-packages/Starfish-0.1-py3.5-macosx-10.5-x86_64.egg/Starfish/init.py", line 10, in
f = open("config.yaml")
FileNotFoundError: [Errno 2] No such file or directory: 'config.yaml'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "", line 1, in
File "/Users/babs/anaconda/lib/python3.5/site-packages/Starfish-0.1-py3.5-macosx-10.5-x86_64.egg/Starfish/init.py", line 17, in
f = open(default)
FileNotFoundError: [Errno 2] No such file or directory: '/Users/babs/anaconda/lib/python3.5/site-packages/Starfish-0.1-py3.5-macosx-10.5-x86_64.egg/Starfish/config.yaml'

exit()
bash-3.2$ python
Python 3.5.1 |Anaconda 2.4.1 (x86_64)| (default, Dec 7 2015, 11:24:55)
[GCC 4.2.1 (Apple Inc. build 5577)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
import Starfish

and I will now try it ;)

Cheers,
Babs

Vacuum/air wavelength conflict resolution

Currently, the code assumes the data are in air wavelengths, and converts the model grid to air, if it isn't already. This default is hardcoded in the RawGridInterface class in grid_tools.py:

class RawGridInterface:
    '''
    A base class to handle interfacing with synthetic spectral libraries.

    ...

    :param air: Are the wavelengths measured in air?
    :type air: bool

    '''
    def __init__(self, name, param_names, points, air=True, wl_range=[3000,13000], base=None):
        self.name = name

        self.param_names = param_names
        self.points = points

        self.air = air

The wavelength conversion is performed in each subclass, for example PHOENIXGridInterface:

if self.air:
  self.wl_full = vacuum_to_air(w_full)
else:
  self.wl_full = w_full

This behavior is as desired if your data are also in air wavelengths.

If your data are in vacuum wavelengths, however, then there are several options:

Ignore the difference The tiny difference might not matter for low resolution, small bandwidth, with no absolute RV necessary. [Not recommended]
User modifiers her/his data Change the raw vacuum wavelengths to air wavelengths, and feed that into Starfish.
New feature: air-vacuum conflict resolution. Discussed below.

Add air/vacuum as a boolean property of your Instrument class. Then in grid.py, you can add something like:

try:
  if instrument.air & mygrid.air:
    pass #both already in air
  if (not instrument.air) & mygrid.air:
    #remake mygrid to have vacuum wavelengths
  if instrument.air & (not mygrid.air):
    #remake mygrid to have air wavelengths
  if (not instrument.air) & (not mygrid.air):
    pass #both are already in vacuum
except AttributeError:
    #existing behavior

I don't think this conflict resolution could be put into the HDF5Creator class, so grid.py is the right place for it.
I don't think there is anywhere else in the code that assumes air wavelengths, but if there were such places we would have to correct them.

Plot library problem

I am testing Starfish on a linux machine but I run into some library problem. Could you help me?

I have matplotlib 1.5.3 np111py35_1

Here is the output:

$ star.py --run_index 1
Traceback (most recent call last):
File "/home/alanz/Starfish/scripts/star.py", line 7, in
from Starfish import parallel
File "/home/alanz/Starfish/Starfish/parallel.py", line 35, in
from Starfish.model import ThetaParam, PhiParam
File "/home/alanz/Starfish/Starfish/model.py", line 8, in
import matplotlib.pyplot as plt
File "/home/alanz/anaconda3/lib/python3.5/site-packages/matplotlib/pyplot.py", line 114, in
_backend_mod, new_figure_manager, draw_if_interactive, _show = pylab_setup()
File "/home/alanz/anaconda3/lib/python3.5/site-packages/matplotlib/backends/init.py", line 32, in pylab_setup
globals(),locals(),[backend_name],0)
File "/home/alanz/anaconda3/lib/python3.5/site-packages/matplotlib/backends/backend_qt5agg.py", line 16, in
from .backend_qt5 import QtCore
File "/home/alanz/anaconda3/lib/python3.5/site-packages/matplotlib/backends/backend_qt5.py", line 31, in
from .qt_compat import QtCore, QtGui, QtWidgets, _getSaveFileName, version
File "/home/alanz/anaconda3/lib/python3.5/site-packages/matplotlib/backends/qt_compat.py", line 137, in
from PyQt4 import QtCore, QtGui
ImportError: cannot import name 'QtCore'

Should we support multiple instrument classes?

The grid_tools.py module currently supports TRES, Reticon, KPNO, SPEX, and SPEX_SXD instrument classes. At present, the instrument classes house the spectral resolution and wavelength range specific to that instrument. In principle, more advanced functionality could be added, like whether to use air or vacuum wavelengths, etc.

The question is whether we want to accept user contributions for instrument classes. Supporting new instrument classes has the advantage of potentially increasing the user base by making it easier to apply the code on data from many instruments. Supporting new instrument classes has the disadvantage that we have to maintain multiple possible configurations modes and other complexities that are specific to an instrument. Users may disagree on instrumental properties, leading to confusion.

I think the advantages outweigh the disadvantages, but I wanted that assumption to be out in the open.

The other question that arises here is how or whether to standardize the classes. I think a new class for every mode sounds fine.

In a few moments, I'll submit a pull request for the IGRINS instrument class as a demonstration. I've broken the class up into H- and K- bands.

Save intermediate results in MCMC

The problem: Users might want to save intermediate samples of the MCMC in case of catastrophic failure, power outage, etc.

Suggested solution: Define an optional flag that saves an mc.hdf5 file every N samples.

"Step" or "jump" size tuning

Step size tuning

The problem: The MCMC process requires the assignment of "step" or "jump" sizes in the config.yaml file. The step size will affect the acceptance fraction and convergence timescale. Admittedly, I do not have much experience tuning multidimensional correlated MCMC step sizes-- I've been spoiled by emcee, which automagically figures out step sizes and correlations among parameters. My Starfish chains on the other hand have correlated parameters, as evinced by the characteristic flicker noise I see in the chains for some (but not all) parameters.
There appears to be some indication of functionality to grapple with these issues. There is a use_cov option that calculates the covariance of the parameters based on the previous MCMC samples:

# Updating specific covariances to speed mixing
if Starfish.config.get("use_cov", None):
    # Use an emprically determined covariance matrix to for the jumps.
    pass

(clipped from ~ line 130 in parallel.py)

There's also the "optimal jumps" functionality, which can be accessed through the --cov argument to chain.py. So for example:

>_ chain.py --files ../data/mc.hdf5 --cov

[...]
'Optimal' jumps
[  7.42251374e+01   7.74280007e-02   9.70897643e-02   9.52720196e-01
   1.02379765e+00   1.15449469e-03]

Suggested solution:
Some of the solution here is that users (myself included) need to read into some of the finer details of MCMC. But the presence of the use_cov and optimal jumps functionality suggests that there are ways to cope with these issues with existing code. Has the use_cov functionality been implemented in the past? What is the best strategy here?

Automatically mask locations of anticipated outliers

The problem: It's conceivable that in some situations, the positions of residual spectrum outliers might be known.

Suggested solution: As it stands, the user can manually edit the regions.json file and/or phi.json file to contain the metadata for the lines. One way to make this even easier would be a small script that takes in a .csv file and formats that info into a regions.json file.

Note-- this is mentioned in Czekala et al. 2015 in the context of setting priors on the locations of local kernels.

Strategy for adding more model parameters within the blocked Gibbs Sampling framework by adding more blocks

I had an idea while watching a video on Gibbs sampling (the bit beginning at roughly 1h09m): Maybe we can adapt the Gibbs sampler to deal with correlated parameters after all. I'm not sure how to implement it, but the main procedure would be something like this:

Update ~6 Nuissance Parameters, holding everything else fixed
Update ~6 traditional stellar parameters (à-la Czekala et al. 2015)
new Update whatever new stellar model parameters you've added (e.g. veiling, starspots, binary, whatever)

My guess is that this three-level blocked Gibbs sampler would outperform a two-level Gibbs with the new stellar parameters included in block 2.

For example, I witnessed poor convergence when I attempted to fit a mixture model with ~8 stellar parameters (3 of which were strongly correlated) in block 2, as discussed in detail in Issue #35. That tension led me to a major departure from the Gibbs sampler, in which I run emcee to sample all 14 stellar+nuisance parameters simultaneously, but with the major limitation that I had to chunk by spectral order thereby deriving multiple independent sets of stellar posteriors for ~50+ spectral orders. The extension to the Gibbs framework described here, if it can be implemented, would return to the much better situation of having a single posterior that is consistent with all the data.

This seems obvious in hindsight, so it must be a good idea, right?

Specify thin parameter for emcee chains

Thin by ~100 or more to save disk space on Odyssey.

logOmega might be confusing

The problem: Currently the code adjusts the absolute mean-level normalization of the model stellar spectrum with a logOmega term. This term may be confusing for users possessing flux-normalized spectra.

Suggested solution: Since some users will have flux calibrated spectra, and will take advantage of the logOmega functionality, I think it's best to leave the behavior unchanged. However, a demo or tutorial about setting logOmega and the c(0) term might stave off some problems for users down the line. Some of this has been accomplished in the recent docs examples, so this might need no further action.

Pedantic Side note--
There's some tacit rationale to the logOmega term. If you have a flux calibrated spectrum, it's probably from a low resolution single order spectrograph observed through either a large slit or no slit. (Right?) In the unlikely situation that you have multiple spectral orders, the code assumes the final order is the correctly flux calibrated spectrum. All other orders are allowed to float, based on the c^(0) term. Hopefully these are all ~1.0 in that case! It's conceivable that the user might want to pick which order to assign as the flux-calibrated order, so that all the other c^(0)'s are less than 1. Since this multi-order-flux-calibrated use case is unlikely, this probably won't be a problem.

In `utils.py`, the method `estimate_covariance()` relies on argparser that does not exist.

I think this is a bug. From line ~289 of utils.py:

def estimate_covariance(flatchain, base):

    if args.ndim:
        d = args.ndim
    else:
        d = flatchain.shape[1]

But args is not accessible from utils.py.

The estimate_covariance() method is called from chain.py line ~90:

if args.cov:
    assert len(flatchainList) == 1, "If estimating covariance, only specify one flatchain"
    utils.estimate_covariance(flatchainList[0], base=args.outdir)

I think a fair solution here would be the following edits:

def estimate_covariance(flatchain, base, ndim=0):

    if ndim == 0:
        d = flatchain.shape[1]
    else:
        d = args.ndim

and then:

if args.cov:
    assert len(flatchainList) == 1, "If estimating covariance, only specify one flatchain"
    utils.estimate_covariance(flatchainList[0], base=args.outdir, ndim=args.ndim)

which achieves the desired behavior: when called without arguments the covariance is computed over all dimensions.

air=True in grid_tools causing weird wavelength bug

Steps to reproduce:

import numpy as np
import Starfish
from Starfish.grid_tools import PHOENIXGridInterfaceNoAlpha

grid = PHOENIXGridInterfaceNoAlpha(wl_range=[900, 50000], air=False)
wl = grid.wl

print("Air is {}: Maximum change in wavelength {:.3f} AA".format(False, np.max(np.diff(wl))))
print("Air is {}: Minimum change in wavelength {:.3f} AA".format(False, np.min(np.diff(wl))))
print()

grid = PHOENIXGridInterfaceNoAlpha(wl_range=[900, 50000], air=True)
wl = grid.wl

print("Air is {}: Maximum change in wavelength {:.3f} AA".format(True, np.max(np.diff(wl))))
print("Air is {}: Minimum change in wavelength {:.3f} AA".format(True, np.min(np.diff(wl))))

Output

Air is False: Maximum change in wavelength 0.225 AA
Air is False: Minimum change in wavelength 0.006 AA

Air is True: Maximum change in wavelength 743.863 AA
Air is True: Minimum change in wavelength -1293.716 AA

Obviously, the wavelength grid should be monotonic. I'm not really sure what's causing this, but it seems to be related to applying the vacuum_to_air method to a larger range of wavelengths than just the optical.

Travis build is failing before it reaches pytest

Error message:
"No output has been received in the last 10 minutes, this potentially indicates a stalled build or something wrong with the build itself."

Adapting Starfish to IGRINS data: Masking NaN's in flux and sigma arrays in the OptimizeTheta step.

One barrier to adapting Starfish to IGRINS data is that IGRINS data has lots of NaN's, Infs, and -Infs in the flux and sigma arrays. The large number of NaN's stems from the unequal number of pixels per order.

In principle there is a masking array ("masks") one could use to deal with NaN's correctly. The 'masks' array is not being used when I try to optimize the Theta and Cheb parameters.

For now I have manually edited the self.fl and self.sigmas in the initialize step of the Order class. My edit is a complete hack: I replace the flux values with 1.0 and the sigma values with 99, so that these regions are downweighted to insignificance. This hack relies on flattened spectra, so that the mean is close to 1.0.

A better solution would be to take advantage of the fact that most of the NaN's are at the ends of the orders. So ideally one would trim the length of the array.

Using flux-calibrated models instead of normalized models.

The problem: If we implement the mixture model feature, we will probably need to use flux-calibrated model spectra; in other words, the non-normalized spectra natively produced by the model grid (i.e. Phoenix provides real flux units for its models). The reason for needing to use flux-calibrated models should be clear- the relative flux of the mixture model components should scale with different guesses for the effective temperature. There are a few problems that could arise. First, we'll need to toggle on-and-off the normalization whether you're in mixture model or not (that's easy enough). Second, the PCA procedure in the spectral emulation step might acquire more--or at least different--eigenspectra, since the dominant variance will now come from the absolute flux level and not the spectral lines. (I haven't fully fleshed this out, but I suspect the default of applying normalization is there for a reason.) Lastly, the logOmega term might take on a different meaning, or at least different absolute values.

Suggested solution
Just experiment with it and see how it performs. :)

Create a HDF5 grid at hi-res, NullInstrument.

Hello, I want to create a HDF5 grid at hi-res begginning at 500A. In the class HDF5Creator, the comments indicate to use the NullInstrument, this is an argument? I don't find any class or definition with this name.
What else should I modify to disable the Instrument?
Thanks!

Capitalization-sensitive filenames of filters cause platform-dependent overwriting behavior.

When cloning the Starfish repo to Mac OS X, I found that the filenames in the filters directory were already out of version control, which seemed strange since I had just cloned the file. It seems that the problem is the following: I.npy and i.npy are seen as identical files to capitalization-insensitive file systems. So one of these files was instantly over-written, and so its contents appeared changed compared to the reference copy.

Should we AutoDoc the Documentation?

The existing documentation and cookbook contains nice written descriptions of how the code works.

I have added some practical examples and figures to the documentation in Pull Request #23. However, every time I perform make html, I find that the new html content differs from the "live" html content in ways I did not expect. Specifically, the pages Grid Tools, Spectrum, Spectral Emulator, and Model contain API info from Autodoc.

As it stands, the new Autodoc API content looks cluttered, making it difficult to follow the descriptive text. I found a way to turn off the API content (by removing sphinx.ext.autodoc from the conf.py file). So I will do this, and the submit the new documentation content from PR #23.

But then the question is, should we have autodoc? I'm fairly agnostic on this point. But in principle, one could arrange it so that the descriptive content is at the top, and then the API class and method info is at the bottom. Or something sane like that.

I'll leave this here in case anyone has feelings either way.

Non-deterministic hashing in interpolator leading to weird edge behavior

Depending on how the parameter dictionary is hashed (order can differ from run to run), there is an inconsistency with the query to real spectra and error spectra. This leads to strange bi-modality in the stellar posteriors, because the wrong error spectra are used.

fix_c0 is never toggled in optimize_Cheb due to a logical comparison bug- missing parentheses

Line 423 in commit 9bd18d6

BUG: len(self.dataSpectrum.wls) > 1 should have parentheses surrounding it, otherwise the logical comparison evaluates the first two terms in order. For example with 3 spectral orders the logical comparison is currently: ( True AND 3) > 1, which always evaluates to False. The correct behavior should be True AND (3 > 1).

Partial derivatives of model flux with respect to a stellar parameter.

Partial derivatives

The problem: A user might want to inspect, say, the gravity-sensitivity of a spectral region of interest by plotting the partial derivative of the model flux with respect to log g. Right now I'm accomplishing this by running a model, saving the output spec.json file, and then differencing it with a revised spec.json file.

Suggested solution:
Some of this differencing could be put into a script, and optionally saved inside the spec.json, or some other file. splot.py could be enhanced to include plotting functionality.

v0.2.0 Roadmap

I think that modest goals for this release are for users to be able to download the package easily, install without problems (on Mac and Linux at least), and successfully run one of the examples. This is also an opportunity to check all of @gully 's work into a release, so that users can focus on downloading and installing tagged releases rather than directly from master. Please add more topics to this thread as you think they should be addressed.

clean up installation on Mac OSX #25
successful tests on travis #19
a streamlined cookbook #14 that starts with global kernels for now
remove extraneous files that cause conflict #10

Naming scheme `PHOENIXGridInterfaceNoAlpha` prevents consistent import of RawGridInterface

Line 20 of scripts/grid.py defines the grid interfaces by reading in the model grid class from the config.yaml file. I think this solution does not work as it was intended:

mygrid = eval("Starfish.grid_tools." + Starfish.data["grid_name"])()

would evaluate to:

mygrid = Starfish.grid_tools.PHOENIX()

But there is no such class. It should be:

mygrid = Starfish.grid_tools.PHOENIXGridInterface()

Well that's easy enough, you could either

Just label grid name as PHOENIXGridInterface in your config.yaml file, or
Make the change: mygrid = eval("Starfish.grid_tools." + Starfish.data["grid_name"] + "GridInterface")()

Number 1 is not so user-friendly from a choice architecture standpoint. Users want to be able to just write the name of their model grid and not have to remember GridInterface.
Number 2 is better, except that it fails for one scenario: PHOENIXGridInterfaceNoAlpha, in which case the NoAlpha screws things up.

I'm suggesting the following solution:

Put in some try/except or if/then statements to catch the various scenarios. This is probably the Best Way To Go because it would preserve backwards compatibility, at the expense of a few extra lines of code. Seems like a bargain to me.

Has anyone else encountered this problem? I simply hard-coded a fix to it months ago. I'm only now going back and trying to push some of these changes upstream. With all these things, I always feel like I'm missing something, so forgive me if I am. :)

This Issue does touch on a slightly bigger picture problem. Should the repo maintainers support/develop the scripts directory, or are these intended as customizable examples? In either case, I suppose the examples should work for at least one case.

Spectral Emulator Documentation and Examples

While the spectral emulator does a fair amount of heavy lifting inside the code, there isn't much in the documentation about setup and use.

As @gully and @kgullikson88 noted, the documentation between the grid_tools.py and the emulator code is outdated and confusing. 1c6481c is a first effort towards fixing this, but more work is needed, especially regarding the emulator.

Cookbook is missing mention of star.py --run_index [num], which helps set up directory structure.

The MCMC will fail to save if you don't have the s0_o22 directory already in place. Users can easily create these directories in place, but the run_index command does this automatically, and avoids this pitfall.

Spectra storage format specification

I would like to use this as a discussion point for fleshing out a data format spec for use in Starfish. This also pertains to #21. Spurred on by a recent post heard via @gully, I'd like to devote some time to ironing out this spec sooner rather than later. First, the possible choices (feel free to suggest more):

FITS: While astropy is already a dependency of Starfish, I would like to avoid using the FITS format to store spectra for reading into the code. This is primarily because it seems as though each observatory has a slightly different standard by which they store their data. For many of my spectra, I have been unable to use astropy to read the data reliably and in the end resorted to using Pyraf to convert the FITS file to text.

ASCSII: ASCII text files might be a good, simple solution that is easily understood by all and provides a low barrier for users ingesting spectra from new instruments. However, I do worry if future larger datasets may make this format unweildy. In particular, it isn't immediately clear to me how we should store echelle data in this format without requiring a separate file for each order.

NPY: Numpy save files. While this certainly would be easy from a Python standpoint, I would like to preserve some degree of cross-language compatibility in the data format, since there are many other routines in other languages that interact with spectra and it would be nice to allow them access. Also, preserving any metadata associated with the spectrum (e.g., barycentric correction) may be difficult.

HDF5: This format is my current preferred format. For the TRES spectra that I have been primarily dealing with, I have the good fortune that all orders contain the same number of pixels and so I have been storing spectra as 2D arrays with shape (norders, npix). As @gully mentioned in #21, this is unlikely to be the case for all spectrographs. I think I would prefer to move to a ragged array format, whereby there is a separate /00, /01, /02, etc folder for each order, within which is contained 1D arrays of wl, fl, and sigma. The order names need not necessarily be 0, 1, 2, if there are only a few orders it could also be something more descriptive like blue and red. HDF5 has the added benefit that we can continue to add in metadata as we see fit.

Masking: on a related note, when should we apply masks to data? If a raw spectrum from the telescope uses NaN or 1e-99 or other numbers to denote bad pixels, should we be propagating these values all the way through the fitting routines? Is it possible to clean these values from the spectrum as we are putting the raw spectrum into one of the above mentioned formats? My hope is that the user could clean these values in regions where it may be obvious (say 200 NaNs in a row at the edge of an order), but also allow a (possibly separate) mechanism by which the user can mask out real regions of the spectrum that cannot be reproduced by the model (e.g., telluric lines, cores of deep absorption lines). This second masking mechanism would likely be iterative, since this is more of a fitting task than a data task.

My current thinking is to continue using HDF5 in the hope that we can make the file format straightforward and easy to maintain, but I am interested in hearing what all of the contributors think!

Argparser in parallel.py prevents importing from Starfish.parallel

The header of parallel.py states the reasoning:

# parallel.py is meant to be run by other modules that import it and use the objects.
# It has an argparser because I think it's the easiest way to consolidate all of the
# parameters into one place, but I'm open to new suggestions.

The one problem I've encountered with the argparser is that I cannot, say, import the Order class from Starfish.parallel, which is useful when inspecting the API in an IPython Notebook for debugging purposes. My workaround was to make parallel_temp.py, and delete off the arg parser stuff, then importing is fine. I'm not sure this is a bug, since the vast majority of the time, the argparser has the desired behavior.

problems with setup.py

Hi
After installing anaconda and all the neccesary packages in an environment I proceeded with the installation but when I do

$ python setup.py build_ext --inplace
I get this:
usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
or: setup.py --help [cmd1 cmd2 ...]
or: setup.py --help-commands
or: setup.py cmd --help

error: no commands supplied

I get a similar message when doing the $ sudo python setup.py develop

can you help with this?
thanks, Elisa

"leading minor not positive definite" in Cholesky decomposition for lnprob calculation

Lately I've been hitting a problem in the Cholesky decomposition step during star.py --sample=ThetaPhi

  File "/anaconda/lib/python3.4/site-packages/emcee/sampler.py", line 116, in get_lnprob
    return self.lnprobfn(p, *self.args, **self.kwargs)
  File "/Users/gully/GitHub/Starfish/Starfish/parallel.py", line 696, in lnfunc
    lnp = self.evaluate()
  File "/Users/gully/GitHub/Starfish/Starfish/parallel.py", line 297, in evaluate
    factor, flag = cho_factor(CC)
  File "/anaconda/lib/python3.4/site-packages/scipy/linalg/decomp_cholesky.py", line 132, in cho_factor
    check_finite=check_finite)
  File "/anaconda/lib/python3.4/site-packages/scipy/linalg/decomp_cholesky.py", line 30, in _cholesky
    raise LinAlgError("%d-th leading minor not positive definite" % info)
numpy.linalg.linalg.LinAlgError: 2-th leading minor not positive definite

At first I thought I had too large of a covariance kernel and the fits were going crazy. I looked into it and I'm guessing it's a subtle type of roundoff error, based on this issue from scikit-learn. The issue and affiliated pull request shows how to work around it, I think.

There could be other problems at play, so I am still experimenting. I'll post what I find out.

Version 0.3 Roadmap

Version 0.2 is out, so version 0.3 roadmap is here. 🚀

Simple goals for this release:

Air/vacuum instrument class support #57
Enhanced terminal output during sampling, and/or options for quiet sampling #41
Support for two-component photosphere mixture model #35

Stretch goals for this release:

User-defined priors on Theta parameters #32
Run from a Docker container or more simply, a guide to selecting gcc versions #25
Enhanced Travis CI, perhaps with different platform support, data caching (discussion in #25)
Benchmarks or profiling information on various datasets, grids, system architectures, perhaps as part of the documentation.

Housekeeping of folders and files

The repository is currently quite cluttered, which is entirely my fault. Please let me know if anyone objects to the following changes:

I propose making an assets directory that will store things like the config.yaml file and other files that will be used to initialize a project directory.
I'm conflicted on the data folder. Since the cookbook requires these existing datasets, I think we should make them easy for users to obtain, but it seems unnecessary to have a separate folder just for a couple HDF5 files. We could upload these to figshare and provide a download link in the cookbook. If anyone has any strong feelings on this, please let me know.
extern is entirely unused now. This was from when we were wrapping the SuiteSparse library, and it's stayed around as a reminder of the old way of doing the Gibbs update, where we included a loop sampling in the local parameters for an individual line. Now this is done in an independent cycle from the other parameters, which yields nearly the same answer but is much faster and simpler to implement.
filters could be moved to assets, but I think the filter profiles themselves will want to be revisited before we tout them as a feature.
plots and tex obviously contain the source of the paper. While it's interesting from an open science viewpoint, I'm wondering if these can just be split off into their own repository. I envision most users will just want to read a PDF separately (from the arXiv) and not actually require the tex within the Starfish package.

grid.py --plot hangs and does not make plots

When I try to run grid.py with the --plot option, it gets stuck on the line p.map(plot, par_fluxes) and no plots are created. I am not familiar enough with the multiprocessing package to understand exactly what is supposed to happen here. There are no errors printed until I CTRL-C it. (I let it sit and do nothing for the better part of an hour just in case.) I'm running the latest Astroconda python v3.5.1, and the --create option of grid.py works fine.

Different local kernels

The problem: Not all residual spectrum outliers originate solely from line strength mismatches. In general, line width mismatches or line center shifts will be present in the residual spectrum as non Gaussian correlated structures. Small mismatches can mostly be absorbed by the global kernel. But large mismatches will have conspicuous p-Cygni type profiles, or double peaked profiles.

Suggested solution: As noted in Czekala et al. 2015 and elsewhere, different local kernels can be designed and instantiated.

Practical Considerations and Costs:
How will these local kernels be instantiated? Can the instatiation be done automatically? Right now the local kernels are placed with a sigma clipping strategy. Such a strategy alone would not be able to distinguish different kernels.

Hdfcopy could use stale grid

Need to add another md5 check to hdfcopy to ensure grid is most up to date

Catalog and interpret spectral line outliers: saving the right metadata

The problem: What's the physical origin of the spectral line outliers? As mentioned in Czekala et al. 2015, the spectral line outliers could be cataloged and interpreted. They could be used to make a semi-empirical spectral atlas, that uses the synthetic spectral models as a baseline.

Suggested solution: One would need to analyze a lot of spectra to actually do this, so it's really more of a research project, not anything that can be done to the code. The one thing that can be done to set the stage is to provide the right metadata that will make this type of project doable. The regions json file is the right step.
A thought-- is the residual line outlier amplitude in pixel units or absolute units? In other words, imagine I go back to catalog all of my region file line positions, widths, and strengths, based on the metadata in the local kernel json files. The line width and the line position have clear physical interpretation. But I think the line amplitude is actually reported in whatever the data units were. So to go from the amplitude listed in the regions file back to the flux units of the original model will require some work. That's fine, and a researcher could do it, but it is potentially a barrier. It might be useful to also encode the amplitude and width as an equivalent width, that can be more easily interpreted.

Verbose and quiet output flags

The problem: The code prints lots of info to standard out. Some users might want more or less info, or different info.

Suggested solution: Provide a flag for verbose or quite output. The default can be the existing setup, or whatever is deemed optimal.

Strange behavior for the spectral order with fix_c0=True in optimize_Cheb and ChebyshevSpectrum

Background: Since c0 and Omega (solid angle) are degenerate, one order's c0 must be fixed.

Issue: I've been consistently getting poor fits for the Chebyshev polynomial during star.py --optimize=Cheb for the particular spectral order that has fix_c0=True. I think there is something wrong with how fix_c0 is treated. A glance over the relevant packages (listed below) shows that there is some commented-out code. I'm not certain there's a bug here, but something is fishy.

Course of action: refactor/spot-check:
-ChebyshevSpectrum in spectrum.py
-update in spectrum.py
-optimize_Cheb in parallel.py

Priors on stellar parameters

The problem: I suspect that a common use case might resemble the following scenario: A user would like to force the value of a parameter to either a single value, or else a distribution- for example a mean and variance. Perhaps she has astro-seismic log g, or metallicity measured from a companion. Or she might wish to constrain more than one stellar property or calibration parameter. The existing framework currently assumes uniform priors for stellar parameters over the library interval. There exists a "fix_logg" mechanism to fix the log g to a singular value, but it's labeled as a "durty hack".

Suggested solution:
Add a mechanism for picking among an array of priors, or a user configurable prior. The default prior would remain as stated in Czekala et al. 2015, but through some flag, a user could specify a normal distribution, or whatever.

Roadmaps for development

This is a discussion stub for organizing the development roadmap. Since we now have the exciting privilege of multiple astronomers interested in the use and development of this package, I think this requires us to more carefully adhere to versioning than we (I--my bad) have been in the past. Right now, the current version is at v0.1.1. I like the scheme put forward in this semantic versioning document.

The question is then, how should we organize development goals for each release? We could use tags (e.g. v0.2.0) to organize issues important to address for the next upcoming releases, and then have one meta-issue that broadly summarizes all of these. This seems like a relatively easy system to start implementing. As these issues are addressed, they would be added to the changelog.md.

Installation problems. sudo change version of python

Hi,

This is more an anaconda problem (I think), but maybe you can help anyway.
When I run

sudo python setup.py develop
Error: Python 3.3 or greater required for Starfish (using 2.7.6 (default, Mar 22 2014, 22:59:56)
[GCC 4.8.2])

So, there is something wrong with the version. But, if I run without sudo, this is what I get:

python setup.py develop
running develop
running egg_info
writing dependency_links to Starfish.egg-info/dependency_links.txt
error: Starfish.egg-info/dependency_links.txt: Permission denied

Any ideas how to make sudo run with the right version of python I have in my anaconda enviroment?

Cheers,
Daniel

Latex in plot labels throws exception- missing \textrm{}

In the plot function in utils.py, the line:

labels = [r"$T_\textrm{eff}$ [K]", r"$\log g$ [dex]", r"$Z$ [dex]",
    r"$v_z$ [km/s]", r"$v \sin i$ [km/s]", r"$\log_{10} \Omega$"]

causes an exception on my Mac (OS X 10.10 with limited Latex installed).

Replacing \textrm{eff} with \mathrm{eff} seems to solve the problem.

Unless \textrm{} is preferred for some reason (they have slightly different behaviors for relative/abolute fonts...), then I recommend simply using \mathrm{}. If that doesn't work for everyone we could also do a "try: \textrm, except: \mathrm" too.

starfish-develop / starfish Goto Github PK

starfish's Introduction

Starfish

Citations

Papers

Installation Instructions

Prerequisites

Installation

Contributing

Contributors

starfish's People

Contributors

Stargazers

Watchers

Forkers

starfish's Issues

Simultaneously infer telluric correction

Step size tuning

Partial derivatives

Priors on stellar parameters

Recommend Projects

Recommend Topics

Recommend Org