sweverett / clustr Goto Github PK

5.0 5.0 4.0 457 KB

Calculates various scaling relations from cluster catalogs.

Python 100.00%

clustr's Issues

SNR flag

I added in the SNR ratio flag to cut data <9. 840 clusters are being removed. I am trying to figure out why the RuntimeWarning is happening.

Removed 9 clusters due to bad_mode flag of <class 'bool'>
Removed 15 clusters due to overlap_r500 flag of <class 'bool'>
Removed 7 clusters due to overlap_r2500 flag of <class 'bool'>
Removed 41 clusters due to edge_r2500 flag of <class 'bool'>
Removed 27 clusters due to overlap_bkgd flag of <class 'bool'>
Removed 52 clusters due to edge_bkgd flag of <class 'bool'>
Removed 39 clusters due to masked flag of <class 'bool'>
/home/paige/anaconda3/lib/python3.7/site-packages/astropy/table/column.py:984: RuntimeWarning: invalid value encountered in less
result = getattr(super(), op)(other)
Removed 840 clusters due to 500_kiloparsecs_SNR flag of <class 'str'>
Removed 0 clusters due to Redshift flag of <class 'str'>

NOTE: Removed counts may be redundant, as some data fail multiple flags.
Accepted 191 data out of 1092

mean x error: 8.974993467153285e+42
mean y error: 4.850821240875912

Parameter File Options

To simplify things, the param.config file should set cutoff and range values for optional flags (see #2 ), what plots are to be saved to the output file, and default values for certain inputs such as the catalog and variables.

fitter class

Should we use the old file for the fit, or rewrite a new one into the clustr.py directly into the fitter class?

MCMC Chain Input and Verbose

Preferably, the Kelly and Mantz method MCMC chain length should be an inputted parameter, with the Kelly method defaulting to say 5,000 and the Mantz method to 1,000 (NB: even at 1,000, the Mantz method is significantly slower). If possible, interface with linmix and lrgs to modify verbose levels.

Keyerror Raised

It seems that the the key value x_label is not found in the config file when running the following code,

def get_data(self, config, catalog):
        xlabel = fits_label(config['x_label'])
        ylabel = fits_label(config['y_label'])

From my understanding, the xlabel references the fits_label function while taking in as input config['x_label'] as the axis name but I end up with the following error,

KeyError: 'x_label'

Which I found to mean that I'm trying to access a key that is not in the dictionary. I've also tried using the following,

def get_data(self, config, catalog):
        xlabel = fits_label(Config.__getitem__(x_label))
        ylabel = fits_label(Config.__getitem__(y_label))

I now get the following error,

NameError: name 'x_label' is not defined

Essentially, I'm trying to figure out how to obtain x and y for the Data class.

Flags for Rewrite

I was thinking of making a function that is not dependent on terminal arguments but rather relies on the param.config file such that the user sets the flags they want applied to True in the param.config file. Since the param.config file contains many boolean values, the function would need a list of flag column labels to reference. If such column label is set to True in the config file, then the function proceeds to remove all rows that contain [row, col] values we don't want included in our analysis.

For example, suppose we have this data frame df:

|Name       | Merger |
| catalog_1 |    0   |
| catalog_2 |    1   |
| catalog_3 |    0   |

0 means not a merger
1 means yes a merger

If user sets merger=True in the param.config, then we want to remove all rows with df["merger"] = 1. We would then have a new dataframe df of only good rows.

|Name       | Merger |
| catalog_1 |    0   |
| catalog_3 |    0   |

I understand how to reference the config file and check if labels are set to True. I've uploaded a naive approach for doing that but I'm having a hard time figuring out how to cut the rows from the data. I've been referencing the master branch for inspiration but I get lost on how mask = np.zeros(len(data), dtype=bool) and mask |= cut is being used in both the create_cuts(data, flags) function and get_data(options) function.

Cosmology Check

While the cosmological parameters are now possible inputs, it is not explicitly clear to the user that a cosmology is being assumed. Perhaps should make a message asking if the assumed cosmology is acceptable - although this should be skippable for scripting purposes.

using parts other peoples code?

Can we use parts of Brandon Kelly's code for the fitting class we have? or is that not allowed?

Check Dependencies

CluStR should automatically check for all dependencies and ask the user if they would like them installed through conda or pip, or for the user to do it themselves. Alternatively, the packages could always be included in the directory if small.

Fix plotting aesthetics

Newer versions of matplotlib change the defaults to make the plots look worse - fix this when there is time!

https://matplotlib.org/users/dflt_style_changes.html

Pickle directory!

Check to see if the /pickle directory is being automatically created if needed. Oops!

run_options

I am getting this error for run_options. I'm not sure what run_options was supposed to refer to, but do you know what I should put in the main for this? thanks!

Traceback (most recent call last):
File "clustr.py", line 256, in
main()
File "clustr.py", line 239, in main
config = Config(config_filename) #(2)
TypeError: init() missing 1 required positional argument: 'run_options

Complete Readme

The repository needs a complete Readme with:

General description of CluStR
User instructions
Example usage, possibly with Jupyter notebook
Possibly a parameter and makefile

Plotlib.py Rework

Compared to the master branch, we need a plotlib.py that works for clustr.py.
Currently, plotlib.py

Includes global variables
No Classes
clustr.py in rewrite does not
- import plotlib
- Make a plot using plotlib

fit function of fitter class won't run

the fit function will not run in the fitter class. the output when this is in the main: "fits = Fitter(viable_data, plot_filename)" is only "test1" so it runs the init part then stops.

class Fitter(object):
def init(self, data, plotting_filename):
self.viable_data = data
self.plotting_filename = plotting_filename
print('test1')
def fit(self):
print('test2')
x_obs = self.viable_data[0]
y_obs = self.viable_data[1]
x_err = self.viable_data[2]
y_err = self.viable_data[3]
#run linmix
print ("Using Kelly Algorithm...")
kelly_b, kelly_m, kelly_sig = reglib.run_linmix(x_obs, y_obs, x_err, y_err)
print(kelly_b)
#use before plotting
log_x = np.log(x_obs)
x_piv = np.median(log_x)
log_y = np.log(y_obs)

    return [log_x-x_piv, log_y, x_err/x_obs, y_err/y_obs, x_piv]

Don't hard-code flags

There's no reason to hard-code the list of possible flags--the parameter configuration file already tells us everything we need to know about new flags at runtime, so hard-coding the flags just makes the library less extensible.

Check Flagged Data Consistency

We should make sure that the flagged data removal methods are working as intended - would be instructive to make plots of flagged data as well as the data that survives the flagging.

getting rid of R

where is R being used in the code? is it only in reglib.py lines 9-16?

Fix incorrect scatter plot legend

In the legend, xpiv needs to be replaced with e^xpiv.

Censored Data

While linmix can handle censored data, this feature has not yet been implemented in CluStR. This will likely be done with a masking array from the inputted catalog.

LRGS Gaussian Plots

There should be an optional plot that displays lrgs's best guess at the Gaussian mixtures, as well as reporting how many Gaussians were used for most plots.

Flag Handling

CluStR should sort any inputted flag to be a boolean, range, or cutoff type automatically and remove the flagged data accordingly. Each flag type needs its own method:

Boolean
Range
Cutoff

LRGS Chain Convergence

Not a priority at the moment - but it would be nice to implement automatic convergence tests for the lrgs method, as there have been convergence issues under certain failure modes. See #7 .

Make PEP 8 compatible

Let's bring this repository up to the general Python standards by making it PEP 8 compatible! This can be done by running pylint or flake8 on the repository and making the suggested changes.

Complete first-pass run of updated pipeline

It's hard to keep momentum up when trying to finish all the complicated parts all at once - so let's focus on getting the script to run with minimal functionality like loading a catalog and making a plot of the x vs. y we want to fit. We can try to pass it to the linmix fitter as well!

I'm using the milestone feature here; let's plan on closing this issue by then - a week after our MCMC meeting this Wednesday.

File Name Options

While there is a default naming scheme and optional prefix parameter, it would be nice to have a full file name parameter option in param.config. Could use the current scheme as the default if nothing is specified.

LRGS Failure Modes

Whenever there is time, it would be nice to come up with a notebook displaying some of the failure modes of lrgs that we have found, especially the case of large scatter compared to measurement error.

CluStR rewrite!

Now that CluStR is being used by the group again it's time for some updates! Or rather, let's use all that learned python/astro knowledge of the past 3 years and write a new, simplified, more extensible, and more general code base that wraps Kelly's linimx package for regression on arbitrary columns in fits catalogs. We'll still need to implement some cluster-specific features but this can be accomplished with a pre-existing config structure such as yaml.

This will also serve as a summer project for Paige (will link once she has an account!)

Here's a TODO list that we will update as needed:

Combine different cluster files into a single clustr.py (@paigemkelly @jjobel)
Setup new environment file, config file, and IO processing (@sweverett)
Make new main() function w/ object-oriented structure
Implement all new classes:
- Config
- Catalog
- Data
- Fitter
Implement new flag structure used by Config in Catalog or Data to apply cuts (@sweverett)

Add richness value checks

There have been some issues with handling bad cluster data with negative richness values - should add a check for this in get_data(). Can think of other common covariate checks to add as well.

Python 3 Compatability

We should probably convert everything to be Python 3 compatible, just to be forward looking :)

Rewrite branch needs to consolidate code

Right now the rewrite branch has the main code split into two copies - clustr.py and newnewclustr.py. That's no good! We need to consolidate into a single file that we all work off, namely clustr.py (though we can keep some of the new implementations from newnewclustr.py).

Taking a quick look at the two copies, I suggest that we use keep the following from each file:

clustr.py:

Config
The additional argparse arguments in the ArgumentParser class, but move them into the correct format in newnewclustr.py
E(z) function (this is a cosmological function)
The get_data() function in the Catalog class (which we will co-opt into the new Data() constructor

newnewclustr.py:

The new parser structure (at the top of the file)
main() function, as it's using the new OO design
Catalog, as we're restructuring it in the new design

Remember that we're not going to 'lose' anything by consolidating - the main code is still in the master branch! We can still reuse things from the main code if we find them helpful, but we're also not forced to use it.

to do

put the 3 masks into one
move the list of allowed flags (boolean, range, snr) into the config file and access it from there
reformat the config file to yaml format
add at the end of each flag statement something that removes the NaNs
double check error handling with Aryas code
check for mistake that is making b, m, sigma different
make a True/False for boolean flags

Switch flag type to enum

Currently, we compare flag types to strings, which is somewhat error-prone. Once we implement #12, we should switch over to using Enums, which are cleaner and less error-prone.

Use warnings module

We should use the warnings module to raise our warnings.

plotting

Should we build off the plotlib.py code you already wrote? or start fresh with the plotting. Jose and I have only done basic plots, so we were gonna take a class or something to figure out how to do that.

Residual Plots

It would be nice for one of the plot options to be a series of residual plots. May add options later, but for now will plot all options.

x axis wrong?

In the plot that Lena made on the plane, the lambda values of at least a couple of clusters seem to be wrong. Most noticeable are two clusters with Tx>10 which appear in the plot with lambda ~ 35, but in the catalog actually have lambda ~ 55.
Noner2500_temperature-lambda.pdf

sweverett / clustr Goto Github PK

clustr's Issues

Recommend Projects

Recommend Topics

Recommend Org