sweverett / clustr Goto Github PK
View Code? Open in Web Editor NEWCalculates various scaling relations from cluster catalogs.
Calculates various scaling relations from cluster catalogs.
I added in the SNR ratio flag to cut data <9. 840 clusters are being removed. I am trying to figure out why the RuntimeWarning is happening.
Removed 9 clusters due to bad_mode
flag of <class 'bool'>
Removed 15 clusters due to overlap_r500
flag of <class 'bool'>
Removed 7 clusters due to overlap_r2500
flag of <class 'bool'>
Removed 41 clusters due to edge_r2500
flag of <class 'bool'>
Removed 27 clusters due to overlap_bkgd
flag of <class 'bool'>
Removed 52 clusters due to edge_bkgd
flag of <class 'bool'>
Removed 39 clusters due to masked
flag of <class 'bool'>
/home/paige/anaconda3/lib/python3.7/site-packages/astropy/table/column.py:984: RuntimeWarning: invalid value encountered in less
result = getattr(super(), op)(other)
Removed 840 clusters due to 500_kiloparsecs_SNR
flag of <class 'str'>
Removed 0 clusters due to Redshift
flag of <class 'str'>
NOTE: Removed
counts may be redundant, as some data fail multiple flags.
Accepted 191 data out of 1092
mean x error: 8.974993467153285e+42
mean y error: 4.850821240875912
To simplify things, the param.config
file should set cutoff and range values for optional flags (see #2 ), what plots are to be saved to the output file, and default values for certain inputs such as the catalog and variables.
Should we use the old file for the fit, or rewrite a new one into the clustr.py directly into the fitter class?
Preferably, the Kelly and Mantz method MCMC chain length should be an inputted parameter, with the Kelly method defaulting to say 5,000 and the Mantz method to 1,000 (NB: even at 1,000, the Mantz method is significantly slower). If possible, interface with linmix
and lrgs
to modify verbose levels.
It seems that the the key value x_label
is not found in the config file when running the following code,
def get_data(self, config, catalog):
xlabel = fits_label(config['x_label'])
ylabel = fits_label(config['y_label'])
From my understanding, the xlabel
references the fits_label
function while taking in as input config['x_label']
as the axis name but I end up with the following error,
KeyError: 'x_label'
Which I found to mean that I'm trying to access a key that is not in the dictionary. I've also tried using the following,
def get_data(self, config, catalog):
xlabel = fits_label(Config.__getitem__(x_label))
ylabel = fits_label(Config.__getitem__(y_label))
I now get the following error,
NameError: name 'x_label' is not defined
Essentially, I'm trying to figure out how to obtain x and y for the Data class.
I was thinking of making a function that is not dependent on terminal arguments but rather relies on the param.config
file such that the user sets the flags they want applied to True
in the param.config
file. Since the param.config
file contains many boolean values, the function would need a list of flag column labels to reference. If such column label is set to True
in the config file, then the function proceeds to remove all rows that contain [row, col] values we don't want included in our analysis.
For example, suppose we have this data frame df
:
|Name | Merger |
| catalog_1 | 0 |
| catalog_2 | 1 |
| catalog_3 | 0 |
0 means not a merger
1 means yes a merger
If user sets merger=True
in the param.config
, then we want to remove all rows with df["merger"] = 1
. We would then have a new dataframe df
of only good rows.
|Name | Merger |
| catalog_1 | 0 |
| catalog_3 | 0 |
I understand how to reference the config file and check if labels are set to True
. I've uploaded a naive approach for doing that but I'm having a hard time figuring out how to cut the rows from the data. I've been referencing the master
branch for inspiration but I get lost on how mask = np.zeros(len(data), dtype=bool)
and mask |= cut
is being used in both the create_cuts(data, flags)
function and get_data(options)
function.
While the cosmological parameters are now possible inputs, it is not explicitly clear to the user that a cosmology is being assumed. Perhaps should make a message asking if the assumed cosmology is acceptable - although this should be skippable for scripting purposes.
Can we use parts of Brandon Kelly's code for the fitting class we have? or is that not allowed?
CluStR
should automatically check for all dependencies and ask the user if they would like them installed through conda
or pip
, or for the user to do it themselves. Alternatively, the packages could always be included in the directory if small.
Newer versions of matplotlib
change the defaults to make the plots look worse - fix this when there is time!
Check to see if the /pickle
directory is being automatically created if needed. Oops!
I am getting this error for run_options. I'm not sure what run_options was supposed to refer to, but do you know what I should put in the main for this? thanks!
Traceback (most recent call last):
File "clustr.py", line 256, in
main()
File "clustr.py", line 239, in main
config = Config(config_filename) #(2)
TypeError: init() missing 1 required positional argument: 'run_options
The repository needs a complete Readme with:
Compared to the master
branch, we need a plotlib.py
that works for clustr.py
.
Currently, plotlib.py
clustr.py
in rewrite
does not
the fit function will not run in the fitter class. the output when this is in the main: "fits = Fitter(viable_data, plot_filename)" is only "test1" so it runs the init part then stops.
class Fitter(object):
def init(self, data, plotting_filename):
self.viable_data = data
self.plotting_filename = plotting_filename
print('test1')
def fit(self):
print('test2')
x_obs = self.viable_data[0]
y_obs = self.viable_data[1]
x_err = self.viable_data[2]
y_err = self.viable_data[3]
#run linmix
print ("Using Kelly Algorithm...")
kelly_b, kelly_m, kelly_sig = reglib.run_linmix(x_obs, y_obs, x_err, y_err)
print(kelly_b)
#use before plotting
log_x = np.log(x_obs)
x_piv = np.median(log_x)
log_y = np.log(y_obs)
return [log_x-x_piv, log_y, x_err/x_obs, y_err/y_obs, x_piv]
There's no reason to hard-code the list of possible flags--the parameter configuration file already tells us everything we need to know about new flags at runtime, so hard-coding the flags just makes the library less extensible.
We should make sure that the flagged data removal methods are working as intended - would be instructive to make plots of flagged data as well as the data that survives the flagging.
where is R being used in the code? is it only in reglib.py lines 9-16?
In the legend, xpiv
needs to be replaced with e^xpiv
.
While linmix
can handle censored data, this feature has not yet been implemented in CluStR. This will likely be done with a masking array from the inputted catalog.
There should be an optional plot that displays lrgs
's best guess at the Gaussian mixtures, as well as reporting how many Gaussians were used for most plots.
CluStR should sort any inputted flag to be a boolean
, range
, or cutoff
type automatically and remove the flagged data accordingly. Each flag type needs its own method:
Not a priority at the moment - but it would be nice to implement automatic convergence tests for the lrgs
method, as there have been convergence issues under certain failure modes. See #7 .
It's hard to keep momentum up when trying to finish all the complicated parts all at once - so let's focus on getting the script to run with minimal functionality like loading a catalog and making a plot of the x vs. y we want to fit. We can try to pass it to the linmix fitter as well!
I'm using the milestone feature here; let's plan on closing this issue by then - a week after our MCMC meeting this Wednesday.
While there is a default naming scheme and optional prefix parameter, it would be nice to have a full file name parameter option in param.config
. Could use the current scheme as the default if nothing is specified.
Whenever there is time, it would be nice to come up with a notebook displaying some of the failure modes of lrgs
that we have found, especially the case of large scatter compared to measurement error.
Now that CluStR is being used by the group again it's time for some updates! Or rather, let's use all that learned python/astro knowledge of the past 3 years and write a new, simplified, more extensible, and more general code base that wraps Kelly's linimx
package for regression on arbitrary columns in fits catalogs. We'll still need to implement some cluster-specific features but this can be accomplished with a pre-existing config structure such as yaml
.
This will also serve as a summer project for Paige (will link once she has an account!)
Here's a TODO list that we will update as needed:
clustr.py
(@paigemkelly @jjobel)main()
function w/ object-oriented structureConfig
Catalog
Data
Fitter
Config
in Catalog
or Data
to apply cuts (@sweverett)There have been some issues with handling bad cluster data with negative richness values - should add a check for this in get_data()
. Can think of other common covariate checks to add as well.
We should probably convert everything to be Python 3 compatible, just to be forward looking :)
Right now the rewrite
branch has the main code split into two copies - clustr.py
and newnewclustr.py
. That's no good! We need to consolidate into a single file that we all work off, namely clustr.py
(though we can keep some of the new implementations from newnewclustr.py
).
Taking a quick look at the two copies, I suggest that we use keep the following from each file:
clustr.py
:
Config
argparse
arguments in the ArgumentParser
class, but move them into the correct format in newnewclustr.py
get_data()
function in the Catalog
class (which we will co-opt into the new Data()
constructornewnewclustr.py
:
parser
structure (at the top of the file)main()
function, as it's using the new OO designCatalog
, as we're restructuring it in the new designRemember that we're not going to 'lose' anything by consolidating - the main code is still in the master
branch! We can still reuse things from the main code if we find them helpful, but we're also not forced to use it.
We should use the warnings module to raise our warnings.
Should we build off the plotlib.py code you already wrote? or start fresh with the plotting. Jose and I have only done basic plots, so we were gonna take a class or something to figure out how to do that.
It would be nice for one of the plot options to be a series of residual plots. May add options later, but for now will plot all options.
In the plot that Lena made on the plane, the lambda values of at least a couple of clusters seem to be wrong. Most noticeable are two clusters with Tx>10 which appear in the plot with lambda ~ 35, but in the catalog actually have lambda ~ 55.
Noner2500_temperature-lambda.pdf
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.