swarchal / morar Goto Github PK
View Code? Open in Web Editor NEWProcessing phenotypic screening data
Processing phenotypic screening data
i.e
data = morar.DataFrame(data, metadata_prefix="Image_Metadata")
Want to be able to add in a string or a list of strings as an argument for column prefixes which are classed as Metadata. e.g extra=["Nuclei", "Cells"]
No idea how to implement this, but it would be nice.
Read in each object.csv file and create median aggregates by ImageNumber
.
If files are large:
Should be able to pass an argument to utils.get_metadata
and utils.get_featuredata
TypeError: drop() takes 1 positional argument but 2 were given
Calculate PCA, find outliers in PCA space and return indices which can then be dropped in the original dataset.
At the moment it will miss columns such as Image_Metadata_compound
, if the prefix is set to "Metadata".
tests/test_utils.py::test_impute
tests/test_utils.py::test_impute_mean
tests/test_utils.py::test_impute_with_metadata
/home/scott/anaconda3/lib/python3.7/site-packages/sklearn/utils/deprecation.py:66:
DeprecationWarning: Class Imputer is deprecated; Imputer was deprecated in version
0.20 and will be removed in 0.22. Import impute.SimpleImputer from sklearn instead.
It would be nice to be able to modify the featuredata or metadata using it's attribute.
feature_subset = ["feature_1", "feature_2"]
my_data.featuredata = my_data[feature_subset]
or
my_data.featuredata = my_data.featuredata[feature_subset]
Though at the moment, this produces the following error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-48-3e99f8310ea1> in <module>()
----> 1 df.featuredata = df.featredata[previous_features]
/home/scott/.local/lib/python3.5/site-packages/pandas/core/generic.py in __getattr__(self, name)
2670 if name in self._info_axis:
2671 return self[name]
-> 2672 return object.__getattribute__(self, name)
2673
2674 def __setattr__(self, name, value):
AttributeError: 'DataFrame' object has no attribute 'featredata'
In [49]: df.featuredata = df.featuredata[previous_features]
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/home/scott/.local/lib/python3.5/site-packages/pandas/core/generic.py in __setattr__(self, name, value)
2702 else:
-> 2703 object.__setattr__(self, name, value)
2704 except (AttributeError, TypeError):
AttributeError: can't set attribute
During handling of the above exception, another exception occurred:
AttributeError Traceback (most recent call last)
<ipython-input-49-1c984f1ae98a> in <module>()
----> 1 df.featuredata = df.featuredata[previous_features]
/home/scott/.local/lib/python3.5/site-packages/pandas/core/generic.py in __setattr__(self, name, value)
2703 object.__setattr__(self, name, value)
2704 except (AttributeError, TypeError):
-> 2705 object.__setattr__(self, name, value)
2706
2707 # ----------------------------------------------------------------------
AttributeError: can't set attribute
tests/test_stats.py::test_mad_dataframe_row
/home/scott/code/morar/tests/test_stats.py:38: DeprecationWarning:
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing
~/anaconda3/lib/python3.7/site-packages/morar/dataframe.py in dropna(self, **kwargs)
96 def dropna(self, **kwargs):
97 """dropna via pandas.DataFrame.dropna"""
---> 98 _check_inplace(kwargs)
99 pandas_df = pd.DataFrame(self)
100 result = pandas_df.dropna(**kwargs)
TypeError: _check_inplace() takes 0 positional arguments but 1 was given
Unlikely that multi-indexed columned dataframes are easily converted to sqlite tables. Therefore needs to be some option to flatten the column names (simple paste?).
Ideally would like to automatically flatten columns to store as a database table, and convert back to multi-index columns when reading in as a dataframe from the database.
Already imports scikit-learn so might as well use sklearn.decomposition.PCA
Likely to have columns of identical names in different objects. Should try to use MultiIndex dataframes in pandas.
i.e
object 1 | object 2
featuredata | metadata | featuredata | metadata
Work is done by splitting the dataframe into groups via .groupby()
, and looping through each group. This should be able to be done in parallel to speed things up.
e.g, from SO:
import pandas as pd
from joblib import Parallel, delayed
import multiprocessing
def tmpFunc(df):
df['c'] = df.a + df.b
return df
def applyParallel(dfGrouped, func):
retLst = Parallel(n_jobs=multiprocessing.cpu_count())(delayed(func)(group) for name, group in dfGrouped)
return pd.concat(retLst)
if __name__ == '__main__':
df = pd.DataFrame({'a': [6, 2, 2], 'b': [4, 5, 6]},index= ['g1', 'g1', 'g2'])
print 'parallel version: '
print applyParallel(df.groupby(df.index), tmpFunc)
print 'regular version: '
print df.groupby(df.index).apply(tmpFunc)
print 'ideal version (does not work): '
print df.groupby(df.index).applyParallel(tmpFunc)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.