Code Monkey home page Code Monkey logo

morar's Issues

Stop using deprecated .ix in tests

tests/test_stats.py::test_mad_dataframe_row
  /home/scott/code/morar/tests/test_stats.py:38: DeprecationWarning: 
  .ix is deprecated. Please use
  .loc for label based indexing or
  .iloc for positional indexing

Add extra to utils.img_to_metadata

Want to be able to add in a string or a list of strings as an argument for column prefixes which are classed as Metadata. e.g extra=["Nuclei", "Cells"]

Find outliers with PCA

Calculate PCA, find outliers in PCA space and return indices which can then be dropped in the original dataset.

calling morar.utils.drop() results in a TypeError

~/anaconda3/lib/python3.7/site-packages/morar/dataframe.py in dropna(self, **kwargs)
     96     def dropna(self, **kwargs):
     97         """dropna via pandas.DataFrame.dropna"""
---> 98         _check_inplace(kwargs)
     99         pandas_df = pd.DataFrame(self)
    100         result = pandas_df.dropna(**kwargs)

TypeError: _check_inplace() takes 0 positional arguments but 1 was given

Modify dataframe.featuredata or dataframe.metadata

It would be nice to be able to modify the featuredata or metadata using it's attribute.

feature_subset = ["feature_1", "feature_2"]
my_data.featuredata = my_data[feature_subset]

or

my_data.featuredata = my_data.featuredata[feature_subset]

Though at the moment, this produces the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-48-3e99f8310ea1> in <module>()
----> 1 df.featuredata = df.featredata[previous_features]

/home/scott/.local/lib/python3.5/site-packages/pandas/core/generic.py in __getattr__(self, name)
   2670             if name in self._info_axis:
   2671                 return self[name]
-> 2672             return object.__getattribute__(self, name)
   2673 
   2674     def __setattr__(self, name, value):

AttributeError: 'DataFrame' object has no attribute 'featredata'

In [49]: df.featuredata = df.featuredata[previous_features]
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/home/scott/.local/lib/python3.5/site-packages/pandas/core/generic.py in __setattr__(self, name, value)
   2702                 else:
-> 2703                     object.__setattr__(self, name, value)
   2704             except (AttributeError, TypeError):

AttributeError: can't set attribute

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
<ipython-input-49-1c984f1ae98a> in <module>()
----> 1 df.featuredata = df.featuredata[previous_features]

/home/scott/.local/lib/python3.5/site-packages/pandas/core/generic.py in __setattr__(self, name, value)
   2703                     object.__setattr__(self, name, value)
   2704             except (AttributeError, TypeError):
-> 2705                 object.__setattr__(self, name, value)
   2706 
   2707     # ----------------------------------------------------------------------

AttributeError: can't set attribute

read and aggregate

Read in each object.csv file and create median aggregates by ImageNumber.

If files are large:

  • read in chunks; aggregate by ImageNumber
  • store aggregated intermediates
  • read in intermediates
  • row-bind into single dataframe
  • remove intermediates

Stop using deprecated Imputer from sklearn

tests/test_utils.py::test_impute
tests/test_utils.py::test_impute_mean
tests/test_utils.py::test_impute_with_metadata
  /home/scott/anaconda3/lib/python3.7/site-packages/sklearn/utils/deprecation.py:66:
    DeprecationWarning: Class Imputer is deprecated; Imputer was deprecated in version
    0.20 and will be removed in 0.22. Import impute.SimpleImputer from sklearn instead.

parallelise normalise functions

Work is done by splitting the dataframe into groups via .groupby(), and looping through each group. This should be able to be done in parallel to speed things up.

e.g, from SO:

import pandas as pd
from joblib import Parallel, delayed
import multiprocessing

def tmpFunc(df):
    df['c'] = df.a + df.b
    return df

def applyParallel(dfGrouped, func):
    retLst = Parallel(n_jobs=multiprocessing.cpu_count())(delayed(func)(group) for name, group in dfGrouped)
    return pd.concat(retLst)

if __name__ == '__main__':
    df = pd.DataFrame({'a': [6, 2, 2], 'b': [4, 5, 6]},index= ['g1', 'g1', 'g2'])
    print 'parallel version: '
    print applyParallel(df.groupby(df.index), tmpFunc)

    print 'regular version: '
    print df.groupby(df.index).apply(tmpFunc)

    print 'ideal version (does not work): '
    print df.groupby(df.index).applyParallel(tmpFunc)

Multi-indexing & databases.

Unlikely that multi-indexed columned dataframes are easily converted to sqlite tables. Therefore needs to be some option to flatten the column names (simple paste?).

Ideally would like to automatically flatten columns to store as a database table, and convert back to multi-index columns when reading in as a dataframe from the database.

store object files as MultiIndex DataFrame

Likely to have columns of identical names in different objects. Should try to use MultiIndex dataframes in pandas.

i.e

              object 1                |                object 2
         featuredata | metadata       |          featuredata | metadata  

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.