Comments (3)
I'm aware of this.
I have not found good solutions besides:
- assuming the number of columns rows is the same as
sample_attributes
(not a safe assumption and would not work ifsample_attributes
is not set) - reading through the file without header and set as columns levels everything which is non-numeric (not good because there may be a factor which is entirely numeric for all samples - case above)
- encode information of the number of header rows in e.g. file name (ugly and implicit).
This just shows that CSV is not a good alternative for storing metadata.
What I recommend for now:
- do not use
Analysis.annotate_samples
withsave=True
. This is safe, and entirely backward compatible. In addition functions which need a multindex columns can produced it on the fly.
What the future will likely be:
- using formats for serialization that accommodate large numeric matrices and metadata associated with both samples and features such as
hdf5
ofh5ad/anndata
.
from toolkit.
For more context see pandas-dev/pandas#17086
from toolkit.
60b71f6 now assumes the "matrix_norm" key in Analysis.load_data
to be a non-MultiIndex dataframe.
However, it will check that all columns are of type float
(as a matrix_norm attribute should) and throw a detailed message on what is the likely cause (MultiIndex CSV) and how to read it in properly manually: analysis.matrix_norm = pandas.read_csv('{file}', index_col=0, header=list(range(x)))
, x
being the number of header rows.
Since Analysis.annotate_samples
now defaults to save=False
I think this should solve the issue.
from toolkit.
Related Issues (20)
- plot_peak_characteristics error: Initializing dataframe with iterator HOT 1
- Allow initialization of subprojects when using `Analysis.from_pep()` HOT 1
- divvy 0.4.0 not supported
- Typo in Example https://ngs-toolkit.readthedocs.io/en/latest/examples.html HOT 1
- pybedtools - bedtool cannot turn into dataframe if NAs present in intersect result HOT 2
- ATACAnalysis feature request - make chrom_state_file / build more explicit HOT 2
- Quick-fix: extra (inconsequential) comma
- Inconsisent shape of region_annotation vs. region_annotation_b
- analysis.collect_coverage - unsorted indexes when fast_and_unsafe = True
- typo in attr name (organism) HOT 3
- Bug in fisher test (differential overlap)
- Enrichr step of run_enrichment_jobs missing time
- Bug in structure of for loop iterating through gene_set_librar(ies) HOT 1
- Motif vs. Meme HOT 1
- Function call to `homer_motifs` has wrong keyword-argument
- Call to `lola` passes bed_file without checking for timestamp
- Issue with calling subprocess command HOT 2
- Bug if no significant results (e.g. from meme)
- PEP2 upgrade - KeyError getting bigwig (not a sample_attr)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from toolkit.