Comments (6)
This error suggests that you have multiple rows in your input metadata with the same cell ID (ie the cell_
column). Can you double check that your query isn't producing duplicates? For example, is it possible your pmap
is binding multiple rows with the same cell ID?
from curatedatlasqueryr.
Sorry I should have tested it before, you were right.
If it's not too annoying we could capture this error with a more informative message.
CuratedAtlasQueryR says: ...... Please check if your input metadata does not include duplicated elements in the `cell_` column. For example, execute `<your input metadata> |> count(cell, name = "number_of_cell_id_instances") |> filter(number_of_cell_id_instances > 1)`
from curatedatlasqueryr.
Would you rather I test the input data frame for duplicates (big performance implications), or just catch errors resulting from the code where I try to set the row names, and throw a better error message?
from curatedatlasqueryr.
Would
input |> pull(cell_) |> duplicates() |> length() > 0
take long for 100M rows?
or faster methods here
https://stackoverflow.com/questions/37148567/fastest-way-to-remove-all-duplicates-in-r
or just to check if duplicates exist -> anyDuplicated
...
But maybe catching the error is the actually right thing to do, as it is exactly what we are doing, replacing an error with another.
from curatedatlasqueryr.
Yeah the performance hit probably won't be too bad compared to the time it takes to actually download and process the data. I think the best function to use to detect duplicates would be one that dbplyr
supports so it can be run in the database instead of purely in R.
Up to you though.
from curatedatlasqueryr.
the input could easily be a tibble, incase you manipulate first.
I think catching the error is the most transparent thing we can do.
from curatedatlasqueryr.
Related Issues (20)
- querying counts by default
- "Author Categories" accessible? HOT 4
- submit the package to Bioconductor HOT 2
- Some studies of the Human Cell Atlas cannot be found HOT 3
- correct typo in tissue harmonised
- Error when downloading cell data while using Linux / SLURM / HPC HOT 5
- Matrix::rowSums gets -> `object 'Csparse_validate' not found` after loading `CuratedAtlasQuery` HOT 3
- how to cite this work ? HOT 2
- Add a download-nly function, for deploying the database without compiling a 30M cell SingleCellExperiment
- `Error: Binder Error: Referenced table "metadata.0.2.3.parquet" not found!`
- file_id_db=="e427efe71e8e94de5b3e48eb98236323" fails HOT 2
- Trying to use the package: accessing the METADATA_URL HOT 1
- Update the readme to reference the Bioconductor publication
- DelayedMatrix not Sparse? HOT 2
- Referenced table not found!
- take intersection of genes in the `get_single_cell_experiment` function
- add get_pseudobulk API HOT 4
- Add citation file
- cache for pseudobulk same as for single cell?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from curatedatlasqueryr.