Code Monkey home page Code Monkey logo

maximilianh / cellbrowser Goto Github PK

View Code? Open in Web Editor NEW
101.0 101.0 40.0 81.67 MB

main repo: https://github.com/ucscGenomeBrowser/cellBrowser/ - Python pipeline and Javascript scatter plot library for single-cell datasets, http://cellbrowser.rtfd.org

Home Page: https://github.com/ucscGenomeBrowser/cellBrowser/

License: GNU General Public License v3.0

Python 34.79% CSS 2.49% JavaScript 61.12% HTML 0.06% Makefile 0.05% AngelScript 0.05% R 1.14% Shell 0.29%

cellbrowser's People

Contributors

braneyboo avatar brittneydwick avatar christopherlee1 avatar flying-sheep avatar gusevfe avatar inodb avatar ivirshup avatar kriemo avatar matthewspeir avatar maximilianh avatar mxposed avatar pcm32 avatar rachadele avatar redst4r avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cellbrowser's Issues

pbmc3k example problems

rsync -Lavzp genome-test.gi.ucsc.edu::cells/datasets/pbmc3k ./pbmc3k/ --progress
still gets this error
symlink has no referent: "/datasets/pbmc3k/cbScanpy" (in cells)

Also, minor detail: with the download example as is you are creating this directory structure:
cellData/pbmc3k/pbmc3k
The example presumes it is really cellData/pbmc3k. To correct this, the rsync command should probably be:

rsync -Lavzp genome-test.gi.ucsc.edu::cells/datasets/pbmc3k ./ --progress

Or, of course, run it from one directory up and do

rsync -Lavzp genome-test.gi.ucsc.edu::cells/datasets/pbmc3k ./cellData --progress

Conda install fails for 0.4.30 due to missing setup.cfg file

The inclusion of versioneer.py adds an unsatisfied requirement for a setup.cfg file (with content relevant to vesioneer I presume). The file is mentioned profusely in the versioneer comments, and required for instance here:

https://github.com/maximilianh/cellBrowser/blob/master/versioneer.py#L341

Error on conda looks like:

Downloading https://github.com/maximilianh/cellBrowser/archive/v0.4.30.tar.gz
Success
Extracting download
source tree in: /opt/conda/conda-bld/ucsc-cell-browser_1547827379096/work
Traceback (most recent call last):
  File "setup.py", line 9, in <module>
    version=versioneer.get_version(),
  File "/opt/conda/conda-bld/ucsc-cell-browser_1547827379096/work/versioneer.py", line 1480, in get_version
    return get_versions()["version"]
  File "/opt/conda/conda-bld/ucsc-cell-browser_1547827379096/work/versioneer.py", line 1412, in get_versions
    cfg = get_config_from_root(root)
  File "/opt/conda/conda-bld/ucsc-cell-browser_1547827379096/work/versioneer.py", line 343, in get_config_from_root
    with open(setup_cfg, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/opt/conda/conda-bld/ucsc-cell-browser_1547827379096/work/setup.cfg'

Traceback (most recent call last):
  File "/opt/conda/bin/conda-build", line 11, in <module>
    sys.exit(main())
  File "/opt/conda/lib/python3.6/site-packages/conda_build/cli/main_build.py", line 438, in main
    execute(sys.argv[1:])
  File "/opt/conda/lib/python3.6/site-packages/conda_build/cli/main_build.py", line 429, in execute
    verify=args.verify)
  File "/opt/conda/lib/python3.6/site-packages/conda_build/api.py", line 201, in build
    notest=notest, need_source_download=need_source_download, variants=variants)
  File "/opt/conda/lib/python3.6/site-packages/conda_build/build.py", line 2204, in build_tree
    notest=notest,
  File "/opt/conda/lib/python3.6/site-packages/conda_build/build.py", line 1445, in build
    utils.check_call_env(cmd, env=env, cwd=src_dir, stats=build_stats)
  File "/opt/conda/lib/python3.6/site-packages/conda_build/utils.py", line 313, in check_call_env
    return _func_defaulting_env_to_os_environ('call', *popenargs, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/conda_build/utils.py", line 293, in _func_defaulting_env_to_os_environ
    raise subprocess.CalledProcessError(proc.returncode, _args)
subprocess.CalledProcessError: Command '['/bin/bash', '-e', '/opt/conda/conda-bld/ucsc-cell-browser_1547827379096/work/conda_build.sh']' returned non-zero exit status 1.

cbScanpy dies on PAGA

This is version 0.4.23
cbScanpy -e </path/to/my.mtx> -o ipf3_files -n ipf3
gets

Performing UMAP
Performing PHATE
Performing PAGA+ForceAtlas2
Traceback (most recent call last):
  File "/home/cellranger/anaconda3/envs/cbConda/bin/cbScanpy", line 11, in <module>
    sys.exit(cbScanpyCli())
  File "/home/cellranger/anaconda3/envs/cbConda/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 3425, in cbScanpyCli
    adata = cbScanpy(matrixFname, confFname, figDir, logFname)
  File "/home/cellranger/anaconda3/envs/cbConda/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 3331, in cbScanpy
    adata.obsm["X_pagaFa"] = adata.obsm["X_draw_graph_fa"]
  File "/home/cellranger/anaconda3/envs/cbConda/lib/python3.6/site-packages/numpy/core/records.py", line 500, in __getitem__
    obj = super(recarray, self).__getitem__(indx)
ValueError: no field of name X_draw_graph_fa

color by multiple genes

Almost everyone wants to color by multiple genes. The current plan is to make these changes:

  • change the gene search box to multi select up to three genes (easy)
  • each gene has one color, red, green, blue
  • show three sliders where you can set a cutoff to binarize expression (hard - there is not a lot of screen space left)
  • mix the colors together based on the expression values (easy, but may be very slow)

rsync to hgwdev gets Connection refused

Following the README:
rsync -Lavzp hgwdev.soe.ucsc.edu::cells/datasets/pbmc3k ./pbmc3k/ --progress

Results in

rsync: failed to connect to hgwdev.soe.ucsc.edu (128.114.198.32): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(125) [Receiver=3.1.2]

`cbBuild -i config-file` fails on latest release version 0.2

When running cbBuild like this cbBuild -i config-file -o output on a directory:

ls -l
total 2040
-rw-r--r--    1 root     root           157 Sep 17 14:09 config-file
-rw-r--r--    1 root     root       1802587 Aug  8 12:52 exprMatrix.tsv
-rw-r--r--    1 root     root        181579 Aug  8 12:55 meta.tsv
-rw-r--r--    1 root     root         90896 Aug  8 12:54 tsne.coords.tsv

It fails with ERROR:root:File cellbrowser.conf does not exist.

If I rename the config file to cellbrowser.config, then it doesn't fail (at the same error). I presume this is because the argument parsing is appending config files and it has a default (so if the default is not found, it fails).

This is with the recently tagged 0.2 version.

Stable colors

I'd like to be able to keep the same color assigned to the same metadata (like Osteosarcoma always being purple). In the past when we've looked at clustered data repeatedly over the course of days and months, stable colors have been required to remember what we've already learned about the relationships among the data points.

Ideally we'd be able to supply a file assigning specific colors to specific metadata values. It could be part of the existing metadata file. In that case, in addition to the disease column, we'd have something like a disease-color column.

permissions issue when building into webserver directory

Hi, thank you for building and releasing this really beautiful browser. I am attempting to build a cell browser served by Apache on a machine running CentOS 7.5. Running a demo browser using your example data works just fine:

curl -s https://cells.ucsc.edu/downloads/samples/mini.tgz | tar xvz
cd mini
cbBuild -o ~/public_html/cells/ -p 8888
# cellBrowser is viewable at ip_address:8888

However, when I try to build to the webserver's webdirectory (still working with the example data), I run into a problem. It looks like a permissions problem although I am running as sudo. Here is the message I see:

-bash-4.2$ sudo cbBuild -o /var/www/
Enter your Domain password:
INFO:root:/home/skelld/mini/summary.html does not exist
INFO:root:/home/skelld/mini/methods.html does not exist
INFO:root:/home/skelld/mini/downloads.html does not exist
INFO:root:/home/skelld/mini/thumb.png does not exist
INFO:root:Getting md5 of /home/skelld/mini/meta.tsv
md5sum: /home/skelld/mini/meta.tsv: Permission denied
Traceback (most recent call last):
  File "/bin/cbBuild", line 11, in <module>
    sys.exit(cbBuildCli())
  File "/usr/lib/python2.7/site-packages/cellbrowser/cellbrowser.py", line 2578, in cbBuildCli
    cbBuild(confFnames, outDir, port)
  File "/usr/lib/python2.7/site-packages/cellbrowser/cellbrowser.py", line 2545, in cbBuild
    convertDataset(inConf, outConf, datasetDir)
  File "/usr/lib/python2.7/site-packages/cellbrowser/cellbrowser.py", line 2296, in convertDataset
    sampleNames, needFilterMatrix, outMeta = convertMeta(inConf, outConf, datasetDir)
  File "/usr/lib/python2.7/site-packages/cellbrowser/cellbrowser.py", line 2123, in convertMeta
    outConf["fileVersions"]["inMeta"] = getFileVersion(metaFname)
  File "/usr/lib/python2.7/site-packages/cellbrowser/cellbrowser.py", line 2109, in getFileVersion
    hexHash = md5ForFile(fname)
  File "/usr/lib/python2.7/site-packages/cellbrowser/cellbrowser.py", line 2211, in md5ForFile
    md5 = getMd5Using("md5sum", fname).split()[0]
  File "/usr/lib/python2.7/site-packages/cellbrowser/cellbrowser.py", line 2204, in getMd5Using
    assert(err==0)
AssertionError

Any idea whether I am doing something wrong? Or could this be a bug in cellBrowser? Thanks!

displaying subsets of the data

When working with large single-cell datasets, it is often useful to look at two or more levels. Several papers have done analysis at two or more levels:

  • Level 1: PCA and tSNE on the full set of all cells.
  • Level 2: PCA and tSNE on each major subset of cells (e.g. only T cells, or only B cells, or only fibroblasts).

When browsing the data at Level 2, it's necessary to hide all of the cells except the cells in the chosen subset. Then the user can browse just the different clusters of T cells, for example, without worrying about all the other cell types in an experiment.

It would be nice to support this type of subset-level analysis in cellBrowser.

Thinking about how this might be implemented...

I think you might already be most of the way there, since you support multiple files with cell coordinates:

# tsv files with coordinates of every sample in format <sampleId, x, y>
# first the name of the file, then a human readable description
coords=[
{"file":"tsne.coords.tsv", "shortLabel":"t-SNE on WGCNA"},
]

What happens if one of the coordinate files only lists coordinates for a subset of cells instead of all cells?

(I haven't tried, so I apologize in advance if this is already supported and I'm not aware.) It would be cool if cellBrowser automatically figures out that it should hide the cells that are not listed in a given coordinate file.

I wonder if you have thoughts about how to organize and navigate these types of subset-level results?

Typo in writeCellBrowser

def writeCellbrowserConf(name, coordsList, fname, args={}):
    for c in name:
        assert(c.isalnum() or c in ["-", "_"]) # only digits and letters are allowed in dataset names

    metaFname = args.get("meta", "meta.tsv")
    clusterField = args.get("clusterField", "Louvain Cluster")
    coordStr = json.dumps(coordList, indent=4)

Input is coordslist (with an s) not coordlist

find cells

From Holly Beale:

  • find cells by wildcard or regex on cellId
  • or by gene expression >, < or = to numeric value

typo in README?

cbScanpy -e filtered_gene_bc_matrices/hg19/matrix.mtx -o scanpyout -n pbmc3k cd scanpyout cbBuild -o ~/public_html/cb -p 8888

appears to be missing a ; before cd

Static files are cached between the updates

If you have CellBrowser installation already running and you want to update it, users that already visited your installation won't see the new js and css files, because they have their old copies in browser cache.

It would be great to invalidate that cache when a new version gets uploaded.
One way to do it is to include a version number or file signature to the generated index.html when running cbBuild

Then, the links to static files would have caching parameter ?XXX which would stay unchanged when the website is functioning, but will get updated and invalidate browser cache when we want to update the website.

auto-generate colors more sophisticatedly, taking advantage of lightness & saturation

Right now, the default colors are all 100% brightness, 100% saturated. This is fine when there are a small number of fields but the coloring quickly gets indistinct as the count goes up. Even for 20 Seurat clusters, there are four shades of green and two shades of blue that I can't distinguish; for the 70 or so diseases, there are many that look the same.

If you were to use the full color space, we'd be able to get many more distinct colors at the cost of a somewhat less vibrant visualization...

A couple of different tools I've found that promise this:

https://github.com/internalfx/distinct-colors
http://tools.medialab.sciences-po.fr/iwanthue/

hide selection

From Holly Beale

bring back the "View > hide selection" and "View > show all" menu entries

scanpy example throws error

../../cellBrowser/src/cbScanpy -e filtered_gene_bc_matrices/hg19/matrix.mtx -o myscanpyout -n pbmc3k

INFO:root:Creating myscanpyout
cbScanpy $Id: 1e7c40a1801e210aa974d42ddb5ea5f4d3f225e6 $
Input file: filtered_gene_bc_matrices/hg19/matrix.mtx
Start time: 2018-10-11 21:19:08.077474
scanpy==1.3.2 anndata==0.6.11 numpy==1.14.6 scipy==1.1.0 pandas==0.23.4 scikit-learn==0.20.0 statsmodels==0.9.0
INFO:root:Loading expression matrix: mtx format
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Data has 2700 samples/observations
Data has 32738 genes/variables
Basic filtering: keep only cells with min 200 genes
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Basic filtering: keep only gene with min 3 cells
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Remove cells with more than 0.050000 percent of mitochondrial genes
Computing percentage of mitochondrial genes
Remove cells with less than 10 and more than 15000 genes
Filtering cells
After filtering: Data has 2643 samples/observations and 13714 genes/variables
Expression normalization, counts per cell = 10000
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Finding highly variable genes: min_mean=0.012500, max_mean=3.000000, min_disp=0.500000
Number of variable genes identified: 1844
Did log2'ing of data
Performing initial PCA, number of PCs: 100
Using 20 PCs as configured in config
Performing tSNE
Performing Louvain Clustering, using 20 PCs and 5 neighbors
  File "../../cellBrowser/src/cbPyLib/cellbrowser.py", line 3267, in cbScanpyCli
    adata = cbScanpy(matrixFname, confFname, figDir, logFname)
  File "../../cellBrowser/src/cbPyLib/cellbrowser.py", line 3190, in cbScanpy
    sc.tl.louvain(adata, resolution=res)
  File "/home/jeltjessh/venvs/cellBrowser/lib/python3.6/site-packages/scanpy/tools/louvain.py", line 103, in louvain
    g = utils.get_igraph_from_adjacency(adjacency, directed=directed)
  File "/home/jeltjessh/venvs/cellBrowser/lib/python3.6/site-packages/scanpy/utils.py", line 304, in get_igraph_from_adjacency
    import igraph as ig
> /home/jeltjessh/venvs/cellBrowser/lib/python3.6/site-packages/scanpy/utils.py(304)get_igraph_from_adjacency()
-> import igraph as ig
(Pdb)

... and then I'm stuck in the (Pdb) interface.

(Pdb) exit
INFO:root:Writing anndata object to myscanpyout/anndata.h5ad
Traceback (most recent call last):
  File "../../cellBrowser/src/cbScanpy", line 13, in <module>
    cellbrowser.cbScanpyCli()
  File "../../cellBrowser/src/cbPyLib/cellbrowser.py", line 3275, in cbScanpyCli
    adata.write(adFname)
UnboundLocalError: local variable 'adata' referenced before assignment

Double slash // in URL when trying to load geneMatrix.tsv causes 404 on AWS

When I tried to change the genes in the gene box on our cellBrowser hosted on AWS, the "loading" popup hung indefinitely. The console had error message:

GET http://tsne.treehouse.gi.ucsc.edu/TH_CompV4//geneMatrix.tsv 404 (Not Found)

And in fact http://tsne.treehouse.gi.ucsc.edu/TH_CompV4//geneMatrix.tsv does respond with a 404, despite that http://tsne.treehouse.gi.ucsc.edu/TH_CompV4/geneMatrix.tsv (v large download) is present.

I removed the leading slash from "/geneMatrix.tsv" on line 1214 on my copy of the code and that resolved the particular problem, but I assume there are other places it may be relevant.

datasets to add

  • Zeng Tasic 2015 1.5 k brain paper or bioarxiv 2018 Tasic, Yao, Koch, Zeng
  • treehouse
  • SRA, Nuno

Gene box functionality

When we view our data with the browser, the gene boxes are less functional that with the test data. The boxes don't dynamically change color based on the circle selected.

display gene table and heatmap

Right now the table of genes with p-values is only visible after clicking on the name of a cluster.

This means you have to choose which display you want to see, either the map of cells or the table of genes, but not both at the same time.

You might consider an alternative display, as in the Loupe software by 10X Genomics.

Here are two screenshots that show how they do it:

image

image

I like the interactive table:

  • When you click on a cluster name in the table header, you get to see the genes ranked by p-value in that cluster.
  • The other clusters that are out-of-focus only show a summarized Log2 Fold-Change, not the p-value.

What do you think?

No cluster markers in example

When I load the example dataset, I can click on the named clusters ('OPC', say) and a new window opens up ('Cluster markers for OPC') but the gene list never fills.
Is there something I should run separately to make that happen?
And is there a server log? Nothing rolls on my screen after Point your internet browser(...)

R fails silently instead of producing an error code

Here

library(argparser, quietly=TRUE)
R fails silently, without producing an error code, which is normally used in workflow environments/bash pipelines to detect whether a process failed or not. If the code fails, it should exit with an error code (anything not zero). In that line, I think that correct package is argparse and not argparser.

Thanks!

Example build fails on MacOS with python2

Similar to #35, running the example build command throws an error. Here's the traceback:

Traceback (most recent call last):
  File "../../src/cbBuild", line 9, in <module>
    cellbrowser.convertAndCopyCli()
  File "../../src/cbPyLib/cellbrowser.py", line 2393, in convertAndCopyCli
    convertAndCopy(confFnames, outDir, port)
  File "../../src/cbPyLib/cellbrowser.py", line 2364, in convertAndCopy
    convertDataset(inConf, outConf, datasetDir)
  File "../../src/cbPyLib/cellbrowser.py", line 2131, in convertDataset
    convertExprMatrix(inConf, outMatrixFname, outConf, sampleNames, geneToSym, datasetDir, needFilterMatrix)
  File "../../src/cbPyLib/cellbrowser.py", line 1827, in convertExprMatrix
    copyMatrixTrim(matrixFname, outMatrixFname, metaSampleNames, needFilterMatrix)
  File "../../src/cbPyLib/cellbrowser.py", line 1430, in copyMatrixTrim
    matIter.open(inFname)
  File "../../src/cbPyLib/cellbrowser.py", line 663, in open
    encoding='utf-8',
TypeError: __init__() got an unexpected keyword argument 'encoding'

From what I can tell, this is caused by having the shebang of cbBuild call for python2 to be used and the usage of encoding as an argument to subprocess.Popen, as introduced by #36. In my python v2.7.15 installation, subprocess.Popen does not have an encoding argument:

>>> inspect.getargs(subprocess.Popen.__init__.__code__)
Arguments(args=['self', 'args', 'bufsize', 'executable', 'stdin', 'stdout', 'stderr', 'preexec_fn', 'close_fds', 'shell', 'cwd', 'env', 'universal_newlines', 'startupinfo', 'creationflags'], varargs=None, keywords=None)

I can get around this error by using:

$ python3 ../../src/cbBuild -o ~/public_html/cb/ -p 8888

cbBuild v0.2 fails with sample1 dataset

Hi there,

For some reason the datasets I have been using with cellbrowser since a month or so ago stopped
working for us with the following error when running cbBuild (using v0.2):

2018-09-17T16:34:56.524288956Z INFO:root:Creating /export/galaxy-central/database/job_working_directory/000/31/dataset_36_files/sample
2018-09-17T16:34:56.525043483Z INFO:root:/export/database/job_working_directory/000/31/working/summary.html does not exist
2018-09-17T16:34:56.525102426Z INFO:root:/export/database/job_working_directory/000/31/working/methods.html does not exist
2018-09-17T16:34:56.525220367Z INFO:root:/export/database/job_working_directory/000/31/working/downloads.html does not exist
2018-09-17T16:34:56.525453936Z INFO:root:/export/database/job_working_directory/000/31/working/thumbnail.png does not exist
2018-09-17T16:34:56.525650515Z INFO:root:Getting md5 of /export/galaxy-central/database/files/000/dataset_33.dat
2018-09-17T16:34:56.589874633Z INFO:root:Creating /export/galaxy-central/database/job_working_directory/000/31/dataset_36_files/sample/metaFields
2018-09-17T16:34:56.589895612Z INFO:root:Checking and reordering meta data to /export/galaxy-central/database/job_working_directory/000/31/dataset_36_files/sample/meta.tsv
2018-09-17T16:34:56.589899082Z INFO:root:Reading sample names from /export/galaxy-central/database/files/000/dataset_33.dat
2018-09-17T16:34:56.602534967Z INFO:root:Reading headers of file /export/galaxy-central/database/files/000/dataset_35.dat
2018-09-17T16:34:56.613669673Z WARNING:root:2902 samples names are in the meta data, but not in the expression matrix. Examples: ['S36.C3', 'S92.D5', 'S2.D10', 'S114.H11', 'S114.B6', 'S131.C5', 'S184.G8', 'S94.F8', 'S32.A6', 'S222.B4']
2018-09-17T16:34:56.614792349Z WARNING:root:These samples will be removed from the meta data
2018-09-17T16:34:56.615269706Z INFO:root:Data contains 4261 samples/cells
2018-09-17T16:34:56.633090318Z INFO:root:Converting to numbers and compressing meta data fields
2018-09-17T16:34:56.633108027Z INFO:root:Meta data field index 0: 'Cell'
2018-09-17T16:34:56.69784884Z INFO:root:Type: uniqueString, 4261 different values
2018-09-17T16:34:56.698013476Z INFO:root:Meta data field index 1: 'WGCNAcluster'
2018-09-17T16:34:56.733551043Z INFO:root:Type: enum, 48 different values
2018-09-17T16:34:56.733598447Z INFO:root:Meta data field index 2: 'Name'
2018-09-17T16:34:56.782910478Z INFO:root:Type: enum, 48 different values
2018-09-17T16:34:56.782929858Z INFO:root:Meta data field index 3: 'Age'
2018-09-17T16:34:56.791846888Z INFO:root:Number of values per decile-bin: [289, 899, 720, 407, 459, 307, 289, 404, 417, 70]
2018-09-17T16:34:56.797273961Z INFO:root:Type: float, 29 different values
2018-09-17T16:34:56.797360832Z INFO:root:Meta data field index 4: 'RegionName'
2018-09-17T16:34:56.813704138Z INFO:root:Type: enum, 4 different values
2018-09-17T16:34:56.813722999Z INFO:root:Meta data field index 5: 'Laminae'
2018-09-17T16:34:56.878486659Z INFO:root:Type: enum, 7 different values
2018-09-17T16:34:56.878546838Z INFO:root:Meta data field index 6: 'Area'
2018-09-17T16:34:56.89320577Z INFO:root:Type: enum, 7 different values
2018-09-17T16:34:56.894613807Z INFO:root:Indexing meta file /export/galaxy-central/database/job_working_directory/000/31/dataset_36_files/sample/meta.tsv to /export/galaxy-central/database/job_working_directory/000/31/dataset_36_files/sample/meta.index
2018-09-17T16:34:56.995743217Z INFO:root:Kept 4261 cells present in both meta data file and expression matrix
2018-09-17T16:34:56.995767735Z INFO:root:Getting md5 of /export/galaxy-central/database/job_working_directory/000/31/dataset_36_files/sample/meta.tsv
2018-09-17T16:34:56.999302087Z INFO:root:Determining if /export/galaxy-central/database/job_working_directory/000/31/dataset_36_files/sample/exprMatrix.tsv.gz needs to be created
2018-09-17T16:34:56.99932641Z INFO:root:/export/galaxy-central/database/job_working_directory/000/31/dataset_36_files/sample/exprMatrix.tsv.gz does not exist.
2018-09-17T16:34:56.999344021Z INFO:root:Getting md5 of /export/galaxy-central/database/files/000/dataset_35.dat
2018-09-17T16:34:57.00930947Z INFO:root:Copying+reordering+trimming /export/galaxy-central/database/files/000/dataset_35.dat to /export/galaxy-central/database/job_working_directory/000/31/dataset_36_files/sample/exprMatrix.tsv.gz, keeping only the 4261 columns with a sample name in the meta data
2018-09-17T16:34:57.011749543Z INFO:root:Auto-detecting number type of /export/galaxy-central/database/files/000/dataset_35.dat
2018-09-17T16:34:57.013617475Z INFO:root:Numbers in matrix are of type 'float'
2018-09-17T16:34:57.979119576Z INFO:root:converting /export/galaxy-central/database/job_working_directory/000/31/dataset_36_files/sample/exprMatrix.tsv.gz to /export/galaxy-central/database/job_working_directory/000/31/dataset_36_files/sample/exprMatrix.bin and writing index to /export/galaxy-central/database/job_working_directory/000/31/dataset_36_files/sample/exprMatrix.json
2018-09-17T16:34:57.979141387Z INFO:root:Compressing gene expression vectors...
2018-09-17T16:34:57.983720239Z INFO:root:Auto-detecting number type of /export/galaxy-central/database/job_working_directory/000/31/dataset_36_files/sample/exprMatrix.tsv.gz
2018-09-17T16:34:57.984819571Z INFO:root:Numbers in matrix are of type 'float'
2018-09-17T16:34:58.211085954Z INFO:root:Getting md5 of /export/galaxy-central/database/job_working_directory/000/31/dataset_36_files/sample/exprMatrix.tsv.gz
2018-09-17T16:34:58.270150678Z INFO:root:Wrote /export/galaxy-central/database/job_working_directory/000/31/dataset_36_files/sample/cellbrowser.json.bak
2018-09-17T16:34:58.277730905Z Traceback (most recent call last):
2018-09-17T16:34:58.277756473Z   File "/usr/local/bin/cbBuild", line 10, in <module>
2018-09-17T16:34:58.27775973Z     cellbrowser.convertAndCopyCli()
2018-09-17T16:34:58.277762262Z   File "/usr/local/bin/cbPyLib/cellbrowser.py", line 2246, in convertAndCopyCli
2018-09-17T16:34:58.277764817Z     convertAndCopy(confFnames, outDir, port)
2018-09-17T16:34:58.27776716Z   File "/usr/local/bin/cbPyLib/cellbrowser.py", line 2220, in convertAndCopy
2018-09-17T16:34:58.277769633Z     convertDataset(inConf, outConf, datasetDir)
2018-09-17T16:34:58.277772084Z   File "/usr/local/bin/cbPyLib/cellbrowser.py", line 2062, in convertDataset
2018-09-17T16:34:58.277784502Z     writeConfig(inConf, outConf, datasetDir)
2018-09-17T16:34:58.277787168Z   File "/usr/local/bin/cbPyLib/cellbrowser.py", line 2178, in writeConfig
2018-09-17T16:34:58.277789623Z     json.dump(outConf, descJsonFh, indent=2)
2018-09-17T16:34:58.277791932Z   File "/usr/local/lib/python3.6/json/__init__.py", line 179, in dump
2018-09-17T16:34:58.277794342Z     for chunk in iterable:
2018-09-17T16:34:58.27779669Z   File "/usr/local/lib/python3.6/json/encoder.py", line 430, in _iterencode
2018-09-17T16:34:58.277799165Z     yield from _iterencode_dict(o, _current_indent_level)
2018-09-17T16:34:58.27780191Z   File "/usr/local/lib/python3.6/json/encoder.py", line 404, in _iterencode_dict
2018-09-17T16:34:58.277804503Z     yield from chunks
2018-09-17T16:34:58.277806857Z   File "/usr/local/lib/python3.6/json/encoder.py", line 404, in _iterencode_dict
2018-09-17T16:34:58.277809216Z     yield from chunks
2018-09-17T16:34:58.277811512Z   File "/usr/local/lib/python3.6/json/encoder.py", line 404, in _iterencode_dict
2018-09-17T16:34:58.277813907Z     yield from chunks
2018-09-17T16:34:58.277816211Z   File "/usr/local/lib/python3.6/json/encoder.py", line 437, in _iterencode
2018-09-17T16:34:58.27781863Z     o = _default(o)
2018-09-17T16:34:58.277820957Z   File "/usr/local/lib/python3.6/json/encoder.py", line 180, in default
2018-09-17T16:34:58.277823361Z     o.__class__.__name__)
2018-09-17T16:34:58.277825713Z TypeError: Object of type 'bytes' is not JSON serializable

I made sure all was fine by re-downloading dataset files (exprMatrix.tsv, meta.tsv, tsne.coords.tsv), but still same issue. Thanks!

For our current functionality (we would like to show this working to someone on the 20/09), would it be possible to tag a20c4d0533a623f6d2b2b357ef94d75a2b87c569 as v0.1.9 or something prior to v0.2. I think that that was the last version that worked for us. Then we can use it to build a container from that while issues are sorted.

Thanks!
Pablo

Example build fails on mac

Hi

I'm on MacOS with python 3.6, and the example command ../../src/cbBuild -o ~/tmp/blah -p8888 in the sampleData/sample1 folder fails with

INFO:root:Copying+reordering+trimming /Users/markov/Documents/Misharin Cell Browser/cellBrowser/sampleData/sample1/exprMatrix.tsv.gz to /Users/markov/tmp/blah/sample/exprMatrix.tsv.gz, keeping only the 4261 columns with a sample name in the meta data
Traceback (most recent call last):
  File "../../src/cbBuild", line 10, in <module>
    cellbrowser.convertAndCopyCli()
  File "../../src/cbPyLib/cellbrowser.py", line 2348, in convertAndCopyCli
    convertAndCopy(confFnames, outDir, port)
  File "../../src/cbPyLib/cellbrowser.py", line 2322, in convertAndCopy
    convertDataset(inConf, outConf, datasetDir)
  File "../../src/cbPyLib/cellbrowser.py", line 2091, in convertDataset
    convertExprMatrix(inConf, outMatrixFname, outConf, sampleNames, geneToSym, datasetDir, needFilterMatrix)
  File "../../src/cbPyLib/cellbrowser.py", line 1792, in convertExprMatrix
    copyMatrixTrim(matrixFname, outMatrixFname, metaSampleNames, needFilterMatrix)
  File "../../src/cbPyLib/cellbrowser.py", line 1396, in copyMatrixTrim
    matIter.open(inFname)
  File "../../src/cbPyLib/cellbrowser.py", line 657, in open
    assert(len(self.sampleNames)!=0)
AssertionError

make default color grey

for values like empty or "none" and for zoomouts or any other situation when there is no color, use "light-grey", not black.

turn off the on-screen labels

Is it possible to turn off the on-screen labels? (Keep the legend labels, but not the cluster labels). Here's an example of unhelpful labels.

example unhelpful labels

Zoom out past 100%

I'd like to be able to zoom out more than "100%". Viewing the clustering from a farther viewpoint can help with visual pattern identification.

'Find Cells' doesn't work

@maximilianh When I open the 'Find Cells' pop-up from the 'Edit' drop-down, I can't seem to actually find any cells after selecting a field and entering a filter value. If I hit 'enter' on my keyboard or click 'OK' nothing happens. The pop-up doesn't close and no filter is applied.

Steps to reproduce:

  • Open 'cortex-dev' dataset
  • Under 'Edit' drop-down click 'Find Cells'
  • For 'cell annotation field' select 'Age_in_Weeks' and set the next drop-down to be 'is greater than' and the value to be '21.00'
  • Hit 'enter' on the keyboard or click 'ok'
  • Notice that nothing happens

select by cell ID

allow user to paste in a couple cell IDs to select. I think also suggested by Tim and Aparna before

Scanpy cbBuild Assertion Error

Hi Max,

I'm receiving an assertion error when attempting to build a browser. I didn't receive errors when I generated the cellbrowser.conf file using a Scanpy object. I also double checked the Scanpy object and the cluster metadata and coordinates are present.

Here is an example of the command and error message I am receiving:

./cbBuild -i ../../single_cell/chi/cellBrowserOut/combined_EF13W3D/cellbrowser.conf -o CBOUT

INFO:root:/projects/sysbio/users/apblair/single_cell/chi/cellBrowserOut/combined_EF13W3D/summary.html does not exist
INFO:root:/projects/sysbio/users/apblair/single_cell/chi/cellBrowserOut/combined_EF13W3D/methods.html does not exist
INFO:root:/projects/sysbio/users/apblair/single_cell/chi/cellBrowserOut/combined_EF13W3D/downloads.html does not exist
INFO:root:/projects/sysbio/users/apblair/single_cell/chi/cellBrowserOut/combined_EF13W3D/thumb.png does not exist
INFO:root:Getting md5 of /projects/sysbio/users/apblair/single_cell/chi/cellBrowserOut/combined_EF13W3D/cell_to_cluster.tsv
md5sum: /projects/sysbio/users/apblair/single_cell/chi/cellBrowserOut/combined_EF13W3D/cell_to_cluster.tsv: No such file or directory
Traceback (most recent call last):
File "./cbBuild", line 9, in
cellbrowser.convertAndCopyCli()
File "./cbPyLib/cellbrowser.py", line 2393, in convertAndCopyCli
convertAndCopy(confFnames, outDir, port)
File "./cbPyLib/cellbrowser.py", line 2364, in convertAndCopy
convertDataset(inConf, outConf, datasetDir)
File "./cbPyLib/cellbrowser.py", line 2120, in convertDataset
sampleNames, needFilterMatrix, outMeta = convertMeta(inConf, outConf, datasetDir)
File "./cbPyLib/cellbrowser.py", line 1954, in convertMeta
outConf["fileVersions"]["inMeta"] = getFileVersion(metaFname)
File "./cbPyLib/cellbrowser.py", line 1940, in getFileVersion
hexHash = md5ForFile(fname).decode("ascii")
File "./cbPyLib/cellbrowser.py", line 2035, in md5ForFile
md5 = getMd5Using("md5sum", fname).split()[0]
File "./cbPyLib/cellbrowser.py", line 2028, in getMd5Using
assert(err==0)
AssertionError

Thanks again for your help! :)

cbBuild 0.25 fails with sample 1 data

Execution as described in #24 fails with the following error:

2018-09-17T20:44:07.682197743Z INFO:root:/export/database/job_working_directory/000/32/working/summary.html does not exist
2018-09-17T20:44:07.682315009Z INFO:root:/export/database/job_working_directory/000/32/working/methods.html does not exist
2018-09-17T20:44:07.682319046Z INFO:root:/export/database/job_working_directory/000/32/working/downloads.html does not exist
2018-09-17T20:44:07.68232157Z INFO:root:/export/database/job_working_directory/000/32/working/thumb.png does not exist
2018-09-17T20:44:07.682325747Z INFO:root:Getting md5 of /export/galaxy-central/database/files/000/dataset_33.dat
2018-09-17T20:44:07.686529535Z INFO:root:Checking and reordering meta data to /export/galaxy-central/database/job_working_directory/000/32/dataset_37_files/sample/meta.tsv
2018-09-17T20:44:07.686593111Z INFO:root:Reading sample names from /export/galaxy-central/database/files/000/dataset_33.dat
2018-09-17T20:44:07.695780892Z INFO:root:Reading headers of file /export/galaxy-central/database/files/000/dataset_35.dat
2018-09-17T20:44:07.699772794Z WARNING:root:2902 samples names are in the meta data, but not in the expression matrix. Examples: ['S49.G5', 'S35.H2', 'S23.C10', 'S49.F10', 'S27.E6', 'S162.C2', 'S21.H2', 'S44.C10', 'S57.H3', 'S49.D11']
2018-09-17T20:44:07.699798257Z WARNING:root:These samples will be removed from the meta data
2018-09-17T20:44:07.700231588Z INFO:root:Data contains 4261 samples/cells
2018-09-17T20:44:07.758384687Z INFO:root:Converting to numbers and compressing meta data fields
2018-09-17T20:44:07.765759419Z INFO:root:Meta data field index 0: 'Cell'
2018-09-17T20:44:07.788543139Z INFO:root:Type: uniqueString, 4261 different values
2018-09-17T20:44:07.788562019Z INFO:root:Meta data field index 1: 'WGCNAcluster'
2018-09-17T20:44:07.804685374Z INFO:root:Type: enum, 48 different values
2018-09-17T20:44:07.804702719Z INFO:root:Meta data field index 2: 'Name'
2018-09-17T20:44:07.87122994Z INFO:root:Type: enum, 48 different values
2018-09-17T20:44:07.871249785Z INFO:root:Meta data field index 3: 'Age'
2018-09-17T20:44:07.87935809Z INFO:root:Number of values per decile-bin: [289, 899, 720, 407, 459, 307, 289, 404, 417, 70]
2018-09-17T20:44:07.884437089Z INFO:root:Type: float, 29 different values
2018-09-17T20:44:07.884466143Z INFO:root:Meta data field index 4: 'RegionName'
2018-09-17T20:44:07.900016901Z INFO:root:Type: enum, 4 different values
2018-09-17T20:44:07.900041781Z INFO:root:Meta data field index 5: 'Laminae'
2018-09-17T20:44:07.967187537Z INFO:root:Type: enum, 7 different values
2018-09-17T20:44:07.967389047Z INFO:root:Meta data field index 6: 'Area'
2018-09-17T20:44:07.982237946Z INFO:root:Type: enum, 7 different values
2018-09-17T20:44:07.98376565Z INFO:root:Indexing meta file /export/galaxy-central/database/job_working_directory/000/32/dataset_37_files/sample/meta.tsv to /export/galaxy-central/database/job_working_directory/000/32/dataset_37_files/sample/meta.index
2018-09-17T20:44:08.084167907Z INFO:root:Kept 4261 cells present in both meta data file and expression matrix
2018-09-17T20:44:08.084201346Z INFO:root:Getting md5 of /export/galaxy-central/database/job_working_directory/000/32/dataset_37_files/sample/meta.tsv
2018-09-17T20:44:08.08741232Z INFO:root:Determining if /export/galaxy-central/database/job_working_directory/000/32/dataset_37_files/sample/exprMatrix.tsv.gz needs to be created
2018-09-17T20:44:08.094838318Z INFO:root:Reading headers of file /export/galaxy-central/database/job_working_directory/000/32/dataset_37_files/sample/exprMatrix.tsv.gz
2018-09-17T20:44:08.097394389Z INFO:root:current input matrix looks identical to previously processed matrix, same file size, same sample names
2018-09-17T20:44:08.097417985Z INFO:root:Matrix and meta sample names have not changed, not indexing matrix again
2018-09-17T20:44:08.097421065Z INFO:root:Parsing coordinates from /export/galaxy-central/database/files/000/dataset_35.dat. FlipY=False, useTwoBytes=False
2018-09-17T20:44:08.385412545Z Traceback (most recent call last):
2018-09-17T20:44:08.38544231Z   File "/usr/local/bin/cbBuild", line 10, in <module>
2018-09-17T20:44:08.385447578Z     cellbrowser.convertAndCopyCli()
2018-09-17T20:44:08.385451124Z   File "/usr/local/bin/cbPyLib/cellbrowser.py", line 2341, in convertAndCopyCli
2018-09-17T20:44:08.385454678Z     convertAndCopy(confFnames, outDir, port)
2018-09-17T20:44:08.385457992Z   File "/usr/local/bin/cbPyLib/cellbrowser.py", line 2315, in convertAndCopy
2018-09-17T20:44:08.385461543Z     convertDataset(inConf, outConf, datasetDir)
2018-09-17T20:44:08.385464798Z   File "/usr/local/bin/cbPyLib/cellbrowser.py", line 2090, in convertDataset
2018-09-17T20:44:08.385468118Z     convertCoords(inConf, outConf, sampleNames, outMeta, datasetDir)
2018-09-17T20:44:08.385471601Z   File "/usr/local/bin/cbPyLib/cellbrowser.py", line 1828, in convertCoords
2018-09-17T20:44:08.385474808Z     coords = parseScaleCoordsAsDict(coordFname, useTwoBytes, flipY)
2018-09-17T20:44:08.385478054Z   File "/usr/local/bin/cbPyLib/cellbrowser.py", line 1216, in parseScaleCoordsAsDict
2018-09-17T20:44:08.385481712Z     for row in lineFileNextRow(fname):
2018-09-17T20:44:08.385484934Z   File "/usr/local/bin/cbPyLib/cellbrowser.py", line 181, in lineFileNextRow
2018-09-17T20:44:08.385488389Z     Record = namedtuple('tsvRec', headers)
2018-09-17T20:44:08.385491697Z   File "/usr/local/lib/python3.6/collections/__init__.py", line 429, in namedtuple
2018-09-17T20:44:08.385496089Z     exec(class_definition, namespace)
2018-09-17T20:44:08.38549937Z   File "<string>", line 12
2018-09-17T20:44:08.385502941Z SyntaxError: more than 255 arguments

panning

Add a way to pan when zoomed - medium priority

When generating a browser (or manually afterwards), be able to choose which meta field to color cells on in the default view

Right now, when a new user loads the browser, the cells are colored by "Seurat Cluster".

I would like to be able to configure the browser when it is being set up such that I can choose a different default field to color the cells.

For example, I would like to be able to configure the browser so that the cells are colored by "disease" when when a new user loads the browser.

This could be via a parameter to the cpPrep, or by manually editing a file (config.json ? index.html?) after creation, or whatever makes sense.

frequency counts overlie legend labels

In the legend, sometimes the text for a color is in the same location as the number indicating how many items have that value, like breast invasive carcinoma in the attached image.

image

Sample Dataset TypeError: 'NoneType'

Running into an error when attempting to build a viewer with sample dataset in sampleData/sample1.

Traceback (most recent call last):
File "../../src/cbBuild", line 10, in
cellbrowser.convertAndCopyCli()
File "../../src/cbPyLib/cellbrowser.py", line 2145, in convertAndCopyCli
convertAndCopy(confFnames, outDir, port)
File "../../src/cbPyLib/cellbrowser.py", line 2119, in convertAndCopy
convertDataset(inConf, outConf, datasetDir)
File "../../src/cbPyLib/cellbrowser.py", line 1962, in convertDataset
convertExprMatrix(inConf, outMatrixFname, outConf, sampleNames, geneToSym, datasetDir, needFilterMatrix)
File "../../src/cbPyLib/cellbrowser.py", line 1696, in convertExprMatrix
matType = matrixToBin(outMatrixFname, geneToSym, binMat, binMatIndex, discretBinMat, discretMatrixIndex)
File "../../src/cbPyLib/cellbrowser.py", line 1035, in matrixToBin
matType, sampleNames = matReader.open(fname)
TypeError: 'NoneType' object is not iterable

color by age

Holly says:

i want to be able to color by quantitative value like age

cbScanpy --init doesn't exist

cbScanpy --init
Usage: cbScanpy [options] -e matrixFile -o outDir - run scanpy and output .tsv files
    If exceptions occur, will automatically start the debugger.
cbScanpy: error: no such option: --init

Version:

pip list | grep cell
cellbrowser      0.4.4

Usage statement

cbScanpy -h
Usage: cbScanpy [options] -e matrixFile -o outDir - run scanpy and output .tsv files

    If exceptions occur, will automatically start the debugger.


Options:
  -h, --help            show this help message and exit
  -e EXPRMATRIX, --exprMatrix=EXPRMATRIX
                        gene-cell expression matrix file, possible formats:
                        .h5ad, .csv, .xlsx, .h5, .loom, .mtx, .txt, .tab,
                        .data
  -o OUTDIR, --outDir=OUTDIR
                        output directory
  -c CONFFNAME, --confFname=CONFFNAME
                        config file from which settings are read, default is
                        scanpy.conf
  -s, --samplesOnRows   when reading the expression matrix from a text file,
                        assume that samples are on lines (default behavior is
                        one-gene-per-line, one-sample-per-column)
  -n NAME, --name=NAME  name of dataset in cell browser, default cbScanpy-Data
  --test                run doctests
  -d, --debug           open an iPython shell when an exception occurs. also
                        output debug messages

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.