eiriniar / cellcnn Goto Github PK

Representation Learning for detection of phenotype-associated cell subsets

Home Page: http://www.imsb.ethz.ch/research/claassen/Software/cellcnn.html

License: GNU General Public License v3.0

Python 3.60% Jupyter Notebook 96.40%

cellcnn's Issues

Unable to plot figures using the NK example files

Thank you for developing this very interesting ML tool for cytof data analysis. I am new to Python and keen to utilise your algorithm for my dataset.

However when trying to run using the example NK cells data set, I keep getting this error code, I wonder if you could advise further.

run run_analysis.py -f NK_fcs_samples_with_labels.csv -m NK_markers.csv -i gated_NK/ -o outdir_NK --export_csv --group_a CMV- --group_b CMV+ --verbose 0

2021-02-23 13:44:11 - main:156 - INFO - Samples used for model training: ['a_001_NK.fcs', 'a_002_NK.fcs', 'a_003_NK.fcs', 'a_004_NK.fcs', 'a_006_NK.fcs', 'a_009_NK.fcs', 'a_010_NK.fcs', 'a_011_NK.fcs', 'a_012_NK.fcs', 'a_1a_NK.fcs', 'a_2a_NK.fcs', 'a_2b_NK.fcs', 'a_4a_NK.fcs', 'a_4b_NK.fcs', 'a_5a_NK.fcs']
2021-02-23 13:44:11 - main:157 - INFO - Samples used for validation: ['a_005_NK.fcs', 'a_007_NK.fcs', 'a_3a_NK.fcs', 'a_3b_NK.fcs', 'a_5b_NK.fcs']
2021-02-23 13:44:12 - cellCnn.model:320 - INFO - Generating multi-cell inputs...
2021-02-23 13:44:12 - cellCnn.model:390 - INFO - Done.
2021-02-23 13:44:12 - cellCnn.model:425 - INFO - Number of filters: 3
2021-02-23 13:44:12 - cellCnn.model:431 - INFO - Cells pooled: 1
63/63 [==============================] - 0s 1ms/step - loss: 0.6482 - accuracy: 0.6768
2021-02-23 13:44:16 - cellCnn.model:460 - INFO - Best validation accuracy: 0.68
2021-02-23 13:44:16 - cellCnn.model:425 - INFO - Number of filters: 6
2021-02-23 13:44:16 - cellCnn.model:431 - INFO - Cells pooled: 2
63/63 [==============================] - 0s 1ms/step - loss: 0.4967 - accuracy: 0.7944
2021-02-23 13:44:25 - cellCnn.model:460 - INFO - Best validation accuracy: 0.79
2021-02-23 13:44:25 - cellCnn.model:425 - INFO - Number of filters: 8
2021-02-23 13:44:25 - cellCnn.model:431 - INFO - Cells pooled: 10
63/63 [==============================] - 0s 1ms/step - loss: 0.3701 - accuracy: 0.8989
2021-02-23 13:44:34 - cellCnn.model:460 - INFO - Best validation accuracy: 0.90
2021-02-23 13:44:34 - cellCnn.model:425 - INFO - Number of filters: 4
2021-02-23 13:44:34 - cellCnn.model:431 - INFO - Cells pooled: 40
63/63 [==============================] - 0s 1ms/step - loss: 0.6090 - accuracy: 0.7579
2021-02-23 13:44:38 - cellCnn.model:460 - INFO - Best validation accuracy: 0.76
2021-02-23 13:44:38 - cellCnn.model:425 - INFO - Number of filters: 6
2021-02-23 13:44:38 - cellCnn.model:431 - INFO - Cells pooled: 1
63/63 [==============================] - 0s 1ms/step - loss: 0.5279 - accuracy: 0.7724
2021-02-23 13:44:45 - cellCnn.model:460 - INFO - Best validation accuracy: 0.77
2021-02-23 13:44:45 - cellCnn.model:425 - INFO - Number of filters: 7
2021-02-23 13:44:45 - cellCnn.model:431 - INFO - Cells pooled: 2
63/63 [==============================] - 0s 1ms/step - loss: 0.6276 - accuracy: 0.7114
2021-02-23 13:44:50 - cellCnn.model:460 - INFO - Best validation accuracy: 0.71
2021-02-23 13:44:50 - cellCnn.model:425 - INFO - Number of filters: 9
2021-02-23 13:44:50 - cellCnn.model:431 - INFO - Cells pooled: 10
63/63 [==============================] - 0s 1ms/step - loss: 0.4519 - accuracy: 0.8374
2021-02-23 13:44:58 - cellCnn.model:460 - INFO - Best validation accuracy: 0.84
2021-02-23 13:44:58 - cellCnn.model:425 - INFO - Number of filters: 4
2021-02-23 13:44:58 - cellCnn.model:431 - INFO - Cells pooled: 40
63/63 [==============================] - 0s 1ms/step - loss: 0.3997 - accuracy: 0.9390
2021-02-23 13:45:03 - cellCnn.model:460 - INFO - Best validation accuracy: 0.94
2021-02-23 13:45:03 - cellCnn.model:425 - INFO - Number of filters: 5
2021-02-23 13:45:03 - cellCnn.model:431 - INFO - Cells pooled: 1
63/63 [==============================] - 0s 2ms/step - loss: 0.6324 - accuracy: 0.8139
2021-02-23 13:45:07 - cellCnn.model:460 - INFO - Best validation accuracy: 0.81
2021-02-23 13:45:07 - cellCnn.model:425 - INFO - Number of filters: 8
2021-02-23 13:45:07 - cellCnn.model:431 - INFO - Cells pooled: 2
63/63 [==============================] - 0s 1ms/step - loss: 0.4493 - accuracy: 0.8229
2021-02-23 13:45:17 - cellCnn.model:460 - INFO - Best validation accuracy: 0.82
2021-02-23 13:45:17 - cellCnn.model:425 - INFO - Number of filters: 5
2021-02-23 13:45:17 - cellCnn.model:431 - INFO - Cells pooled: 10
63/63 [==============================] - 0s 1ms/step - loss: 0.6025 - accuracy: 0.7044
2021-02-23 13:45:22 - cellCnn.model:460 - INFO - Best validation accuracy: 0.70
2021-02-23 13:45:22 - cellCnn.model:425 - INFO - Number of filters: 7
2021-02-23 13:45:22 - cellCnn.model:431 - INFO - Cells pooled: 40
63/63 [==============================] - 0s 1ms/step - loss: 0.4441 - accuracy: 0.8574
2021-02-23 13:45:29 - cellCnn.model:460 - INFO - Best validation accuracy: 0.86
2021-02-23 13:45:29 - cellCnn.model:425 - INFO - Number of filters: 7
2021-02-23 13:45:29 - cellCnn.model:431 - INFO - Cells pooled: 1
63/63 [==============================] - 0s 1ms/step - loss: 0.4593 - accuracy: 0.8219
2021-02-23 13:45:37 - cellCnn.model:460 - INFO - Best validation accuracy: 0.82
2021-02-23 13:45:37 - cellCnn.model:425 - INFO - Number of filters: 4
2021-02-23 13:45:37 - cellCnn.model:431 - INFO - Cells pooled: 2
63/63 [==============================] - 0s 1ms/step - loss: 0.5350 - accuracy: 0.7729
2021-02-23 13:45:42 - cellCnn.model:460 - INFO - Best validation accuracy: 0.77
2021-02-23 13:45:42 - cellCnn.model:425 - INFO - Number of filters: 6
2021-02-23 13:45:42 - cellCnn.model:431 - INFO - Cells pooled: 10
63/63 [==============================] - 0s 1ms/step - loss: 0.2961 - accuracy: 0.9450
2021-02-23 13:45:51 - cellCnn.model:460 - INFO - Best validation accuracy: 0.94
2021-02-23 13:45:53 - cellCnn.plotting:144 - INFO - Loading the weights of consensus filters.
2021-02-23 13:45:53 - cellCnn.plotting:168 - INFO - Computing t-SNE projection...
C:\Users\yeong\AppData\Local\Programs\Python\Python37\Scripts\CellCnn-python3\cellCnn\plotting.py:582: MatplotlibDeprecationWarning:
The 'add_all' parameter of init() was deprecated in Matplotlib 3.3 and will be removed two minor releases later. If any parameter follows 'add_all', they should be passed as keyword, not positionally.
cbar_pad="5%",

TypeError Traceback (most recent call last)
~\AppData\Local\Programs\Python\Python37\Scripts\CellCnn-python3\run_analysis.py in
240 if name == 'main':
241 try:
--> 242 main()
243 except KeyboardInterrupt:
244 sys.stderr.write("User interrupt!\n")

~\AppData\Local\Programs\Python\Python37\Scripts\CellCnn-python3\run_analysis.py in main()
209 tsne_ncell=args.tsne_ncell,
210 regression=args.regression,
--> 211 show_filters=False)
212 _v = plot_results(results, valid_samples, valid_phenotypes,
213 marker_names, os.path.join(plotdir, 'validation_plots'),

~\AppData\Local\Programs\Python\Python37\Scripts\CellCnn-python3\cellCnn\plotting.py in plot_results(results, samples, phenotypes, labels, outdir, filter_diff_thres, filter_response_thres, response_grad_cutoff, stat_test, log_yscale, group_a, group_b, group_names, tsne_ncell, regression, show_filters)
176 fig_path = os.path.join(outdir, 'tsne_all_cells')
177 plot_tsne_grid(x_tsne, x_for_tsne, fig_path, labels=labels, fig_size=(20, 20),
--> 178 point_size=5)
179
180 return_filters = []

~\AppData\Local\Programs\Python\Python37\Scripts\CellCnn-python3\cellCnn\plotting.py in plot_tsne_grid(z, x, fig_path, labels, fig_size, g_j, suffix, point_size)
580 cbar_mode="each",
581 cbar_size="8%",
--> 582 cbar_pad="5%",
583 )
584 for seq_index in range(ncol):

c:\users\yeong\appdata\local\programs\python\python37\lib\site-packages\matplotlib\cbook\deprecation.py in wrapper(*inner_args, **inner_kwargs)
409 else deprecation_addendum,
410 **kwargs)
--> 411 return func(*inner_args, **inner_kwargs)
412
413 return wrapper

c:\users\yeong\appdata\local\programs\python\python37\lib\site-packages\mpl_toolkits\axes_grid1\axes_grid.py in init(self, fig, rect, nrows_ncols, ngrids, direction, axes_pad, add_all, share_all, aspect, label_mode, cbar_mode, cbar_location, cbar_pad, cbar_size, cbar_set_cax, axes_class)
434 direction=direction, axes_pad=axes_pad,
435 share_all=share_all, share_x=True, share_y=True, aspect=aspect,
--> 436 label_mode=label_mode, axes_class=axes_class)
437 else: # Only show deprecation in that case.
438 super().init(

c:\users\yeong\appdata\local\programs\python\python37\lib\site-packages\mpl_toolkits\axes_grid1\axes_grid.py in init(self, fig, rect, nrows_ncols, ngrids, direction, axes_pad, add_all, share_all, share_x, share_y, label_mode, axes_class, aspect)
210 if add_all:
211 for ax in self.axes_all:
--> 212 fig.add_axes(ax)
213
214 self.set_label_mode(label_mode)

c:\users\yeong\appdata\local\programs\python\python37\lib\site-packages\matplotlib\figure.py in add_axes(self, *args, **kwargs)
1234 else:
1235 rect = args[0]
-> 1236 if not np.isfinite(rect).all():
1237 raise ValueError('all entries in rect must be finite '
1238 'not {}'.format(rect))

TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Datasets

Hello,

Can you share datasets on any other external provider, for example Zenodo? https://zenodo.org/
I have tried to download the data already several times without success.

Thank you!

NameError: global name 'l1l2' is not defined in Model.py

Hi eriniar,

I was looking for some help with this error. I have tried running on commandline and ipython yet somehow get caught with this same error. I have tried changing l1l2 to l1_l2 due to a change in the Keras API and modifying model.py to include keras.regularizers.l1_l2 and yet still get the error shown below regardless of the change. Have you encountered such an error? Thanks in advance!

Generating multi-cell inputs...
Done.
Number of filters: 7
Cells pooled: 1

NameError Traceback (most recent call last)
.
.
.
NameError: global name 'l1l2' is not defined

ValueError

Traceback (most recent call last):
File "/Users/shane/Documents/tools/CellCnn/cellCnn/run_analysis.py", line 242, in
main()
File "/Users/shane/Documents/tools/CellCnn/cellCnn/run_analysis.py", line 185, in main
outdir=args.outdir)
File "/Users/shane/Documents/tools/CellCnn/cellCnn/model.py", line 167, in fit
accur_thres=self.accur_thres, verbose=self.verbose)
File "/Users/shane/Documents/tools/CellCnn/cellCnn/model.py", line 484, in train_model
w_best_net = keras_param_vector(best_net)
File "/Users/shane/Documents/tools/CellCnn/cellCnn/utils.py", line 117, in keras_param_vector
W_tot = np.hstack([W.T, b.reshape(-1, 1), W_out])
File "<array_function internals>", line 6, in hstack
File "/Users/shane/.local/share/virtualenvs/CellCnn-MbiXJHLq/lib/python3.7/site-packages/numpy/core/shape_base.py", line 344, in hstack
return _nx.concatenate(arrs, 0)
File "<array_function internals>", line 6, in concatenate
ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 1 dimension(s) and the array at index 1 has 2 dimension(s)

Data set cannot be downloaded #6 The site cannot be accessed

Thank you for designing this algorithm for cells. I'm new to Python and wanted to try it, but I couldn't download the data set.
I wonder if you can provide relevant data sets.
You can email me at [email protected]

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

Hi, I'm writing from the University of Colorado where we're hoping to use CellCnn to understand immune dysregulation in individuals with Down syndrome (http://www.trisome.org/). I've been able to successfully run CellCnn on a small subset of samples (8 with Trisomy 21, 8 without), with each FCS file subsampled down to only 1000 events. I'm encountering the following error when I attempt to run the analysis on our full cohort (n=292 with Trisomy 21, n=96 without Trisomy 21). Our panel includes 34 markers relevant among CD45+/CD66lo cells. Can you help me understand what could trigger this NaN/Infinity error message, or recommend some things I should try to fix it? Thanks so much.

(CellCnn) bash-3.2$ python /usr/local/bin/CellCnn/cellCnn/run_analysis.py \
> --seed 1234 \
> -f '/Users/shawjes/Dropbox/EspinosaGroup/ANALYSIS/CyTOF/P4C/Unsupervised_Analysis/CellCNN/P4C_CellCNN_InputFiles/P4C_CyTOF_051121_Samples_with_Labels_for_CellCnn_CD45posCD66lo_Subsample45k_v0.1_JRS.csv' \
> -m '/Users/shawjes/Dropbox/EspinosaGroup/ANALYSIS/CyTOF/P4C/Unsupervised_Analysis/CellCNN/P4C_CellCNN_InputFiles/P4C_CyTOF_051121_Markers_for_CellCnn_among_CD45posCD66lo_Subsample45k_v0.1_JRS.csv' \
> -i '/Users/shawjes/Dropbox/EspinosaGroup/P4C_CyTOF/CellCNN/CyTOF_P4C_P95batch_normalized_FSC_files (PA gates modified flowJo-New Bcells gate) - Gated Populations_CD45+CD66lo/Subsample45k/' \
> -o '/Users/shawjes/Dropbox/EspinosaGroup/ANALYSIS/CyTOF/P4C/Unsupervised_Analysis/CellCNN/051121_Out_AllSamples_CD45posCD66lo_Subsample45kEvents_noarcsinh' \
> --no_arcsinh \
> --export_csv \
> --group_a D21 --group_b T21 \
> --export_csv \
> --stat_test mannwhitneyu \
> --verbose 0
Traceback (most recent call last):
  File "/usr/local/bin/CellCnn/cellCnn/run_analysis.py", line 242, in <module>
    main()
  File "/usr/local/bin/CellCnn/cellCnn/run_analysis.py", line 149, in main
    train, val = next(skf.split(np.zeros((len(phenotypes), 1)), phenotypes))
  File "/Users/shawjes/.local/share/virtualenvs/CellCnn-gWcf5gBq/lib/python3.7/site-packages/sklearn/model_selection/_split.py", line 735, in split
    y = check_array(y, ensure_2d=False, dtype=None)
  File "/Users/shawjes/.local/share/virtualenvs/CellCnn-gWcf5gBq/lib/python3.7/site-packages/sklearn/utils/validation.py", line 73, in inner_f
    return f(**kwargs)
  File "/Users/shawjes/.local/share/virtualenvs/CellCnn-gWcf5gBq/lib/python3.7/site-packages/sklearn/utils/validation.py", line 646, in check_array
    allow_nan=force_all_finite == 'allow-nan')
  File "/Users/shawjes/.local/share/virtualenvs/CellCnn-gWcf5gBq/lib/python3.7/site-packages/sklearn/utils/validation.py", line 100, in _assert_all_finite
    msg_dtype if msg_dtype is not None else X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

Possible error of make-custom-legend-in-matplotlib

Greetings!
I noticed that in cellCnn/plotting.py, the code referred a StackOverflow post, while a recent comment to it pointed out that this does not work anymore to create multiple dots for a legend entry. It also provided a solution.
I'm trying to learn the usefulness of such small updates on StackOverflow. Does the comment make sense to you? Would this comment help improve your code? I understand that such improvement might not be helpful in real life situation. In that case, do you think this comment can help prevent future bugs (for example, when the code were reused somewhere else)?
I'll really appreciate it if you could kindly give me some feedback or suggestions. Thank you very much for your time.
Have a nice day!

Trouble running the toy example

Hi May I get some help with going through the tutorial? I am continuously getting the error attached and not sure how to resolve it. Thanks!

Regarding Model Parameters

Hi, may I ask in the screenshot below, how are the number of samples in each epoch determined? If the "multi-cell inputs" correspond to the size of each mini-batch, and the "n-subsets" correspond to the number of "multi-cell inputs" we have, then shouldn't the number of samples per epoch be the "n-subset"?

Support for Python 3

Are there any plans to update the package to support Python 3 now that Python 2 has reached end of life?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.