According to the <a href="https://github.com/CamaraLab/STvEA/blob/master/examples/code

CODEX input about stvea HOT 7 CLOSED

camaralab commented on June 1, 2024

CODEX input

from stvea.

Comments (7)

govekk commented on June 1, 2024 1

The FCS files we started with were downloaded from the original CODEX paper (https://www.cell.com/cell/fulltext/S0092-8674(18)30904-8), so I believe they should be the same as the output from the CODEX Processor, though I haven't used that. We converted the FCS files directly to CSV, so there shouldn't be any extra information.

The codex_protein object is all of the protein count columns from the CSV.
The codex_spatial object takes the information from the "tile_nr.tile_nr", "X.X", "Y.Y", "Z.Z" columns to give absolute x,y,z coordinates for each cell. We calculated the x and y using the following:
x <- floor((tile_nr.tile_nr-1)/9) * max(X.X) + X.X
and the z is just Z.Z.

The codex_size object is the column titled "size.size", and the codex_blank object is all columns titled "blank", i.e. "blank_Cy3_cyc15". If your dataset doesn't include these, I would advise you to create dummy matrices (i.e. of all 1s) and later adjust the parameters to FilterCODEX() so that no cells get removed.

Based on issues you and other users have submitted, we will likely implement a more clear data input process, but I hope this information helps until we make those changes.

from stvea.

igordot commented on June 1, 2024

Thank you for clarifying! That was very helpful. The original data source from that paper (http://welikesharingdata.blob.core.windows.net/forshare/index.html) is no longer available, so it's difficult to confirm. Also, that was an early implementation of the technology, so I wasn't sure if the CODEX Processor is replicating that original workflow or if they introduced some substantial modifications.

from stvea.

igordot commented on June 1, 2024

I was able to import my own data and run the main functions without any obvious issues. I just have some followup.

I don't think it's necessary to adjust the x and y coordinates with the current FCS/CSV files. I assume each tile previously had independent coordinates (cells in different tiles could have identical x and y coordinates), but now that is not the case (x and y coordinates are for the entire region).

I also checked the input data. The provided codex_protein data frame includes positive and negative values:

The current FCS/CSV files have only positive values:

Do the expression values need to be adjusted?

from stvea.

govekk commented on June 1, 2024

Thank you for this helpful information. You are correct that previously the xy coordinates were relative to each tile. If the coordinates are now over the entire region, there is no need to convert them. You may still wish to convert the units from pixels to nm so that the different size of the z slices is taken into account.

It is good to know that now the FCS output of CODEX does not contain negative values. We believe the negative/positive range of the original FCS files is from their spillover correction - we did not do any preprocessing to cause that.

The new distribution might cause the CleanCODEX function in STvEA to produce unexpected results or fail to fit entirely, since it is attempting to fit a Gaussian mixture. We have found that a Gaussian can work fine on some non-negative protein expression, but generally negative binomial mixtures (as in the CleanCITE function) better fit non-negative expression distributions. However, we have not yet implemented an NB mixture in the CODEX functions. If you care to try that on CODEX, you may find the FitNB function helpful. I would be interested in hearing how the Gaussian or NB fits work on your data.

Meanwhile, we will work on adding more options for fitting different distributions and more transparency in the fit success in the CleanCODEX function.

from stvea.

govekk commented on June 1, 2024

Updating this issue to add that we have added functionality in the CleanCODEX() function for fitting a negative binomial model or applying an arcsinh transformation in case the Gaussian doesn't fit well on non-negative values. This can be changed using the model parameter.

from stvea.

igordot commented on June 1, 2024

That is great news. Thank you for the update.

Do you have any suggestions on when to use negative binomial or arcsinh transformation?

from stvea.

govekk commented on June 1, 2024

The negative binomial is best used to fit count data. It requires non-negative integer expression values, though the CleanCODEX function will take the ceiling of any non-integer data to allow for CyTOF data that has been randomized. It may be also worth it to try this distribution on non-negative CODEX values, though I have not thoroughly tested that.

We use arcsinh as a last resort if the data does not fit any probability distribution well. This scales the data for better visualization, similar to a log transformation, but does not distinguish signal from background.

from stvea.

CODEX input about stvea HOT 7 CLOSED

Comments (7)

Related Issues (13)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent