Code Monkey home page Code Monkey logo

Comments (7)

govekk avatar govekk commented on June 1, 2024 1

The FCS files we started with were downloaded from the original CODEX paper (https://www.cell.com/cell/fulltext/S0092-8674(18)30904-8), so I believe they should be the same as the output from the CODEX Processor, though I haven't used that. We converted the FCS files directly to CSV, so there shouldn't be any extra information.

The codex_protein object is all of the protein count columns from the CSV.
The codex_spatial object takes the information from the "tile_nr.tile_nr", "X.X", "Y.Y", "Z.Z" columns to give absolute x,y,z coordinates for each cell. We calculated the x and y using the following:
x <- floor((tile_nr.tile_nr-1)/9) * max(X.X) + X.X
and the z is just Z.Z.

The codex_size object is the column titled "size.size", and the codex_blank object is all columns titled "blank", i.e. "blank_Cy3_cyc15". If your dataset doesn't include these, I would advise you to create dummy matrices (i.e. of all 1s) and later adjust the parameters to FilterCODEX() so that no cells get removed.

Based on issues you and other users have submitted, we will likely implement a more clear data input process, but I hope this information helps until we make those changes.

from stvea.

igordot avatar igordot commented on June 1, 2024

Thank you for clarifying! That was very helpful. The original data source from that paper (http://welikesharingdata.blob.core.windows.net/forshare/index.html) is no longer available, so it's difficult to confirm. Also, that was an early implementation of the technology, so I wasn't sure if the CODEX Processor is replicating that original workflow or if they introduced some substantial modifications.

from stvea.

igordot avatar igordot commented on June 1, 2024

I was able to import my own data and run the main functions without any obvious issues. I just have some followup.

I don't think it's necessary to adjust the x and y coordinates with the current FCS/CSV files. I assume each tile previously had independent coordinates (cells in different tiles could have identical x and y coordinates), but now that is not the case (x and y coordinates are for the entire region).

I also checked the input data. The provided codex_protein data frame includes positive and negative values:
image

The current FCS/CSV files have only positive values:
image

Do the expression values need to be adjusted?

from stvea.

govekk avatar govekk commented on June 1, 2024

Thank you for this helpful information. You are correct that previously the xy coordinates were relative to each tile. If the coordinates are now over the entire region, there is no need to convert them. You may still wish to convert the units from pixels to nm so that the different size of the z slices is taken into account.

It is good to know that now the FCS output of CODEX does not contain negative values. We believe the negative/positive range of the original FCS files is from their spillover correction - we did not do any preprocessing to cause that.

The new distribution might cause the CleanCODEX function in STvEA to produce unexpected results or fail to fit entirely, since it is attempting to fit a Gaussian mixture. We have found that a Gaussian can work fine on some non-negative protein expression, but generally negative binomial mixtures (as in the CleanCITE function) better fit non-negative expression distributions. However, we have not yet implemented an NB mixture in the CODEX functions. If you care to try that on CODEX, you may find the FitNB function helpful. I would be interested in hearing how the Gaussian or NB fits work on your data.

Meanwhile, we will work on adding more options for fitting different distributions and more transparency in the fit success in the CleanCODEX function.

from stvea.

govekk avatar govekk commented on June 1, 2024

Updating this issue to add that we have added functionality in the CleanCODEX() function for fitting a negative binomial model or applying an arcsinh transformation in case the Gaussian doesn't fit well on non-negative values. This can be changed using the model parameter.

from stvea.

igordot avatar igordot commented on June 1, 2024

That is great news. Thank you for the update.

Do you have any suggestions on when to use negative binomial or arcsinh transformation?

from stvea.

govekk avatar govekk commented on June 1, 2024

The negative binomial is best used to fit count data. It requires non-negative integer expression values, though the CleanCODEX function will take the ceiling of any non-integer data to allow for CyTOF data that has been randomized. It may be also worth it to try this distribution on non-negative CODEX values, though I have not thoroughly tested that.

We use arcsinh as a last resort if the data does not fit any probability distribution well. This scales the data for better visualization, similar to a log transformation, but does not distinguish signal from background.

from stvea.

Related Issues (13)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.