Comments (7)
The FCS files we started with were downloaded from the original CODEX paper (https://www.cell.com/cell/fulltext/S0092-8674(18)30904-8), so I believe they should be the same as the output from the CODEX Processor, though I haven't used that. We converted the FCS files directly to CSV, so there shouldn't be any extra information.
The codex_protein object is all of the protein count columns from the CSV.
The codex_spatial object takes the information from the "tile_nr.tile_nr", "X.X", "Y.Y", "Z.Z" columns to give absolute x,y,z coordinates for each cell. We calculated the x and y using the following:
x <- floor((tile_nr.tile_nr-1)/9) * max(X.X) + X.X
and the z is just Z.Z.
The codex_size object is the column titled "size.size", and the codex_blank object is all columns titled "blank", i.e. "blank_Cy3_cyc15". If your dataset doesn't include these, I would advise you to create dummy matrices (i.e. of all 1s) and later adjust the parameters to FilterCODEX() so that no cells get removed.
Based on issues you and other users have submitted, we will likely implement a more clear data input process, but I hope this information helps until we make those changes.
from stvea.
Thank you for clarifying! That was very helpful. The original data source from that paper (http://welikesharingdata.blob.core.windows.net/forshare/index.html) is no longer available, so it's difficult to confirm. Also, that was an early implementation of the technology, so I wasn't sure if the CODEX Processor is replicating that original workflow or if they introduced some substantial modifications.
from stvea.
I was able to import my own data and run the main functions without any obvious issues. I just have some followup.
I don't think it's necessary to adjust the x and y coordinates with the current FCS/CSV files. I assume each tile previously had independent coordinates (cells in different tiles could have identical x and y coordinates), but now that is not the case (x and y coordinates are for the entire region).
I also checked the input data. The provided codex_protein data frame includes positive and negative values:
The current FCS/CSV files have only positive values:
Do the expression values need to be adjusted?
from stvea.
Thank you for this helpful information. You are correct that previously the xy coordinates were relative to each tile. If the coordinates are now over the entire region, there is no need to convert them. You may still wish to convert the units from pixels to nm so that the different size of the z slices is taken into account.
It is good to know that now the FCS output of CODEX does not contain negative values. We believe the negative/positive range of the original FCS files is from their spillover correction - we did not do any preprocessing to cause that.
The new distribution might cause the CleanCODEX function in STvEA to produce unexpected results or fail to fit entirely, since it is attempting to fit a Gaussian mixture. We have found that a Gaussian can work fine on some non-negative protein expression, but generally negative binomial mixtures (as in the CleanCITE function) better fit non-negative expression distributions. However, we have not yet implemented an NB mixture in the CODEX functions. If you care to try that on CODEX, you may find the FitNB function helpful. I would be interested in hearing how the Gaussian or NB fits work on your data.
Meanwhile, we will work on adding more options for fitting different distributions and more transparency in the fit success in the CleanCODEX function.
from stvea.
Updating this issue to add that we have added functionality in the CleanCODEX()
function for fitting a negative binomial model or applying an arcsinh transformation in case the Gaussian doesn't fit well on non-negative values. This can be changed using the model
parameter.
from stvea.
That is great news. Thank you for the update.
Do you have any suggestions on when to use negative binomial or arcsinh transformation?
from stvea.
The negative binomial is best used to fit count data. It requires non-negative integer expression values, though the CleanCODEX function will take the ceiling of any non-integer data to allow for CyTOF data that has been randomized. It may be also worth it to try this distribution on non-negative CODEX values, though I have not thoroughly tested that.
We use arcsinh as a last resort if the data does not fit any probability distribution well. This scales the data for better visualization, similar to a log transformation, but does not distinguish signal from background.
from stvea.
Related Issues (13)
- seurat v3+ HOT 1
- Different marker distributions between samples HOT 1
- Load codex.fcs data HOT 1
- Normalization of CODEX data HOT 1
- Shiny app source code HOT 1
- Error during 'stvea_object <- GetTransferMatrix(stvea_object)' HOT 6
- Error during MapCODEXtoCITE() HOT 5
- How to map the UMAP result into CODEX spatial figure ? HOT 1
- Color legend to interpret the colors in the heatmap (CODEX protein adj map & cluster adj map) HOT 1
- I can't reproduce your example HOT 1
- AdjScoreProteins errors: adjacency score for each feature pairError: $ operator is invalid for atomic vectors HOT 8
- CleanCODEX() error HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from stvea.