Code Monkey home page Code Monkey logo

tenet's Introduction

TENET

A tool for reconstructing Transfer Entropy-based causal gene NETwork from pseudo-time ordered single cell transcriptomic data

Citation

Nucleic Acids Research, gkaa1014, https://doi.org/10.1093/nar/gkaa1014

Dependency

python3
openmpi (>4.0)
JPype

1. Run TENET using expression data in a csv file and pseudotime result in a text file

Usage

./TENET [expression_file_name] [number_of_threads] [trajectory_file_name] [cell_select_file_name] [history_length]

example

./TENET expression_data.csv 10 trajectory.txt cell_select.txt 1

Input

(1) expression_file (raw count is recommended) - a csv file with N cells in the rows and M genes in the columns (same format with wishbone pseudotime package).
	GENE_1	GENE_2	GENE_3	...	GENE_M

CELL_1	

CELL_2

CELL_3

.
.
.

CELL_N
(2) number_of_threads - You can use this multi-threads option. This will take lots of memory depending on the squared number of genes * the number of cells. If the program fail, you need to reduce this.
(3) trajectory_file - a text file of pseudotime data with N time points in the same order as the N cells of the expression file.
0.098
0.040
0.023
.
.
.
0.565
(4) cell_select_file - a text file of cell selection data with N Boolean (1 for select and 0 for non-select) data in the same order as the N cells of the expression file.
1
1
0
.
.
.
1
(5) history_length - the length of history. In the benchmark data TENET provides best result when the length of history set to 1.

Output

TE_result_matrix.txt - TEij, M genes x M genes matrix representing the causal relationship from GENEi to GENEj.

TE	GENE_1	GENE_2	GENE_3	...	GENE_M
GENE_1	0	0.05	0.02	...	0.004
GENE_2	0.01	0	0.04	...	0.12
GENE_3	0.003	0.003	0	...	0.001
.
.
.
GENE_M	0.34	0.012	0.032	...	0

2. Run TENET with hdf5 file including PAGA pseudotime result

Usage

./TENET4PAGAhdf5 [hdf5_file_name] [number_of_threads] [history_length] [variable_in_adata]

example

./TENET4PAGAhdf5 Data.Tuck/Tuck_PAGA510genes.h5ad 10 1 X
./TENET4PAGAhdf5 Data.Tuck/Tuck_PAGA510genes.h5ad 10 1 raw

Input

(1) hdf5 file stored after running PAGA.
(2) [variable_in_adata]
If the expression matrix stored in adata.X, then choose X. If it is adata.raw.X, then choose raw.

3. Run TENET from TF to target using expression data in a csv file and pseudotime result in a text file

Usage

    ./TENET_TF [expression_file_name] [number_of_threads] [trajectory_file_name] [cell_select_file_name] [history_length] [species]

example

    ./TENET_TF expression_data.csv 10 trajectory.txt cell_select.txt 1 mouse

Input

(6) species - [human/mouse/rat]

Output

    TE_result_matrix.txt

4. Run TENET single core version

Usage

python TENETsinglecore [expression_file_name] [trajectory_file_name] [cell_select_file_name] [history_length]

example

python TENETsinglecore expression_data.csv trajectory.txt cell_select.txt 1

Output

TE_result_matrix.txt

5. Downstream analysis

(1) Reconstructing GRN

Usage
python makeGRN.py [cutoff for FDR]
python makeGRNsameNumberOfLinks.py [number of links]
python makeGRNbyTF.py [species] [cutoff for FDR]
python makeGRNbyTFsameNumberOfLinks.py [species] [number of links]
** Note that "TE_result_matrix.txt" should be in the same folder.
Example
python makeGRN.py 0.01
python makeGRNsameNumberOfLinks.py 1000
python makeGRNbyTF.py human 0.01
python makeGRNbyTFsameNumberOfLinks.py human 1000
Output file
TE_result_matrix.fdr0.01.sif
TE_result_matrix.NumberOfLinks1000.sif
TE_result_matrix.byGRN.fdr0.01.sif
TE_result_matrix.byGRN.NumberOflinks1000.sif
Parameter
[cutoff for fdr] - A cutoff value for FDR by z-test
[number of links] - The number of links of the GRN
[species] - User can choose [human/mouse/rat]

(2) Trimming indirect edges

Usage
python trim_indirect.py [name of GRN] [cutoff]
Example
python trim_indirect.py TE_result_matrix.fdr0.01.sif 0
Output file
TE_result_matrix.fdr0.01.trimIndirect0.0.sif
Parameter
[cutoff] - A cutoff value for trimming indirect edges. Recommended range is -0.1 to 0.1

(3) Counting out-degree of a given GRN

Usage
python countOutdegree.py [name of GRN]
Example
python countOutdegree.py TE_result_matrix.fdr0.01.sif
Output file
TE_result_matrix.fdr0.01.sif.outdegree.txt

tenet's People

Contributors

neocaleb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

tenet's Issues

TENET and TENETsinglecore results not matching

As told in #9, the mutlicore TENET and TENETsinglecore uses different indexes.
However, when converting the list of gene pairs into TE matrix, it works the same way as multicore TENET, subtracting 1 from the index.

This causes the singlecore TENET to break, as it puts gene pair [0,1] into [-1,0], yielding different results.

Also, singlecore TENET does not work with a , before the genes.

image

Re: example output wondering

Hello,
I am doing application research on the reconstruction of gene regulatory networks and interested in the TENET algorithm. However, when I running the example data, the result matrix in TE_result_matrix.txt is all 0 for all pairs of genes. I am wondering should this output be all 0 for the example data? (or I make the mistake in running the code). Thank you so much for your help.

unexpected result files

Hi there,
we used the command
./TENET OB_temp.csv 10 OB_PAGApseudotime.txt OB_PAGAcell_select.txt 1

The input file included 3000 genes and 6424 cells.
We expected the matrix output file as usual. However we got
10 files called:
TE_out_aaa.csv
TE_out_aab.csv
...
TE_out_aaj.csv

They all look similar

1, 2, 0.00016435724617081058,0.00017018181705232195
1,3,9.219382988825285e-05,0.001077501635620558
1,4,0.00026165766902638675,0.0008222731935857703
1,5,0.0,2.0344534011729684e-06
1,6,2.1777893709535886e-05,2.2460595471730527e-05
1,7,0.0,0.0
1,8,2.1006674256813516e-07,2.100340090475586e-07
1,9,2.1006674256813426e-07,2.100340090475579e-07
1,10,1.400117626303666e-07,1.4001176263036658e-07
1,11,0.001054885122915543,1.105408979823489e-05
1,12,1.2648365209598076e-06,1.2616793584688755e-06

The last files ends with
2999,3000

It seems like those values depict the indices of the matrix. However, we would then expect only one value for each gene-pair.

Why there are two and why didn't we got the expected matrix output?

cheers

A few issues

  1. It would be better to explicitly use python3 instead of python since some computer often have python2 as default python
  2. It seems like input files must be located within the current working directory. It would be convenient to allow input files in a separate folder for handling multiple different data sets.
  3. openmpi dependency is not clear. at least it didn't work with openmpi3. I guess it's based on openmpi > 4.0?

a few typoes

  1. in makeGRNbyTF.py
    Line 8:
    ifile = open("GO_symbol_"+species+"regulation_of_transcription+sequence-specific_DNA_binding_list_list.txt")
    ==> ifile = open("GO_symbol
    "+species+"_regulation_of_transcription+sequence-specific_DNA_binding_list.txt")

NumberOfLinks=sys.argv[2]
Line 42:
fdrCutoff=float(sys.argv[1])
==> fdrCutoff=float(sys.argv[2])

  1. in makeGRNbyTFsameNumberOfLinks.py
    Line 6:
    ifile = open("GO_symbol_"+species+"regulation_of_transcription+sequence-specific_DNA_binding_list_list.txt")
    ==> ifile = open("GO_symbol
    "+species+"_regulation_of_transcription+sequence-specific_DNA_binding_list.txt")

SINGE benchmarking

I saw your bioRxiv preprint recently, and the work looks very interesting.

I am one of the developers of SINGE (formerly called SCINGE) and am wondering which version of the software you used and what hyperparameters you selected. Are these details in your manuscript or supplement? The SINGE performance was not great in your evaluation, and we'd like to diagnose what happened. Thanks.

Runtime of Tenet with Dyngen dataset

Hello,

Thank you very much for this program from the paper it seems like a great tool!

I was trying out to run it on a simulated dataset from DYNGEN with 500 cells and 100 genes however the program is still running after more then 18 hours now! This seems pretty long and I am afraid something went wrong. Or is it to be expected to run that long?

I am using 10 threads :)

Thanks for your help
Philipp

Using TENET with real time points

Hi,

I am wondering if it could be possible to use TENET with real time points rather than pseudo time information. In my case the vector of time points will only contain a few integer values that will be repeated for most cells.

Thanks for your help!
Ines

about the history length

question about history length in file runTE.py line 69, its default value is 1, and I noticed that the max time point in the trajectory.txt is also 1. so can you explain more clearly what does this history length actually do? Thanks!

About the variables..?

It is interesting to know the implementation of transfer entropy in the analysis of scrnaseq.

Do I need to use:

  1. The discretization algorithm which convert the exp in to 0 and 1: if it is, what determines the threshold?
  2. I have two trajectories from wishbone. Can I use the direct output from it?
  3. Cell select txt file may require further description. How can I make the cell select file? Please let me know :)

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.