morrelllab / bad_mutations Goto Github PK
View Code? Open in Web Editor NEWDeleterious mutation prediction pipeline
License: MIT License
Deleterious mutation prediction pipeline
License: MIT License
BAD_Mutations does not properly find HyPhy. I have Hyphy installed at /usr/bin/hyphymp
, but BAD_Mutations cannot find it.
$ which hyphymp # Done in Docker container
/usr/bin/hyphymp
$ sudo docker run -v $(pwd -P):/data bad-mutations -v DEBUG setup -b /data/BAD_Mutation_database/ -t 'Athaliana' -e 0.05 -c /data/BAD_Mutations_Config.txt
===2017-02-06 22:23:36,289 - LRT_Predict===
DEBUG setup subcommand was invoked
===2017-02-06 22:23:36,291 - Setup_Env===
DEBUG Config file in /data/BAD_Mutations_Config.txt
Setting variables:
#define BASE /data/BAD_Mutation_database/
#define TARGET_SPECIES Athaliana
#define EVAL_THRESHOLD 0.05
===2017-02-06 22:23:36,292 - Setup_Env===
DEBUG Setting executable path variables:
#define BASH /bin/bash
#define GZIP /bin/gzip
#define SUM /usr/bin/sum
#define TBLASTX /usr/bin/tblastx
#define PASTA /usr/local/bin/run_pasta.py
#define HYPHY
===2017-02-06 22:23:36,292 - Setup_Env===
WARNING Cannot find HyPhy! Will download
===2017-02-06 22:23:36,292 - Setup_Env===
DEBUG Checking if /data/BAD_Mutations_Config.txt exists.
===2017-02-06 22:23:36,292 - Setup_Env===
WARNING Config file /data/BAD_Mutations_Config.txt already exists. It will be overwritten!
===2017-02-06 22:23:36,313 - Setup_Env===
INFO Wrote configuration into /data/BAD_Mutations_Config.txt
makeblastdb fails when sequence is missing from fasta file. This affects:
Hordeum_vulgare.082214v1.29.cds.all.fa.20151105_makeblastdb_err
Leersia_perrieri.Lperr_V1.4.29.cds.all.fa.20151105_makeblastdb_err
When fetching, it checks for BioPython being installed. On MSI's setup, this breaks the script as it won't fetch on LAB and LOGIN doesn't have BioPython readily available.
Hi,
when i'm using "Fetch" i get this error:
DEBUG The XML I got was
<title>302 Found</title>The document has moved here.
===2020-01-26 12:56:30,035 - lrt_predict.Fetch.fetch===
DEBUG The XML I got was
The document has moved here.
Traceback (most recent call last):
File "/mnt/data/Tools/BAD_Mutations-Python3/BAD_Mutations/BAD_Mutations.py", line 382, in
main()
File "/mnt/data/Tools/BAD_Mutations-Python3/BAD_Mutations/BAD_Mutations.py", line 348, in main
fetch(arguments_valid, loglevel)
File "/mnt/data/Tools/BAD_Mutations-Python3/BAD_Mutations/BAD_Mutations.py", line 110, in fetch
phy.get_xml_urls()
File "/mnt/data/Tools/BAD_Mutations-Python3/BAD_Mutations/lrt_predict/Fetch/phytozome.py", line 168, in get_xml_urls
xml_tree = ElementTree.fromstring(xml)
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1311, in XML
parser.feed(text)
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1659, in feed
self._raiseerror(v)
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1523, in _raiseerror
raise err
xml.etree.ElementTree.ParseError: syntax error: line 1, column 49
i didn't manage to redirect the phytosome.py script to the correct XML, any suggestions?
When setting TARGET_SPECIES in config file to Athaliana or Athaliana_167_TAIR10, it ends up in alignment and in the tree file. The input test.fasta sequence (from Athalian) does as well, which would result in duplicated species in the alignment and tree.
When install pasta need to install specific DendroPy-3.12.3:
Install dendropy for pasta on MSI:
wget https://github.com/jeetsukumaran/DendroPy/archive/v3.12.3.tar.gz
tar -xvzf v3.12.2.tar.gz
cd DendroPy-3.12.3
python setup.py build -b $(pwd -P)
python setup.py install --user
So I tried to run the sample code in the manual for barley setup, with the only change being that I changed the directories for the blast databases and the dependencies. Code below.
wyant008@labqi046 [~/BAD_Mutations] % ./BAD_Mutations.py -v DEBUG \
setup
-b /home/morrellp/wyant008/BAD_Mutations_Alpha/BLAST_databases
-d /home/morrellp/wyant008/BAD_Mutations_Alpha/dependencies
-t 'Hordeum_vulgare'
-e 0.05
-m 0.2
-c BAD_Mutations_Config.txt 2> Setup.log
The file Setup.log that is outputted looks like this:
===2015-12-08 16:51:54,455 - LRT_Predict===
ERROR The species name you provided is not in the list of allowable species.
Setup.log (END)
I changed 'Hordeum_vulgare' to 'hordeum_vulgare' and it outputted a much nicer Setup.log, but still not a usable config file. Make sure to change this in the manual.
I didn't know it wasn't usable until I tried to fetch using the example fetch code in the manual, after making an account with JGI Genome Portal.
wyant008@labqi046 [~/BAD_Mutations] % ./BAD_Mutations.py -v DEBUG fetch -c BAD_Mutations_Config.txt -u '[email protected]'
I get the following errors:
wyant008@labqi046 [~/BAD_Mutations] % ./BAD_Mutations.py -v DEBUG fetch -c BAD_Mutations_Config.txt -u '[email protected]'
===2015-12-08 17:23:40,302 - Configuration_Handler===
ERROR Line 11: Expected three fields, got 2
===2015-12-08 17:23:40,302 - LRT_Predict===
ERROR Config file is not valid!
Here's the config file:
// Generated by 'setup' at 2015-12-08 16:58:39.633427
// Program paths
BAD_Mutations_Config.txt (END)
As you can see it's missing paths for tBLASTx, PASTA, and HYPHY. Additionally, there's nothing in both of the directories specified in the setup script, so it looks like it didn't download any dependencies or BLAST databases for me.
I hope this was helpful.
Do not align sequences that have ambiguous nucleotides, as they cause HyPhy errors.
When I extracted the items without NOSNP from the prediction output file, I saw the fields of the header (11) is inconsistent with the content fields (18). The first seven fields are repeating twice after the reference AA. I show the first 10 lines in the file, please see below:
Position L0 L1 Constraint Chisquared P-value SeqCount Alignment ReferenceAA MaskedConstraint MaskedP-value
417 -114.2504769550768 -109.8429675226595 2.600003815922569 8.815018864834457 0.002987611217884822 37 PEFSSC-FLCTTCCCPTP--SAAHASWWFFFFCCCCHQC--L NA 417 -114.2504769550768 -109.8429675226595 2.600003815922569 8.815018864834457 0.002987611217884822 37 2.600003815922569 0.002987611217884822
432 -96.00793409068631 -95.98883661710803 0.901551054244307 0.03819494715654059 0.8450522026700051 34 AFFFFF-FFFFFTAVW-----FFLLLLLLLLLLLLLLSH--L NA 432 -96.00793409068631 -95.98883661710803 0.901551054244307 0.03819494715654059 0.8450522026700051 34 0.901551054244307 0.8450522026700051
660 -70.25261880236872 -42.59972200094571 0.01476124340506724 55.30579360284601 1.03139718987677e-13 40 VVVVVVVVVVVVVVVVVVVVVVVVVIIIIIIIIIIIVVV--V NA 660 -70.25261880236872 -42.59972200094571 0.01476124340506724 55.30579360284601 1.03139718987677e-13 40 0.01476124340506724 1.03139718987677e-13
933 -17.73616225941901 -17.18405710624295 2.423459124324687 1.104210306352108 0.293343971335091 10 --------------------------VVVVVKVVVV------ NA 933 -17.73616225941901 -17.18405710624295 2.423459124324687 1.104210306352108 0.293343971335091 10 2.423459124324687 0.293343971335091
1011 -76.5375368627051 -65.83106099341637 0.148075067122228 21.41295173857745 3.702615600342796e-06 35 SG-GGGG-GGGGGGGAAAAAS--S-SSSSSSSSSSSGGG--G NA 1011 -76.5375368627051 -65.83106099341637 0.148075067122228 21.41295173857745 3.702615600342796e-06 35 0.148075067122228 3.702615600342796e-06
1128 -88.81552901538274 -88.65171196789275 0.7129923137686337 0.3276340949799703 0.5670555509255972 38 NN-NNNNNNNQQSEQEEEEEESIG-NNNNNNNNNNNQQE--E NA 1128 -88.81552901538274 -88.65171196789275 0.7129923137686337 0.3276340949799703 0.5670555509255972 38 0.7129923137686337 0.5670555509255972
1134 -73.24644758771406 -66.2098928055336 0.222896653989791 14.07310956436092 0.0001758398293251195 37 TT-TTTTTNTTT-TIHPPPPPPPN-RSSSSSSSSSSTTT--A NA 1134 -73.24644758771406 -66.2098928055336 0.222896653989791 14.07310956436092 0.0001758398293251195 37 0.222896653989791 0.0001758398293251195
1182 -61.99178040725806 -47.72236360077187 0.06212813833168374 28.53883361297238 9.183793447942179e-08 36 LL-LLLL-ILLL-LLLLLLLLLLL-IIIIIIIIIIILLL--I NA 1182 -61.99178040725806 -47.72236360077187 0.06212813833168374 28.53883361297238 9.183793447942179e-08 36 0.06212813833168374 9.183793447942179e-08
1233 -72.04357072402931 -44.08226769565381 9.866446284317615e-17 55.92260605675099 7.538414337204813e-14 38 SSSSSSS-SSSSSSSSSSSSSSSS-SSSSSSSSSSSSSS--S NA 1233 -72.04357072402931 -44.08226769565381 9.866446284317615e-17 55.92260605675099 7.538414337204813e-14 38 9.866446284317615e-17 7.538414337204813e-14
When running without --fetch-only, LRT_Predict.py fails when attempting to unzip and create BLAST databases.
Traceback (most recent call last):
File "./LRT_Predict.py", line 151, in <module>
main()
File "./LRT_Predict.py", line 142, in main
fetch(arguments_valid.base, arguments_valid.user, arguments_valid.password, arguments_valid.fetch_only)
File "./LRT_Predict.py", line 113, in fetch
retval = blast_databases.format_blast(c)
File "/panfs/roc/groups/9/morrellp/hoffmanp/sandbox/LRT_Pipeline/LRT/Fetching/blast_databases.py", line 23, in format_blast
retval = subprocess.call(cmd, shell=False)
File "/soft/python2/2.7.8/lib/python2.7/subprocess.py", line 522, in call
return Popen(*popenargs, **kwargs).wait()
File "/soft/python2/2.7.8/lib/python2.7/subprocess.py", line 710, in __init__
errread, errwrite)
File "/soft/python2/2.7.8/lib/python2.7/subprocess.py", line 1327, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
The compile
in the BAD_Mutations only works for 41 aligned genomes.
Check that all required arguments (fasta file, output directory, etc.) are specified, either on command line or in config file. Currently fails messily, since it tries to use NoneType for various operations.
Paul and I tried to test with interactive python with the "_MAS.fasta" file, and it seems the script can not find the absolute path of this file.
Traceback (most recent call last):
File "/home/morrellp/llei/BAD_Mutations/BAD_Mutations.py", line 335, in <module>
main()
File "/home/morrellp/llei/BAD_Mutations/BAD_Mutations.py", line 315, in main
out = predict(arguments_valid, loglevel)
File "/home/morrellp/llei/BAD_Mutations/BAD_Mutations.py", line 250, in predict
lrt.prepare_hyphy_inputs()
File "/panfs/roc/groups/9/morrellp/llei/BAD_Mutations/lrt_predict/Predict/predict.py", line 102, in prepare_hyphy_inputs
infile.write(os.path.abspath(self.nmsa) + '\n')
File "/soft/python2/2.7.8/lib/python2.7/posixpath.py", line 367, in abspath
if not isabs(path):
File "/soft/python2/2.7.8/lib/python2.7/posixpath.py", line 61, in isabs
return s.startswith('/')
AttributeError: 'MultipleSeqAlignment' object has no attribute 'startswith'
Biopython's Newick parser fails on trees generated by PASTA. Sanitize them using Prepare_HyPhy.sh before copying them into the output directory in the 'align' subcommand.
There are multiple versions of each species in Phytozome and Ensembl Plants now. We should only grab the latest version of each one.
When running the new config branch, the PASTA dependency fails to show up in the finished config file after running setup.
JGI changed their Phytozome login again
Line 151 of Manual_v0.1.md refers to the "Minimum number of gapped (missing) site in the...alignment..." I'm guess the intention was to set a "maximum" number of missing sites?
Check for the required executables ('tblastx' and 'prank' for now) and print a nice message if they are not in $PATH
The hyphy command failed. Not sure why since temporary input files are not kept.
Here is my command:
python BAD_Mutations.py -v DEBUG
predict
-c BAD_Mutations_Config.txt
-f test.fasta
-s subs.txt
My guess is that hyphy needs alignment to be cleaned up, remove characters from names, remove bootstrap values - which I previously did for Barley before running hyphy.
Here is the error:
PASTA INFO: Total time spent: 36.7152650356s
===2015-11-10 12:35:02,202 - LRT_Predict===
DEBUG stderr:
===2015-11-10 12:35:02,225 - LRT_Predict===
INFO Nucleotide alignment in /tmp/BAD_Mutations_BackTranslated_9rv1TH.fasta
===2015-11-10 12:35:02,225 - LRT_Predict===
INFO Tree in /tmp/pastajob_1447180465.400478.tre
===2015-11-10 12:35:02,226 - LRT_Prediction===
DEBUG Checking if subs.txt exists.
===2015-11-10 12:35:02,226 - LRT_Prediction===
WARNING Variant on line 1 of input file subs.txt does not have an ID. Using the empty string ('') as an ID.
===2015-11-10 12:35:02,226 - LRT_Prediction===
WARNING Variant on line 2 of input file subs.txt does not have an ID. Using the empty string ('') as an ID.
===2015-11-10 12:35:02,226 - LRT_Prediction===
INFO Input file subs.txt contains 2 positions to predict.
===2015-11-10 12:35:02,228 - LRT_Prediction===
DEBUG test is at position 12
===2015-11-10 12:35:02,229 - LRT_Prediction===
DEBUG bash /media/jfay/data1/projects/barley/BAD_Mutations/Shell_Scripts/Prediction.sh /usr/local/bin/HYPHYSP /media/jfay/data1/projects/barley/BAD_Mutations/Shell_Scripts/LRT.hyphy /tmp/BAD_Mutations_HYHPY_In_kGLw8c.txt /tmp/BAD_Mutations_HYPHY_Out_IuvmhL.txt
===2015-11-10 12:35:02,298 - LRT_Prediction===
DEBUG stdout:
===2015-11-10 12:35:02,298 - LRT_Prediction===
DEBUG stderr:
/media/jfay/data1/projects/barley/BAD_Mutations/Shell_Scripts/Prediction.sh: line 15: 14577 Segmentation fault ${HYPHY} ${PREDICTION_SCRIPT} <<< ${INPUT} > ${OUTPUT}
===2015-11-10 12:35:02,298 - LRT_Predict===
INFO Prediction in /media/jfay/data1/projects/barley/BAD_Mutations/test_Predictions.txt
I would like to say create new directories: mkdir -p /scratch/BAD_Mutations_Data /scratch/BAD_Mutations_Deps before run the below command line.
./BAD_Mutations.py -v DEBUG
setup
-b /scratch/BAD_Mutations_Data
-d /scratch/BAD_Mutations_Deps
-t 'Hordeum_vulgare'
-e 0.05
-m 10
-c BAD_Mutations_Config.txt 2> Setup.log
Maybe it is better to add the command line (git clone link or wget link) for user to download in the Downloading of the Manual. Because user may not familiar with the git to download.
Hello BAD_Mutations team,
When I running the command `./BAD_Mutations.py -v DEBUG compile -P Predictions_Dir -S Test_Data/CBF3.subs 2> Compile.log` , The following errors will appear :
DEBUG 168 Error: Traceback (most recent call last): File "/vol3/agis/wangli_group/sunshichao/soybean/biosoft/BAD_Mutations/BAD_Mutations.py", line 382, in <module> main() File "/vol3/agis/wangli_group/sunshichao/soybean/biosoft/BAD_Mutations/BAD_Mutations.py", line 375, in main compile_preds(arguments_valid, loglevel) File "/vol3/agis/wangli_group/sunshichao/soybean/biosoft/BAD_Mutations/BAD_Mutations.py", line 266, in compile_preds parsed_preds = [comp.parse_prediction(rep) for rep in reports] File "/vol3/agis/wangli_group/sunshichao/soybean/biosoft/BAD_Mutations/lrt_predict/Predict/hyphy_parser.py", line 133, in parse_prediction geneseq += line.strip().split()[8] IndexError: list index out of range
I don't know what went wrong. I hope you can give me some advice.
Thank you so much.
In this command:
python BAD_Mutations.py -v DEBUG predict -c LRTPredict_Config.txt -f test.fasta -s subs.txt
there is a bad path error:
PASTA INFO: Writing resulting alignment to /tmp/pastajob_1446831223.000076.marker001.LRTPredict_PastaInput_VFTMuG.aln
PASTA INFO: Writing resulting tree to /tmp/pastajob_1446831223.000076.tre
PASTA INFO: Writing resulting likelihood score to /tmp/pastajob_1446831223.000076.score.txt
PASTA INFO: Total time spent: 42.497729063s
===2015-11-06 11:34:25,634 - LRT_Predict===
DEBUG stderr:
===2015-11-06 11:34:25,658 - LRT_Predict===
INFO Nucleotide alignment in /tmp/LRTPredict_BackTranslated_YWKiqN.fasta
===2015-11-06 11:34:25,658 - LRT_Predict===
INFO Tree in /tmp/pastajob_1446831223.000076.tre
Traceback (most recent call last):
File "BAD_Mutations.py", line 279, in
main()
File "BAD_Mutations.py", line 269, in main
open(new_nuc, 'w').close()
IOError: [Errno 2] No such file or directory: '/Users/tomkono/Data_Disk/tmp/Barley_Anc_Aln/test.fasta'
As discussed, an input format similar to that used by PolyPhred would be preferable. Essentially, a SNP and codon list would be easier to use than a modified FASTA file. There are too many modified FASTA formats already.
Good afternoon! Sorry to leave another response, I have been trying for several days to trouble shoot this on my own with no luck.
When running the sample data, the output from the align subcommand gives me CBF3_MSA.fasta and CBF3.tree instead of CBF3_tree.tree. I checked the .log files and I don't see an error happening. I then tried executing the predict subcommand using this CBF3.tree and it gives me the error that there are multiple trees in this file. I have tried looking at the Hyphy/Biopython side and the PASTA side of the problem, and noticed there was an extra semicolon in the tree. I deleted that manually and tried running again with the same error occurring. At this point I assume some step in the PASTA align/tree component is being skipped and just not giving me an error. I have tried reinstalling things and messing with the versions for each dependency without any luck. Bellow are the end portions of the logs:
Align:
PASTA INFO: TreeShrink option has been turned off!
PASTA INFO: Step 4. Realigning with decomposition strategy set to mincluster
PASTA INFO: Step 4. Alignment obtained. Tree inference beginning...
PASTA INFO: realignment accepted despite the score not improving.
PASTA INFO: current score: -17739.33, best score: -17698.421
PASTA INFO: TreeShrink option has been turned off!
PASTA INFO: Writing resulting alignment to /tmp/pastajob_1572975010.129370.marker001.BAD_Mutations_PastaInput_KxK2_1.aln
PASTA INFO: Writing resulting tree to /tmp/pastajob_1572975010.129370.tre
PASTA INFO: Writing resulting likelihood score to /tmp/pastajob_1572975010.129370.score.txt
PASTA INFO: The resulting alignment (with the names in a "safe" form) was first written as the file "/tmp/pastajob_1572975010.129370_temp_iteration_1_seq_alignment.txt"
PASTA INFO: The resulting tree (with the names in a "safe" form) was first written as the file "/tmp/pastajob_1572975010.129370_temp_iteration_1_tree.tre"
PASTA INFO: Total time spent: 70.6753809452s
===2019-11-05 10:31:22,141 - LRT_Predict===
DEBUG stderr:
===2019-11-05 10:31:22,169 - Pasta_Align===
DEBUG Sanitizing alignment in /tmp/BAD_Mutations_BackTranslated_Aw55AW.fasta
===2019-11-05 10:31:22,169 - Pasta_Align===
DEBUG Sanitizing tree in /tmp/pastajob_1572975010.129370.tre
===2019-11-05 10:31:22,243 - LRT_Predict===
INFO Nucleotide alignment in /tmp/BAD_Mutations_BackTranslated_Aw55AW.fasta
===2019-11-05 10:31:22,243 - LRT_Predict===
INFO Tree in /tmp/pastajob_1572975010.129370.tre
===2019-11-05 10:31:22,261 - LRT_Predict===
INFO MSA copied to Output_Dir/CBF3_MSA.fasta
===2019-11-05 10:31:22,261 - LRT_Predict===
INFO Tree copied to Output_Dir/CBF3.tree<
Predict:
===2019-11-07 08:44:27,234 - Configuration_Handler===
DEBUG Setting variable BASE to /scratch/BAD_Mutations_Data
===2019-11-07 08:44:27,234 - Configuration_Handler===
DEBUG Setting variable TARGET_SPECIES to hordeum_vulgare
===2019-11-07 08:44:27,234 - Configuration_Handler===
DEBUG Setting variable EVAL_THRESHOLD to 0.05
===2019-11-07 08:44:27,234 - Configuration_Handler===
DEBUG Setting variable BASH to /bin/bash
===2019-11-07 08:44:27,234 - Configuration_Handler===
DEBUG Setting variable GZIP to /bin/gzip
===2019-11-07 08:44:27,234 - Configuration_Handler===
DEBUG Setting variable SUM to /usr/bin/sum
===2019-11-07 08:44:27,234 - Configuration_Handler===
DEBUG Setting variable TBLASTX to /usr/bin/tblastx
===2019-11-07 08:44:27,234 - Configuration_Handler===
DEBUG Setting variable PASTA to /usr/local/bin/run_pasta.py
===2019-11-07 08:44:27,234 - Configuration_Handler===
DEBUG Setting variable HYPHY to /usr/bin/hyphymp
===2019-11-07 08:44:27,234 - Configuration_Handler===
DEBUG Command line and config options merged. Values: {'sum_path': '/usr/bin/sum', 'target': 'hordeum_vulgare', 'gzip_path': '/bin/gzip', 'loglevel': 'DEBUG', 'config': 'BAD_Mutations_Config.txt', 'evalue': '0.05', 'tree': 'Output_Dir/CBF3.tree', 'substitutions': 'Test_Data/CBF3.subs', 'base': '/scratch/BAD_Mutations_Data', 'hyphy_path': '/usr/bin/hyphymp', 'action': 'predict', 'output': 'Predictions_Dir', 'fasta': 'Test_Data/CBF3.fasta', 'tblastx_path': '/usr/bin/tblastx', 'bash_path': '/bin/bash', 'alignment': 'Output_Dir/CBF3_MSA.fasta', 'pasta_path': '/usr/local/bin/run_pasta.py'}
===2019-11-07 08:44:27,234 - LRT_Predict===
DEBUG Checking if BAD_Mutations_Config.txt exists.
===2019-11-07 08:44:27,235 - LRT_Predict===
DEBUG Checking if Output_Dir/CBF3.tree exists.
Traceback (most recent call last):
File "./BAD_Mutations.py", line 372, in
main()
File "./BAD_Mutations.py", line 330, in main
arguments_valid, msg = parse_args.validate_args(config_opts, loglevel)
File "/home/arizonica/BAD_Mutations/lrt_predict/General/parse_args.py", line 347, in validate_args
if not parse_input.valid_tree(args['tree'], log):
File "/home/arizonica/BAD_Mutations/lrt_predict/General/parse_input.py", line 29, in valid_tree
p = Phylo.read(f, 'newick')
File "/usr/local/lib/python2.7/dist-packages/Bio/Phylo/_io.py", line 73, in read
"There are multiple trees in this file; use parse() instead.")
ValueError: There are multiple trees in this file; use parse() instead.<
Thank you again for your assistance, I very much appreciate your time!
Add motes for running on UMN MSI systems
module load python/2.7.8
module load biopython
$ ./BAD_Mutations.py -v DEBUG
setup
-b /scratch/BAD_Mutations_Data
-d /scratch/BAD_Mutations_Deps
-t 'Hordeum_vulgare'
-e 0.05
-m 10
-c BAD_Mutations_Config.txt 2> Setup.log
Maybe it is better give a hint that run "python /home/morrellp/llei/BAD_Mutations/BAD_Mutations.py -v DEBUG setup --list-species" to list the species.
Using the command:
python BAD_Mutations.py -v DEBUG
predict
--config BAD_Mutations_Config.txt
--alignment test1/test_MSA.fasta
--tree test1/test.tree
--substitutions subs.txt
gives an error:
Traceback (most recent call last):
File "BAD_Mutations.py", line 327, in
main()
File "BAD_Mutations.py", line 266, in main
arguments_valid, msg = parse_args.validate_args(config_opts, loglevel)
File "/media/jfay/data1/projects/barley/BAD_Mutations/lrt_predict/General/parse_args.py", line 359, in validate_args
if not parse_input.valid_fasta(args['fasta'], log):
KeyError: 'fasta'
I think fasta should be alignment, however, that change gave me an error that the input fasta file is not valid.
Tom, I'll send you fasta alignment and tree file by email so you can reproduce.
Despite using module load hyphy/2.2.6_smp
, the file path in the config file generated by the setup subroutine is blank (#define HYPHY
) instead of #define HYPHY /panfs/roc/itascasoft/hyphy/2.2.6_smp/bin/HYPHYMP
.
Not directly and issue, but would it be possible to have the compile function order the results based on the given SNP name instead of the gene ID and aaPOS? I have a rather large dataset, and there are some codons that have missense SNPs in two positions. But, none of the outputs have both of them listed, since they are given the same sequence conservation score. Thanks!
HyPhy chokes on sequence names that have non-alphanumeric characters (except underscore). Use re.sub
to fix names prior to alignment, so they are HyPhy-safe when it comes time to build trees and make predictions.
Compile the HyPhy results (currently one report per gene) into a table describing all genes. Users can then make predictions from a single file.
The align subcommand seems to execute fine, but does not finish, and tail -f of log file stops at exactly the same place (whether it's my own or the example Test_Data provided in the repo).
The log file has output, e.g.
kate_crosby@kc-sandbox-2:~/BAD_Mutations$ tail adh3align.log
===2016-12-01 17:44:57,617 - Pasta_Align===
DEBUG Cyanidioschyzon_merolae MSSTQGKVIRCKAAVAWKPGQPLSIEEIEVEPPKAGEVRAKVVATGVCHTDAYTLSGADPEGVFPVILGHEGGAIVESVGEGVTSVKPGDHIIPCYIPECGQCKFCRSTKTNLCSAIRVTQGQGLMPDRTTRYSCNGRSLFHYMGCSCFSQYIVLPEIAVAKIRQDAPLDRVCLLGCGITTGIGAVLNTAKVEQGSTVAVFGLGGVGLSVVQGARIAGASRIIGVDTNESKFPLAKQLGATECINPLKFGEKPIQQVLIDMTDGGPDYTFEAIGNVKTMRAALEASHKGWGVSVIIGVAASGEEISTRPFQLVTGRTWKGTAFGGAKSRTQLPELVDMYMKGVINIDDYVTGTYKLDDINRAFEEMHNGRSIRSIILMDDDA
===2016-12-01 17:44:57,618 - Pasta_Align===
DEBUG Fvesca_226_v1 MSSTEGKVICCRAAVAWEAGKPLVIEEVEVAPPQANEVRVKILYTSLCHTDVYFWEAKGQNPLFPRIYGHEAGGIVESVGEGVTDLKAGDHVLPVFTGECKECDHCKSEESNMCDLLRINTDRGVMLSDGESRFSIKGKPIYHFVGTSTFSEYTVTHVGCLAKINPKAPLDKVCVLSCGISTGLGATLNVAKPKKGSTVAVFGLGAVGLAAAEGARMAGASRIIGVDLTSNRFEEAKKFGITEFVNPKDHKKPVQEVIAELTNGGVDRSIECTGNIQSIISAFESVHDGWGVAVLVGLPPKDAVFTTHPMNFLNERTLKGTFFGNYKPRTDIPSVVEKYMNKELELDKFITHQLPFSQINKAFDYMLKGEGIRCIITMEE*
===2016-12-01 17:44:57,618 - Pasta_Align===
DEBUG Acoerulea_322_v3 MASISNTTGQIIRCKAAVAWEAGKPLVIEEVEVAPPQAMEVRVKILFTSLCHTDVYFWEAKGQTPLFPRIFGHEAGGIVESVGSGVTDLKPGDHVLPVFTGECKDCAHCKSEESNMCDLLRINTDRGVMLNDGQSRFSINGKPIYHFVGTSTFSEYTVVHVGCLAKINPAAPLDKVCILSCGISTGLGAALNVAKPKQGSTVAVFGLGAVGLAACEGARIAGAKRIIGVDLNSNRFNEAKNFGVTDFVNPKDHNKPVQEVLAEMTDGGVDRSIECTGSVAAMISAFECVHDGWGVAVLVGVPNKDDAFKTHPMNLLNERTLKGTFFGNYKPRSDIPSVVEKYMNKELELEKFITHEVPFSEINKAFEYMLQGKSIRCIIRMEA*
===2016-12-01 17:44:57,618 - Pasta_Align===
DEBUG Number of species aligned: 45
===2016-12-01 17:44:57,620 - Pasta_Align===
DEBUG bash /home/kate_crosby/BAD_Mutations/Shell_Scripts/Pasta_Align.sh /home/kate_crosby/BAD_Mutations/bm_test/lib/python2.7/site-packages/run_pasta.py /tmp/BAD_Mutations_PastaInput_LXpvLW.fasta /tmp pastajob_1480614297.619982
Basically after that last Pasta align with 'number of species aligned', the job just hangs (there's nothing in the output directory).
Not sure what could be going on here, but my understanding is I need the alignments and trees produced from this step (not as a messy log file or an obscurely named newick file in a tmp directory - where I didn't specify the output to go to).
An inspection of said /tmp directory reveals the following:
**pastajob_1480614297.619982**
pastajob_1480614297.619982.err.txt
pastajob_1480614297.619982.marker001.BAD_Mutations_PastaInput_LXpvLW.aln
pastajob_1480614297.619982.out.txt
pastajob_1480614297.619982.score.txt
pastajob_1480614297.619982_temp_iteration_initialsearch_seq_alignment.txt
pastajob_1480614297.619982_temp_iteration_initialsearch_seq_unmasked_alignment.gz
pastajob_1480614297.619982_temp_iteration_initialsearch_tree.tre
pastajob_1480614297.619982_temp_name_translation.txt
pastajob_1480614297.619982_temp_pasta_config.txt
pastajob_1480614297.619982.tre
The /tmp dir is owned by root (not me) - do I need sudo privileges (I won't get them) + (I'm on an obscure cloud system that is not like U Minn's system) to get pasta to write out from /tmp?
Can I redirect pasta to a local tmp? If so, file(s) + line number(s) would be helpful.
It looks like there might be an off-by-one error in the HyPhy script or report.
Subs file (CBF3.subs
):
21 SNP_1
45 SNP_2
Compiled HyPhy report:
GeneID CDSPos AlignedPosition . . .
CBF3_Predictions 22 63 . . .
CBF3_Predictions 46 135 . . .
Note the CDSPos
column.
predict command is no generating results. hyphy runs, but tmp file that is supposed to have substitutions is empty:
DEBUG bash /media/jfay/data1/projects/barley/BAD_Mutations/Shell_Scripts/Prediction.sh /usr/local/bin/HYPHYSP /media/jfay/data1/projects/barley/BAD_Mutations/Shell_Scripts/LRT.hyphy /tmp/BAD_Mutations_HYHPY_In_cvCQRZ.txt /tmp/BAD_Mutations_HYPHY_Out_F5RFSJ.txt
cat /tmp/BAD_Mutations_HYHPY_In_cvCQRZ.txt
/media/jfay/data1/projects/barley/BAD_Mutations/test1/test_MSA.fasta
/media/jfay/data1/projects/barley/BAD_Mutations/test1/test.tree
/tmp/BAD_Mutations_HYPHY_Subs_372r5B.txt
BAD_Mutations_HYPHY_Subs_372r5B.txt is empty
The original substitution file is
cat subs.txt
24
87
Currently fetches all Angiosperm species. Users can edit the source files to add or remove species for fetching, but this is not clear or intuitive. Add option to specify (maybe a text file, one species per line) with species to fetch.
Use this command line for MSI installation:
pip install requests --user
When LRT_Pipeline.py is run outside the project folder, the script for unzipping downloaded CDS files is not found.
Having an issue that after compiling ncbi tblastx from source, and running in virtualenv with Config file specified to appropriate paths (for dependencies)
===2016-11-28 23:55:08,889 - Configuration_Handler===
DEBUG Setting variable BASE to /home/kate_crosby/scratch/BAD_Mutations_Data
===2016-11-28 23:55:08,889 - Configuration_Handler===
DEBUG Setting variable TARGET_SPECIES to Zmays
===2016-11-28 23:55:08,889 - Configuration_Handler===
DEBUG Setting variable EVAL_THRESHOLD to 0.05
===2016-11-28 23:55:08,889 - Configuration_Handler===
DEBUG Setting variable BASH to /bin/bash
===2016-11-28 23:55:08,889 - Configuration_Handler===
DEBUG Setting variable GZIP to /bin/gzip
===2016-11-28 23:55:08,889 - Configuration_Handler===
DEBUG Setting variable SUM to /usr/bin/sum
===2016-11-28 23:55:08,889 - Configuration_Handler===
DEBUG Setting variable TBLASTX to /home/kate_crosby/BAD_Mutations/ncbi-blast-2.5.0+-src/c++/ReleaseMT/bin
===2016-11-28 23:55:08,889 - Configuration_Handler===
DEBUG Setting variable PASTA to /home/kate_crosby/BAD_Mutations/bm_test/lib/python2.7/site-packages
===2016-11-28 23:55:08,889 - Configuration_Handler===
DEBUG Setting variable HYPHY to /home/kate_crosby/hyphy
===2016-11-28 23:55:08,889 - Configuration_Handler===
DEBUG Command line and config options merged. Values: {'sum_path': '/usr/bin/sum', 'target': 'Zmays', 'gzip_path': '/bin/gzip', 'loglevel': 'DEBUG', 'config': 'BAD_Mutations_Config.txt', 'evalue': '0.05', 'fetch_only': False, 'action': 'fetch', 'base': '/home/kate_crosby/scratch/BAD_Mutations_Data', 'user': 'BLANK', 'hyphy_path': '/home/kate_crosby/hyphy', 'convert_only': False, 'password': 'blank', 'tblastx_path': '/home/kate_crosby/BAD_Mutations/ncbi-blast-2.5.0+-src/c++/ReleaseMT/bin', 'bash_path': '/bin/bash', 'pasta_path': '/home/kate_crosby/BAD_Mutations/bm_test/lib/python2.7/site-packages'}
===2016-11-28 23:55:08,889 - LRT_Predict===
DEBUG Checking if BAD_Mutations_Config.txt exists.
===2016-11-28 23:55:08,902 - LRT_Predict===
DEBUG fetch subcommand was invoked
===2016-11-28 23:55:08,902 - LRT_Predict===
ERROR Some required executables were not found on your system: makeblastdb
Please install them to continue.
Yet, makeblastdb is in the same dir as tblastx - any help?
Hello, thanks for writing this great program! I have downloaded the newest code and working through the test data provided. Everything has worked well, until I got to the compile function. This is the command I am using:
./BAD_Mutations.py -v DEBUG compile -P Predictions_Dir -S Test_Data/CBF3.subs 2> Compile.log
And I've attached the resulting DEBUG report.
Compile.txt
I'm not sure what the source of this error is, but is -S command is not described in the manual, so I'm not sure if I'm using this correctly. The help page says it should be in the same format as the -s substitutions file, so I just used that file.
Thanks for looking into this! Hopefully it's a quick, easy fix.
Fetching has an issue with relative paths
Example:
./LRT_Predict.py fetch --user USER --base BASE/
does not work. Base is required to be fully written out.
./LRT_Predict.py fetch --user USER --base ~/PATH/BASE/
Generate a yes/no prediction using the logistic regression equations developed with a test dataset.
Print an informative error message (loglevel CRITICAL) when there are no BLAST hits that match the user's criteria. Currently PASTA dies uncleanly, which causes BAD_Mutations to die uncleanly.
Seems to be a new bug.
For gene HORVU0Hr1G000050.1
:
FASTA source file:
>HORVU0Hr1G000050.1
NNNATGCTGCTGCTGGAGATGGCTGGCGGTAGAAGGAACGCTGATCCAAACATGGGGTCC
TCAAGTCAGGCGTACTATCCGTCATGGGTGTACGACCAGCTGACTCGGGAAGAAGCGGGT
GAGATATCTCCAGTTGTTGCCGACATGCACGAGCTGGAGAAGAAGCTGTGTGTTGTCGGA
TTATGGTGTATTCAGATGAGGTCTCGTGATCGGCCAACGATGAGCGAGGTCATTGAGATT
CTGGAGGCTGGGGTTGATGACCTACAGATGCCTTCAAGGCCGTTTTTTTGTGACGAAGGA
CACATCCATGTGGAGGACTCTTACCATTTCACCTCCGAGCTGACGGCGGTCTCGGAGGAG
GGATTGAGTGTGGTGTCAGAGGAAGACGATGTGTGA
Translated peptide sequence:
>HORVU0Hr1G000050.1
XMLLLEMAGGRRNADPNMGSSSQAYYPSWVYDQLTREEAGEISPVVADMHELEKKLCVVGLWCIQMRSRDRPTMSEVIEILEAGVDDLQMPSRPFFCDEGHIHVEDSYHFTSELTAVSEEGLSVVSEEDDV*
Back-translated sequence after alignment:
>HORVU0Hr1G000050_1
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------ATGACAAAA---------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
---------------------------AGGCAAGGTTCACCTACAGCCTCAGCCACTCTG
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
---------------------TTTACTCTGCTAATCTTTCTACTCACAACC---------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------AAAGTTCTTGCAAGATTCATCCCA---------------CAT
GTTTGC---TCCCCTTCCTCATGT---------GGAGAAATTGAA---ATCGACTACCCT
TTTCGC---CTGAAGACCGATCCTGCTGGCTGTGGCGAACCA---------GACTTCGAA
CTTTCTTGCGAGAAC------AAC------AAGACCATATTAGAACTCCACTCAGGGAAG
TAT---------------------------CTTGTTAAGCGAATTTCCTATGATGTTCAG
---------------------AGACTCCGTGTCGTTGATGTTAACTTA------------
GCTAATGGTACCTGC---AGCCTC------CCGTACAAATCAGTGTCGGTTGATGAGTTT
---------------------------------------------ATGGATAATGATCAC
TACATA---TTAGACGCTACTACT---------TATACAAGTTTTATCAAGTGTTCGAGT
AATTTGAGTGAC------CAAGCTTATAGACTAGTTCCTTGTTTGAGTGGAAAT------
------------GGGACTAGTGTTTATGTTAGTTATGTC---------------------
------ACCTACATAATTTCTAGTCTCCAAGGA------TCTTGCTTG------TTCGTT
TCAAGAGTGCCTACG---------------------------------GTTTATCAGGCT
GTGCTGTTCCCCTCTTATGATAGTATCTTGCAATTGATG------CAAACAGGGTTTGAT
CTTGAATGGTCTGTTGGG------------------------------------------
---------TGCTGGTACTCATCA------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------TATAGCTATGATTGTTACAGATCA
GATTATTATCTTCCC---------------------------------------------
------------------------------------------------------------
------CCATGGTTTGCA---GCCTTCTCTGTGGTTTGGGAT---TTTTTGTCAGTATAT
CTT---------------------------------------------------------
GTT---------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------GGAAGATTCATCCTCGCTCCGATAGGGATATTTGGA
TTC------------CTCATTCACAAGTATATG---------------------------
------------------------------------------------------------
------------------------------ACAACA---AAGAAGGCGTCTGGCAACGAA
GAGATATTTCTG---------------------GTCAATCAGCAACACTTGATG---CCC
AAGAGGTACACTTTCTCTGACATTATTGCAATTACAAACAACTTC---AAAGATAAATTA
GGCCAAGGTGGATTTGGGAATGTATATAAAGGACAACTTCGTGAT------GCGTTTTTA
---GTTGCCGTTAAAATGCTT---GGCAATGCCAAA---TGCAATGACGAGGACTTCATT
AATGAAGTCTCCATAATTGGTAGAATTCATCATGTTAACATAGTACGGCTGGTGGGATTT
TGTTCCGAGGGATCTTACCGAGCTCTTGTATTCGAGTATATGGCTAATGGATCTCTT---
---GATAAGCTTTTATTTTCAAGA---------GAAACAGAACTTCTT---CTAGTTAGT
TGGGAGAAACTCCTTCAGATAGCTGTA------------GGCACAGCTCGAGGGATTGAG
CATCTTCATGGGGGATGCAGCGTATGCATTCTTCATTTGGATATTAAGCCTCACAATGTC
CTGCTAGATAGTAATTTCATCCCAAAAGTTTCAGATTTTGGCCTTGCAAAATTTTATCCC
AGTGAAAAGGATTTTGTATCCATTAGTACTACCAGAGGAACTATAGGGTACTTTGCTCCT
GAAATGATTTCAAGGAACCTCGGAGCTGTTTCT---TGCAAATCAGATGTTTATAGTTTT
GGGATGTTATTGTTGGAAATGGCTGGA------AGAAGAAGGAAGTCT---------AAT
TCAAAGGGAAATTGCTCAAGCGATGTATATTTCCCATCGTGGGTTTATGACCATCTCAGT
GAGGGAGGAGATTTAGAGCTT---------------GAGAATGTCACTGAAATTGAGGCT
GCAATAGCAAGAAAGCTGTGTATCGCTGGGCTGTGGTGCATCCAAAAGGCAGCATCAGAT
CGTCCAACCATGACCAAAGTGGTGGAAATGCTTGGAGCAAACATCGATGATCTGCAATTG
CCTTCCAATGCTCTATCATTT---CCCCAATCTATTTCC---------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------AAAGAA------
------------------CCTCAATCAGATTCCTCAACGGAA------------TCGCTA
ATACCCGAGACAGCG------------------GATCGAAGTTTG---------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------
In particular, read the first 9 non-gap bases after back-translating from alignment.
This may cause a downstream issue, which is that positions in the HYPHY report do not match original substitutions files.
Installing dependencies using get_dependencies.sh
script:
Command:
bash ./get_dependencies.sh
Error:
./get_dependencies.sh: line 18: $1: unbound variable
Some of the names are capitalized and some are not. If you incorrectly capitalize (or don't capitalize) the species you're using, the code doesn't like you.
Species list:
aegilops_tauschii
brassica_oleracea
cyanidioschyzon_merolae
hordeum_vulgare
leersia_perrieri
musa_acuminata
triticum_urartu
Acoerulea
Alyrata
Athaliana
Bdistachyon
BrapaFPsc
Bstricta
Cclementina
Cgrandiflora
Cpapaya
Crubella
Csativus
Csinensis
Egrandis
Esalsugineum
Fvesca
Gmax
Graimondii
Lusitatissimum
Mdomestica
Mesculenta
Mguttatus
Mtruncatula
Osativa
Ppersica
Ptrichocarpa
Pvirgatum
Pvulgaris
Rcommunis
Sbicolor
Sitalica
Slycopersicum
Stuberosum
Tcacao
Vvinifera
Zmays
Pasta dies when there is an internal stop (*) in protein sequence of one of the orthologues. It would be better to just screen these out before running pasta rather than having a failed alignment.
I am relatively new to the game, as such please pardon if there is an obvious answer to this I am missing. When attempting the Fetch command, I repeatedly run into the following error:
Traceback (most recent call last):
File "./BAD_Mutations.py", line 382, in
main()
File "./BAD_Mutations.py", line 348, in main
fetch(arguments_valid, loglevel)
File "./BAD_Mutations.py", line 110, in fetch
phy.get_xml_urls()
File "/home/michaelmckibben/BAD_Mutations/lrt_predict/Fetch/phytozome.py", line 168, in get_xml_urls
xml_tree = ElementTree.fromstring(xml)
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1311, in XML
parser.feed(text)
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1659, in feed
self._raiseerror(v)
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1523, in _raiseerror
raise err
xml.etree.ElementTree.ParseError: syntax error: line 1, column 49<
I have attempted reinstalling all dependencies/Bad_Mutations, along with looking for similar issues related to xml around the internet to no avail. As such I decided to reach out here. Please let me know if there is a known solution to this problem. Thank you.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.