Code Monkey home page Code Monkey logo

chopstitch's People

Contributors

hamzakhanvit avatar mohamadi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

chopstitch's Issues

Exons shorter than k

I've observed some cases where the predicted exons for a transcript include ones whose length is shorter than k. Is this expected, and what is their significance? Also, ~10 bases are not included in the predicted exons, missing from between exons 3 and 4. Is this from the leniency factor?

>gi|239619018|gb|FJ830670.1|_1_53
ATGTTCACCTTGAAGAAATCCCTGTTGCTCCTTTTCTTTCTTGGGACCATCTC
>gi|239619018|gb|FJ830670.1|_54_73
ATTATCTCTCTGTGAGCAAG
>gi|239619018|gb|FJ830670.1|_74_283
AGAGAGATGCCGATGGAGATGAAGGGGAAGTTGAAGAAGTAAAAAGAGGTTTCCTGGATATCATCAAGGATACGGGGAAGGAATTTGCTGTGAAAATTTTGAATAATTTAAAATGTAAATTGGCTGGAGGATGTCCACCCTGAATCAGAGGTCATCTCATGTGGAATATCACTTAGCTAAATCTGTAATGTCTTATTAAAAAATAAAAAT
>gi|239619018|gb|FJ830670.1|_294_321
AGAAAAAAAAAAAAAAAAAAAAAAAAAA

This example is from GenBank. Note that it is 5' incomplete because the PCR primers used by the authors were not designed to capture the 5' UTR; that is, the sequence consists of the CDS and 3' UTR with polyA tail. I built my Bloom filter at k=32.
This is a bullfrog sequence, and alignment to the draft genome indicates that there should be only one exon-exon junction, after the 74th base of the sequence.

Option to output transcript to gene mapping?

Would it be possible to have a new option in ChopStitch to print transcript to gene mapping?

That is, each splice graph would be assigned a gene id and the tab-delimited file would list the mapping like so:

T1<tab>G1
T2<tab>G1
T3<tab>G1
T4<tab>G2
T5<tab>G2
...

transcripts in fasta

How do I reconstruct a transcript fasta for all possible subgraphs? I assume you would need the DOT file, but I am unsure how to proceed from there on.

Error running FindSubcomponents.py

Hi, I got the following error when running the FindSubcomponents script:

$ python FindSubcomponents.py -g ./test.dot -m
Warning: syntax error in line 2 near ''
Traceback (most recent call last):
  File "/home/kmnip/programs/chopstitch/ChopStitch/FindSubcomponents.py", line 120, in <module>
    main()
  File "/home/kmnip/programs/chopstitch/ChopStitch/FindSubcomponents.py", line 108, in main
    cc_subgraph = obj.findcc()
  File "/home/kmnip/programs/chopstitch/ChopStitch/FindSubcomponents.py", line 24, in findcc
    G = nx.drawing.nx_agraph.read_dot(self.dotfile)
  File "/home/kmnip/lib/python2.7/site-packages/networkx/drawing/nx_agraph.py", line 197, in read_dot
    A=pygraphviz.AGraph(file=path)
  File "/home/kmnip/lib/python2.7/site-packages/pygraphviz/agraph.py", line 158, in __init__
    self.read(filename)
  File "/home/kmnip/lib/python2.7/site-packages/pygraphviz/agraph.py", line 1228, in read
    raise DotError("Invalid Input")
pygraphviz.agraph.DotError: Invalid Input

Here is my test.dot file to replicate the error:

digraph graphname {
"ENST00000474716.5_364_482" -> "ENST00000426585_5_444_4978_OR_ENST00000340849_8_841_1085_OR_ENST00000646055_1_90_562_OR_ENST00000647790_1_3969_4093_OR_ENST00000635739_1_1_104_OR_ENST00000565723_1_1_77_OR_ENST00000632858_1_228_493_OR_ENST00000527747_5_261_454_OR_ENST00000540522_6_1_183_OR_ENST00000526811_4_1_103_OR_ENST00000554758_1_46_184_OR_ENST00000553138_1_1693_1980_OR_ENST00000368644_5_1058_3825_OR_ENST00000589657_1_704_801_OR_ENST00000642640_1_1_2024_OR_ENST00000497501_5_201_270_OR_ENST00000644470_1_8_213_OR_ENST00000597403_5_30_129_OR_ENST00000641401_1_158_301_OR_ENST00000610928_4_37_160_OR_ENST00000532158_5_51_124_OR_ENST00000527509_6_36_143_OR_ENST00000646244_1_1_102_OR_ENST00000463593_6_1_136_OR_ENST00000416079_2_83_612_OR_ENST00000636351_1_1091_1264_OR_ENST00000488886_1_43_135_OR_ENST00000504366_5_1_1791_OR_ENST00000456398_5_830_1233_OR_ENST00000360019_8_1484_1839_OR_ENST00000588728_5_626_1028_OR_ENST00000616283_4_1_291_OR_ENST00000636228_1_1071_1222_OR_ENST00000544401_2_425_486_OR_ENST00000366159_8_585_1337_OR_ENST00000601938_5_1_104_OR_ENST00000487809_1_573_760_OR_ENST00000622059_4_1439_1622_OR_ENST00000474246_2_1_73_OR_ENST00000457029_3_544_600_OR_ENST00000467705_6_1_57_OR_ENST00000645756_1_1589_1976_OR_ENST00000409351_5_1_285_OR_ENST00000584202_1_49_97_OR_ENST00000629956_2_634_787_OR_ENST00000635958_1_1604_1835_OR_ENST00000637050_1_1586_1682_OR_ENST00000647112_1_1_143_OR_ENST00000360160_8_1258_1322_OR_ENST00000411915_1_421_537_OR_ENST00000355178_8_514_1436_OR_ENST00000644257_1_1_65_OR_ENST00000590940_5_280_760_OR_ENST00000495154_1_512_579_OR_ENST00000565140_5_1105_1223_OR_ENST00000524046_1_1_69_OR_ENST00000650213_1_103_1090_OR_ENST00000398731_3_299_720_OR_ENST00000483607_1_54_606_OR_ENST00000566203_6_505_913_OR_ENST00000547760_1_459_747_OR_ENST00000536355_5_515_569_OR_ENST00000473661_5_1_808_OR_ENST00000574594_5_292_907_OR_ENST00000475446_5_347_555_OR_ENST00000391555_1_2412_2523_OR_ENST00000495523_1_77_704_OR_ENST00000491047_5_458_569_OR_ENST00000646283_1_1_66_OR_ENST00000644701_1_1_64_OR_ENST00000572403_5_587_646_OR_ENST00000548240_1_399_1563_OR_ENST00000438121_1_331_997_OR_ENST00000484707_1_212_543_OR_ENST00000580061_5_1_75_OR_ENST00000577200_5_1_71_OR_ENST00000632565_1_1_168_OR_ENST00000635540_1_4220_5654_OR_ENST00000594273_5_485_615_OR_ENST00000548942_1_262_544_OR_ENST00000615242_1_344_443_OR_ENST00000469319_1_54_606_OR_ENST00000549748_2_28_99_OR_ENST00000563078_1_393_579_OR_ENST00000577721_6_1_61_OR_ENST00000459758_5_403_553_OR_ENST00000631857_1_1_292_OR_ENST00000498561_1_54_606_OR_ENST00000506274_1_57_1922_OR_ENST00000510219_2_1_66_OR_ENST00000464010_5_939_1399_OR_ENST00000620086_4_1_113_OR_ENST00000489929_1_54_606_OR_ENST00000467136_5_5773_7985_OR_ENST00000561689_6_1610_1955_OR_ENST00000520052_1_1_138_OR_ENST00000566682_2_1_161_OR_ENST00000426827_5_638_754_OR_ENST00000592119_2_493_563_OR_ENST00000564047_2_767_2891_OR_ENST00000308941_9_1_260_OR_ENST00000441245_5_224_627_OR_ENST00000469615_1_1_97_OR_ENST00000532903_1_643_770_OR_ENST00000519466_1_221_573_OR_ENST00000560878_1_1_305_OR_ENST00000645027_1_125_2351_OR_ENST00000489817_1_1_283_OR_ENST00000593214_5_25_142_OR_ENST00000547738_5_789_2017_OR_ENST00000303155_9_1073_3239_OR_ENST00000458438_2_784_939_OR_ENST00000479642_1_58_574_OR_ENST00000505128_5_19_104_OR_ENST00000594629_1_435_527_OR_ENST00000496650_1_93_473_OR_ENST00000466880_5_45_145_OR_ENST00000560318_1_1_115_OR_ENST00000605096_1_297_629_OR_ENST00000569692_1_3_141_OR_ENST00000369017_5_874_1222_OR_ENST00000432094_6_1_124_OR_ENST00000593004_5_375_570_OR_ENST00000569430_6_3525_3865_OR_ENST00000356156_7_1_389_OR_ENST00000358510_6_1_194_OR_ENST00000529201_1_90_559_OR_ENST00000576991_1_69_540_OR_ENST00000606567_5_1_302_OR_ENST00000422738_5_445_543_OR_ENST00000371004_6_40_125_OR_ENST00000632426_1_292_907_OR_ENST00000641443_1_753_818_OR_ENST00000602110_5_471_543_OR_ENST00000643426_1_1_452_OR_ENST00000575397_5_46_167_OR_ENST00000617111_2_1_118_OR_ENST00000357806_11_1124_1467_OR_ENST00000519807_5_590_1824_OR_ENST00000371356_6_26_133_OR_ENST00000631641_2_907_1198_OR_ENST00000573896_1_197_552_OR_ENST00000642459_1_1_73_OR_ENST00000495565_5_1_123_OR_ENST00000583548_1_508_560_OR_ENST00000442418_5_650_746_OR_ENST00000473206_5_464_819_OR_ENST00000645661_1_458_569_OR_ENST00000512808_2_1_303_OR_ENST00000442361_1_711_815_OR_ENST00000587351_1_204_399_OR_ENST00000583007_2_1_180_OR_ENST00000355477_10_1053_1171_OR_ENST00000568493_1_1_300_OR_ENST00000518578_5_1_385_OR_ENST00000527158_2_98_2159_OR_ENST00000488801_5_263_566_OR_ENST00000518680_1_1_374_OR_ENST00000359984_12_1512_1874_OR_ENST00000533048_5_1_121_OR_ENST00000554076_5_296_583_OR_ENST00000575365_1_1_55_OR_ENST00000524996_1_192_581_OR_ENST00000632902_1_43_156_OR_ENST00000591203_5_234_580_OR_ENST00000637871_1_1422_1604_OR_ENST00000448166_6_7_92_OR_ENST00000468258_1_34_148_OR_ENST00000636853_1_2210_2287_OR_ENST00000299540_6_1_1055_OR_ENST00000506169_5_32_117_OR_ENST00000507941_1_96_1352_OR_ENST00000474754_1_1_66_OR_ENST00000460405_5_1_123_OR_ENST00000443070_5_641_2173_OR_ENST00000470442_5_239_703_OR_ENST00000610814_3_834_883_OR_ENST00000513619_5_705_961_OR_ENST00000349438_8_1112_1233_OR_ENST00000489133_6_1_147_OR_ENST00000458062_3_2236_2346_OR_ENST00000474837_5_410_553_OR_ENST00000494519_1_83_535_OR_ENST00000423810_6_1319_1635_OR_ENST00000603691_1_47_132_OR_ENST00000560964_1_1_167_OR_ENST00000413584_1_1_79_OR_ENST00000458301_5_1_90_OR_ENST00000497073_1_1_236_OR_ENST00000644171_1_1_1122_OR_ENST00000637974_1_2521_3028_OR_ENST00000620307_4_172_2284_OR_ENST00000383827_5_1_997_OR_ENST00000594385_1_1_120_OR_ENST00000635749_1_2067_3712_OR_ENST00000562559_5_515_1325_OR_ENST00000544837_5_47_132_OR_ENST00000371007_6_6_91_OR_ENST00000569030_5_989_1107_OR_ENST00000578778_5_486_580_OR_ENST00000643806_1_1_51_OR_ENST00000480616_1_90_562_OR_ENST00000649760_1_1_117_OR_ENST00000615663_2_430_570_OR_ENST00000644527_1_1_59_OR_ENST00000470361_6_599_720_OR_ENST00000626950_1_344_443_OR_ENST00000397683_5_1310_1672_OR_ENST00000504901_2_1_49_OR_ENST00000336164_8_2519_3256_OR_ENST00000645773_1_1_82_OR_ENST00000587452_5_333_573_OR_ENST00000529068_5_46_153_OR_ENST00000646448_1_1_60_OR_ENST00000487960_1_1_199_OR_ENST00000334342_5_1996_2378_OR_ENST00000434211_1_1_434_OR_ENST00000506595_5_1_93_OR_ENST00000463932_1_385_682_OR_ENST00000392266_7_848_1281_OR_ENST00000333496_14_1257_1606_OR_ENST00000518975_1_448_652_OR_ENST00000643178_1_532_667_OR_ENST00000336079_7_1055_1119_OR_ENST00000579280_1_357_458_OR_ENST00000645886_1_1_88_OR_ENST00000541875_1_471_1635_OR_ENST00000632288_1_48_162_OR_ENST00000646093_1_1_209_OR_ENST00000635861_1_1284_1402_OR_ENST00000540918_2_2766_4115_OR_ENST00000587867_1_570_721_OR_ENST00000521138_1_233_281_OR_ENST00000578194_5_35_282_OR_ENST00000643030_1_1_59_OR_ENST00000471407_1_1_167_OR_ENST00000488828_1_1_574_OR_ENST00000557034_1_140_563_OR_ENST00000647918_1_2180_4399_OR_ENST00000645155_1_1_55_OR_ENST00000566141_5_80_645_OR_ENST00000620945_1_1450_1561_OR_ENST00000576935_5_121_263_OR_ENST00000468371_5_1_2341_OR_ENST00000589676_5_451_568_OR_ENST00000493690_1_1_105_OR_ENST00000511184_5_1003_1764_OR_ENST00000642084_1_272_487_OR_ENST00000615736_4_1449_1549_OR_ENST00000533361_1_46_160_OR_ENST00000550522_5_1_204_OR_ENST00000637923_2_25_144_OR_ENST00000582095_5_441_564_OR_ENST00000647221_1_1_67_OR_ENST00000564113_5_1_168_OR_ENST00000420375_5_793_862_OR_ENST00000349320_7_1568_1808_OR_ENST00000504078_1_419_538_OR_ENST00000515061_1_607_660_OR_ENST00000493660_6_1_69_OR_ENST00000504242_1_82_674_OR_ENST00000372592_7_1_773_OR_ENST00000632755_1_34_136_OR_ENST00000483919_5_367_1140_OR_ENST00000566057_5_811_929_OR_ENST00000454035_5_50_236_OR_ENST00000635518_1_1_86_OR_ENST00000628696_2_834_883_OR_ENST00000574297_1_388_567_OR_ENST00000594245_1_207_1084_OR_ENST00000361948_8_1736_3046_OR_ENST00000644909_1_1_66_OR_ENST00000334384_3_1_228_OR_ENST00000588525_1_1_74_OR_ENST00000586649_2_207_331_OR_ENST00000313566_10_755_837_OR_ENST00000395653_9_1085_1428_OR_ENST00000580654_6_1_61_OR_ENST00000629741_2_834_883_OR_ENST00000642576_1_1_78_OR_ENST00000642439_1_48_343_OR_ENST00000600821_5_494_566_OR_ENST00000567237_1_1_107_OR_ENST00000474880_5_407_707_OR_ENST00000400593_6_1128_1428_OR_ENST00000475180_5_22_129_OR_ENST00000476494_1_229_578_OR_ENST00000445693_5_1003_1283_OR_ENST00000463788_1_45_167_OR_ENST00000481705_1_506_630_OR_ENST00000639066_1_558_1178_OR_ENST00000318203_9_1_266_OR_ENST00000532110_1_1_190_OR_ENST00000338146_6_7268_10777_OR_ENST00000292144_8_806_1522_OR_ENST00000599261_5_472_571_OR_ENST00000534927_5_1_129_OR_ENST00000553632_1_1_346_OR_ENST00000646871_1_1_147_OR_ENST00000549874_1_394_1505_OR_ENST00000240095_10_1469_1706_OR_ENST00000568195_5_1149_1465_OR_ENST00000358361_7_1436_1979_OR_ENST00000419130_5_45_143_OR_ENST00000565316_6_1146_1316_OR_ENST00000526157_5_559_632_OR_ENST00000419552_5_1_187_OR_ENST00000497292_1_466_822_OR_ENST00000474111_1_54_606_OR_ENST00000377031_7_513_1105_OR_ENST00000557192_1_536_594_OR_ENST00000566371_6_642_738_OR_ENST00000504338_5_46_128_OR_ENST00000514904_5_1_887_OR_ENST00000638429_1_3109_5630_OR_ENST00000633431_1_1_113_OR_ENST00000447024_6_1_69_OR_ENST00000377808_9_1_69_OR_ENST00000646454_1_1_706_OR_ENST00000417504_5_1_118_OR_ENST00000481494_2_1_107_OR_ENST00000474029_5_1_202_OR_ENST00000403197_5_583_1092_OR_ENST00000605331_3_1_68_OR_ENST00000581409_5_440_566_OR_ENST00000583912_1_257_559_OR_ENST00000563294_1_135_547_OR_ENST00000465580_3_38_138_OR_ENST00000425201_5_2010_2113_OR_ENST00000645152_1_1_55_OR_ENST00000602450_1_40_227_OR_ENST00000514798_1_1580_2893_OR_ENST00000492359_6_437_539_OR_ENST00000380338_8_5222_6770_OR_ENST00000564899_1_1_467_OR_ENST00000479069_1_325_755_OR_ENST00000495967_5_466_552_OR_ENST00000402247_5_434_663_OR_ENST00000636907_1_1348_1677_OR_ENST00000449697_7_396_813_OR_ENST00000636172_1_1416_1534_OR_ENST00000565354_5_510_757_OR_ENST00000601467_1_1_395_OR_ENST00000523987_4_324_1795_OR_ENST00000488361_1_1_48_OR_ENST00000484325_1_34_190_OR_ENST00000645208_1_1_233_OR_ENST00000478400_3_452_572_OR_ENST00000518646_5_497_556_OR_ENST00000419973_1_1_93_OR_ENST00000495658_2_520_920_OR_ENST00000533227_5_3473_5014_OR_ENST00000648153_1_1_95_OR_ENST00000324461_9_1_119_OR_ENST00000536545_5_1061_1786_OR_ENST00000578044_6_1_623_OR_ENST00000591371_5_385_502_OR_ENST00000619286_4_1114_1445_OR_ENST00000638824_1_1_105_OR_ENST00000585283_1_157_277_OR_ENST00000468308_1_600_761_OR_ENST00000366206_2_1_402_OR_ENST00000314282_7_1_1252_OR_ENST00000580118_2_33_150_OR_ENST00000589953_1_301_662_OR_ENST00000439719_5_464_767_OR_ENST00000487364_1_403_498_OR_ENST00000627237_2_465_595_OR_ENST00000482280_1_751_942_OR_ENST00000562447_5_21_161_OR_ENST00000631495_1_27_144_OR_ENST00000533217_1_1_124_OR_ENST00000342055_9_1243_1555_OR_ENST00000596740_5_1_144_OR_ENST00000463032_5_1075_1485_OR_ENST00000486855_1_254_548_OR_ENST00000495497_1_93_847_OR_ENST00000498613_1_1_120_OR_ENST00000437813_7_512_632_OR_ENST00000592362_5_427_569_OR_ENST00000422552_5_1_87_OR_ENST00000648012_1_1202_1464_OR_ENST00000591074_1_334_551_OR_ENST00000493340_5_345_701_OR_ENST00000635234_1_30_115_OR_ENST00000546370_5_322_584_OR_ENST00000639051_1_812_934_OR_ENST00000636839_1_1693_2016_OR_ENST00000417065_5_515_563_OR_ENST00000503424_1_723_840_OR_ENST00000591498_1_1_225_OR_ENST00000647347_1_1_54_OR_ENST00000552696_1_27_121_OR_ENST00000502824_1_924_1610_OR_ENST00000427628_5_248_321_OR_ENST00000620308_1_3411_4979_OR_ENST00000631206_2_323_476_OR_ENST00000642721_1_959_1031_OR_ENST00000574594_5_44_161_OR_ENST00000636147_1_1547_3830_OR_ENST00000563874_5_2725_3079_OR_ENST00000557210_5_1_496_OR_ENST00000562651_5_1_121_OR_ENST00000467678_5_18_65_OR_ENST00000481231_6_1_51_OR_ENST00000577982_1_1_118_OR_ENST00000475165_5_1_170_OR_ENST00000597593_5_525_581_OR_ENST00000577836_1_437_550_OR_ENST00000472612_5_316_941_OR_ENST00000611387_1_1_116_OR_ENST00000546406_1_1_204_OR_ENST00000560717_5_851_1109_OR_ENST00000463204_5_1_122_OR_ENST00000596096_1_208_564_OR_ENST00000417538_6_1015_1331_OR_ENST00000392883_6_515_1177_OR_ENST00000590245_5_404_551_OR_ENST00000614034_4_1904_2599_OR_ENST00000262432_12_1605_2180_OR_ENST00000434242_2_30_93_OR_ENST00000465726_1_53_433_OR_ENST00000593268_5_481_550_OR_ENST00000597410_1_40_160_OR_ENST00000642651_1_355_569_OR_ENST00000625245_2_834_883_OR_ENST00000568452_6_1428_1614_OR_ENST00000604846_1_115_211_OR_ENST00000428382_2_31_144_OR_ENST00000559947_1_204_842_OR_ENST00000584519_5_447_675_OR_ENST00000401977_6_1_108_OR_ENST00000637107_1_1629_1944_OR_ENST00000513443_5_667_976_OR_ENST00000528192_5_433_559_OR_ENST00000559095_1_127_425_OR_ENST00000497505_5_1431_2370_OR_ENST00000290208_11_1_532_OR_ENST00000461812_1_207_529_OR_ENST00000412438_5_49_140_OR_ENST00000454252_1_1_77_OR_ENST00000646502_1_1317_1868_OR_ENST00000639206_1_243_365_OR_ENST00000637551_2_963_1081_OR_ENST00000344304_3_1_79_OR_ENST00000558500_5_495_559_OR_ENST00000466977_1_215_449_OR_ENST00000491565_1_170_320_OR_ENST00000585208_5_807_985_OR_ENST00000474716_5_485_669_OR_ENST00000555959_1_1_154_OR_ENST00000519274_1_166_824_OR_ENST00000504113_5_526_592_OR_ENST00000644872_1_1_103_OR_ENST00000546428_2_304_618_OR_ENST00000294413_12_1355_1658_OR_ENST00000648671_1_1_66_OR_ENST00000564667_1_104_797_OR_ENST00000602397_5_1_197_OR_ENST00000591618_1_1_481_OR_ENST00000469528_5_403_464_OR_ENST00000471070_1_1_314_OR_ENST00000478349_7_1_55_OR_ENST00000247295_4_1_1241_OR_ENST00000442897_6_371_484_OR_ENST00000568422_6_1053_1166_OR_ENST00000372450_8_579_1189_OR_ENST00000346452_8_780_969_OR_ENST00000544810_5_3_202_OR_ENST00000487587_2_265_1848_OR_ENST00000506042_5_305_915_OR_ENST00000554082_1_239_407_OR_ENST00000475873_2_19_97_OR_ENST00000572974_1_228_493_OR_ENST00000534514_1_215_1140_OR_ENST00000546172_7_23_148_OR_ENST00000529150_1_1_147_OR_ENST00000645533_1_1143_1223_OR_ENST00000508325_5_1_113_OR_ENST00000632234_1_1_117_OR_ENST00000562759_1_1_406_OR_ENST00000628023_3_1019_1181_OR_ENST00000455274_5_1_66_OR_ENST00000649320_1_238_1374_OR_ENST00000463738_5_163_526_OR_ENST00000648538_1_1008_3263_OR_ENST00000632129_1_121_263_OR_ENST00000487588_5_240_480_OR_ENST00000476269_1_1_201_OR_ENST00000602046_1_411_535_OR_ENST00000577208_5_517_570_OR_ENST00000514069_6_34_126_OR_ENST00000591245_1_1_112_OR_ENST00000600648_1_313_420_OR_ENST00000525354_6_345_804_OR_ENST00000618977_4_1_189_OR_ENST00000340180_5_517_1436_OR_ENST00000647515_1_1_79_OR_ENST00000477989_1_84_442_OR_ENST00000643318_1_1_881_OR_ENST00000264640_9_1_57_OR_ENST00000512818_5_1_302_OR_ENST00000555911_1_1_210_OR_ENST00000552878_5_471_1050_OR_ENST00000484834_5_1_4293_OR_ENST00000645454_1_538_976_OR_ENST00000431889_6_1_65_OR_ENST00000593264_5_1_90_OR_ENST00000454842_2_1_341_OR_ENST00000645053_1_1_99_OR_ENST00000648273_1_44_185_OR_ENST00000467514_1_1_744_OR_ENST00000601138_5_463_559_OR_ENST00000220058_8_990_2732_OR_ENST00000612974_3_834_883_OR_ENST00000618223_4_258_506_OR_ENST00000452456_1_360_520_OR_ENST00000520734_5_1_138_OR_ENST00000397532_7_1_484_OR_ENST00000593687_5_260_488_OR_ENST00000530212_5_219_826_OR_ENST00000557629_5_45_177_OR_ENST00000635868_1_32_124_OR_ENST00000648904_1_207_1152_OR_ENST00000570445_5_27_144_OR_ENST00000441393_1_1_117_OR_ENST00000375140_7_1_279_OR_ENST00000479784_1_177_727_OR_ENST00000507904_5_37_122_OR_ENST00000532552_2_538_639_OR_ENST00000434453_1_546_2227_OR_ENST00000526842_5_34_136_OR_ENST00000455587_3_1_94_OR_ENST00000591485_5_412_662_OR_ENST00000518487_1_1_62_OR_ENST00000488243_1_61_124_OR_ENST00000567963_6_1249_1450_OR_ENST00000645600_1_540_603_OR_ENST00000646816_1_433_559_OR_ENST00000548404_6_1_105_OR_ENST00000468929_5_702_1166_OR_ENST00000632426_1_44_161_OR_ENST00000595703_5_283_566_OR_ENST00000297933_10_97_213_OR_ENST00000531383_5_1116_2054_OR_ENST00000561770_1_39_154_OR_ENST00000436756_5_219_622_OR_ENST00000587681_5_349_535_OR_ENST00000478572_1_1_1443_OR_ENST00000582909_1_1_75_OR_ENST00000567795_3_586_2876_OR_ENST00000454452_6_989_1301_OR_ENST00000564663_1_242_661_OR_ENST00000633394_1_11_286_OR_ENST00000496246_5_36_402_OR_ENST00000642564_1_1353_1460_OR_ENST00000601760_1_130_227_OR_ENST00000317977_10_1_511_OR_ENST00000534197_5_275_387_OR_ENST00000613539_1_1_118_OR_ENST00000374394_7_35_155_OR_ENST00000596237_5_267_679_OR_ENST00000631023_2_1229_1584_OR_ENST00000555425_5_1_174_OR_ENST00000350274_9_2426_3163_OR_ENST00000546875_1_20_548_OR_ENST00000598012_1_282_562_OR_ENST00000561945_2_1_130_OR_ENST00000366304_3_150_878_OR_ENST00000400122_8_1_364_OR_ENST00000458499_5_1_91_OR_ENST00000414501_2_1_341_OR_ENST00000553351_1_429_561_OR_ENST00000524883_1_11_286_OR_ENST00000521214_5_1_237_OR_ENST00000541777_6_1_1321_OR_ENST00000581655_1_362_547_OR_ENST00000491988_1_213_805_OR_ENST00000487013_1_54_606_OR_ENST00000646401_1_1_73_OR_ENST00000356442_4_1_312_OR_ENST00000598024_5_474_569_OR_ENST00000372697_7_387_1026_OR_ENST00000518275_1_344_443_OR_ENST00000304932_5_214_1049_OR_ENST00000647508_1_1_597_OR_ENST00000446170_1_35_129_OR_ENST00000466960_5_1_199_OR_ENST00000415318_2_1_279_OR_ENST00000455814_2_272_448_OR_ENST00000644970_1_24_538_OR_ENST00000568690_1_1_107_OR_ENST00000621489_2_742_1304_OR_ENST00000495825_6_375_556_OR_ENST00000645258_1_1_79_OR_ENST00000600335_5_350_558_OR_ENST00000643813_1_132_252_OR_ENST00000533026_6_699_2061_OR_ENST00000646856_1_1_73_OR_ENST00000383376_9_1246_1573_OR_ENST00000617261_4_51_124_OR_ENST00000520366_1_1_60_OR_ENST00000505634_5_1_71_OR_ENST00000460315_5_537_721_OR_ENST00000435050_1_78_4161_OR_ENST00000498128_1_417_557_OR_ENST00000614349_4_400_514_OR_ENST00000480193_5_496_561_OR_ENST00000488455_5_58_177_OR_ENST00000553632_1_347_473_OR_ENST00000493254_1_248_3032_OR_ENST00000397755_2_314_3244_OR_ENST00000643004_1_1_246_OR_ENST00000561733_5_258_835_OR_ENST00000378581_7_319_918_OR_ENST00000495566_1_246_557_OR_ENST00000467750_5_49_164_OR_ENST00000637578_1_1302_1575_OR_ENST00000541178_1_1688_1975_OR_ENST00000570727_5_48_162_OR_ENST00000550103_2_160_645_OR_ENST00000425645_2_510_602_OR_ENST00000557175_5_1356_1746_OR_ENST00000507518_5_1_62_OR_ENST00000619341_1_138_494_OR_ENST00000649581_1_652_2854_OR_ENST00000477332_5_354_846_OR_ENST00000476438_1_1_290_OR_ENST00000578112_6_1_75_OR_ENST00000551757_1_161_1149_OR_ENST00000474290_1_416_546_OR_ENST00000568076_6_1626_1966_OR_ENST00000445187_5_1_124_OR_ENST00000628962_1_154_205_OR_ENST00000513116_1_1247_1781_OR_ENST00000619341_1_1_133_OR_ENST00000479793_5_1_120_OR_ENST00000647294_1_1_71_OR_ENST00000594133_3_1_167_OR_ENST00000438970_6_350_489_OR_ENST00000534026_5_946_1118_OR_ENST00000492077_5_231_729_OR_ENST00000505785_5_538_768_OR_ENST00000370353_3_3904_5975_OR_ENST00000645104_1_1_59_OR_ENST00000642378_1_581_1274_OR_ENST00000439578_5_45_163_OR_ENST00000519611_5_43_156_OR_ENST00000472495_5_517_701_OR_ENST00000348513_11_1_1091_OR_ENST00000505737_2_344_443_OR_ENST00000594180_1_77_451_OR_ENST00000621220_4_239_360_OR_ENST00000432564_2_3574_4766_OR_ENST00000506900_1_1_121_OR_ENST00000457428_6_469_562_OR_ENST00000637699_1_1108_1226_OR_ENST00000469334_6_1_51_OR_ENST00000637184_1_1423_1751_OR_ENST00000394826_8_45_161_OR_ENST00000572809_1_1_212_OR_ENST00000597013_5_389_575_OR_ENST00000571298_5_545_622_OR_ENST00000357542_8_1073_1385_OR_ENST00000628247_2_463_559_OR_ENST00000358028_8_986_1755_OR_ENST00000394915_7_1348_1472_OR_ENST00000636977_1_2689_2967_OR_ENST00000646242_1_1_82_OR_ENST00000553874_5_413_545_OR_ENST00000424614_1_329_662_OR_ENST00000625917_2_260_488_OR_ENST00000465896_5_435_536_OR_ENST00000647000_1_626_1028_OR_ENST00000353176_9_1_76_OR_ENST00000479770_1_1657_2243_OR_ENST00000484766_1_50_165_OR_ENST00000597969_5_394_640_OR_ENST00000507659_5_45_120_OR_ENST00000357857_14_1364_1718_OR_ENST00000466715_5_1_121";

}

Very large set of transcripts grouped in one splice graph

Hi,

We are testing Chopstitch with known transcripts.

We are using Homo_sapiens.GRCh38.ensembl_94.cdna with transcripts > 200 bp and redundancy removed at 98% (cd-hit-est). The reference genome is NA12878 - reads.
This is the output that we have after FindSubcomponents.py
geneMap.tsv.gz

There is a huge cluster of 7429 transcripts grouped together. I attach the fasta file with those sequences:
largeCluster.fasta.gz

This looks a bit odd. Usually with every run of Chopstitch there is a very big cluster that groups together transcripts with very dissimilar sequences
Any clues where this can come from?
Thanks,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.