eternagame / eternabrain Goto Github PK
View Code? Open in Web Editor NEWDeep learning to solve RNA design puzzles
Home Page: https://software.eternagame.org/
License: Other
Deep learning to solve RNA design puzzles
Home Page: https://software.eternagame.org/
License: Other
# this is for only 1 randomly generated sequence
[4 2 4 4 2 2 3 1 2]
[1 2 4 4 2 2 3 1 2] [2 2 4 4 2 2 3 1 2] [3 2 4 4 2 2 3 1 2] [4 2 4 4 2 2 3 1 2]
# calculate reward
[4 2 4 4 2 2 3 1 2]
[4 1 4 4 2 2 3 1 2] [4 2 4 4 2 2 3 1 2] [4 3 4 4 2 2 3 1 2] [4 4 4 4 2 2 3 1 2]
# calculate reward
[4 2 4 4 2 2 3 1 2]
[4 2 4 4 2 2 3 1 2] [4 2 4 4 2 2 3 1 2] [4 2 4 4 2 2 3 1 2] [4 2 4 4 2 2 3 1 2]
# calculate reward
[4 2 4 4 2 2 3 1 2]
[4 4 4 4 2 2 3 1 2] [4 4 4 4 2 2 3 1 2] [4 4 4 4 2 2 3 1 2] [4 4 4 4 2 2 3 1 2]
# calculate reward
Eventually, the entire list becomes 4s.
Sequence: GGGAUAACCU Structure: (((....))) Locks: oooxxxxooo
[0,0,1,0],[0,0,1,0],[0,0,1,0],[1,0,0,0],[1,0,0,0],[1,0,0,0],[1,0,0,0],[0,0,0,1],[0,0,0,1],[0,1,0,0]
[0,1,0,0],[0,1,0,0],[0,1,0,0],[1,0,0,0],[1,0,0,0],[1,0,0,0],[1,0,0,0],[0,0,1,0],[0,0,1,0],[0,0,1,0]
[9,0,0,0],[8,0,0,0],[7,0,0,0],[-1,0,0,0],[-1,0,0,0],[-1,0,0,0],[-1,0,0,0],[2,0,0,0],[1,0,0,0],[0,0,0,0]
[0,0,0,0],[0,0,0,0],[0,0,0,0],[1,0,0,0],[0,1,0,0],[1,0,0,0],[1,0,0,0],[0,0,0,0],[0,0,0,0],[0,0,0,0]
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=prediction,labels=y))
Logits and labels not of same size.
net = tflearn.input_data(shape=[None, 4716])
Shape size is incorrect for input matrix.
Add target energy to list of features
Hi Rohan,
Here are the getting started puzzles I was talking about: [6892343, 6892344, 6892345, 6892346, 6892347, 6892348, 7254756, 7254757, 7254758, 7254759, 7254760, 7254761]
I uploaded the problems file to the dropbox folder - github didn't like it's size when I tried to commit it directly. User data will also be there too. Hope everything went well with the SAT tests!
Can look at EteRNAbot rules for more strategies.
for i in pairmap: get index of last pair before -1, get index of paired base, and change bases at those indices to G-C
if num_unpaired_bases_in_a_row >= 3 then boost with G
for every paired base: if '(' has one '.' following it and its complementary ')' has one '.' preceding it, then G-G boost
for every paired base: if '(' has two '.'s following it and its complementary ')' has two '.'s preceding it, then UGUG boost
Resets base sequence to starting sequence each time rather than adding previous base changes to overall sequence
Use BEAR notation as additional structural feature
For comparisons, use BEAR instead of dot-bracket notation
[[sequence],[current structure],[target structure],[energy]]
structure_and_energy_at_current_time
Currently predicting only base
Options:
num_locations x num_bases
matrix (or just 85 x 4
)Eterna uses Vienna1 for calculating folding; EternaBrain is currently using Vienna2
After a certain number of epochs, the Keras model's loss becomes NaN and accuracy drops to 0.2698 (roughly 1/4)
Currently structure_and_energy_at_current_time
works only with 1 puzzle ID. Would reduce the number of pickles and the amount of time unpickling when training.
Instead of copying and pasting structures from Eterna website:
.txt
fileOccurring only on puzzles 6892343, 7254758
Encode location similar to bases
[0,0,0,0,0,0,0,0,0,1,0,0,0,0,0] # location 10; number of bases = 15
For multi-state puzzles, the features should look like this:
[sequence, current_structure, target_structure, current_energy,
target_energy, reporter, A, B, C] # A, B, C are other oligos
Some puzzles have bases which cannot be mutated - this feature might need to be encoded
I was wondering if the training, validation, and test datasets for EternaBrain were publicly available? In particular, I am looking for a dataset of RNA sequences and contact maps. I don't really need the specific player moves.
Thank you for your help, and sorry if this is off topic.
Checks if base prediction already matches existing base in that location.
Example:
base_sequence = [1,1,1,1]
Prediction = [1630.97,1630.88,1630.56,1630.30]
argmax(Prediction) == 0 # base A
location = 3
if base_sequence[location] == argmax(Prediction):
SmartPredictor() # takes 2nd highest probability and uses that as base
Came across the recent biorXiv posting of this work - awesome stuff! I should probably comment through the official Disqus comment thread but whatever.
It was interesting to note how you honed down the input data and implemented a couple of regularization strategies to improve the accuracy of the tandem-CNN model. I'm curious if you have messed around with different loss functions? Though the probability vectors at hand that are being predicted are short, some probability theory can still be applied - I think. See this for some insight on choosing a loss function. If you know, or can reason out, the distribution of the noise, a tailored loss function may improve the accuracy.
Some other off the shelf loss functions: https://www.tensorflow.org/api_docs/python/tf/losses.
Best of luck!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.