mas,mslehre

Sequence simulation

Simulate S sequences of length L over the alphabet {A,C,G,T} and store them in a file.

Possible distributions:

iid
master + mutations

The simulation should take place in a separate program with its own main-function.
The sequences are stored in multiple fasta file-format.

sAt1 final opponent music

compose music that introduces the end fight

coding style in State.cc

white space at line end:

MAS/src/alignment/State.cc

Line 46 in f441ea5

}

also: indentation in constructors

sHt2 linear regression with random parameters

An algorithm that computes new coordinates for nodes on their horizontal axis. Design a criterion for the placement such that the game position is clear, e.g. long near-horizontal edges are avoided. The criterion could be force-directed, e.g. chosen edges pull their connected nodes together and nodes push each other away. A possible solution is using Graphviz, possibly as a library.

main Parts:

research about existing tools/libraries or own solutions
decision to use a reseached method/library is finished
succeeded implementation of this idea in a raw format (i.E. with dot. described as under the checkboxes)
If not done already, checked consistency in terms of doxygen and coding conventions

suggestion:

dot: conversion of graph and output coordinates

sAt3 final opponent scene

the arena for the end fight

sP UnitTest graph construction_edges

sHt1 learn about PyTorch learning framework/library

sDt4 SimplifyGraphRenderer

Rewrite graph renderer and simplify its usage.
3 methods:

update(float delta) where delta is a time value (in seconds) with the elapsed time since the last update call
handleInput(sf::Event event) that is called inside the event polling loop for each polled event
render(sf::renderWindow& window) that renders the graph in it's current state

Alignment state + actions

Suppose the set E of edges in the graph defined in #3 is known (which corresponds to the set of possible k-mer alignment in subsequent sequences).

Let a state S denote the currently selected and consistent subset of chosen edges. Write a class to encapsulate such a state. Add a "possibleActions" method, that outputs a list of valid moves given the current state.

sDt1 fix segmentation faults

sJt5 add softmax

Add softmax and argmax to options for learnedpolicy.

sN UnitTest consistency check

step: 2

sDt4 fix graph view (scaling for big graphs)

The scaling for the graph when it's drawn should depend on the number of sequences (y dimension) and number of k-mers per sequence (x dimension).

You can use sf::View for this purpose: https://www.sfml-dev.org/tutorials/2.5/graphics-view.php

sAt2 score connected components based on their size

sLt4 calculate and display current score

Add the score into Graphrendere.cc and update him when you set a new edge.

sF animated transition between placements

Given two placements (coordinates) of all nodes in a graph, make a smooth animation (with acceleration and slowing down) to go from the current one to the new one. This will e.g. be used when the player adds an edge to her edge choice, then a new placement (sE) is chosen and the transition should be animated rather than instantaneous.

step 4

sM UnitTest scoring

Write a test for the scoring function.

sBt1 Node property map

In addition to the graph, create "property maps" (inspired by boost: https://www.boost.org/doc/libs/1_55_0/libs/graph/doc/using_property_maps.html) that map node indices to certain properties (2D coordinates and sf::colors)

sJt3 monitoring loss

Compute the loss on a batch or on training data (set of pairs of x,y). The loss should be output from time to time during the optimization, so one can monitor the progress of the optimization.

Fix documentation

The documentation (https://mslehre.github.io/MAS/) still needs some work here and there. I will assign you to this issue, if your code is related.

Please keep doxygen comments in header files only.
Please do not "overdocument". Class- and public member/method descriptions are important for others as well as comments for function arguments if they are not self-explanatory. Typically you don't include private member variables or local variables in your doxygen documentation.
Test code for your features typically does not need doxygen comments.

CPU hogging

When the game is idle, it requires ~50% of CPU time on my laptop. Maybe reconfigure the wait in the main event loop?

random number generation

I just tested the sequence simulation code (at home under windows). It does produce the same sequence every time I run the program. Also the sequences are not mutated at all. I suspect that this is a seed initialization problem. Do you get different sequences for each execution of the program on your system? Do mutations work?

I researched a bit and found that std::random_device is deterministic under MINGW/Windows, but non-deterministic under linux, which is a ugly thing as we want code to be portable and work the same on every system.

As a solution I would try to not use std::random_device and set the seed using the current system time instead:

#include <chrono>

std::mt19937 rng(chrono::high_resolution_clock::now().time_since_epoch().count());

(and remove the line with the random device)

The issue with the mutations not working might be unrelated to this. Since you generate random doubles there, you should use std::uniform_real_distribution
(https://en.cppreference.com/w/cpp/numeric/random/uniform_real_distribution) for the random mutation probability, not the random device as you currently do.

sJt2 stochastic gradient descent

Write the code that iterates over batches of training data x and corresponding outputs y. In the example code mnist.cpp this is the for loop over the epochs and it calls optimizer.step().

sDt3 Add edges to graph

sDt5 Tweaks on GraphRenderer and Export of some functionalitys

Exclude functionalitys out of GraphRenderer:

-State
-Seperate Eventhandler
-implement DrawNode, especially render the window in terms of the current DrawNodes

sCt3 agent efficiency changes

Refactoring of Agent according to comments in sH branch pull request.
Efficiency changes to getEpisode, setting LearnedPolicy as default and changing LearnedPolicy accordingly.

sL GUI

Figure out how to poll mouse-events with SFML, create a simple GUI with buttons like "load game" or "close game" that perform an action when clicked.

Graph construction

Write a class that:

reads multiple fasta files
splits the sequences into k-mers
computes exact matches between k-mers of subsequent sequences
eventually builds a graph datastructure with all k-mers with at least one match as nodes and where an edge represents a exact match of 2 k-mers

More precisely:
Let for denote the number of k-mers with at least one match in string j.

Consider the graph such that and .

Visualization

Use the SFML-Library to write code that:

opens a window
renders a graph as described in #3 (make sure interfaces match)

You may want to start with an empty window, proceed with a single rectangle that optionally contains a string and then end up in plotting the whole k-mer graph.

This issue will define the main-function for the core program.

sAt1 find connected components

sC RL: simple agent

Create a class Agent with a member Policy. Implement a simple random policy, that makes valid moves (=adding an edge) until no more admissible edge remains.
The agent can utilize an instance of the "State" class already implemented.

__step 2 __

sJt1 loss and gradient

Define the sum-of-squares loss we use for regression:
Sum of squares of differences between predicted and samples values.
Implement the part that tells PyTorch to do backwards, i.e. to compute the gradient of the loss wrt to the parameters.

sLt2 - Gamemaster

We need a class Gamemaster. This class create a Graph, a State and a DrawNodes vector for the game.

Latex in issue test

sLt3 - choose parameter

Choose new parameters for the game.

sAt2 final opponent graphics

3d model of final monster

compiler warnings "comparison between signed and unsigned integer expressions"

MAS/src/visualization/GraphRenderer.cpp

Line 175 in d8bb90d

while(rects.size()!=i+1) {

Bitte die Warnungen von make visualization reparieren, z.B. mit cast.

sF _smooth animation of replacement_

Given two placements (coordinates) of all nodes in a graph, make a smooth animation (with acceleration and slowing down) to go from the current one to the new one. This will e.g. be used when the player adds an edge to her edge choice, then a new placement (sE) is chosen and the transition should be animated rather than instantaneous.

__step: __

sLt1finish Button class

Finish the Button class with a function handler.
"The button class could have a general "setFunction" method that assignes a method that is invoked on button click."

bug in edge consistency

The boundary case is wrong: no two selected edges may share a node. In other words: each equivalence class of characters aligned with each other can contain at most one character from each sequence.

MAS/src/alignment/State.cc

Line 33 in f441ea5

    
           if((edges[left].first.j<edges[i].first.j && edges[left].second.j>edges[i].second.j)

This is a matter of changing > to >= or likewise.
Also: please make the consistency check for a pair of edges a separate function, e.g.
bool consistent(Edge &e, Edge &f);

coding style in StaticGraph/main.cpp

Please

correct the style, e.g. the indentation here:

MAS/src/visualization/StaticGraph/main.cpp

Line 12 in f441ea5

// Get Graph Infos
spaces after if (https://github.com/mslehre/MAS/wiki/Coding-Conventions), inconsistent in the file itself.
check arguments: it crashes if argv[2] is not present.
Please do such housekeeping before opening a pull request.

Otherwise, works well already!

sP UnitTest graph construction_nodes

sLt5 - layout improvement

If we start a game, the graph is not displayed correctly. We must increase the line index by one.

k-mer color palette

Define a mapping k-mer -> color such that the selected colors are best possible distinguishable for the users eye.

You may want to start looking at the SFML Color class: https://www.sfml-dev.org/documentation/2.5.1/classsf_1_1Color.php
Then think about how to map an arbitrary k-mer to a triple (r,g,b)

sK _RL: learning of v-values_

Create a training set for the ML method from sJ:

sample trajectories from the current policy.
create a training set and learn new parameters of ML (sJ)
learning tuples (s,r), where r is the cumulative reward (paid only at end) and s is any state on the policy episode (rollout).
goto 1.

step: 5

sJt4 training performance test

Test the performance of the training algorithm to achieve its objective: minimization of loss.
For the case of linear regression this could be done by comparing to the closed-form solution or by reporting the maximum absolute value of the gradient after the training loop has finished.

mslehre / mas Goto Github PK

mas's People

Contributors

Stargazers

Watchers

mas's Issues

Recommend Projects

Recommend Topics

Recommend Org