Introduction to small CRISPR on-target model in Tensorflow / Keras and use it to train a small ontarget efficiency model on real data.
Deep learning does not easily lend itself to extraction of feature importance, like in the example of CRISPR where one could wish to know the importance of e.g. the first nucleotide of the NGG pam for the efficiency of the guide. In this exercise we will look at a way around this problem by masking out parts of the input sequence or of the energy parameter from the model input.
In this exercise we will take a look at what are the actual outcome of the convolutions of the on-target sequence in the deep learning model.
Create a machine learning model to replace the simple model used in these exercises. The only conditions are
- Replace the one-hot encoding with an embedding layer
- Do not use any convolutional layers
- It should me trainable in less than approximately 5 minutes
This can not be completed by anyone but an expert in the given time, so feel free to use your favorite Python code generating LLM. You could for example try with a bidirectional LSTM model. Can you make a model that performs better on the provided test (validation) data than the model built in Exercise 1?