Code Monkey home page Code Monkey logo

evo's Introduction

Evo: DNA foundation modeling from molecular to genome scale

Evo

Evo is a biological foundation model capable of long-context modeling and design. Evo uses the StripedHyena architecture to enable modeling of sequences at a single-nucleotide, byte-level resolution with near-linear scaling of compute and memory relative to context length. Evo has 7 billion parameters and is trained on OpenGenome, a prokaryotic whole-genome dataset containing ~300 billion tokens.

We describe Evo in the the paper “Sequence modeling and design from molecular to genome scale with Evo” and in the accompanying blog post.

We provide the following model checkpoints:

Checkpoint Name Description
evo-1-8k-base A model pretrained with 8,192 context. We use this model as the base model for molecular-scale finetuning tasks.
evo-1-131k-base A model pretrained with 131,072 context using evo-1-8k-base as the base model. We use this model to reason about and generate sequences at the genome scale.

Contents

Setup

Requirements

Evo is based on StripedHyena.

Evo uses FlashAttention-2, which may not work on all GPU architectures. Please consult the FlashAttention GitHub repository for the current list of supported GPUs.

Make sure to install the correct PyTorch version on your system.

Installation

You can install Evo using pip

pip install evo-model

or directly from the GitHub source

git clone https://github.com/evo-design/evo.git
cd evo/
pip install .

We recommend that you install the PyTorch library first, before installing all other dependencies (due to dependency issues of the flash-attn library; see, e.g., this issue).

One of our example scripts, demonstrating how to go from generating sequences with Evo to folding proteins (scripts/generation_to_folding.py), further requires the installation of prodigal. We have created an environment.yml file for this:

conda env create -f environment.yml
conda activate evo-design

Usage

Below is an example of how to download Evo and use it locally through the Python API.

from evo import Evo
import torch

device = 'cuda:0'

evo_model = Evo('evo-1-131k-base')
model, tokenizer = evo_model.model, evo_model.tokenizer
model.to(device)
model.eval()

sequence = 'ACGT'
input_ids = torch.tensor(
    tokenizer.tokenize(sequence),
    dtype=torch.int,
).to(device).unsqueeze(0)
logits, _ = model(input_ids) # (batch, length, vocab)

print('Logits: ', logits)
print('Shape (batch, length, vocab): ', logits.shape)

An example of batched inference can be found in scripts/example_inference.py.

We provide an example script for how to prompt the model and sample a set of sequences given the prompt.

python -m scripts.generate \
    --model-name 'evo-1-131k-base' \
    --prompt ACGT \
    --n-samples 10 \
    --n-tokens 100 \
    --temperature 1. \
    --top-k 4 \
    --device cuda:0

We also provide an example script for using the model to score the log-likelihoods of a set of sequences.

python -m scripts.score \
    --input-fasta examples/example_seqs.fasta \
    --output-tsv scores.tsv \
    --model-name 'evo-1-131k-base' \
    --device cuda:0

HuggingFace

Evo is integrated with HuggingFace.

from transformers import AutoConfig, AutoModelForCausalLM

model_name = 'togethercomputer/evo-1-8k-base'

model_config = AutoConfig.from_pretrained(model_name, trust_remote_code=True)
model_config.use_cache = True

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    config=model_config,
    trust_remote_code=True,
)

Together API

Evo will also be soon available via an API by TogetherAI.

import openai
import os

# Fill in your API information here.
client = openai.OpenAI(
  api_key=TOGETHER_API_KEY,
  base_url='https://api.together.xyz',
)

chat_completion = client.chat.completions.create(
  messages=[
    {
      "role": "system",
      "content": ""
    },
    {
      "role": "user",
      "content": "ACGT", # Prompt the model with a sequence.
    }
  ],
  model="togethercomputer/evo-1-131k-base",
  max_tokens=128, # Sample some number of new tokens.
  logprobs=True
)
print(
    chat_completion.choices[0].logprobs.token_logprobs,
    chat_completion.choices[0].message.content
)

Citation

Please cite the following preprint when referencing Evo.

@article {nguyen2024sequence,
	author = {Eric Nguyen and Michael Poli and Matthew G Durrant and Armin W Thomas and Brian Kang and Jeremy Sullivan and Madelena Y Ng and Ashley Lewis and Aman Patel and Aaron Lou and Stefano Ermon and Stephen A Baccus and Tina Hernandez-Boussard and Christopher Ré and Patrick D Hsu and Brian L Hie},
	title = {Sequence modeling and design from molecular to genome scale with Evo},
	year = {2024},
	doi = {10.1101/2024.02.27.582234},
	publisher = {Cold Spring Harbor Laboratory},
	URL = {https://www.biorxiv.org/content/early/2024/02/27/2024.02.27.582234},
	journal = {bioRxiv}
}

evo's People

Contributors

artificial-paul avatar athms avatar brianhie avatar zymrael avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.