Code Monkey home page Code Monkey logo

biogpt.cpp's Introduction

About me

  • I work as a Data Scientist at AI biotech Owkin.
  • Previously, I interned at INRIA Parietal on solving neuroscience (M/EEG) inverse problems.
  • I graduated from Ecole Polytechnique and HEC Paris with a double major in data science and management.

In 2022, I co-created skglm, a fast sklearn-compatible solver for sparse generalized linear models. More recently, I've become interested in fast inference for large language models. I have implemented Bark.cpp, a port of SunoAI's Bark model in C/C++, as well as specialized models like BioGPT.cpp.

Cool open-source projects I contributed to

  • MNE-Python, a toolkit for exploring neurophysiological data in Python
  • Linfa, the leading crate for machine learning and data analysis in Rust
  • Benchopt, a benchmarking suite for optimization algorithms

Other projects I worked on

  • Encodec.cpp, Meta's neural codec model ported in C++
  • SparseGLM, a fast coordinate descent solver in Rust
  • Nanograd, a lightweight deep learning framework built around Numpy arrays
  • NarrateMate.ai, a Next.JS web app to practice language comprehension listening to YouTube videos

biogpt.cpp's People

Contributors

hffqyd avatar pabannier avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

biogpt.cpp's Issues

Confused about running the main executable

Thanks for taking the time to build this! Awesome initiative.

So I'm stuck here because I did this:

mkdir build && cd build
cmake ..
cmake --build . --config Release

I go back to the root project folder and I download the weights into a weights folder and run the convert script

python convert.py --dir-model ./weights/ --out-dir ./ggml_weights

and all is well. I get the ggml_weights folder.

This is now my directory structure:

.
├── biogpt.cpp
├── biogpt.h
├── bpe.cpp
├── bpe.h
├── build
│   ├── bin
│   ├── CMakeCache.txt
│   ├── CMakeFiles
│   ├── cmake_install.cmake
│   ├── compile_commands.json
│   ├── examples
│   ├── ggml
│   └── Makefile
├── CMakeLists.txt
├── convert.py
├── data
│   ├── nonbreaking_prefixes
│   └── perluniprops
├── examples
│   ├── CMakeLists.txt
│   ├── main
│   └── quantize
├── ggml
│   ├── build.zig
│   ├── ci
│   ├── cmake
│   ├── CMakeLists.txt
│   ├── examples
│   ├── ggml.pc.in
│   ├── include
│   ├── LICENSE
│   ├── README.md
│   ├── requirements.txt
│   ├── scripts
│   ├── src
│   └── tests
├── ggml_weights
│   └── ggml-model.bin
├── mosestokenizer.cpp
├── mosestokenizer.h
├── README.md
└── weights
    ├── config.json
    ├── merges.txt
    ├── pytorch_model.bin
    ├── README.md
    └── vocab.json

Then I go to

cd build/bin
./main -p "trastuzumab"                                                                                                                                                                                         15:27:19
terminate called after throwing an instance of 'std::runtime_error'
  what():  Perl Uniprops file not available.
fish: Job 1, './main -p "trastuzumab"' terminated by signal SIGABRT (Abort)

So for some reason the executable doesn't run and it's missing perl uniprops which are already located in your data folder. But it still doesn't work.

What am I doing wrong?

runtime_error: Perl Uniprops file not available.

Thanks for your great tool.

I've compiled the biogpt and converted the model to ggml successfully, but cannot run it. When ./bin/biogpt -m path/to/model or just ./bin/biogpt -h, it throwed an error: libc++abi.dylib: terminating with uncaught exception of type std::runtime_error: Perl Uniprops file not available..

I checked that the perluniprops folder was in data directory.

I used macos 10.15 on Intel CPU., what should I do to fix this?

Thanks for your help.

compile failed

compiling using:

make CC=gcc-11 CPP=g++-11 CXX=g++-11 LD=g++-1

failed to create biogpt. only create file main

log info:

I biogpt.cpp build info:
I UNAME_S: Linux
I UNAME_P: x86_64
I UNAME_M: x86_64
I CFLAGS: -I. -O3 -std=c11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native
I CXXFLAGS: -I. -O3 -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native
I LDFLAGS:
I CC:
I CXX:

gcc-11 -I. -O3 -std=c11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native -c ggml.c -o ggml.o
g++-11 -I. -O3 -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -c mosestokenizer.cpp -o mosestokenizer.o
g++-11 -I. -O3 -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -c bpe.cpp -o bpe.o
g++-11 -I. -O3 -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -c biogpt.cpp -o biogpt.o
biogpt.cpp: In function ‘bool biogpt_model_load(const string&, biogpt_model&, biogpt_vocab&, uint8_t)’:
biogpt.cpp:210:13: warning: C++ designated initializers only available with ‘-std=c++20’ or ‘-std=gnu++20’ [-Wpedantic]
210 | .mem_size = ctx_size,
| ^
biogpt.cpp:211:13: warning: C++ designated initializers only available with ‘-std=c++20’ or ‘-std=gnu++20’ [-Wpedantic]
211 | .mem_buffer = NULL,
| ^
biogpt.cpp:212:13: warning: C++ designated initializers only available with ‘-std=c++20’ or ‘-std=gnu++20’ [-Wpedantic]
212 | .no_alloc = false,
| ^
biogpt.cpp:364:89: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 5 has type ‘int64_t’ {aka ‘long int’} [-Wformat=]
364 | fprintf(stderr, "%s: tensor '%s' has wrong shape in model file: got [%lld, %lld], expected [%d, %d]\n",
| ~~~^
| |
| long long int
| %ld
365 | func, name.data(), tensor->ne[0], tensor->ne[1], ne[0], ne[1]);
| ~~~~~~~~~~~~~
| |
| int64_t {aka long int}
biogpt.cpp:364:95: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 6 has type ‘int64_t’ {aka ‘long int’} [-Wformat=]
364 | fprintf(stderr, "%s: tensor '%s' has wrong shape in model file: got [%lld, %lld], expected [%d, %d]\n",
| ~~~^
| |
| long long int
| %ld
365 | func, name.data(), tensor->ne[0], tensor->ne[1], ne[0], ne[1]);
| ~~~~~~~~~~~~~
| |
| int64_t {aka long int}
biogpt.cpp: In function ‘bool biogpt_eval(const biogpt_model&, int, int, const std::vector&, std::vector&, size_t&)’:
biogpt.cpp:596:9: warning: C++ designated initializers only available with ‘-std=c++20’ or ‘-std=gnu++20’ [-Wpedantic]
596 | .mem_size = buf_size,
| ^
biogpt.cpp:597:9: warning: C++ designated initializers only available with ‘-std=c++20’ or ‘-std=gnu++20’ [-Wpedantic]
597 | .mem_buffer = buf,
| ^
biogpt.cpp:598:9: warning: C++ designated initializers only available with ‘-std=c++20’ or ‘-std=gnu++20’ [-Wpedantic]
598 | .no_alloc = false,
| ^
g++-11 -I. -O3 -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -c main.cpp -o main.o
g++-11 -I. -O3 -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native main.o biogpt.o mosestokenizer.o bpe.o ggml.o -o main

No biogpt executable after running make

Hi Pierre,

Awesome idea to do this project. I am trying to get it running but their is no biogpt executable being generated after running make - their is just the main executable. If I try to run ./main -p "trastuzumab" I get an error:

libc++abi: terminating with uncaught exception of type char const*
zsh: abort ./main -p "trastuzumab"

Was wondering what the right way to run this is.

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.