ridgerchu / spikegpt Goto Github PK

View Code? Open in Web Editor NEW

732.0 732.0 76.0 1.29 MB

Implementation of "SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks"

License: BSD 2-Clause "Simplified" License

Cuda 49.15% C++ 0.13% Python 50.72%

spikegpt's Introduction

Hello. I'm Ridger (Rui-Jie) Zhu.

👀 interested in spiking neural network and natural language processing
🌱 currently learning at UC Santa Cruz, first-year Ph.D. student, supervised by Prof. Jason Eshraghian
📫 How to reach me [email protected]

spikegpt's People

Contributors

Stargazers

Watchers

Forkers

codeaudit woonhock stevenzhou2017 cpehle stjordanis hridhoy magnologan dumpmemory co-simulation dan255 techthiyanes ardabck eltociear tengyuantuohai-113 parsaomidi smarts027 nguyenducnhaty rachmadvwp wuhyeongdoh gmontgomery yuyangshu apitar arturosing wangwenjie123 connor-henderson pritchardn mahmoudzamani ishine lowee1 jinguanghe ncbwct realtaki moomoofarm1 szpal00 loveberryc meicale macguyversmusic wrathofbhuvan11 conscious-choi anigi98932 piotr-maciag cupdike armadel edgeye leonidkonya tlgbskn eddiem3 darcstar-solutions-tech shb96 naoya-takagi adrien-vl kostasl diederikvink 932179209 lindiac 2251821381 mlevngr julesdesai macshkim joyli-x lihuibng icec102 death0004 vectorrent song-seng-hun liangqibin codeamt aslansd alex-vasilache pnjf-freitas awpao arthurwpao davidko3 ethersito123 cholmaster rickyhong

spikegpt's Issues

SynOps Calculation

Hi @ridgerchu , first of all congratulations for your work, it is amazing. I would like to know how you exactly calculate the number for SynOps reported in your paper, as I do not get the same results. Look forward to hearing from you.

TypeError: object of type 'NoneType' has no len()

Obtaining this error at the end of training when training bar reaches 100%

TypeError: object of type 'NoneType' has no len()

Full error message

Traceback (most recent call last):
  File "path/pycharm-community-2021.3.3/plugins/python-ce/helpers/pydev/pydevd.py", line 1483, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "path/pycharm-community-2021.3.3/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "path/PycharmProjects/LLM/train.py", line 137, in <module>
    trainer.train()
  File "path/PycharmProjects/LLM/src/trainer.py", line 183, in train
    run_epoch('valid')
  File "path/PycharmProjects/LLM/src/trainer.py", line 116, in run_epoch
    num_steps = len(loader)
  File "path/python3_venv/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 489, in __len__
    return len(self._index_sampler)
  File "path/python3_venv/lib/python3.8/site-packages/torch/utils/data/sampler.py", line 265, in __len__
    return (len(self.sampler) + self.batch_size - 1) // self.batch_size  # type: ignore[arg-type]
  File "path/python3_venv/lib/python3.8/site-packages/torch/utils/data/sampler.py", line 79, in __len__
    return len(self.data_source)
TypeError: object of type 'NoneType' has no len()

RuntimeError: Error building extension 'wkv'

Whether have a batch inference file?

I wonder that whether there have a batch inference code?
It seems like I can only input one context for one time.
That would be nice if u can provided a file like batch_run.py or something else. Thx!

dataset

How the dataset is made

The model has been talking nonsense all along

Training setup

Hi, I'm attempting to replicate the training runs with all the different datasets. Could you provide some insight into the configuration that you used to train all three of the datasets you mentioned in the paper?

Thanks in advance!

Using [Vit] with SpikeGPT model

Hello everyone ,
It is fantastic to see your great job you have done .
Is it possible to leverage Vit model (or it can be any) for image understanding with SpikeGPT in Sequence to Sequence task for image captions task ?

Linking paper and code

Hi, I had a couple of questions on the paper as well as the link to the code here.

Do you have any materials on how you derived of Eq.10 in the paper from Eq.4?
I'm also a little unclear how the CUDA function "kernel_forward" in wkv_cuda.cu implements the Eq.10 - could you provide some pointers around that please?

Thanks!

Access to downstream task finetuned models

Hi, have you open-sourced the models that you used for the perplexity values quoted in (https://arxiv.org/abs/2302.13939)? For instance do you have the wikitext-2 and wikitext-103 models open source anywhere?

Alternatively, in order to create a custom model to reproduce those results, should I start with the provided 216M model trained on OpenWebText and finetune it on wikitext using the provided train.py script?

Thanks!

Adversarial attack

Hello, I want to do SNN image classification against attacks, and I would like to consult spikegpt as the attacked network architecture? My data shape is (T,N,C,H,W), T is the number of frames, N is batch, C is channel, H is height, and W is width. What should I change if I can? Thank you very much.