The tinygrad version is very slow compared to pytorch probably because I am missing some detail about tinygrad. Even though it uses CUDA, seems like there is a chance for optimization here.
surajhanchinal / spearenet Goto Github PK
View Code? Open in Web Editor NEWTorch and tinygrad implementation of Karpathy's nanogpt based off of shakespeare's text