Jiao, Wenxiang's Projects
Preprint: Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
Fast inference from large lauguage models via speculative decoding
An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.
Code for visualizing the loss landscape of neural nets
Approximating neural network loss landscapes in low-dimensional parameter subspaces for PyTorch
MASS: Masked Sequence to Sequence Pre-training for Language Generation
MediaWiki API wrapper in python http://pymediawiki.readthedocs.io/en/latest/
A machine translation reading list maintained by Tsinghua Natural Language Processing Group
Code for the paper "Balancing Training for Multilingual Neural Machine Translation, ACL 2020"
MultilingualSIFT: Multilingual Supervised Instruction Fine-tuning
A detailed description on how to extract and align text, audio, and video features at word-level.
Add noise to your text, can be used to improve synthetic training corpus for Neural Machine Translation
Conversion between Traditional and Simplified Chinese
download, extract, parse and tokenize the opensubtitles dataset with this script
The ParroT framework to enhance and regulate the Translation Abilities during Chat based on open-sourced LLMs (e.g., LLaMA-7b, Bloomz-7b1-mt) and human written translation and evaluation data.
Implementation of our paper "Exploiting Unsupervised Data for Emotion Recognition in Conversations" in the Findings of EMNLP-2020.
Notes of proxy settings for conda, pip, python scripts.
ā”ļø Python client for the unofficial ChatGPT API with auto token regeneration, conversation tracking, proxy support and more.
Temporal video features extracted from ImageNet pre-trained ResNet-152.
RIBES is an automatic evaluation metric for machine translation.
Language modeling and instruction tuning for Russian
Implementation of ICLR 2020 paper "Revisiting Self-Training for Neural Sequence Generation"
A fully customisable language detection pipeline for spaCy
Staged Training for Transformer Language Models
All languages stopwords collection
all kinds of text classificaiton models and more with deep learning