jaaack-wang Goto Github PK
Name: Zhengxiang Wang
Type: User
Company: Stony Brook University
Bio: PhD student at Stony Brook University (since Fall 2022), specializing in NLP & ML/DL.
Twitter: ZhengXian9_Wang
Location: Stony Brook, New York
Name: Zhengxiang Wang
Type: User
Company: Stony Brook University
Bio: PhD student at Stony Brook University (since Fall 2022), specializing in NLP & ML/DL.
Twitter: ZhengXian9_Wang
Location: Stony Brook, New York
This repository stores Python scripts created for BASE slient pause project.
CCNC: A Comprehensive Chinese Name Corpus (3.65M name samples). 大型中文姓名语料库 (内含365万姓名语例)。
A large corpus of Chinese fixed phrases and idioms scraped from a reputable educational website (30310 instances). 一个大型的中文成语及俗语语料库,内含30310条语例
A large high-quality corpus of Chinese synonyms 一个大型、高质量的中文同义词语料库。
Chinese Mandarin Ngrams Counts from large-scale corpora
中文自然语言处理数据集,平时做做实验的材料。欢迎补充提交合并。
Competition records of mine, mostly related to NLP.
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 200 universities.
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
Clustering Document Parts: Detecting and Characterizing Influence Campaigns from Documents
notes on paddlenlp, a SOTA deep learning based NLP toolkit
Predicting gender of given Chinese names (93~99% test set accuracy). 预测中文姓名的性别(93~99%的测试集准确率)。
Auto-aggregating academic profiles of researchers on Google Scholar.
Hands on gradients derivations for common supervised machine learning and deep learning loss functions.
Historical English Language Processing Toolkit: An efficient toolkit and a general framework for early modern & modern English Language Processing in XML and much more. With just a few lines of code and a few minutes, it can tokenize, normalize & annotate a normal XML corpus of a few million tokens. Besides, it is also easy to adapt.
A corpus-linguistic tool to extract and search for linguistic features
Source Code, data, and results for my paper titled Linguistic Knowledge in Data Augmentation for Natural Language Processing: An Example on Chinese Question Matching.
Python Implementation of the Lstar Algorithm by Angluin (1987)
Evaluating LLMs with Multiple Problems at once: A New Paradigm for Probing LLM Capabilities
Notes for Stanford CS224N: Natural Language Processing with Deep Learning.
Simple PyTorch Tutorial for a guest lecture I gave. Suitable for beginners.
Re-implementations (code & results) of the paper "Subregular Complexity and Deep Learning", which disagree with the original observations.
Revised Easy Data Augmentation: a set of general token-level text augmentation techniques
RNN seq2seq models learning transductions and alignments
Using RNN seq2seq models in modelling transduction tasks. Customized training and inference pipelines are provided.
Using RNNs in modelling transduction tasks. Customized training and inference pipelines are provided.
Common approaches to text augmentation, from random text-editing perturbations, back translation, to model-based transformations.
Building and training deep learning models for text classification tasks from scratch using paddle, PyTorch, and TensorFlow.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.