toolkits for NLP

intent

为了方便自己学习与理解一些东西，实现一些自己的想法

Update info:

2021.1.13 增加SBERT的复现demo,具体代码：sbert-stsb
2020.11.26 增加pretrain + fine-tuning example, 具体代码：classification tnew pretrain before fine-tuning
2020.11.10 NEZHA增加external_embedding_weights, 可以通过该参数将其他信息融合进NEZHA Token-Embedding,具体使用方式：

from toolkit4nlp.models import build_transformer_model
# 自己构造 embeddings_matrix，与vocabulary 对应
config_path = ''
checkpoint_path = ''
embeddings_matrix = None
nezha = build_transformer_model(
config_path=checkpoint_path,
checkpoint_path=checkpoint_path, 
model='nezha', external_embedding_size=100,
 external_embedding_weights=embeddings_matrix)

2020.11.3 增加ccf 2020 qa match baseline：ccf_2020_qa_match_pair和ccf_2020_qa_match_point
2020.10.19 AdaBelief Optimizer 及对应example，具体代码：classification use AdaBelief
2020.10.16 增加focal loss 及对应example，具体代码：classification_focal_loss
2020.09.27 增加NEZHA的实现，使用方法：

from toolkit4nlp.models import build_transformer_model
config_path = '/home/mingming.xu/pretrain/NLP/chinese_nezha_base/config.json'
checkpoint_path = '/home/mingming.xu/pretrain/NLP/chinese_nezha_base/model_base.ckpt'

model = build_transformer_model(config_path=config_path, checkpoint_path=checkpoint_path, model='nezha')

2020.09.22 增加FastBERT的实现，具体代码：classification ifytek with FastBERT
2020.09.15 增加两个尝试在分类任务上构造新的任务来增强性能实验，具体代码：classification ifytek with similarity 和 classification ifytek with seq2seq
2020.09.10 增加Knowledge Distillation Bert example, 具体代码: distilling knowledge bert
2020.08.24 增加UniLM做question answer generation example，具体代码：qa question answer generation
2020.08.20 增加UniLM做question generation example，具体代码：qa question generation
2020.08.20 增加UniLM和LM model，使用方法：

from toolkit4nlp.models import build_transformer_model
config_path = '/home/mingming.xu/pretrain/NLP/chinese_electra_base_L-12_H-768_A-12/config.json'
checkpoint_path = '/home/mingming.xu/pretrain/NLP/chinese_electra_base_L-12_H-768_A-12/electra_base.ckpt'


# lm
model = build_transformer_model(
  config_path=config_path,
  checkpoint_path=checkpoint_path,
  application='lm'
)

# unilm
model = build_transformer_model(
  config_path=config_path,
  checkpoint_path=checkpoint_path,
  application='unilm'
)

2020.08.19 增加ELECTRA model,使用方法：

from toolkit4nlp.models import build_transformer_model


config_path = '/home/mingming.xu/pretrain/NLP/chinese_electra_base_L-12_H-768_A-12/config.json'
checkpoint_path = '/home/mingming.xu/pretrain/NLP/chinese_electra_base_L-12_H-768_A-12/electra_base.ckpt'

model =  build_transformer_model(
  config_path=config_path,
  checkpoint_path=checkpoint_path,
  model='electra',
)

2020.08.17 增加 two-stage-fine-tuning 实验，验证bert-of-theseus中theseus_model的必要性，具体代码: two_stage_fine_tuning
2020.08.14 增加 bert-of-theseus在ner相关实验下的代码，具体代码：sequence_labeling_ner_bert_of_theseus
2020.08.11 增加 bert-of-theseus在文本分类下的相关实验代码，具体代码:classification_ifytek_bert_of_theseus
2020.08.06 增加 cws-crf example,具体代码:cws_crf_example
2020.08.05 增加 ner-crf example,具体代码:ner_crf_example
2020.08.01 增加 bert + dgcnn 做 qa task, 具体代码:qa_dgcnn_example
2020.07.27 增加 pretraining，用法参照 pretraining/README.md
2020.07.18 增加 tokenizer，用法：

from toolkit4nlp.tokenizers import Tokenizer
vocab = ''
tokenizer = Tokenizer(vocab, do_lower_case=True)
tokenizer.encode('我爱你**')

2020.07.16 完成bert加载预训练权重，用法：

from toolkit4nlp.models import build_transformer_model

config_path = ''
checkpoints_path = ''
model = build_transformer_model(config_path, checkpoints_path)

主要参考了bert 和 bert4keras以及 keras_bert

haojiepan1 / toolkit4nlp Goto Github PK

toolkit4nlp's Introduction

toolkits for NLP

intent

Update info:

toolkit4nlp's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent