Code Monkey home page Code Monkey logo

bebujoie / paddlenlp Goto Github PK

View Code? Open in Web Editor NEW

This project forked from paddlepaddle/paddlenlp

0.0 0.0 0.0 70.3 MB

Easy-to-use and powerful NLP library with Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including Neural Search, Question Answering, Information Extraction and Sentiment Analysis end-to-end system.

Home Page: https://paddlenlp.readthedocs.io

License: Apache License 2.0

Shell 1.91% C++ 10.15% Python 82.10% C 0.04% Cuda 4.96% Makefile 0.02% CMake 0.81% Dockerfile 0.01%

paddlenlp's Introduction

简体中文🀄 | English🌎


PaddleNLP is an easy-to-use and powerful NLP library with Awesome pre-trained model zoo, supporting wide-range of NLP tasks from research to industrial applications.

News 📢

  • 🔥 2022.5.18-19 We will introduce UIE (Universal Information Extraction) and ERNIE 3.0 light-weight model on bilibili. Welcome to join us!

  • 🔥 2022.5.16 PaddleNLP v2.3 Released!🎉

    • 💎 Release UIE (Universal Information Extraction) technique, which single model can support NER, Relation Extraction, Event Extraction and Sentiment Analysis simultaneously.
    • 😊 Release ERNIE 3.0 light-weight model achieved better results compared to ERNIE 2.0 on CLUE, also including 🗜️lossless model compression and ⚙️end-to-end deployment.
    • 🏥 Release ERNIE-Health, a SOTA biomedical pretrained model on CBLUE.
    • 💬 Release PLATO-XL with ⚡FasterGeneration⚡, the 11B open-domain SOTA chatbot model can be deployed on multi-GPU and do parallel inference easily.

Features

Out-of-Box NLP Toolset

Taskflow aims to provide off-the-shelf NLP pre-built task covering NLU and NLG technique, in the meanwhile with extreamly fast infernece satisfying industrial scenario.

taskflow1

For more usage please refer to Taskflow Docs.

Awesome Chinese Model Zoo

🀄 Comprehensive Chinese Transformer Models

We provide 45+ network architectures and over 500+ pretrained models. Not only includes all the SOTA model like ERNIE, PLATO and SKEP released by Baidu, but also integrates most of the high-quality Chinese pretrained model developed by other organizations. Use AutoModel API to ⚡SUPER FAST⚡ download pretrained mdoels of different architecture. We welcome all developers to contribute your Transformer models to PaddleNLP!

from paddlenlp.transformers import *

ernie = AutoModel.from_pretrained('ernie-3.0-medium-zh')
bert = AutoModel.from_pretrained('bert-wwm-chinese')
albert = AutoModel.from_pretrained('albert-chinese-tiny')
roberta = AutoModel.from_pretrained('roberta-wwm-ext')
electra = AutoModel.from_pretrained('chinese-electra-small')
gpt = AutoModelForPretraining.from_pretrained('gpt-cpm-large-cn')

Unified API experience for NLP task like semantic representation, text classification, sentence matching, sequence labeling, question answering, etc.

import paddle
from paddlenlp.transformers import *

tokenizer = AutoTokenizer.from_pretrained('ernie-3.0-medium-zh')
text = tokenizer('natural language processing')

# Semantic Representation
model = AutoModel.from_pretrained('ernie-3.0-medium-zh')
sequence_output, pooled_output = model(input_ids=paddle.to_tensor([text['input_ids']]))
# Text Classificaiton and Matching
model = AutoModelForSequenceClassification.from_pretrained('ernie-3.0-medium-zh')
# Sequence Labeling
model = AutoModelForTokenClassification.from_pretrained('ernie-3.0-medium-zh')
# Question Answering
model = AutoModelForQuestionAnswering.from_pretrained('ernie-3.0-medium-zh')

Wide-range NLP Task Support

PaddleNLP provides rich examples covering mainstream NLP task to help developers accelerate problem solving. You can find our powerful transformer Model Zoo, and wide-range NLP application exmaples with detailed instructions.

Also you can run our interactive Notebook tutorial on AI Studio, a powerful platform with FREE computing resource.

PaddleNLP Transformer model summary (click to show details)
Model Sequence Classification Token Classification Question Answering Text Generation Multiple Choice
ALBERT
BART
BERT
BigBird
BlenderBot
ChineseBERT
ConvBERT
CTRL
DistilBERT
ELECTRA
ERNIE
ERNIE-CTM
ERNIE-Doc
ERNIE-GEN
ERNIE-Gram
ERNIE-M
FNet
Funnel-Transformer
GPT
LayoutLM
LayoutLMv2
LayoutXLM
LUKE
mBART
MegatronBERT
MobileBERT
MPNet
NEZHA
PP-MiniLM
ProphetNet
Reformer
RemBERT
RoBERTa
RoFormer
SKEP
SqueezeBERT
T5
TinyBERT
UnifiedTransformer
XLNet

For more pretrained model usage, please refer to Transformer API Docs.

Industrial End-to-end System

We provide high value scenarios including information extraction, semantic retrieval, questionn answering high-value.

For more details industial cases please refer to Applications.

🧠 Neural Search System

For more details please refer to Neural Search.

❓ Question Answering System

We provide question answering pipeline which can support FAQ system, Document-level Visual Question answering system based on 🚀RocketQA.

For more details please refer to Question Answering and Document VQA.

💌 Opinion Extraction and Sentiment Analysis

We build an opinion extraction system for product review and fine-grained sentiment analysis based on SKEP Model.

For more details please refer to Sentiment Analysis.

🎙️ Speech Command Analysis

Integrated ASR Model, Information Extraction, we provide a speech command analysis pipeline that show how to use PaddleNLP and PaddleSpeech to solve Speech + NLP real scenarios.

For more details please refer to Speech Command Analysis.

High Performance Distributed Training and Inference

🚀 Fleet API: 4D Hybrid Distributed Training

For more super large-scale model pre-training details please refer to GPT-3.

⚡ FasterTokenizers: High Performance Text Preprocessing Library

AutoTokenizer.from_pretrained("ernie-3.0-medium-zh", use_faster=True)

Set use_faster=True to use C++ Tokenizer kernel to achieve 100x faster on text pre-processing. For more usage please refer to FasterTokenizers.

⚡ FasterGeneration: High Perforance Generation Utilities

model = GPTLMHeadModel.from_pretrained('gpt-cpm-large-cn')
...
outputs, _ = model.generate(
    input_ids=inputs_ids, max_length=10, decode_strategy='greedy_search',
    use_faster=True)

Set use_faster=True to achieve 5x speedup for Transformer, GPT, BART, PLATO, UniLM text generation. For more usage please refer to FasterGeneration.

Installation

Prerequisites

  • python >= 3.6
  • paddlepaddle >= 2.2

More information about PaddlePaddle installation please refer to PaddlePaddle's Website.

Python pip Installation

pip install --upgrade paddlenlp

Quick Start

Taskflow aims to provide off-the-shelf NLP pre-built task covering NLU and NLG scenario, in the meanwhile with extreamly fast infernece satisfying industrial applications.

from paddlenlp import Taskflow

# Chinese Word Segmentation
seg = Taskflow("word_segmentation")
seg("第十四届全运会在西安举办")
>>> ['第十四届', '全运会', '在', '西安', '举办']

# POS Tagging
tag = Taskflow("pos_tagging")
tag("第十四届全运会在西安举办")
>>> [('第十四届', 'm'), ('全运会', 'nz'), ('在', 'p'), ('西安', 'LOC'), ('举办', 'v')]

# Named Entity Recognition
ner = Taskflow("ner")
ner("《孤女》是2010年九州出版社出版的小说,作者是余兼羽")
>>> [('《', 'w'), ('孤女', '作品类_实体'), ('》', 'w'), ('是', '肯定词'), ('2010年', '时间类'), ('九州出版社', '组织机构类'), ('出版', '场景事件'), ('的', '助词'), ('小说', '作品类_概念'), (',', 'w'), ('作者', '人物类_概念'), ('是', '肯定词'), ('余兼羽', '人物类_实体')]

# Dependency Parsing
ddp = Taskflow("dependency_parsing")
ddp("9月9日上午纳达尔在亚瑟·阿什球场击败俄罗斯球员梅德韦杰夫")
>>> [{'word': ['9月9日', '上午', '纳达尔', '在', '亚瑟·阿什球场', '击败', '俄罗斯', '球员', '梅德韦杰夫'], 'head': [2, 6, 6, 5, 6, 0, 8, 9, 6], 'deprel': ['ATT', 'ADV', 'SBV', 'MT', 'ADV', 'HED', 'ATT', 'ATT', 'VOB']}]

# Sentiment Analysis
senta = Taskflow("sentiment_analysis")
senta("这个产品用起来真的很流畅,我非常喜欢")
>>> [{'text': '这个产品用起来真的很流畅,我非常喜欢', 'label': 'positive', 'score': 0.9938690066337585}]

API Reference

  • Support luge.ai dataset loading and compatible with Hugging Face Datasets. For more details please refer to Dataset API.
  • Using Hugging Face style API to load 500+ selected transformer models and download with fast speed. For more information please refer to Transformers API.
  • One-line of code to load pre-trained word embedding. For more usage please refer to Embedding API.

Please find all PaddleNLP API Reference from our readthedocs.

Community

Special Interest Group (SIG)

Welcome to join PaddleNLP SIG for contribution, eg. Dataset, Models and Toolkit.

Slack

To connect with other users and contributors, welcome to join our Slack channel.

WeChat

Scan the QR code below with your Wechat⬇️. You can access to official technical exchange group. Look forward to your participation.

Citation

If you find PaddleNLP useful in your research, please consider cite

@misc{=paddlenlp,
    title={PaddleNLP: An Easy-to-use and High Performance NLP Library},
    author={PaddleNLP Contributors},
    howpublished = {\url{https://github.com/PaddlePaddle/PaddleNLP}},
    year={2021}
}

Acknowledge

We have borrowed from Hugging Face's Transformer🤗 excellent design on pretrained models usage, and we would like to express our gratitude to the authors of Hugging Face and its open source community.

License

PaddleNLP is provided under the Apache-2.0 License.

paddlenlp's People

Contributors

smallv0221 avatar liuchiachi avatar zeyuchen avatar linjieccc avatar wawltor avatar frostml avatar zhui avatar joey12300 avatar yingyibiao avatar lemonnoel avatar gongel avatar guoshengcs avatar steffy-zxf avatar kinghuin avatar huhuiwen99 avatar xiemoyuan avatar moebius21 avatar junnyu avatar chenxiaozeng avatar leeyy2020 avatar w5688414 avatar luyaojie avatar hysunflower avatar 1649759610 avatar mawenjie8731 avatar yeliang2258 avatar ceci3 avatar baibaifan avatar forfishes avatar fiyen avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.