Code Monkey home page Code Monkey logo

opendialog's Introduction

OpenDialog

我们现在拥有了测试接口了,搜索微信公众号 OpenDialog 可以使用

OpenDialog建立在基于PyTorch的transformers之上。 提供一系列transformer-based的中文开放域对话模型(闲聊对话),网罗已有的数据资源并持续不断的补充对应的中文对话系统的数据集,意图构建一个开源的中文闲聊对话平台。

最新进展:

  • 2020.8.20, 完成LCCC-GPT-Large生成式Open-Domain预训练模型的接口,运行下面代码可以启动对应的服务

    ./run_flask lccc <gpu_id>
  • 2020.10.26, 完成一批bi-encoder的检索式对话模型(bert-bi-encoder, polyencoder等)

  • ...

使用教程

1. 项目结构和文件简述

OpenDialog核心文件和目录:

  • data: 数据集,配置文件,词表,词向量,数据集处理脚本
  • models: 对话模型
  • metrics: 评价指标
  • multiview: 多角度重排模型,针对获得对话候选回复进行重排序
  • ckpt: 存放训练模型
  • rest: 存放tensorboard日志和test阶段生成的结果文件
  • utils: 存放工具函数
  • dataloader.py: 数据集加载脚本
  • main.py: 主运行文件
  • header.py: 需要导入的package
  • eval.py: 调用metrics中的评价指标的评估脚本,测试rest中生成文件的结果
  • run.sh: 运行批处理脚本
  • run_flask.sh: 调用模型,启动服务

2. 准备环境

  1. 基础系统环境: Linux/Ubuntu-16.04+, Python 3.6+, GPU (default 1080 Ti)

  2. 安装python依赖库

pip install -r requirements.txt
  1. 安装 ElasticSearch

    基于检索的对话系统需要首先使用elasticsearch进行粗筛。同时为了实现粗筛检索阶段的中文分词,同时需要下载和安装中文分词器

  2. 安装 mongodb

    启动服务之后,会使用mongodb存储会话历史和必要的数据

3. 准备数据

  1. 数据集百度云链接: https://pan.baidu.com/s/1xJibJmOOCGIzmJVC6CZ39Q; 提取码: vmua
  2. 将对应的数据文件存放在data目录下对应的子目录中,词向量文件chinese_w2v.txtenglish_w2v.bin存放在data下即可。
  3. 数据细节和预处理数据详见data/README.md
  4. 可用的数据集

5. 训练模型

  • 训练模型支持多GPU并行,只需要<gpu_ids>指定多个gpu id即可,比如0,1,2,3
  • dataset名称和data目录下的名称一致
Model CMD Type Details Refer Pre-train Model
bertretrieval ./run.sh train <dataset> bertretrieval <gpu_ids> retrieval 基于bert的精排模型(fine-tuning) Paper
gpt2 ./run.sh train <dataset> gpt2 <gpu_ids> generative GPT2生成式对话模型 Code
gpt2gan ./run.sh train <dataset> gpt2gan <gpu_ids> generative GAN-based对话模型,生成式模型是GPT2,判别模型是bert二分类模型 Paper

6. 实验结果

7. 启动flask服务

  1. 启动flask服务

    ./run_flask.sh <model_name> <gpu_id>
    
  2. 调用接口

    • 微信公众号
    • postman

opendialog's People

Contributors

gmftbygmftby avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

opendialog's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.