Code Monkey home page Code Monkey logo

make_datafiles_for_pgn's Introduction

The origin code is from https://github.com/becxer/cnn-dailymail/

Instructions

It processes your test data into the binary format expected by the code for the Tensorflow model , as used in the ACL 2017 paper Get To The Point: Summarization with Pointer-Generator Networks.

Environment

item detail
OS Windows 10 64 bit
Python Python 3.5
Tensorflow Tensorflow 1.2.1
CUDA CUDA® Toolkit 8.0
cuDNN cuDNN v5.1
stanford-corenlp stanford-corenlp-3.9.1

How to use?

1. Download Stanford CoreNLP

We will need Stanford CoreNLP to tokenize the data. Download it here and unzip it.
Then add stanford-corenlp-3.7.0.jar(stanford-corenlp-full-2018-02-27/stanford-corenlp-3.9.1.jar) to ypur environment variable.
In my case,I add below

D:\data\tensorflow\pgn\stanford-corenlp-full-2018-02-27\stanford-corenlp-3.9.1.jar

path1.png path1.png path1.png

You can check if it's working by running

echo "Please tokenize this text." | java edu.stanford.nlp.process.PTBTokenizer

You should see something like:

Please
tokenize
this
text
.
PTBTokenizer tokenized 5 tokens at 68.97 tokens per second.

2. Process test data into .bin

USAGE : python make_datafiles.py <stories_dir> <out_dir>

d:
cd make_datafiles_dondon
python make_datafiles.py  ./stories  ./output

3. Download Pointer-generator Networks

3. Download pretrained model

4. Download the processed data

User @JafferWilson has provided the processed data, which you can download here.

5. Summarization your Test data

python run_summarization.py --mode=decode --data_path=C:\\tmp\\data\\finished_files\\chunked\\test_*  --vocab_path=D:\\data\\tensorflow\\pgn\\CNN_Daily_Mail\\finished_files\\vocab --log_root=D:\\data\\tensorflow\\pgn --exp_name=pretrained_model --max_enc_steps=500 --max_dec_steps=40 --coverage=1 --single_pass=1

make_datafiles_for_pgn's People

Contributors

then0ob avatar dondon2475848 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.