amr4i / query-biased-multi-document-abstractive-summarisation Goto Github PK

View Code? Open in Web Editor NEW

Implementation for multi-document query-based abstractive summarisation

Python 0.73% Makefile 0.01% q 99.25% Perl 0.01% Shell 0.01%

query-biased-multi-document-abstractive-summarisation's Introduction

InfoRetProject

####################################################################

Topic: Query-biased Multi-Document Abstractive Summarisation

Authors: Amrit Singhal and Akshat Jindal

####################################################################

This is the implementation of all parts of the pipeline that we have proposed in our report for the project, which can be found in the repo.

Pre-requisites

PyLucene
NLTK
Tensorflow

Creating Data Required

Download the CNN news dataset from here.Only the stories are required. Keep it in some directory which will act as your < data_directory >.
For generating query strings, run the following commands :

	python QueryLDA/CreateQueries.py

In the code for CreateQueries.py, set the the directory variable on line 73 as your <data_directory>. This creates a Queries.txt file in the QueryLDA directory.

	python QueryLDA/TopicGen.py

This creates LDA_500.pickle in the QueryLDA directory. We have provided the file already for direct use. This file has 50 queries, each representing a different topic.

Usage

Build the index for the corpus.

	./buildIndex <data_directory>

Run the extractive summarisation process:

	./QueryMultiDocSummarisation <Query_string> <paragraph_extraction_type>

Paragraph_extraction_type

Parameter_Type	Extraction_Process
1	TextTiling
2	Vector Space Method
3	TfIdf Method
4	Luhn Clusters
5	Query Biased LSA

This will create a file RelevantDoc.txt in the main directory of the repo. This is the SuperDoc mentioned in the report.
Following this, we need to perform abstractive summarisation on this SuperDoc. The model we chose was the pointer-generator networks, the implementation for which can be found here.

Samples

The Samples directory has a sample input query, the SuperDoc for it and the final Abstractive summary for it.

Future Works

We aim to improve upon the abstractive part also, by adding a query bias to it.

Recommend Projects

amr4i / query-biased-multi-document-abstractive-summarisation Goto Github PK

query-biased-multi-document-abstractive-summarisation's Introduction

InfoRetProject

Pre-requisites

Creating Data Required

Usage

Paragraph_extraction_type

Samples

Future Works

query-biased-multi-document-abstractive-summarisation's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent