Code Monkey home page Code Monkey logo

summarization's Introduction

Summarization

This guideline concerns the analysis of some text summary models using to improve digital reading of scientific papers. Three language models for the abstractive summarization approach have been selected and tested, with the aim of choosing the one that best summarises the content of each section of the paper, i.e. that is ables to extract the essential information and render the general sense of the source text coherently, generating new data correctly from the original text.

Models

Two types of summaries are generated by the scripts in this repository: keyword bigrams, extracted using the Keybert language model, and abstractive summaries generated by the T5 transformers model.

Data

The selected corpus consists of 130 Open Access english scientific papers related to the fields of communication and education from the Spanish Journal "Comunicar: Scientific Journal of Media Education" [https://doi.org/10.3916/comunicar]. All the articles follow the IMRaD structure and are downloadable in xml format directly from the journal website. This journal was selected according to its active indexations 2022: it is a representative of national journals well positioned in the quartile (Q1) among scientific journals in the Social Sciences: Communication, Education and Cultural Studies.

Language Model for summary generation

Three language models, based on the abstractive summarization approach, have been analysed in the study: BART (Lewis et al., 2019), PEGASUS (Zhang et al., 2020) and T5 (Raffel et al., 2020).

Summary human evaluation

For the evaluation of the extracted summaries, three levels of text quality measurement were established based on what in our opinion is a proper text assessment, which consist of the following intrinsic measurements:

  1. Content:
    • The content of the summary is relevant and coherent 
    • The summary is complete, the main ideas have been selected
  2. Spelling and Morphological correctness:
    • The spelling is correct
    • The morphosyntactic is correct
    • The punctuation is appropriate  
  3. Vocabulary and Style:
    • The summary is not identical to the original text, it rather introduces new data (sentences)
    • The length of the summary is suitable

Based on these considerations, the human evaluations have been conducted using five ranking levels according to the Mean Opinion Score (MOS) scale (Streijl et al., 2016; Iskender et al., 2021):

  1. Bad
  2. Poor
  3. Fair
  4. Good
  5. Excellent

Interface

https://obtic.sorbonne-universite.fr/summary/

summarization's People

Contributors

obtic-sorbonne avatar oussamajomaa avatar valentina-fed avatar ludovicamastrobattista avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.