This guideline concerns the analysis of some text summary models using to improve digital reading of scientific papers. Three language models for the abstractive summarization approach have been selected and tested, with the aim of choosing the one that best summarises the content of each section of the paper, i.e. that is ables to extract the essential information and render the general sense of the source text coherently, generating new data correctly from the original text.
Two types of summaries are generated by the scripts in this repository: keyword bigrams, extracted using the Keybert language model, and abstractive summaries generated by the T5 transformers model.
The selected corpus consists of 130 Open Access english scientific papers related to the fields of communication and education from the Spanish Journal "Comunicar: Scientific Journal of Media Education" [https://doi.org/10.3916/comunicar]. All the articles follow the IMRaD structure and are downloadable in xml format directly from the journal website. This journal was selected according to its active indexations 2022: it is a representative of national journals well positioned in the quartile (Q1) among scientific journals in the Social Sciences: Communication, Education and Cultural Studies.
Three language models, based on the abstractive summarization approach, have been analysed in the study: BART (Lewis et al., 2019), PEGASUS (Zhang et al., 2020) and T5 (Raffel et al., 2020).
For the evaluation of the extracted summaries, three levels of text quality measurement were established based on what in our opinion is a proper text assessment, which consist of the following intrinsic measurements:
- Content:
- The content of the summary is relevant and coherent
- The summary is complete, the main ideas have been selected
- Spelling and Morphological correctness:
- The spelling is correct
- The morphosyntactic is correct
- The punctuation is appropriate
- Vocabulary and Style:
- The summary is not identical to the original text, it rather introduces new data (sentences)
- The length of the summary is suitable
Based on these considerations, the human evaluations have been conducted using five ranking levels according to the Mean Opinion Score (MOS) scale (Streijl et al., 2016; Iskender et al., 2021):
- Bad
- Poor
- Fair
- Good
- Excellent