Code Monkey home page Code Monkey logo

compauth's Introduction

Computational authorship analysis with language modelling

In prior work, we modeled Homeric language and provided empirical findings regarding the authorship nature of the 48 Iliad and Odyssey books. Following this line of work, and considering the current philological views and trends, we break down the two poems further into smaller portions. By employing language modeling we identify outlying passages, indicating reduced linguistic affinity with the main body of the two works and, by extension, potentially different authorship.

By using a book resolution, in the Iliad, the greatest proximity (least PPL) was observed for Books 11 and 17. The least proximity (highest PPL) was observed for the 4th and 9th Book. In the Odyssey, the 9th and 12th books were the ones least linguistically associated while the 1st and the 16th were the ones most strongly associated with the remaining books.

By using a passage resolution, forty-seven excerpts of 600 characters each were scored above the upper 95% among all other excerpts. Twenty-five were from the Iliad and 22 from the Odyssey.

Further investigation showed that some of the passages isolated as outliers by the language models were also identified as such by human researchers. Please read our paper for more details: https://link.springer.com/article/10.1007/s42803-022-00046-7

If you find this work interesting, please cite us:

@article{Pavlopoulos2022,
  author = {Pavlopoulos, John and Konstantinidou, Maria},
  doi = {https://doi.org/10.1007/s42803-022-00046-7},
  month = {7},
  title = {Computational authorship analysis of the homeric poems},
  journal = {International Journal of Digital Humanities}
  year = {2022},
}

In this repository you will find:

  • analysing_iliad_and_odyssey.ipynb comprising the code,
  • scored_excerpts.xlsx comprising the PPL per 600-character long excerpt of the Iliad and the Odyssey,
  • unforeseen_excerpts.xlsx comprises the fragments of I.2 and O.11 that were found to exceed the upper 95% confidence interval, as this was defined by random sampling excerpts from the remainder of each poem.

compauth's People

Contributors

ipavlopoulos avatar

Stargazers

Stephen Sansom avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.