- The zipped file "textclassification" written in Perl consists of four scripts marked with No.0, 1, 2 and 3. Run them according to the order of numbers, and then twenty subcorpora only containing full-length articles from corpus of royal society will be created.
- The remaining three zipped documents are used to calculate relative entropy and linguistic concreteness, and each zipped document is responsible for one of three functions (concreteness, lemma and POS trigram).
- Each zipped docuement includes a code script and texts. Due to the large size of all texts in PTRS, just several samples are uploaded here. The code scripts were written by Linux shell and R.
- After you running scripts, the relevant texts and data will be obtained accordingly.
fivehills / corpus-of-royal-society Goto Github PK
View Code? Open in Web Editor NEWrelative entropy and linguistic concreteness