Code Monkey home page Code Monkey logo

wordcounter's Introduction

Wordcounter with word type tagging & json output

To tag & count words frequency in multiple text files and save the output into a json file.

Requierements

  • python 2.x
  • TreeTagger
  • TreeTagger language parameter files

Installation

Run it

To run the script with english text files & plain output as array of object for every file, run:

cd wordcounter
python word_counter.py

Output (language default: 'en'):

[
  [
    {
      "count": 30,
      "tag": "KON",
      "word": "and"
    }, 
    {..}
  ],
  [..]
]

To run the script with another language support, run it with:

Please Note: Make sure you have the language parameter file installed!

cd wordcounter
python word_counter.py de

Output (language default: 'en'):

{
  list: [
    {
      text: [
        {
          "count": 30,
          "tag": "KON",
          "word": "und"
        }, 
      {..},
      ]
      {..},
    }
  ],
]

To run the script and add the filenames information into your output tree, run it with:

cd wordcounter
python word_counter.py en filename

Output (language: en, filename data enrichment)

In order to make this work, the filename needs to have the naming convention: prefix_lastName_firstName_year_month_day.txt So, imagine our file is named Germany_Schneider_Anton_2018_07_21.txt

Given that, the json file will be structured the following way:

{
  list: [
    {
      "firstName": "Anton",
      "lastName": "Schneider",
      "month": "07",
      "year": 2018,
      "day": "21"
      "party": "Germany",
      "words":[{
        "count": 30,
        "tag": "KON",
        "word": "and"
      }, 
      {..}]
    }
  ]
}

Result

The result will be written in a file named wordcounter/wordcounter/data.json

wordcounter's People

Contributors

rocketk avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.