Code Monkey home page Code Monkey logo

wikidata's Introduction

Wikidata

Wikidata is a cli program that compares wikipage contributions. You only need to provide wikidata with the title of two Wikipedia pages, and it will download information about the list contributions and contributors.

Wikidata can also pull different languages and compare them together. you only have to pass it the abbriviation of the country in the case of a single language, or a set of lanaguages for multiple language analysis.

Wikidata will display 3 types of graphs:

  1. Number of contributions by month for each page.
  2. Number of contributions by type (anonymous or not) for each page.
  3. The intersection between contributors from both pages.

alt text

Usage

usage: Wikidata [-h] [-c CONTRIBUTIONS]
                [-l LANGUAGES [LANGUAGES ...]] [-o OUTPUT]
                [--graphical | --no-graphical]
                [--csv-data | --no-csv-data]
                page1 page2

Extracts and compares data about wikipedia pages

positional arguments:
  page1                 Name of Wikipedia page
  page2                 Name of Wikipedia page

options:
  -h, --help            show this help message and exit
  -c CONTRIBUTIONS, --contributions CONTRIBUTIONS
                        Number of contributions to retrieve
  -l LANGUAGES [LANGUAGES ...], --languages LANGUAGES [LANGUAGES ...]
                        Set of language (default: en)
  -o OUTPUT, --output OUTPUT
                        Name of output file for the graphs
  --graphical, --no-graphical
                        Display data graphicly (default: True)
  --csv-data, --no-csv-data
                        Compile contributors information per page in
                        a csv file (default: False)

Example

The simplest way to run wikidata is to just pass it the title name of the two pages.

python wikidata.py Mikhail_Bakunin Errico_Malatesta

The command below would compare the wikipedia pages (Albert_Camus, and David_Graeber) for the last 100 revisions. No graphs will be shown (still going to be saved as png image), and csv data of contributions for each data will be generated.

python wikidata.py Albert_Camus David_Graeber -c 100 --no-graphical --csv-data

The command below would compare the wikipedia pages (Albert_Camus, and David_Graeber) for the last 1000 revisions with 3 languages selected (en, fr, de).

python wikidata.py Albert_Camus David_Graeber -c 1000 -l en fr de

Csv data structure

The csv file generated for each (meta)page has this format:

contributor, contributions, language
Fabienamnet,1,fr
Le sourcier de la colline,1,fr
Rita2008,27,de
Tsor,2,de
Invisigoth67,5,de

Installation

To insall wikidata you only need to follow the instructions below.

// pull the git repository.
git clone https://github.com/zeddo123/wikidata
cd wikidata
// install the requirements.
poetry install // or pip install -r requirements.txt

Dependencies

  • beautifulsoup4: for web scrapping the wikipedia pages.
  • matplotlib: for generating the plots from the data.
  • matplotlib-venn: helper library used to generate the venn diagrams.
  • aiohttp: async library used to speed up the pulling of the wikipedia pages.
  • tqdm: used for progess bars.
  • dateparser: used to parse the dates from any date format. Crucial to get the date objects from different languages.

wikidata's People

Contributors

zeddo123 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.