Code Monkey home page Code Monkey logo

wiki-text-nlp's Introduction

Extract 'Did you know?' facts from Wikipedia articles

This is a small Python command line tool and a Jupyter Notebook based on Adam Geitgey's excellent blogpost on Natural Language Processing in Python.

When you run the command like this

$ python3 wiki-text-nlp.py Pikachu

It will show you 'interesting facts' based on the article on the English Wikipedia.

Did you know that Pikachu...
...is a central character in the Pokémon anime series?
...was one of several different Pokémon designs conceived by Game Freak's character development team?
...were the first "Electric-type" Pokémon created, their design intended to revolve around the concept of electricity?
...is one of the sixteen starters and ten partners in the Pokémon Mystery Dungeon games?
...is an amiibo character?
...would be happier living in a colony of wild Pikachu?
...is one of the main Pokémon used in many of the Pokémon manga series?
...is real, out next week in Japan?

Dependencies

If you've followed Adam's tutorial you'll have all the dependencies expect for two: bs4 (BeautifulSoup) and requests.

If you haven't followed that tutorial, this will help you out

pip3 install spacy textacy bs4 requests

You also need to install the 'small' English model for spaCy:

python3 -m spacy download en_core_web_sm

I probably should be making a requirements.txt or a Pipfile, but i'm lazy.

Credits

This code was based on this little gist by Adam Geitgey. The rest was done by Hay Kranen.

The Wikipedia content is retrieved from the excellent Wikimedia REST API which more people should use.

Given that this is mostly a textbook example, this code is in the public domain.

wiki-text-nlp's People

Contributors

hay avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.