abinashmeher999 / pymediawiki Goto Github PK
View Code? Open in Web Editor NEW(Earlier known as wikipedia-category) A package to extract the properties of a wiki page. Uses MediaWiki API. Aspires to cover more page properties.
License: MIT License
(Earlier known as wikipedia-category) A package to extract the properties of a wiki page. Uses MediaWiki API. Aspires to cover more page properties.
License: MIT License
Rearrange the source files into appropriate folder structure. This link can be referred for best practices: http://docs.python-guide.org/en/latest/writing/structure/
This would aim to minimize calls to the API for repetitive requests. The idea is to use the information which has already been obtained from the API for a second query of the same page. There are many more concerns like how long before the cache expires etc which can be further discussed here if anyone is interested.
Package the code so that it can be published on PyPI and hence is available for installation through pip. Making it available as a conda
package is optional.
Refer http://docs.python-guide.org/en/latest/shipping/packaging/
Checklist of all API properties for reference:
If possible, we can try making asynchronous GET requests for faster outputs.
Currently the code takes only the page id as the input which is not very user friendly. This would involve adding more options like querying by wikipedia link or page title.
See https://www.mediawiki.org/wiki/API:Query for more information on what input does the MediaWiki query take.
Use a testing framework and setup simple tests that can be run on Travis CI.
There are many things that can be improved
WikiCatQuery
whose object needs to be instantiated before making any queries. This makes it closed to any extensions. By this logic a new class will have to be added for every new feature, which is wrong. Make an interface with WikiPage
class where you only add a method for a new feature. (Resolved in #19)Now that the project involves more than just categories. I would like to hear suggestions on what should it be named.
An isolated environment to which developers can switch and then start contributing. I would prefer Python 3 environment. This might also require minor refactoring of the code too.
While getting the categories for a wiki page, we can provide whether we need hidden categories or not as mentioned here. This needs to be implemented in the existing code wiki_cat.py#L14. This should be sent as parameter to the get_cat
function. For example,
def get_cat(self, pageid=None, include_hidden=False)
As mentioned in the MediaWiki API:Etiquette.
This would involve the following:
Refer to http://docs.python-guide.org/en/latest/writing/documentation/ for details on how to do the above.
The response of linkshere
is huge and it blocks the execution till it returns. So instead of setting the limit as max
it would be good to set it to something reasonable and wrap it in a generator.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.