Code Monkey home page Code Monkey logo

mediawiktory's People

Contributors

smostovoy avatar zverok avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

mediawiktory's Issues

Usability: shorter and wiser action construction

Current example:

response = client.
  query.
  generator(categorymembers: {title: 'Category:Countries_in_South_America', limit: 30}).
  prop(revisions: {prop: :content}).
  perform

Imaginary example:

response = client.
  query.
  generator(categorymembers: 'Category:Countries_in_South_America'). # singular values -- sets first key of hash
  limit(30). # limit is common and propagated to all sub-modules having limit parameters
  prop(revisions: :content). # again -- singular value by default is value for first key
  perform

This approach may one time become fragile, but worth trying? Or some another approach?

Client: more useful shortcuts

  • Popular wikis shortcuts (copy from infoboxer):
MediaWiktory.wikipedia.query...
MediaWiktory.wikitravel('ru').query....
MediaWiktory.wikia('tardis', 'fr').query...
  • Default path to /w/api.php (not always there, so, can't do it uncoditionally), like MediaWiktory::Client.new('https://en.wikipedia.org') โ€” no need to path specification.

Deal with redirects

As per this docs

Most reasonable default behavior, I assume, is receive content (follow redirects), but there should be an opt-out (for writing bots and other study software, targeting EXACTLY redirects logic).

YARD docs

...for everything.

YARD allows "fake" docs for auto-generated methods, so, we just need to make API docs for all modules clearly but concisely resembling original.

Usability: runtime docs

It's not hard to generate some more code, and allow to have things like:

client.query.params
# => [:titles, :list, :meta, :generator....]
puts client.query.help
# Prints something like this page: https://en.wikipedia.org/w/api.php?action=help&modules=query

With multitude of actions and modules and modules and submodules... Can be REALLY helpful.

Client: raw Faraday builder exposure

Something like

c = MediaWiktory::Wikipedia::Api.new do |b|
  # b is Faraday::Builder, allowing to change underlying web client, add middleware and so on
end

Client: support cookies

Cookies are important.
Need to find cookies details in MediaWiki docs, then implement! Use faraday-cookie_jar gem or something.

Support multiple versions of MediaWiki

Many popular sites have MediaWiki version lower than latest. Some of them even as old as 1.19 (of 2012!)
Differences between versions can both in available modules and core logic (like token management)

Ideal support for multiple version should consist of:

  • multiple generated modules lists, like lib/generated/<version>/*.rb (currently, code is generated from online help at mediawiki.org/w/api.php?action=help; for multiple versions it should be generated from MediaWiki sources, using tags?)
  • multiple clients, like MediaWiki::Client_v1_19, loading different modules
  • transparent API version discovery on client creation.

Sub-ideal solution should at least tag modules with minimal version they require, and warn user on attempt to use those modules on wikis with lower version.

NB: In older versions of MediaWiki (even in slightly older ones) query continuation logic is much more complex and inconsistent than now.

Specialized hi-level clients

Only a draft thoughts currently, but I can imagine, for ex., some special QueryClient, which is more "semantic" than thin API wrapper with its query, revisions, prop and so on:

c = QueryClient.new('https://en.wikipedia.org')
c.page('Argentina').wikitext
c.pages('Argentina', 'Bolivia')
c.category('Soth America').count
c.category('Soth America').pages
c.search(prefix: 'List of').count
# ...and so on

Same way, specialized EditClient or AdminWatchClient could be implemented.

Client: default params

From this page many params, like "assert", "origin", "maxage", and so on, should be passed at client creation (and optionally can be rewritten on each request).

Like this:

client = MediaWiktory::Client.
  new('https://en.wikipedia.org', assert: 'user') # default value for each request

client.query.
  ...
  perform(assert: 'bot') # rewriting for concrete action

Mergin for keys other than pages

For example, wikipedia.query.list(prefixsearch: {search: 'List of cities in'}) -- will return key query.prefixsearch, and this key's content should be merged on subsequent calls

How to get a list of song search links?

Hi,
Not an issue but a question. I'm doing my final project for a coding school and making an app with Rails backend/React frontend. I would like to allow a user to search a song title and return matches based on that search from Mediawiki API. I've been playing with some of the parameters but can't quite hone in on that specific search query. Would also be great if it returned an image with the link. Can anyone point me in the right direction? I've also been messing with the Mediawiki sandbox but also having trouble honing in with query search.
Thanks,
D

Bug with pages merging

w = MediaWiktory::Client.new('https://en.wikipedia.org/w/api.php')
res = w.query.generator(prefixsearch: {search: 'Guardians', limit: 100}).prop(revisions: {prop: :content}, info: {prop: :url}).redirects(true).perform
res.continue!
# ArgumentError: Unknown key in continued page: revisions
#      from /media/storage/work/gems/mediawiktory/lib/mediawiktory/page.rb:32:in `block in merge!'

If you look at raw structure: page 1, page -- you'll notice some strange things, like one page is described as pageid+title, and its content is on continuation page. So, current merging algorithm is not sufficient.

Generator: `--only action1,action2,action3`

Lots of clients does not require all 100+ actions, so generator for them could probably be "optimized" with command line argument which actions to choose. Less code, less classes required, less memory consumption and so on.

Better errors processing

Follow-up of #8

  • Not loose information from API:
    • API response: {"error"=>{"code"=>"gcmmissingparam", "info"=>"One of the parameters cmtitle, cmpageid is required", "*"=>"See https://en.wikipedia.org/w/api.php for API usage"}}
    • Response::Error currenty preserves only "info" field
  • Maybe try to investigate/parse into errors; for ex., here we have gcm prefix in error code and cm-something params in message. In best case, MediaWiktory could say from it something like: params are missing for generator categorymembers, param list title, pageid and return this information structured. Or maybe it is completely superfluous :)

Usability: better support for namespaces

Currenlty, we have:

  • in request: query.generator(:search).namespace(0)
  • in response, page description: {..., namespace: 0, ...}

It is not that humane. Potentially, Api should introspect what namespaces current wiki have and provide some constants or shortcuts for more obvious work with them, not their ids.

Introspection could be done on code generation stage.

Gem quality: badges

Mininum:

  • gem version;
  • Travis CI.

Better:

  • code climate;
  • gemnasium (dependencies actuality);
  • test coverage;
  • docs coverage (? with InchCI).

Of course, values in "better" section badges should became reasonable before publishing!

See also here: https://github.com/badges/shields for list of useful badges.

Token management

All data-modifying actions require some dance with "tokens". It should be done transparently for user.
See https://www.mediawiki.org/wiki/API:Tokens

Note, that token management have different behavior in:

  • versions < 1.20
  • version 1.20-1.23
  • version 1.24+

See also #9 about version compatibility

NB: Current version of "manual" token management is:

token = api.query.meta(:tokens).response.dig('tokens', 'csrftoken')
response = api.edit.title('Wikipedia:Sandbox').text("Test '''me''', MediaWiktory!").token(token).response

It is, in fact, not that bad, and probably not worth over-automatization at all.

Params: validations

Action should have method for parameter validation, and call it automatically before perform. Possible validations:

  • parameter types;
  • parameter limitations (like "50 or less" for numerical params), including complex ones, like "50 for users, 500 for bots" (and MediaWiktory user can explicitly say "my code has bots right")
  • forcing required params and param combinations, including complex ones (param A or B is required; param A and B are required; param A is required if param B is present, and prohibited otherwise; )

Response: cleaner access to deeply buried values

For example, if you want page content, currently you'll do something like this: pages.first.revisions.first.content['*'], which is kinda awful.

Don't invented, still, what could be done here.
Things should became simpler, but magic should be avoided.

Response: errors and warning processing

If response contains errors key, an error should be raised
If response contains warnings -- do nothing by default, but allow easy access to the warnings (and maybe have mode for printing the warnings?)

Editing: protection agains conflicts

From here:

If you want to protect against edit conflicts (which is wise), you also need to get the timestamp of the last revision

MediaWiktory could automate it (check if you still editing last revision), with possible opt-out.

Usage examples

Comprehensive and "real" usage examples should be written for several use cases:

  • useful bot (watching pages; fixing pages in batches);
  • information extraction (gathering pages and information about them, or images, or...);
  • study (gathering complicated statistical data like "how many edits had been done in last month by unregistered users with splitting by major categories, amount of edit and time of the day");
  • alternative client ("casual" usage like "look for page, eventually do small edits")
    ...

Usability: Page class

Page is one of main concepts in MediaWiki content, and having it as a Hashie is a bit of under-implementation!

Better warnings processing

Follow-up of #8

  • Response#warnings? (or Response#has_warnings?)
  • Option to print warnings while performing actions: both for entire client and for concrete action

Usability: inspect

There should be pretty inspect both for Action and for Response: informative yet concise.

Generator: Enchanced API parser/validator

Examples of uncovered cases:

  • To specify all values, use *.
  • ...A "csrf" token retrieved from action=query&meta=tokens
    • should be probably converted to "Retrieved by api.query.meta(:tokens)"
  • Flags: read/write rights
  • This module is deprecated in favor of action=query&meta=tokens
  • Sortkey to start listing from, as returned by cmprop=sortkey. Can only be used with cmsort=sortkey.

Also would nice to:

  • have a links in each module to source docs
  • enchanced docs (and probably even shorter apis) for parameter-less modules, there are few.

Client: follow HTTP redirects

Currenlty MediaWiktory will fail on attempt to get something from http://en.wikipedia.org because of redirect to httpS. It should be handled transparently.

Params: cleaner work with Enums

  • Enum values are always converted to symbols
  • Parameter expecting list of enums should accept one symbol (like prop: :categories, currently, developer is required to write prop: [:categories], which is dumb)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.