Code Monkey home page Code Monkey logo

metacpan-api's Issues

Github watchers

Track # of watchers of Github repos for distributions. Possibly also track changes over time.

MetaCPAN Status Page

It would be helpful to have a status page right on www.metacpan.org or www.metacpan.org/status We could include stats on what is currently in the index # of dists, authors etc. We could also list our module coverage, whether the API is currently online etc.

Perhaps also a few sample stats on how many PAUSE authors in the index have listed their Github accounts etc.

CPANTS

Pass/Fail/Unknown data needs to added to dist info.

fix ::Plack::Source to use Archive::Any

::Plack::Source is used to extract single files from a tarball and return the source. Right now it does that only for tar.gz files.
We might want to consider to extract the whole tarball and keep that around instead of extracting only one single file. We can the replace http://cpansearch.perl.org/src/ with our service too (i.e. directory browsing). We won't have to extract all CPAN tarballs but only those that are requested.

Dist Comments

Commenting on dists (with version #) could be added once the Twitter auth etc is stable.

Bookmarking

You should be able to bookmark modules/dists for viewing later. This is not the same as marking as a favourite.

Github Issues

The index should keep current data on # of open issues for repositories referred to in META.yml files. Probably open issues and issues tagged as "bug".

Store user searches in ElasticSearch

curl -XPUT http://api.metacpan.org/search/release/latest -d '{"query":{"match_all":{}},"filter":{"and":[{"term":{"status":"latest"}},{"term":{"release.distribution.raw":"$1"}}]}}'
curl -XGET http://api.metacpan.org/search/release/latest/Net-FreshBooks-API

This API allows users to store custom searches in MetaCPAN and execute them with parameters.

Package Download and Page View Analytics

We can easily provide page view stats as an indicator of popular modules. We can do the same for dists downloaded via HTTP from our CPAN mirror.

We may want to look at letting other trusted sources report download statistics so that we can aggregate this information. One way would be to encourage people to configure their favourite command line installer to user our CPAN mirror via HTTP. It would be a very easy (and painless) way for people to contribute information back to the system.

POD Translation

Now that POD is in the index, it would be helpful to have a translation layer in the API. For example, /pod/Moose should return straight POD. /pod2html/Moose would return nicely formatted HTML and /pod2textile/Moose would return textile etc.

Author Updating via Web App

Once the auth system is finished, we'll need to set up a site where anyone can create an account via a Twitter login. If the user has a PAUSE id, they can request one or more author roles to be added to their account. We will send an authentication email to [email protected] Once authenticated, they'll be able to use the web app to update their author info.

Going forward authors could add metadata on modules and dists as well. Dists could be marked as:

Deprecated
Unloved (In need of new maintainer)
Looking for co-maintainers

Faster Index Updates

Frepan seems to be able to index dists within a few minutes of release. We need to explore how to do this as well. Frequent rsyncs would be overkill. We're probably fine with the daily rsync, but if we can regularly process a feed of released dists and then fetch those dists manually for indexing, that would likely get us there.

Proxy bug reports through metacpan

Have a centralized endpoint to submit bugs to the appropriate place.

Use cases:

  • Module on github

    • Reporter on github:

    Report directly bug report directly to the github repo

    • Reporter not on github:

Report bug using the metacpan identity on github and send reporter a link to the issue if he supplies an email address

  • Module not on github

Report to RT by sending an email (using his email address as sender) or redirect user to RT (if he wishes to).

Up for discussion :-)

Which modules are in core?

Would be helpful to tag which modules are actually in core. Also helpful to know which modules are "dual-life".

perldoc

Looks like those files can be found in something like: local/lib/perl5/5.12.2/pods They should be added to the index after the actual module POD is also available.

Documentation vs. Module vs. Package

Some ideas how to differentiate between a Module and Documentation

These are some examples that caused me some headache. Please feel free to comment and look for inconsistencies on the CPAN.

Example:

http://cpansearch.perl.org/src/RJBS/perl-5.12.3/pod/perltoot.pod

  • .pod extension but no package declaration
    • set name to perltoot (as mentioned in the NAME section) and do not mark as module

http://cpansearch.perl.org/src/DOY/Moose-2.0000/lib/Moose/Manual.pod

  • .pod extension with package declaration Moose::Manual
    • set name to Moose::Manual but do not mark as module since it has a .pod extension

http://search.cpan.org/~perler/MooseX-Attribute-Deflator-2.1.2/README.pod

  • .pod extension with no package declaration and no NAME section
    • set name to README.pod (i.e. full path inside the tarball)

http://cpansearch.perl.org/src/MLEHMANN/AnyEvent-5.31/lib/AnyEvent.pm

  • multiple package declarations in one file (AnyEvent, AE, ...)
  • NAME section which say "AnyEvent"
    • set name to AnyEvent, AE, ..., thus users who are looking for AE still find the correct module

http://cpansearch.perl.org/src/MLEHMANN/AnyEvent-5.31/lib/AE.pm

  • NAME section says "AE"
  • contains a package AE declaration
    • set name to AE, users who search for AE will receive both files (AnyEvent.pm and AE.pm)

.bz2 tarballs are not being indexed

This seems to be a problem with Archive::Any:

$ bin/metacpan release ~/CPAN/authors/id/R/RJ/RJBS/perl-5.12.3.tar.bz2
2011/04/25 12:08:03 I release: Processing /Users/mo/CPAN/authors/id/R/RJ/RJBS/perl-5.12.3.tar.bz2
No handler available for type 'application/x-bzip2' at /Users/mo/perl5/perlbrew/perls/perl-5.12.3/lib/site_perl/5.12.3/Archive/Any.pm line 179.
2011/04/25 12:08:03 F release: Can't call method "is_naughty" on an undefined value at /Users/mo/Documents/workspace/cpan-api/bin/../lib/MetaCPAN/Script/Release.pm line 129.

Module Dependents

Add data on dependents to index. Much like cpan-mangler uses this information.

Monitor Services (Nagios?)

We should set up Nagios or something similar to send alerts if/when services go down, like the API, the CPAN mirror (if, for example, there's an issue with the Middleware) and cpanvote.

Add Complex ElasticSearch Queries to API Docs

The REST API is documented well enough, but we don't have any examples of how people can run more complex queries on the index. This could take the form of a blog post or a page in the wiki.

MetaCPAN Road Map

We need to map out where the project is going in the coming months so that we have something to focus on and also so that we have a clear direction for anyone who wishes to contribute. The wiki would probably be a good home for this type of document.

search.metacpan.org

Basic Info: a sample search page as proof of concept for users to use the api.

Implementation: my idea is to create a single-page javascript engine for using the api, pulling data via ajax and managing the display of the search results/pod/source etc. through some js magic. No server required for running it.

Fix /source endpoint to extract whole archive and support bz2

See https://github.com/CPAN-API/cpan-api/blob/master/lib/MetaCPAN/Plack/Source.pm

Currently, we are opening the tarball, extract the requested file, store it in a temp directory and serve it. Subsequent requests to that file will be handled directly from the temp directory.

What we should do instead is extract the whole tarball, so we let users browse the directory and can do diffs more easily.

Also, currently we only extract .tar.gz files. Use Archive::Any instead.

IRC Channel

We need an IRC channel for general connectedness re: entire project.

Document general architecture

rafl:

i was wondering. do you guys have docs giving a brief architectural overview of things?

moonk: feel free to just explain things to me over beer. i'll be happy to volunteer to write it down

Don't forget to mention:

  • How are the identifiers built?
  • What role does ElasticSearchX::Model play? (MetaCPAN::Model namespace)
  • MetaCPAN::Script:: namespace
  • MetaCPAN::Plack:: namespace
  • $ bin/metacpan

StackOverflow

Integrate data available from the StackOverflow API. Questions/threads about modules etc.

Here are JSON results for a search against their API for "Data::Dumper":

curl --compressed -XGET "http://api.stackoverflow.com/1.0/search?intitle=Data::Dumper&key=EByZgXQ_-U-DuGKS38yYjA"

The 'key' argument is an API key that I've set up specifically for CPAN-API. Here's more info on their API keys if you're interested (likely we'll probably want to implement something similar):

http://stackapps.com/questions/67/how-api-keys-work

And here's the full API documentation:

http://stackapps.com/questions/1/api-documentation-and-help

SSL Cert

Free SSL certs at http://cert.startcom.org/ (work in most browsers and even iOS)

We should enable SSL for the API and encourage it's use. Especially if we start personalization (i.e. session cookies etc. should be encrypted)

Tagging

Dists and modules should be taggable. Perhaps authors as well. There needs to be discussion on the UI and about standard and user-defined tags.

Favourite Modules/Dists

Favourites are not the same as bookmarks. Bookmarks indicate you want to revisit the docs. Favourites indicate some satisfaction with the product.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.