The metacpan-api's discuss from metacpan

POD parse module/release abstract

Some abstracts (e.g. http://search.cpan.org/perldoc?MooseX::App::Cmd) contain POD in the abstract.
Not sure how we should handle this. Either we add a property "abstract_html" or remove the POD altogether from the abstract. IMHO, the abstract property should not contain any markup (neither pod nor html).

Github watchers

Track # of watchers of Github repos for distributions. Possibly also track changes over time.

MetaCPAN Status Page

It would be helpful to have a status page right on www.metacpan.org or www.metacpan.org/status We could include stats on what is currently in the index # of dists, authors etc. We could also list our module coverage, whether the API is currently online etc.

Perhaps also a few sample stats on how many PAUSE authors in the index have listed their Github accounts etc.

Match search.cpan.org api

http://search.cpan.org/faq.html#Is_there_a_API?

Would help if anyone ever wants to migrate.

CPANTS

Pass/Fail/Unknown data needs to added to dist info.

Automate tagging

Use ElasticSearch's percolate feature to automate the tagging of modules:

http://www.elasticsearch.org/guide/reference/api/percolate.html

A release document is sent to the ES server and ES returns a list of matching tags.

Examples:

abstract contains "deprecated" => tag as deprecated
dependency contains XSLoader => tag as XS code
dependency contains Moose ...

fix ::Plack::Source to use Archive::Any

::Plack::Source is used to extract single files from a tarball and return the source. Right now it does that only for tar.gz files.
We might want to consider to extract the whole tarball and keep that around instead of extracting only one single file. We can the replace http://cpansearch.perl.org/src/ with our service too (i.e. directory browsing). We won't have to extract all CPAN tarballs but only those that are requested.

Dist Comments

Commenting on dists (with version #) could be added once the Twitter auth etc is stable.

Bookmarking

You should be able to bookmark modules/dists for viewing later. This is not the same as marking as a favourite.

Index CPAN scripts

See http://www.cpan.org/scripts/ and http://www.cpan.org/scripts/submitting.html.

Scripts are single perl files, that can be uploaded to PAUSE. search.cpan.org doesn't index them and they don't seem to be widely used.

Schwartz Factor

Something like this could also be included in the index: http://babyl.dyndns.org/techblog/entry/schwartz-factor

Github Issues

The index should keep current data on # of open issues for repositories referred to in META.yml files. Probably open issues and issues tagged as "bug".

Store user searches in ElasticSearch

curl -XPUT http://api.metacpan.org/search/release/latest -d '{"query":{"match_all":{}},"filter":{"and":[{"term":{"status":"latest"}},{"term":{"release.distribution.raw":"$1"}}]}}'
curl -XGET http://api.metacpan.org/search/release/latest/Net-FreshBooks-API

This API allows users to store custom searches in MetaCPAN and execute them with parameters.

Fix YAML parsing

CPAN::Meta fails to parse some META.yml files

02packages for specific date

So tools could be built to say:

"install DBIx::Simple with the knowledge of 2001-02-12"

Package Download and Page View Analytics

We can easily provide page view stats as an indicator of popular modules. We can do the same for dists downloaded via HTTP from our CPAN mirror.

We may want to look at letting other trusted sources report download statistics so that we can aggregate this information. One way would be to encourage people to configure their favourite command line installer to user our CPAN mirror via HTTP. It would be a very easy (and painless) way for people to contribute information back to the system.

Add PerlMongers groups to index

Would be very nice to have the PerlMongers group info in the ES index.

http://www.pm.org/groups/perl_mongers.xm

That would mean we'd only need the .pm group name in the author info rather than accompanying links etc.

MetaCPAN::Consumer::...

Perl api on top of the http api - for testing and also consumer authors.

Might also help mask api incompatibilities in at least perl space

http://search.cpan.org/~xsawyerx/MetaCPAN-API-0.02/ does exist - but might be an idea to have an official integrated version?

Some emailadresses are marked as CENSORED

01mailrc.txt.gz contains CENSORED email addresses.
We should probably replace them with [email protected] or null them

POD Translation

Now that POD is in the index, it would be helpful to have a translation layer in the API. For example, /pod/Moose should return straight POD. /pod2html/Moose would return nicely formatted HTML and /pod2textile/Moose would return textile etc.

Author Updating via Web App

Once the auth system is finished, we'll need to set up a site where anyone can create an account via a Twitter login. If the user has a PAUSE id, they can request one or more author roles to be added to their account. We will send an authentication email to [email protected] Once authenticated, they'll be able to use the web app to update their author info.

Going forward authors could add metadata on modules and dists as well. Dists could be marked as:

Deprecated
Unloved (In need of new maintainer)
Looking for co-maintainers

Authentication System for Gathering Metadata from Humans

Ideally the index will eventually be expanded by allowing users to log in and tag modules, upvote, downvote etc. The architecture of this system is, as of yet, undefined.

Faster Index Updates

Frepan seems to be able to index dists within a few minutes of release. We need to explore how to do this as well. Frequent rsyncs would be overkill. We're probably fine with the daily rsync, but if we can regularly process a feed of released dists and then fetch those dists manually for indexing, that would likely get us there.

Proxy bug reports through metacpan

Have a centralized endpoint to submit bugs to the appropriate place.

Use cases:

Module on github
- Reporter on github:
Report directly bug report directly to the github repo
- Reporter not on github:

Report bug using the metacpan identity on github and send reporter a link to the issue if he supplies an email address

Module not on github

Report to RT by sending an email (using his email address as sender) or redirect user to RT (if he wishes to).

Up for discussion :-)

Which modules are in core?

Would be helpful to tag which modules are actually in core. Also helpful to know which modules are "dual-life".

perldoc

Looks like those files can be found in something like: local/lib/perl5/5.12.2/pods They should be added to the index after the actual module POD is also available.

search for last name of author that should match never returns

A search for stosberg gives me a progress-bar-of-death:

http://search.metacpan.org/#/author/STOSBERG

But a search for "MARKSTOS" brings up a page for me quickly, including showing that my name is "Mark Stosberg"

http://search.metacpan.org/#/author/MARKSTOS

Mark

Documentation vs. Module vs. Package

Some ideas how to differentiate between a Module and Documentation

These are some examples that caused me some headache. Please feel free to comment and look for inconsistencies on the CPAN.

Example:

http://cpansearch.perl.org/src/RJBS/perl-5.12.3/pod/perltoot.pod

.pod extension but no package declaration
- set name to perltoot (as mentioned in the NAME section) and do not mark as module

http://cpansearch.perl.org/src/DOY/Moose-2.0000/lib/Moose/Manual.pod

.pod extension with package declaration Moose::Manual
- set name to Moose::Manual but do not mark as module since it has a .pod extension

http://search.cpan.org/~perler/MooseX-Attribute-Deflator-2.1.2/README.pod

.pod extension with no package declaration and no NAME section
- set name to README.pod (i.e. full path inside the tarball)

http://cpansearch.perl.org/src/MLEHMANN/AnyEvent-5.31/lib/AnyEvent.pm

multiple package declarations in one file (AnyEvent, AE, ...)
NAME section which say "AnyEvent"
- set name to AnyEvent, AE, ..., thus users who are looking for AE still find the correct module

http://cpansearch.perl.org/src/MLEHMANN/AnyEvent-5.31/lib/AE.pm

NAME section says "AE"
contains a package AE declaration
- set name to AE, users who search for AE will receive both files (AnyEvent.pm and AE.pm)

.bz2 tarballs are not being indexed

This seems to be a problem with Archive::Any:

$ bin/metacpan release ~/CPAN/authors/id/R/RJ/RJBS/perl-5.12.3.tar.bz2
2011/04/25 12:08:03 I release: Processing /Users/mo/CPAN/authors/id/R/RJ/RJBS/perl-5.12.3.tar.bz2
No handler available for type 'application/x-bzip2' at /Users/mo/perl5/perlbrew/perls/perl-5.12.3/lib/site_perl/5.12.3/Archive/Any.pm line 179.
2011/04/25 12:08:03 F release: Can't call method "is_naughty" on an undefined value at /Users/mo/Documents/workspace/cpan-api/bin/../lib/MetaCPAN/Script/Release.pm line 129.

Update author mapping and author.json files

Write a better mapping for the author meta data. Need to update all existing .json files and index them.

Module Dependents

Add data on dependents to index. Much like cpan-mangler uses this information.

Monitor Services (Nagios?)

We should set up Nagios or something similar to send alerts if/when services go down, like the API, the CPAN mirror (if, for example, there's an issue with the Middleware) and cpanvote.

Patch Module::Metadata to use Safe.pm for version evaling

Right now, our indexer uses alarm to kill Module::Metadata when it parses modules like Acme::BadExample that do a while(1) loop in the version block. But there could be more destructive code.

The code to fix this is already there:

https://github.com/andk/pause/blob/master/lib/PAUSE/mldistwatch.pm#L2560

https://github.com/andk/pause/blob/master/lib/PAUSE/mldistwatch.pm#L2688

Add Complex ElasticSearch Queries to API Docs

The REST API is documented well enough, but we don't have any examples of how people can run more complex queries on the index. This could take the form of a blog post or a page in the wiki.

MetaCPAN Road Map

We need to map out where the project is going in the coming months so that we have something to focus on and also so that we have a clear direction for anyone who wishes to contribute. The wiki would probably be a good home for this type of document.

search.metacpan.org

Basic Info: a sample search page as proof of concept for users to use the api.

Implementation: my idea is to create a single-page javascript engine for using the api, pulling data via ajax and managing the display of the search results/pod/source etc. through some js magic. No server required for running it.

Fix /source endpoint to extract whole archive and support bz2

See https://github.com/CPAN-API/cpan-api/blob/master/lib/MetaCPAN/Plack/Source.pm

Currently, we are opening the tarball, extract the requested file, store it in a temp directory and serve it. Subsequent requests to that file will be handled directly from the temp directory.

What we should do instead is extract the whole tarball, so we let users browse the directory and can do diffs more easily.

Also, currently we only extract .tar.gz files. Use Archive::Any instead.

Configure ES proxy to allow params in GET requests

https://github.com/CPAN-API/cpan-api/blob/master/lib/MetaCPAN/Plack/Base.pm#L64

via IRC: http://irclog.perlgeek.de/metacpan/2011-05-07#i_3689099

-> you would basically add something there
-> elsif PATH_INFO =~ _search and METHOD eq GET or someting
-> and then you need to Plack::App::Proxy to the es server

IRC Channel

We need an IRC channel for general connectedness re: entire project.

Document general architecture

rafl:

i was wondering. do you guys have docs giving a brief architectural overview of things?

moonk: feel free to just explain things to me over beer. i'll be happy to volunteer to write it down

Don't forget to mention:

How are the identifiers built?
What role does ElasticSearchX::Model play? (MetaCPAN::Model namespace)
MetaCPAN::Script:: namespace
MetaCPAN::Plack:: namespace
$ bin/metacpan

Implement no_index of packages and namespaces

See http://search.cpan.org/~dagolden/CPAN-Meta-2.110580/lib/CPAN/Meta/Spec.pm#no_index

API versioning

rafl:

just go with http://api.metacpan.org/$n/, where $n isa Int

but spec out that metacpan consumers should follow redirects

Each API version has it's own index in ElasticSearch.

RT Issues

Integrate RT issue counts on a per-distribution basis: http://rt.cpan.org/Public/bugs-per-dist.tsv This obviously goes hand in hand with adding Github issue counts.

StackOverflow

Integrate data available from the StackOverflow API. Questions/threads about modules etc.

Here are JSON results for a search against their API for "Data::Dumper":

curl --compressed -XGET "http://api.stackoverflow.com/1.0/search?intitle=Data::Dumper&key=EByZgXQ_-U-DuGKS38yYjA"

The 'key' argument is an API key that I've set up specifically for CPAN-API. Here's more info on their API keys if you're interested (likely we'll probably want to implement something similar):

http://stackapps.com/questions/67/how-api-keys-work

And here's the full API documentation:

http://stackapps.com/questions/1/api-documentation-and-help

search.cpan.org gets them from the cpanstats SQLite database available from the development site: http://devel.cpantesters.org. The DB is updated every 6 hours, and typically search.cpan.org takes a copy once a day.

CPAN-API/search-metacpan-org#27

Favourite Modules/Dists

Favourites are not the same as bookmarks. Bookmarks indicate you want to revisit the docs. Favourites indicate some satisfaction with the product.

metacpan / metacpan-api Goto Github PK

metacpan-api's Issues

Recommend Projects

Recommend Topics

Recommend Org