metacpan / metacpan-api Goto Github PK
View Code? Open in Web Editor NEWA free, open API for everything you want to know about CPAN
Home Page: http://www.metacpan.org/
License: Other
A free, open API for everything you want to know about CPAN
Home Page: http://www.metacpan.org/
License: Other
Some abstracts (e.g. http://search.cpan.org/perldoc?MooseX::App::Cmd) contain POD in the abstract.
Not sure how we should handle this. Either we add a property "abstract_html" or remove the POD altogether from the abstract. IMHO, the abstract property should not contain any markup (neither pod nor html).
Track # of watchers of Github repos for distributions. Possibly also track changes over time.
It would be helpful to have a status page right on www.metacpan.org or www.metacpan.org/status We could include stats on what is currently in the index # of dists, authors etc. We could also list our module coverage, whether the API is currently online etc.
Perhaps also a few sample stats on how many PAUSE authors in the index have listed their Github accounts etc.
http://search.cpan.org/faq.html#Is_there_a_API?
Would help if anyone ever wants to migrate.
Pass/Fail/Unknown data needs to added to dist info.
Use ElasticSearch's percolate feature to automate the tagging of modules:
http://www.elasticsearch.org/guide/reference/api/percolate.html
A release document is sent to the ES server and ES returns a list of matching tags.
Examples:
::Plack::Source is used to extract single files from a tarball and return the source. Right now it does that only for tar.gz files.
We might want to consider to extract the whole tarball and keep that around instead of extracting only one single file. We can the replace http://cpansearch.perl.org/src/ with our service too (i.e. directory browsing). We won't have to extract all CPAN tarballs but only those that are requested.
Commenting on dists (with version #) could be added once the Twitter auth etc is stable.
You should be able to bookmark modules/dists for viewing later. This is not the same as marking as a favourite.
See http://www.cpan.org/scripts/ and http://www.cpan.org/scripts/submitting.html.
Scripts are single perl files, that can be uploaded to PAUSE. search.cpan.org doesn't index them and they don't seem to be widely used.
Something like this could also be included in the index: http://babyl.dyndns.org/techblog/entry/schwartz-factor
The index should keep current data on # of open issues for repositories referred to in META.yml files. Probably open issues and issues tagged as "bug".
curl -XPUT http://api.metacpan.org/search/release/latest -d '{"query":{"match_all":{}},"filter":{"and":[{"term":{"status":"latest"}},{"term":{"release.distribution.raw":"$1"}}]}}'
curl -XGET http://api.metacpan.org/search/release/latest/Net-FreshBooks-API
This API allows users to store custom searches in MetaCPAN and execute them with parameters.
CPAN::Meta fails to parse some META.yml files
So tools could be built to say:
"install DBIx::Simple with the knowledge of 2001-02-12"
We can easily provide page view stats as an indicator of popular modules. We can do the same for dists downloaded via HTTP from our CPAN mirror.
We may want to look at letting other trusted sources report download statistics so that we can aggregate this information. One way would be to encourage people to configure their favourite command line installer to user our CPAN mirror via HTTP. It would be a very easy (and painless) way for people to contribute information back to the system.
Would be very nice to have the PerlMongers group info in the ES index.
http://www.pm.org/groups/perl_mongers.xm
That would mean we'd only need the .pm group name in the author info rather than accompanying links etc.
Perl api on top of the http api - for testing and also consumer authors.
Might also help mask api incompatibilities in at least perl space
http://search.cpan.org/~xsawyerx/MetaCPAN-API-0.02/ does exist - but might be an idea to have an official integrated version?
01mailrc.txt.gz contains CENSORED email addresses.
We should probably replace them with [email protected] or null them
Now that POD is in the index, it would be helpful to have a translation layer in the API. For example, /pod/Moose should return straight POD. /pod2html/Moose would return nicely formatted HTML and /pod2textile/Moose would return textile etc.
Once the auth system is finished, we'll need to set up a site where anyone can create an account via a Twitter login. If the user has a PAUSE id, they can request one or more author roles to be added to their account. We will send an authentication email to [email protected] Once authenticated, they'll be able to use the web app to update their author info.
Going forward authors could add metadata on modules and dists as well. Dists could be marked as:
Deprecated
Unloved (In need of new maintainer)
Looking for co-maintainers
Ideally the index will eventually be expanded by allowing users to log in and tag modules, upvote, downvote etc. The architecture of this system is, as of yet, undefined.
Frepan seems to be able to index dists within a few minutes of release. We need to explore how to do this as well. Frequent rsyncs would be overkill. We're probably fine with the daily rsync, but if we can regularly process a feed of released dists and then fetch those dists manually for indexing, that would likely get us there.
Have a centralized endpoint to submit bugs to the appropriate place.
Use cases:
Module on github
Report directly bug report directly to the github repo
Report bug using the metacpan identity on github and send reporter a link to the issue if he supplies an email address
Report to RT by sending an email (using his email address as sender) or redirect user to RT (if he wishes to).
Up for discussion :-)
Would be helpful to tag which modules are actually in core. Also helpful to know which modules are "dual-life".
Looks like those files can be found in something like: local/lib/perl5/5.12.2/pods They should be added to the index after the actual module POD is also available.
A search for stosberg gives me a progress-bar-of-death:
http://search.metacpan.org/#/author/STOSBERG
But a search for "MARKSTOS" brings up a page for me quickly, including showing that my name is "Mark Stosberg"
http://search.metacpan.org/#/author/MARKSTOS
Mark
Some ideas how to differentiate between a Module and Documentation
These are some examples that caused me some headache. Please feel free to comment and look for inconsistencies on the CPAN.
Example:
http://cpansearch.perl.org/src/RJBS/perl-5.12.3/pod/perltoot.pod
http://cpansearch.perl.org/src/DOY/Moose-2.0000/lib/Moose/Manual.pod
http://search.cpan.org/~perler/MooseX-Attribute-Deflator-2.1.2/README.pod
http://cpansearch.perl.org/src/MLEHMANN/AnyEvent-5.31/lib/AnyEvent.pm
http://cpansearch.perl.org/src/MLEHMANN/AnyEvent-5.31/lib/AE.pm
This seems to be a problem with Archive::Any:
$ bin/metacpan release ~/CPAN/authors/id/R/RJ/RJBS/perl-5.12.3.tar.bz2
2011/04/25 12:08:03 I release: Processing /Users/mo/CPAN/authors/id/R/RJ/RJBS/perl-5.12.3.tar.bz2
No handler available for type 'application/x-bzip2' at /Users/mo/perl5/perlbrew/perls/perl-5.12.3/lib/site_perl/5.12.3/Archive/Any.pm line 179.
2011/04/25 12:08:03 F release: Can't call method "is_naughty" on an undefined value at /Users/mo/Documents/workspace/cpan-api/bin/../lib/MetaCPAN/Script/Release.pm line 129.
Write a better mapping for the author meta data. Need to update all existing .json files and index them.
Add data on dependents to index. Much like cpan-mangler uses this information.
We should set up Nagios or something similar to send alerts if/when services go down, like the API, the CPAN mirror (if, for example, there's an issue with the Middleware) and cpanvote.
Right now, our indexer uses alarm
to kill Module::Metadata when it parses modules like Acme::BadExample that do a while(1) loop in the version block. But there could be more destructive code.
The code to fix this is already there:
https://github.com/andk/pause/blob/master/lib/PAUSE/mldistwatch.pm#L2560
https://github.com/andk/pause/blob/master/lib/PAUSE/mldistwatch.pm#L2688
The REST API is documented well enough, but we don't have any examples of how people can run more complex queries on the index. This could take the form of a blog post or a page in the wiki.
We need to map out where the project is going in the coming months so that we have something to focus on and also so that we have a clear direction for anyone who wishes to contribute. The wiki would probably be a good home for this type of document.
Basic Info: a sample search page as proof of concept for users to use the api.
Implementation: my idea is to create a single-page javascript engine for using the api, pulling data via ajax and managing the display of the search results/pod/source etc. through some js magic. No server required for running it.
See https://github.com/CPAN-API/cpan-api/blob/master/lib/MetaCPAN/Plack/Source.pm
Currently, we are opening the tarball, extract the requested file, store it in a temp directory and serve it. Subsequent requests to that file will be handled directly from the temp directory.
What we should do instead is extract the whole tarball, so we let users browse the directory and can do diffs more easily.
Also, currently we only extract .tar.gz files. Use Archive::Any instead.
https://github.com/CPAN-API/cpan-api/blob/master/lib/MetaCPAN/Plack/Base.pm#L64
via IRC: http://irclog.perlgeek.de/metacpan/2011-05-07#i_3689099
-> you would basically add something there
-> elsif PATH_INFO =~ _search and METHOD eq GET or someting
-> and then you need to Plack::App::Proxy to the es server
We need an IRC channel for general connectedness re: entire project.
rafl:
i was wondering. do you guys have docs giving a brief architectural overview of things?
moonk: feel free to just explain things to me over beer. i'll be happy to volunteer to write it down
Don't forget to mention:
$ bin/metacpan
rafl:
just go with http://api.metacpan.org/$n/, where $n isa Int
but spec out that metacpan consumers should follow redirects
Each API version has it's own index in ElasticSearch.
Integrate RT issue counts on a per-distribution basis: http://rt.cpan.org/Public/bugs-per-dist.tsv This obviously goes hand in hand with adding Github issue counts.
Integrate data available from the StackOverflow API. Questions/threads about modules etc.
Here are JSON results for a search against their API for "Data::Dumper":
curl --compressed -XGET "http://api.stackoverflow.com/1.0/search?intitle=Data::Dumper&key=EByZgXQ_-U-DuGKS38yYjA"
The 'key' argument is an API key that I've set up specifically for CPAN-API. Here's more info on their API keys if you're interested (likely we'll probably want to implement something similar):
http://stackapps.com/questions/67/how-api-keys-work
And here's the full API documentation:
Free SSL certs at http://cert.startcom.org/ (work in most browsers and even iOS)
We should enable SSL for the API and encourage it's use. Especially if we start personalization (i.e. session cookies etc. should be encrypted)
We need to set up some init scripts to ensure that all required services come back online after a reboot.
Dists and modules should be taggable. Perhaps authors as well. There needs to be discussion on the UI and about standard and user-defined tags.
compare: http://search.metacpan.org/#/dist/CGI-Application ( detects 3.31 as the latest version.
with: http://search.cpan.org/dist/CGI-Application/ (detects 4.31 as the latest version )
barbie:
search.cpan.org gets them from the cpanstats SQLite database available from the development site: http://devel.cpantesters.org. The DB is updated every 6 hours, and typically search.cpan.org takes a copy once a day.
CPAN-API/search-metacpan-org#27
Favourites are not the same as bookmarks. Bookmarks indicate you want to revisit the docs. Favourites indicate some satisfaction with the product.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.