book / backpan-index Goto Github PK
View Code? Open in Web Editor NEWProvide an index of BACKPAN
Home Page: https://metacpan.org/release/BackPAN-Index
License: Other
Provide an index of BACKPAN
Home Page: https://metacpan.org/release/BackPAN-Index
License: Other
NAME BackPAN::Index - An interface to the BackPAN index SYNOPSIS use BackPAN::Index; my $backpan = BackPAN::Index->new; # These are all DBIx::Class::ResultSet's my $files = $backpan->files; my $dists = $backpan->dists; my $releases = $backpan->releases("Acme-Pony"); # Use DBIx::Class::ResultSet methods on them my $release = $releases->single({ version => '1.23' }); my $dist = $backpan->dist("Test-Simple"); my $releases = $dist->releases; DESCRIPTION This downloads, caches and parses the BackPAN index into a local database for efficient querying. Its a pretty thin wrapper around DBIx::Class returning DBIx::Class::ResultSet objects which makes it efficient and flexible. The Comprehensive Perl Archive Network (CPAN) is a very useful collection of Perl code. However, in order to keep CPAN relatively small, authors of modules can delete older versions of modules to only let CPAN have the latest version of a module. BackPAN is where these deleted modules are backed up. It's more like a full CPAN mirror, only without the deletions. This module provides an index of BackPAN and some handy methods. METHODS new my $backpan = BackPAN::Index->new(\%options); Create a new object representing the BackPAN index. It will, if necessary, download the BackPAN index and compile it into a database for efficient storage. Initial creation is slow, but it will be cached. new() takes some options update Because it is rather large, BackPAN::Index caches a copy of the BackPAN index and builds a local database to speed access. This flag controls if the local index is updated. If true, forces an update of the BACKPAN index. If false, the index will never be updated even if the cache is expired. It will always create a new index if one does not exist. By default the index is cached and checked for updates according to "<$backpan-"cache_ttl>>. cache_ttl How many seconds before checking for an updated index. Defaults to an hour. debug If true, debug messages will be printed. Defaults to false. releases_only_from_authors If true, only files in the "authors" directory will be considered as releases. If false any file in the index may be considered for a release. Defaults to true. cache_dir Location of the cache directory. Defaults to whatever App::Cache does. backpan_index_url URL to the BackPAN index. Defaults to a sensible location. files my $files = $backpan->files; Returns a ResultSet representing all the files on BackPAN. files_by my $files = $backpan->files_by($cpanid); my @files = $backpan->files_by($cpanid); Returns all the files by a given $cpanid. Returns either a list of BackPAN::Index::Files or a ResultSet. dists my $dists = $backpan->dists; Returns a ResultSet representing all the distributions on BackPAN. dist my $dists = $backpan->dist($dist_name); Returns a single BackPAN::Index::Dist object for $dist_name. dists_by my $dists = $backpan->dists_by($cpanid); my @dists = $backpan->dists_by($cpanid); Returns the dists which contain at least one release by the given $cpanid. Returns either a ResultSet or a list of the Dists. dists_changed_since my $dists = $backpan->dists_changed_since($time); Returns a ResultSet of distributions which have had releases at or after after $time. releases my $all_releases = $backpan->releases(); my $dist_releases = $backpan->releases($dist_name); Returns a ResultSet representing all the releases on BackPAN. If a $dist_name is given it returns the releases of just one distribution. release my $release = $backpan->release($dist_name, $version); Returns a single BackPAN::Index::Release object for the given $dist_name and $version. releases_by my $releases = $backpan->releases_by($cpanid); my @releases = $backpan->releases_by($cpanid); Returns all the releases of a single author. Returns either a list of Releases or a ResultSet representing those releases. releases_since my $releases = $backpan->releases_since($time); Returns a ResultSet of releases which were released at or after $time. EXAMPLES The real power of BackPAN::Index comes from DBIx::Class::ResultSet. Its very flexible and very powerful but not always obvious how to get it to do things. Here's some examples. # How many files are on BackPAN? my $count = $backpan->files->count; # How big is BackPAN? my $size = $backpan->files->get_column("size")->sum; # What are the names of all the distributions? my @names = $backpan->dists->get_column("name")->all; # What path contains this release? my $path = $backpan->release("Acme-Pony", 1.01)->path; # Get all the releases of Moose ordered by version my @releases = $backpan->dist("Moose")->releases ->search(undef, { order_by => "version" }); AUTHOR Michael G Schwern <[email protected]> COPYRIGHT Copyright 2009, Michael G Schwern LICENSE This module is free software; you can redistribute it or modify it under the same terms as Perl itself. SEE ALSO DBIx::Class::ResultSet, BackPAN::Index::File, BackPAN::Index::Release, BackPAN::Index::Dist Repository: <http://github.com/acme/parse-backpan-packages> Bugs: <http://rt.cpan.org/Public/Dist/Display.html?Name=Parse-BACKPAN-Packages >
Since the files table contains the date, not the release, you can't do a simple order_by.
Might be nice also to have Distribution->releases order by date automatically.
Hey @book, would you turn on Travis for this repo? I don't think I can do it, I'm not an admin for this repo.
You'll want to merge the associated pull request first so there's a travis configuration file.
Right now its based on the file time, which is problematic (just look at the code).
Instead, store it in an extra table in the DB. This can also be set only AFTER a complete update is done which eliminates the need for an empty database check.
my $file = $release->file;
my $release = $file->release;
A method to list all the distributions by a particular author.
Both on github and in the tarball
Force a deletion of the database if the schemas don't match.
If they do, rather than deleting the database on update delete all the rows and then insert. This allows a failed update to rollback and leave the user with a usable database.
The root BackPAN URL used to make the url used to fetch a file is hard coded in BackPAN::Index::File.
That second bit will be tricky because DBIx::Class is the thing usually creating BackPAN::Index::File objects and changing how it does that initialization might be hard.
Its kind of non-obvious. releases_by() and files_by() would be nice, too.
dynamic_config is set to 1, so the presence of the META files is probably not necessary in the repository.
Regards, Slaven
In fact, it returns everything but the prefix. cpan_path?
For File, Release and Dist document what fields are available to the ResultSet for searching.
CC @neilbowers
The slowest part of BackPAN::Index is building the database. The whole thing has to be downloaded, read and rebuilt every change.
If we had "recent" indexes like on CPAN, this could be done much faster.
BackPAN::Index::Create would be changed to...
BackPAN::Index would be changed to...
What do you think?
You do not want to dump a DBIx::Class object.
People get confused about the difference between a distribution, a module and a release.
Note this in the PBP docs.
Release: distvname
File: prefix
Distribution: dist name
They're incomplete.
Before releasing BackPAN::Index, review the APIs of Release, File and Distribution to see if they could benefit from any changes. For example, prefix() is not the right name.
So two instances of BackPAN::Index don't fight each other doing the update.
update If true, force an update. If false, never update.
ttl Time for the cache to live.
debug Turn on debugging
releases_only_from_authors Only the authors directory is considered for releases
cache_dir The directory to put the cache
backpan_index_url Where to get the BACKPAN index file
List all the authors of a distribution.
The repo description still points at Parse-Backpan-Packages. I'd recommend changing it to https://metacpan.org/release/BackPAN-Index
Hey @book, are you still interested in admining and participating in this project? I can move it to evalEmpire to share the administration.
DBIx::Class objects are not conducive to simple Data::Dumper'ing. Write a method, maybe called as_hash() to output the data in a File, Release or Dist object as a hash.
Right now it touches the decompressed index file, which is kinda yuck and has to be done in two places. Instead, just touch a semaphore file.
For server-side reasons, the check in BackPAN::Index::IndexFile->index_url_mtime was failing and defaulting to 0. This meant the cache quietly never invalidated.
BackPAN::Index::Release->path
is too much for a lot of purposes. authors/id/S/SO/SOMEONE/Foo-Bar-1.23.tar.gz
is useful if you want to build a URL to fetch the file, but if all you want is a canonical name for a release archive, SOMEONE/Foo-Bar-1.23.tar.gz
is enough.
I've been calling this "short_path". I have some reservations that name isn't descriptive enough.
Right now we're building the DBIC schema each time at runtime. Not only is this slow, but its hard to debug and the DBIx::Class::Schema::Loader recommends against it.
Switch to building the schema files at module build time. This will require separating the database creation from the BackPAN::Index which it should be anyway.
The feature/schema_at_build branch exists for this issue.
This will also help debug rt.cpan.org 82107.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.