hirevo / alexandrie Goto Github PK
View Code? Open in Web Editor NEWAn alternative crate registry, implemented in Rust.
Home Page: https://hirevo.github.io/alexandrie/
License: Apache License 2.0
An alternative crate registry, implemented in Rust.
Home Page: https://hirevo.github.io/alexandrie/
License: Apache License 2.0
Currently, username & password are hard-coded into the database URL in alexandrie.toml
.
Would it be possible to allow the user to specify a credentials file for the password?
For example:
[database]
url = "mysql://127.0.0.1:3306/alexandrie"
user = "alex"
password_file = "~/db_creds"
The credentials file could likely be a single line just containing the password.
This change would mostly open up Docker options. mysql
and postgres
docker containers already exist, and passwords can be passed to them via a file mounted as a secret. This would allow mounting that same password file in the Alexandrie container, avoiding writing the password in multiple places.
Far from being a database expert, so I'm dropping this error here to hopefully get help with or to look at later:
❯ diesel database setup --migration-dir migrations/mysql/
Creating database: alexandrie
Running migration 2019-10-12-193526_initialise
Executing migration script migrations/mysql/2019-10-12-193526_initialise/up.sql
Failed with: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'concat(utc_date(), " ", utc_time()),
updated_at varchar(25) not null default' at line 5
Error generated when trying to set up a new mysql database
Mysql version: 8.0.19
i believe that in the current implementation, it is an error to upload a crate if there is a newer version in the index. Is that correct? Does crates.io behave the same way?
this precludes backporting fixes to older versions.
consider this use case
similarly for backporting features (which would be a minor release on an older major release)
currently, i think it would be impossible to publish the 1.0.1 version in the above example.
It is not always possible to force library users to upgrade to a new major version
it seems to me that the correct logic would be to allow publishing any version of a crate provided it is the latest version for a given major release (as opposed to the globally latest release).
Hi me again!
I manage to build a docker image with alexandrie (Dockerfile here) but I'm still wondering how some things are supposed to work.
I understand we need a git repo to store the crate-index, and when looking at crates.io one, there a lot of thing on the github repo.
What do I need to do?
Only the config.json file and alexandrie
will populate our private github repo?
And on the [index]
section, what other type
are available?
Hi,
I'm looking into alexandrie because we need a private registry at work and I noticed it needs Rust nightly to build. why is that?
Hello,
The UID and SIG set to 1000 can conflict with a user existing on the host running the containers. It could be handy to have control over their value.
Since these settings are managed by some args parameters, we could use the same mechanism as for the DATABASE
arg and replace this block:
build:
context: .
args:
- DATABASE=${DATABASE}
by
build:
context: .
args:
- DATABASE=${DATABASE}
- GROUP_ID=${GROUP_ID}
- USER_ID=${USER_ID}
Thanks,
Hugo
Cloning, building, running all worked I even created my account and login on my own Alexandrie server at 127.0.0.1:3000, all within a few minutes.
But for hours I cannot figure out from the provided documentation how to complete configuration of this server and how to configure a simple hello app (cargo new hello) to be able to publish to this server with cargo. Even if every parameter has descriptive names in the config file, I still don't know how to determine their values.
Please provide a specific working example.
The Replace the '<...>' placeholders by the real ones
instructions are not helpful in
https://hirevo.github.io/alexandrie/getting-started.html
Otherwise congrats to the nice and very important work!
For some reason the master I pulled on March 30th builds just fine. The latest - no such luck:
--> src/db/mod.rs:2:5
|
2 | use url::Url;
| ^^^ help: a similar path exists:cookie::url
In running some builds that reference many crates, against a Alexandrie+Sqlite setup, I started getting a 500 error from cargo, where Alexandrie logs showed:
SqlError("Database is locked")
To reproduce:
It seems sqlite doesn't allow multiple writes at the same time.
To address it, I had to set sqlite to a WriteAheadLog mode:
sqlite3 alexandrie.db 'PRAGMA journal_mode=WAL;'
and limit the number of connections in the connection pool to 1
let mut builder = r2d2::Builder::default();
builder = builder.max_size(1);
(seemingly, because still only 1 connection is allowed to hold onto the corresponding alexandrie.wal
file that gets created)
I modified DatabaseConfig
in this commit to accept an optional connection_pool_max_size
. These 2 fixes, and restarting the app, solved the "Database is Locked" problem I was seeing.
Let me know what you think, or other suggestions to fix this or extend the Database configuration options to accommodate sqlite. I can open a PR for this if you think this is an appropriate change.
The Indexer
trait, in its current state has a number of problems that needlessly restricts what its implementors can do.
Here is a initial and non-exhaustive list of problems we might want to solve:
&mut self
:&mut self
and open the doors for fancier index managers.Result<_, Error>
(_
being the type of successful value).fn latest_record(&self, name: &str) -> Result<CrateVersion, Error>
, for instance, distinguishing between whether the crate was not found (something that can happen and not a real failure of the system) or something more critical is harder than needed (we must match and cannot just use ?
to handle the critical ones).Crate-Index
, we could change some of these return type into Result<Option<_>, Error>
or something like Result<Result<_, RetrievalError>, Error>
.?
operator could take care of the outer, more critical, error and we can easily know if the lookup found anything or not.Feel free to propose designs or share another problem that you think should be addressed by the new revision.
This issue tracks whether or not we can compile Alexandrie on the stable channel of the Rust toolchain.
Currently, the feature gates used inside of Alexandrie itself are:
In order to move to the stable channel, every dependencies has to compile on stable too, so we need to wait for these crates to all work on stable.
The most notable awaited crates are:
futures
(0.3.0
has support for async-await and now works on stable)tokio
(0.2.0
was released on Nov. 26, 2019)tide
(0.4.0
was released on Nov. 27, 2019)runtime
(deprecated, and we should remove it as a dependency whenever possible)I just make a demo to publish crates
name with h2-demo
.
cargo publish --registry=mycrates
0.1.0
Success.cargo publish --registry=mycrates
,that publish response error,the message is :error: failed to get a 200 OK response, got 500
headers:
HTTP/1.1 100 Continue
HTTP/1.1 500 Internal Server Error
content-length: 0
date: Sun, 26 Jul 2020 10:42:46 GMT
body:
It's bug with response message ,that message body is empty?
Currently, crate searching, both in the frontend and the programmatic API, has limitations:
"serde json"
won't yield "serde_json" as a matchThis issue will serve as the place where improvements to the search mechanism, like addressing these limitations, can be discussed.
Currently, only MySQL is supported as the backing database of Alexandrie.
To make Alexandrie deployable in more environments, we should offer different database options.
The options to support I am thinking about are:
Related things to also decide:
hey, just a user report from the wild for you to consider.
I was setting up alexandrie behind a reverse proxy and ran into a puzzling error.
it turned out to be this issue.
the alexandrie logs displayed this:
Feb 24 11:48:20.085 ERRO async-h1 error, version: 0.1.0
I tried but was unable to identify where in the source code that message is generated. It would have been better if there were additional details provided, to facilitate debugging the problem. thanks!
this crate has two 'index' implementations which manage the git repository in different ways
These index implementations also manage the filesystem and directory structure of the index, and this is entirely duplicated between the two.
I suggest that the file system management and repository management be split into separate concerns.
you might like to take a look at my index library - https://github.com/Rust-Bucket/Crate-Index for reference.
You might even like to consider using it instead, though there are a couple of small design choices which may not fit with Alexandrie, such as an assumption that the user will handle git authentication outside of the library using SSH keys.
the structure would then become
this way you're only swapping the git management part, not duplicating the entire index implementation.
Regardless of whether you adopt my crate, or refactor Alexandrie, you may wish to consider splitting the Index module into a separate crate (probably in the same repo, with a workspace). Other libraries in future will want a low-level and battle-tested index managing library for their own development, so it makes sense for this to be published separately and shareably.
there's a lot of unwrap
s in the git2 index.
Internally the git2::Repository
is held in a mutex, which is unlocked and unwrapped in every method. It's not clear what purpose the Mutex serves in this module.
I suspect that the mutex is added so that the index itself is Sync
, meaning it can be used "naked" in the tide::Server state. is that correct?
It might be better to move this into the calling code. I think you're kind of cheating by placing a mutex here.
The Indexer
trait only has immutable references in the in the interface, which gives the appearance (as far as Rust's type system is concerned) that types which implement Indexer
can safely be used across threads from inside an Arc
. But since these objects are actually mutating the filesystem, you can end up with all sorts of race conditions. Neither the CLI or the git2 implementation are actually thread safe. (you might consider explicitly marking the Tree as !Sync
!
for example you have this code in api/publish.rs
-
state.index.add_record(crate_desc)?;
state.index.commit_and_push(commit_msg.as_str())?;
since there are no guarantees there is only one thread running this code at a time in your server, you could find yourself in a position where the commit contains more (or less!) than 1 change to the index. You'd have to be pretty unlucky to run into this, but the likelihood will scale with the number of users.
The other problem with this (as i'm sure you've discovered!) is having only immutable references in the Indexer
trait precludes you from caching any information about your index. You have to do a round-trip to the filesystem for every type of query.
The simplest solution would be to wrap the config::State
object in a Mutex
. This seems to be a pretty common way to handle state in Tide.
While that won't hurt, this is maybe a little coarse for you since there are other objects in your State
struct that are already thread safe (like the database connection pool).
alternatively, you could wrap just the Index
object in the State
struct. Since an Arc<Mutex>
can take a trait object, this is a nice way to make your app slightly more composable. instead of Arc<Mutex<Index>>
, you could simply have an Arc<Mutex<dyn Indexer>>
(ie you no longer care what concrete type the index object is).
If you want to get really clever you could look at using a RwLock
to only lock access on the methods that take &mut self
. My gut feeling is this might not work since RwLock<T>
is only Sync
if T
is Sync
, and you're back to square 1. A simple mutex is probably fine until the number of concurrent users scales by 3 or 4 orders of magnitude.
Either way, you'll then be able to remove the mutex from the git2 implementation (since enforcing single access to the git2::Repository
is only part of the problem anyway!)
Architecturally it's cleaner also, the Index
is responsible for accessing and mutating the filesystem only, and it is the responsibility of the caller to ensure only one Index
is doing its thing at any one time
This crate currently has zero test coverage
This makes refactoring and continuous integration impossible.
It also makes it extremely difficult to support community contributions
Perhaps an incremental approach is best? Start tracking test coverage, then insist that each pull request going forward either improve test coverage, or at least not significantly degrade it.
Hello,
On Crates.io there is a default crate upload size limit of 10MB, but this value is configurable per-crate (and can be raised by the crates.io team on request).
However, I'm running an instance of Alexandrie where I need to upload large crates,
and it seems that the PUT /api/v1/crates/new
endpoint hard-codes a 10MB request size limit:
// In alexandrie/src/api/crates/publish.rs: async fn put
let mut bytes = Vec::new();
(&mut req).take(10_000_000).read_to_end(&mut bytes).await?;
This causes upload requests to fail with mysterious "failed to fill whole buffer" HTTP errors.
Would it be possible to make this size limit configurable instead of hard-coded, and ideally return an explicit error message if the request is too large?
Thank you! =)
I don't know if this is the forum for questions like this.
I like that this crate is attempting to be very exensible, and Rust provides some nice ways to support this through traits. But I wonder if there's a couple of places that this is not adding value? For example this crate supports two indexing strategies out of the box-
I can't see why you would ever need both?
I see that you want to add a 'remote index' implementation. I think this has questionable value (since you can always use a remote index as your upstream git repository!) but it's cool that the option exists. But i can't see why you would ever want two different strategies to manage a local index, unless each one represented a different set of trade-offs. If one is simply better than the other, why support both?
This is a tracking issue for additional Store
implementations.
Here is a non-exhaustive list of potential candidates for an Store
implementation:
Currently, there are actions that can only be performed by the frontend (generating tokens).
We should make sure that anything the frontend can do, the API can do it too (as the frontend is optional).
Things that are currently in the frontend and needs to be mirrored in the API:
Hello,
While performing the migrations for PostgreSQL, the following error occurs:
Failed with: Invalid migration directory, the directory's name should be <timestamp>_<name_of_migration>, and it should only contain up.sql and down.sql
It seems to come from the following line:
elif [ "${DATABASE}" = "postgresql" ]; then
which should be:
elif [ "${DATABASE}" = "postgres" ]; then
I also noticed some kind of confusion between the existing repository docker/postgres
and docker/postgresql
that doesn't exist. In the documentation, the docker/postgresql/rootpass.txt
path is mentioned but IMHO, it should be docker/postgres/rootpass.txt
.
Same remark for these lines:
Since I don't have the privilege to open a PR, I duplicated the repository and I created a PR to show you the changes: HugoPuntos/alexandrie#2.
Let me know if you want me to open a PR.
Currently, most of the flows of the registry (like crate publication) is quite hard to test.
We should look into having a way to do end-to-end testing (most-likely dockerized), both for integration tests in CI and for local testing.
Basically a way to run an (dockerized) instance against which a client can run commands expected from a user (like cargo publish
or cargo search
), the client could also be containerized to avoid conflicts with the existing cargo configurations on the host machine.
The API is something that should never be broken or experience feature regressions.
Having integration tests on API endpoints would help to ensure this.
This is a feature request to enable external user management.
We do almost everything through a private gitlab server. We would love to host a private cargo registry with alexandrie for internal development because a gitlab hosted registry seems far of. However, we would like to use the user management from gitlab itself. Would it be possible to integrate something like OAuth to facilitate this?
Currently, the only way to get data about a specific version of a crate (or even just to know what is the latest version of a crate) is to query the Indexer
implementation.
But if an alternative Indexer
implementation relies on the network and needs to download entire crate version records every time we want to know what the latest version of a crate is, it could end up being quite slow and waste bandwidth.
So, we may duplicate version information in the database, which, I suspect, will almost always be faster to access and more efficient than the Indexer
implementation.
Here is the schema for how I could see it being shaped:
-- MySQL version (needs to be adjusted for other backends)
create table versions (
id bigint not null auto_increment unique primary key,
crate_id bigint not null,
num varchar(128) not null,
downloads bigint not null default 0,
created_at varchar(25) not null default concat(utc_date(), " ", utc_time()),
updated_at varchar(25) not null default concat(utc_date(), " ", utc_time()),
yanked bool not null default false,
published_by bigint unsigned not null,
foreign key (crate_id) references crates(id) on update cascade on delete cascade,
foreign key (published_by) references authors(id) on update cascade on delete cascade,
);
Things I have doubts about:
published_by
column (worried about what should happen to one of these rows if its related publisher gets deleted from the authors
table).varchar
).As more-and-more features get added into Alexandrie, it will become harder-and-harder to document them without making the README overwhelming.
So, we should look into solutions of making documentation more scalable, easier to both browse and maintain.
One such solution that currently crosses my mind is GitBook, but maybe there are other better ones.
AF_UNIX uses filesystem paths instead of address+port pairs for socket addresses which makes it much easier to avoid conflicts if you run multiple services on the same machine.
tide 0.12.0 seems to support it with the http+unix:///path
scheme: https://docs.rs/tide/0.12.0/tide/listener/trait.ToListener.html#strings-supported-only-on-cfgunix-platforms
I am planning to run alexandrie behind a reverse-proxy (which also does TLS termination) so this would be a useful feature for me.
alexandrie deliberately omits deps
and features
fields in the crate index when there are no dependencies or features. For example, this JSON is generated by alexandrie:
{"name":"tmp42","vers":"0.1.0","cksum":"7181406a66629d77cba073cc0b7e6966d46bed72d31251da1813f3166e737c5d","yanked":false}
cargo doesn't recognize such entries in the index (I set CARGO_LOG=info
for internal logging):
Updating `ssh://...` index
[2020-04-24T10:17:39Z INFO cargo::sources::registry::index] failed to parse "tm/p4/tmp42" registry package: missing field `deps` at line 1 column 121
error: no matching package named `tmp42` found
location searched: registry `ssh://...`
required by package `tmp43 v0.1.0 (...)`
If I manually update the JSON like this, cargo recognizes it:
{"name":"tmp42","vers":"0.1.0","deps":[],"cksum":"7181406a66629d77cba073cc0b7e6966d46bed72d31251da1813f3166e737c5d","yanked":false,"features":{}}
The RFC doesn't mention anything about optional fields.
The fix would be to simply remove the skip_serializing_if
attributes.
cargo --version
cargo 1.43.0 (3532cf738 2020-03-17)
Diesel has requirements on system packages, depending on which features are installed. These are the drivers for the different databases.
The fact that these dependencies must be installed (or you get linker errors during build) should be mentioned
for example, libpq
(package libpq-dev
) is a required system dependency when using the diesel/postgres
feature, which is enabled by default
The libgit2
based git operations have some problems (with SSH).
There is a workaround for this problem: setting the CARGO_NET_GIT_FETCH_WITH_CLI
environment variable to true
.
If the CLI based git access has no known problems, then why not make CLI the default option?
This default could be overwritten by setting the CARGO_NET_GIT_FETCH_WITH_CLI
environment variable to false
, in case some users don't like the more sensible new default.
This was listed as a "thing yet to do" item in the project's README, for quite some time already, and implementation work for it has started, so here is its proper tracking issue.
ran into an issue where shutting down the server with ctrl-c seems to have left the sqlite database in a "locked" state that alexandrie could not recover from on restart.
the log message that displayed is:
Feb 24 13:55:31.950 INFO --> GET /api/v1/crates/my-crate/0.3.15/download 200 1ms , SQL error: database is locked, version: 0.1.0
I recovered by running the command
echo '.dump' | sqlite3 old.db | sqlite3 new.db
and switching to new.db
in my config file.
I don't know exactly what happened to leave the sqlite file in locked state, however, I had just restarted the alexandrie server immediately prior to this, and had not opened the sqlite db manually or otherwise tinkered with the file.
I just read doc for alternate-registry.
but something was wrong. I just exec:cargo publish
,but get failed message
➜ fuzz git:(master) ✗ cargo publish --registry=git-baoyachi --allow-dirty
Updating `ssh://[email protected]/crates.git` index
warning: manifest has no description, license, license-file, documentation, homepage or repository.
See https://doc.rust-lang.org/cargo/reference/manifest.html#package-metadata for more info.
Packaging fuzz v0.1.0 (/Users/baoyachi/git_project/fuzz)
Verifying fuzz v0.1.0 (/Users/baoyachi/git_project/fuzz)
Compiling fuzz v0.1.0 (/Users/baoyachi/git_project/fuzz/target/package/fuzz-0.1.0)
Finished dev [unoptimized + debuginfo] target(s) in 1.14s
Uploading fuzz v0.1.0 (/Users/baoyachi/git_project/fuzz)
error: failed to get a 200 OK response, got 404
headers:
HTTP/2 404
server: nginx
date: Fri, 24 Jul 2020 03:38:10 GMT
content-type: application/json; charset=utf-8
content-length: 34
x-request-id: xcvRDF0Ae15
x-runtime: 0.018661
body:
{"status":404,"error":"Not Found"}
Many pages in Alexandrie consists of lists for different use-cases:
and most likely other ones...
To make these pages possible, we should have a pagination implementation which:
I have some questions about the deployment- specifically whether you've looked at containerisation tools such as Docker?
The use case would be to have a docker-compose file that could be used to spin up a functional, load-balanced instance of the registry server- including the database- with a single command. This would ease the friction of getting a server up and running for potential end-users.
I'd love to help, if i can! Making the adoption easier would be a big step towards getting companies and individuals to start using private registries
just like this link:https://www.jfrog.com/confluence/display/JFROG/LDAP
Hello,
Thanks for this initiative, very usefull project !!!
I wanna to deploy in my company but I have some issues to deploy as a private crate repository.
I maybe possible for us to contribute
Thank you a lot,
Marc-Antoine
This is a tracking issue for additional Indexer
implementations.
Here is a non-exhaustive list of potential candidates for an Indexer
implementation:
Hi again,
I'm on Windows 7 at work and try to dockerize alexandrie.
Before doing anything, I tried to build it and ended with
= note: LINK : fatal error LNK1181: cannot open input file 'sqlite3.lib'
error: aborting due to previous error
error: Could not compile `migrations_macros`.
I understand it needs sqlite library, but I thought it would be pulled as a dependency of the package?
There's an issue on Safari (both macOS and iOS) which causes the hash received by the server to include the Unicode Byte Order Mark (BOM, '\u{feff}'
), which breaks the hex::decode
call (which crashes the process, because its output is currently unwrapped).
This problem shows up in both the 'Login' and 'Register' pages, and only manifests in the processed password fields.
This leads me to suspect that this issue is from the WebAssembly module and not a form parsing bug in Tide.
this is a bit of a design question i guess- is there a reason the front end and back end are in a single project? I suppose this means that you don't have to have an API layer between the two, but it makes them tightly coupled. would it not be preferable for the backend to serve some api (REST/gRPC/whatever) for the frontend to consume? More work to set up, perhaps, but possibly a little more flexible and maintainable in the long run. essentially it would make the frontend swappable
Been trying to get Alexandrie setup using a Gitlab private index - and am confused on the configuration I'd need.
I adjusted the Dockerfile to do an ssh-keyscan of gitlab.com - and that helped get me a bit further. Noticed that Gitlab locked down the master branch, so I set that as 'unprotected' - so in theory I can push to it. But wondering if there's anything else I need to setup so that this will work as expected.
Anyone have any luck going down this path?
My personal code which located at "FlashFunk" dir.
This is the roadmap
FlashFunk
crate-index
How can I edit the alexandrie.toml
to provide FlashFunk?
I have run it successfully. but not provide any crate.
Recently I had manually updated my crate index repo (to change the README). When I went to publish a crate, it appeared to succeed, but I couldn't get the new version via cargo update
. What happened is, when alexandrie server went to push the crate index changes, but the push was "rejected" because the repo needed to pull my README changes first:
[master 4a58405] Updating crate `markets#0.3.1`
1 file changed, 1 insertion(+)
To git.jstrong.dev:jstrong/crate-index.git
! [rejected] master -> master (fetch first)
error: failed to push some refs to '[email protected]:jstrong/crate-index.git'
hint: Updates were rejected because the remote contains work that you do
hint: not have locally. This is usually caused by another repository pushing
hint: to the same ref. You may want to first integrate the remote changes
hint: (e.g., 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.
^ that's the error I found logged by alexandrie when I ssh'd to the server to figure it out.
This is not a huge problem, I imagine manual crate index repo changes are probably not a frequent occurrence. However, it does seem like it would be an improvement to try a git pull if the push fails, and see if that fixes things. Perhaps there's some reason I'm not thinking of why doing so on failure would be a problem, but if not, it might help someone avoid a confusing situation where the crate version seemingly was uploaded successfully to the registry, but isn't in the index.
Another approach would be to confirm the crate index changes have been pushed before cargo publish
is permitted to succeed, and alexandrie's state about the crate is changed.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.