Code Monkey home page Code Monkey logo

Comments (19)

arkpar avatar arkpar commented on June 25, 2024 3

If I understand correctly this only happens when there are multuple tests running? The error message here means that the database is already open. This could happen if mutiple tests use the same temp dir.

In any case, these upgrade tests are probably obsolete. I'd check with the polkadot team if it even makes sense to still maintain the upgrade code.
cc @ordian @sandreim

from polkadot.

ordian avatar ordian commented on June 25, 2024 2

If I understand correctly this only happens when there are multuple tests running? The error message here means that the database is already open. This could happen if mutiple tests use the same temp dir.

In any case, these upgrade tests are probably obsolete. I'd check with the polkadot team if it even makes sense to still maintain the upgrade code. cc @ordian @sandreim

Indeed, the test can be removed now.

from polkadot.

mrcnski avatar mrcnski commented on June 25, 2024 2

Yeah I'll go bisect the branch. Partly out of morbid curiosity and partly because I'm afraid there is a deeper issue that could reappear later.

from polkadot.

ggwpez avatar ggwpez commented on June 25, 2024 1

Yea we may need to comment those if they impede the development workflow.

from polkadot.

altonen avatar altonen commented on June 25, 2024 1

I've confirmed locally the issue was introduced by #7337. Weird.

It is weird. Since the branch is still available, it would be possible to go through the entire history and check which commit caused the regression but if the tests can be removed, I'll create a PR

from polkadot.

arkpar avatar arkpar commented on June 25, 2024 1

Question: is this is expected behavior in paritydb or a bug that could manifest in production? I guess in production there is only ever one instance of paritydb? Maybe we should at least catch this error and throw a more descriptive one for the benefit of future devs? @ordian

Mutiple database instances should not interfere unless they use the same path. Each test must use a unique temp dir. No serial_test or mutex hacks should be required.

from polkadot.

ggwpez avatar ggwpez commented on June 25, 2024 1

Someone sent me this log output form a parachain DB migration:

Aug 11 13:14:21 region2-collator-us-1 systemd[1]: Started watr Systemd Service.
Aug 11 13:14:21 region2-collator-us-1 watr[2020990]: CLI parameter `--execution` has no effect anymore and will be removed in the future!
Aug 11 13:14:21 region2-collator-us-1 watr[2020990]: 2023-08-11 13:14:21.648  INFO main sc_cli::runner: Watr Collator
Aug 11 13:14:21 region2-collator-us-1 watr[2020990]: 2023-08-11 13:14:21.649  INFO main sc_cli::runner: ✌️   version 1.0.0-6ec9229d97c
Aug 11 13:14:21 region2-collator-us-1 watr[2020990]: 2023-08-11 13:14:21.649  INFO main sc_cli::runner: ❤️  by Parity Technologies <[email protected]>, 2023-2023
Aug 11 13:14:21 region2-collator-us-1 watr[2020990]: 2023-08-11 13:14:21.649  INFO main sc_cli::runner: 📋 Chain specification: Watr Network
Aug 11 13:14:21 region2-collator-us-1 watr[2020990]: 2023-08-11 13:14:21.649  INFO main sc_cli::runner: 🏷   Node name: region2-collator-us-1
Aug 11 13:14:21 region2-collator-us-1 watr[2020990]: 2023-08-11 13:14:21.649  INFO main sc_cli::runner: 👤 Role: AUTHORITY
Aug 11 13:14:21 region2-collator-us-1 watr[2020990]: 2023-08-11 13:14:21.649  INFO main sc_cli::runner: 💾 Database: RocksDb at /home/polkadot/.local/share/polkadot/chains/watr_network/db/full
Aug 11 13:14:23 region2-collator-us-1 watr[2020990]: CLI parameter `--execution` has no effect anymore and will be removed in the future!
Aug 11 13:14:23 region2-collator-us-1 watr[2020990]: 2023-08-11 13:14:23.804  INFO main watr_node::command: Parachain id: Id(2058)
Aug 11 13:14:23 region2-collator-us-1 watr[2020990]: 2023-08-11 13:14:23.804  INFO main watr_node::command: Parachain Account: 5Ec4AhNoGMRn8j7rpcumbokVMzWeTNdqw42r7pcH8dtqH29G
Aug 11 13:14:23 region2-collator-us-1 watr[2020990]: 2023-08-11 13:14:23.804  INFO main watr_node::command: Parachain genesis state: 0x00000000000000000000000000000000000000000000000000000000000000000032d52a557fc0b1e3b6ce56981091c11c652646eb3b39733928b143f2993fe47003170a2e7597b7b7e3d84c05391d139a62b157e78786d8c082f29dcf4c11131400
Aug 11 13:14:23 region2-collator-us-1 watr[2020990]: 2023-08-11 13:14:23.804  INFO main watr_node::command: Is collating: yes
Aug 11 13:15:36 region2-collator-us-1 watr[2020990]: 2023-08-11 13:15:36.803  INFO main sub-libp2p: [Relaychain] 🏷   Local node identity is: 12D3KooWPELBXukq5cju5VKoKaFcRe5r1ti8jBTXeQAEoXZTjHyk
Aug 11 13:15:36 region2-collator-us-1 watr[2020990]: 2023-08-11 13:15:36.818  INFO main sub-libp2p: [Relaychain] 🔍 Discovered new external address for our node: /ip4/3.93.142.105/tcp/30333/p2p/12D3KooWPELBXukq5cju5VKoKaFcRe5r1ti8jBTXeQAEoXZTjHyk
Aug 11 13:15:36 region2-collator-us-1 watr[2020990]: 2023-08-11 13:15:36.821  INFO main parachain::db: [Relaychain] Migrating parachains db from version 1 to version 2 ...
Aug 11 13:15:50 region2-collator-us-1 watr[2020990]: 2023-08-11 13:15:50.022  INFO main parachain::db: [Relaychain] Migration complete!
Aug 11 13:15:50 region2-collator-us-1 watr[2020990]: 2023-08-11 13:15:50.025  WARN tokio-runtime-worker sc_service::builder: [Relaychain] The NetworkStart returned as part of `build_network` has been silently dropped
Aug 11 13:15:50 region2-collator-us-1 watr[2020990]: Error: Service(Application(Application(Io(Custom { kind: Other, error: Error { message: "Invalid argument: Column families not opened: col5, col4, col3, col2, col1, col0" } }))))
Aug 11 13:15:50 region2-collator-us-1 systemd[1]: watr.service: Main process exited, code=exited, status=1/FAILURE
Aug 11 13:15:50 region2-collator-us-1 systemd[1]: watr.service: Failed with result 'exit-code'.

There is a Q in the General elements chat from Peter W. about it. Could this be related?

from polkadot.

arkpar avatar arkpar commented on June 25, 2024 1

There is a Q in the General elements chat from Peter W. about it. Could this be related?

I don't think so. This is rocksdb error. If this can be reproduced please file a separate issue.

from polkadot.

mrcnski avatar mrcnski commented on June 25, 2024

Trying to reproduce but unsuccessful so far. What command did you run? I'm doing

while cargo ltest test_paritydb_migrate_2_to_3; do :; done

from polkadot.

mrcnski avatar mrcnski commented on June 25, 2024

OK I get it if I just do while cargo ltest; [...]. Weird.

from polkadot.

altonen avatar altonen commented on June 25, 2024

Trying to reproduce but unsuccessful so far. What command did you run? I'm doing

while cargo ltest test_paritydb_migrate_2_to_3; do :; done

running

while true; do cargo test || break; done

in node/service/src/parachain_db worked for me

from polkadot.

mrcnski avatar mrcnski commented on June 25, 2024

I'm flummoxed, I don't see how these tests can be related to #7337. The tests and the error seem quite specific to paritydb.

called `Result::unwrap()` on an `Err` value: Io(Custom { kind: Other, error: "Error removing COL_SESSION_WINDOW_DATA Locked(Os { code: 35, kind: WouldBlock, message: \"Resource temporarily unavailable\" })" })

from polkadot.

mrcnski avatar mrcnski commented on June 25, 2024

I've confirmed locally the issue was introduced by #7337. Weird.

from polkadot.

mrcnski avatar mrcnski commented on June 25, 2024

The test failures were introduced in dc5d6a0 where we added some other tests in the same crate. The new tests could not run at the same time as each other1, so I had added some code like this:

// Ensure tests run one at a time so they don't interfere with each other.
static OVERRIDES_LOCK: Mutex<()> = Mutex::new(());
let _lock = OVERRIDES_LOCK.lock()?;

in the next commit this was replaced with the serial_test crate which does the same thing and the test failures persisted.

My guess is that this affected the order that tests are run in, in such a way that caused paritydb tests to step on each other's toes.

Question: is this is expected behavior in paritydb or a bug that could manifest in production? I guess in production there is only ever one instance of paritydb? Maybe we should at least catch this error and throw a more descriptive one for the benefit of future devs? @ordian

Footnotes

  1. though in retrospect there is a slight tweak that could be made so they don't step on each other's toes.

from polkadot.

mrcnski avatar mrcnski commented on June 25, 2024

@arkpar If I understand correctly, the paritydb tests which have now been removed were simply not written correctly, so there's no actual bug there? BTW the hacks were for tests related to #7337 (totally unrelated to paritydb), and on further inspection they do actually need to be there, using a unique temp dir is not enough.

from polkadot.

arkpar avatar arkpar commented on June 25, 2024

I don't see anything wrong with the tests. They should have worked. I've looked at the tempfile crate to see if it could produce the same path if called simulatneously from two different threads, but it seems to be allright.

I've tried reproducing the issue with while true; do cargo test || break; done, on macos but was unable to. So maybe the issue is platform dependent. I'd add some logs to see if the paths are indeeed different. setting RUST_LOG=parity-db=debug and running cargo test --release -- --nocapture could also shed some light.

from polkadot.

mrcnski avatar mrcnski commented on June 25, 2024

@arkpar I was able to reproduce on M1 Mac as well as Linux, but it is flaky. Try cargo clean first, otherwise try also the next commit 6ef14fe.

from polkadot.

mrcnski avatar mrcnski commented on June 25, 2024

@ggwpez No idea if there is a connection but IMO should be investigated. Reopening.

from polkadot.

arkpar avatar arkpar commented on June 25, 2024

Added an explicit DB unlock in paritytech/parity-db#218 which resolves this. Normally file locks should be removed automatically once the file handle is closed. For some reason when running tests the lock may take some time to disappear on its own.

from polkadot.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.