Code Monkey home page Code Monkey logo

Comments (19)

secana avatar secana commented on July 22, 2024 1

New version of Kellnr 5.3.2 is out with improved SHA256 computation. This error should be fixed with that version.

from kellnr.

secana avatar secana commented on July 22, 2024

Hi @alexthe2! Thanks for reporting the issue. Does this happen only with one specific crate or under specific circumstances? I'll try to reproduce the issue.

from kellnr.

alexthe2 avatar alexthe2 commented on July 22, 2024

It happens with two crates for us (from what I can see right now), called v12-data and v12-terra_converters, v12-terra_converters is depndent on data

from kellnr.

secana avatar secana commented on July 22, 2024

Can you provide logs from Kellnr in the "trace" mode, when the issue occurs? I try to replicate the wrongly computed hash, but so far without any success.

from kellnr.

alexthe2 avatar alexthe2 commented on July 22, 2024

should I set KELLNR_LOG__LEVEL or KELLNR_LOG__LEVEL_WEB_SERVER?

from kellnr.

secana avatar secana commented on July 22, 2024

KELLNR_LOG__LEVEL should be enough.

from kellnr.

alexthe2 avatar alexthe2 commented on July 22, 2024

it's not producing any logs 😢 , I verfied that the level is really set to trace (updated helm chart, and deleted pod to force restart)

from kellnr.

secana avatar secana commented on July 22, 2024

I still try to debug the issue but have no idea why the sha256 is computed wrong. Can you try to disable the cache, so I know that it is not a caching issue?

KELLNR_REGISTRY_CACHE_SIZE=0

from kellnr.

asymmetry avatar asymmetry commented on July 22, 2024

Hi @secana, we are experiencing the same issue. I could provide one more data point.

Compare the sha256sum of a corrupted crate in the db and the actual value:
In db:
image
Actual:
image

And I am already running with KELLNR_REGISTRY__CACHE_SIZE=0
image

I am running Kellnr 5.2.2 with the released docker image.

Thanks!

from kellnr.

asymmetry avatar asymmetry commented on July 22, 2024

If I delete the corrupted version from the web ui as admin user, restart the docker container, and then publish the same crate again, it could fix the problem.

from kellnr.

alexthe2 avatar alexthe2 commented on July 22, 2024

from kellnr.

secana avatar secana commented on July 22, 2024

Thanks for the input. So far, the issue seems to be the computation of the sha256, as the crate itself seems to be fine on disk. I released a debug version of kellnr with much more debug output for the specific issue. Would you be so kind and try to run it and provide the logs here?

Kellnr version: 5.2.3-debug-311
Helm chart version: 3.2.3-debug-311

All logs are in the level debug and prefixed with #311 to be easily identifiable. For the debug version, I added an additional crate to compute the sha256 and be able to compare it with the current implementation. Hopefully the debug output shows us the right direction to finally find and fix the issue.

from kellnr.

asymmetry avatar asymmetry commented on July 22, 2024

Thanks @secana! I have deployed this test version with logs enabled. I tried to publish a test crate 10 times (with slight modification each time) and the logs looks good to me. My team will continue to use this version and I will show the logs here if the issue happens again.

Thanks for the help!

from kellnr.

alexthe2 avatar alexthe2 commented on July 22, 2024

Also a finishing update from our side, we got the debug version running two days ago, no issues yet, we'll now switch to the new 5.3.2

from kellnr.

asymmetry avatar asymmetry commented on July 22, 2024

Hi @secana, my team have the debug version running for almost 10 days and we captured this issue again. Here is the log:
image

It seems that the error is happenned when reading the crate saved to disk back into memory for sha256 calculation, it only reads 4096 bytes.
However it seems you have already switched to use the in-memory data to calcualte checksum so I think this is not going to happen again. We will switch to the new 5.3.2.
Thanks a lot for the help!

from kellnr.

asymmetry avatar asymmetry commented on July 22, 2024

Hi @secana, sorry to keep posting on a closed issue, but I did some study for the actual root cause.

I was able to reproduce the read issue by this minimal reproducible example:

use std::path::Path;

use anyhow::Result;
use tokio::{
    fs::File,
    io::{AsyncReadExt, AsyncWriteExt},
};

#[tokio::main]
async fn main() -> Result<()> {
    let path = Path::new("test/source.crate");

    let mut file = File::open(path).await?;

    let mut source = Vec::new();
    let _ = file.read_to_end(&mut source).await?;

    let source_length = source.len();

    println!("source length: {}", source_length);

    let mut test = 1;

    loop {
        let path = Path::new("test/test.crate");

        let mut file = File::create(&path).await?;
        file.write_all(&source).await?;

        // if we comment this line, we could reproduce the read issue
        file.flush().await?;

        let mut buf: Vec<u8> = vec![];

        let mut file = File::open(&path).await?;
        let bytes_read = file.read_to_end(&mut buf).await?;
        let real_length = file.metadata().await?.len();

        if bytes_read != source_length || real_length != source_length as u64 {
            println!(
                "test {}, bytes read: {}, real length: {}",
                test, bytes_read, real_length
            );
            break;
        }

        test += 1;
    }

    Ok(())
}

As shown in the comment, it seems that we should flush the io buffer after the write_all() call, otherwise we will trigger this issue occasionally. There is an useful comment from the maintainer of Tokio: https://users.rust-lang.org/t/async-write-all-sometimes-silently-fails/68195/4.

Not sure if it is still valid for the new version but maybe a flush() call is helpful.

from kellnr.

secana avatar secana commented on July 22, 2024

@asymmetry Thank you so much! With the in-memory computation I build around the issue but knowing what went wrong initially gives me piece of mind. I'll check if I use the same code somewhere else in kellnr and make sure that I flush the buffer.

from kellnr.

plied avatar plied commented on July 22, 2024

You guys mean it's fixed in version 5.2.3 i guess. Version 5.3.2 does not exist yet.

from kellnr.

secana avatar secana commented on July 22, 2024

Yes, you are right. That' a typo. 5.2.3 is the right version.

from kellnr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.