Code Monkey home page Code Monkey logo

cas's People

Contributors

dennwc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

cas's Issues

cas pull "corrupting" internal store

Hi,

Sry if i understood something wrong, i'm trying to use cas to store downloaded artifacts, version them and later use them in some software (as more powerfull replacement for the typical ad-hoc downloader shellscripts you see everywhere).

what works on first execution

% cas pull floodgate-spigot.jar https://download.geysermc.org/v2/projects/floodgate/versions/latest/builds/latest/downloads/spigot
% cas sync floodgate-spigot.jar
% cas checkout floodgate-spigot.jar ./floodgate-spigot.jar

but after executing a second time cas checkout does not work anymore

% cas pull floodgate-spigot.jar https://download.geysermc.org/v2/projects/floodgate/versions/latest/builds/latest/downloads/spigot
floodgate-spigot.jar = sha256:d6f3fb960861d6560259f894bd514fca37195c086d7f2c6800c4783d8cde2216
% cas sync floodgate-spigot.jar
floodgate-spigot.jar -> sha256:d6f3fb960861d6560259f894bd514fca37195c086d7f2c6800c4783d8cde2216 (up-to-date)
% cas checkout floodgate-spigot.jar ./floodgate-spigot.jar
Error: blob: invalid ref
<help msg>
2024/03/17 19:54:41 blob: invalid ref
%

the problem seems to be, that a pull where nothing is updated creates a @type: cas:WebContent blob that has an empty ref.

% find .cas/blobs -type f -size -500 |xargs -n1 grep .
...
{
 "@type": "cas:WebContent",
 "url": "https://download.geysermc.org/v2/projects/floodgate/versions/latest/builds/latest/downloads/spigot",
 "ref": "sha256:4aca4a66a2641967dcc4b895dd1a7453f76b47c239e139f494a80c69066e55f1",
 "size": 11235940,
 "etag": "09b0c6b5cc19a1618c0b30ad13327890c",
 "ts": "2024-02-18T14:41:25Z"
}
{
 "@type": "cas:WebContent",
 "url": "https://download.geysermc.org/v2/projects/floodgate/versions/latest/builds/latest/downloads/spigot",
 "ref": "",
 "etag": "09b0c6b5cc19a1618c0b30ad13327890c",
 "ts": "2024-02-18T14:41:25Z"
}

environment

golang upstream installed via godeb install 1.22.1
% go version go version go1.22.1 linux/amd64
cas installed via: % go install github.com/dennwc/cas/cmd/cas@latest

questions

beside that, is there a way in cas to see the history / log of what file versions a pin had? (to get the old state back quickly in case something broke) or is it just grep/jq into the index objects?

Support for "content-formatting-lenient" hashes?

Hashes like sha256 of the binary file content are highly sensitive to changes in the file, that may end up being practically inconsequential. An alternative strategy is to utilize hashes which are aware to the formatting structure of the file, and only hash the important content while ignoring the formatting.

Here are some examples:

  • sum command of seqkit, which produces a content-lenient hash of FASTA format files: https://bioinf.shenwei.me/seqkit/usage/#sum
  • (Can imagine something for a .docx, which pulls out the plaintext, while ignoring the formatting, but unaware of a premade tool for this off the top of my head)

Could it be possible that CAS would support such content-formatting-lenient hashes?

git-annex instead of git LFS support?

Hi there,

Sorry that I keep driving by your repo with Qs... Wondering if git-annex support could be on the cas roadmap, rather than git LFS?

I say this for no other reason, than git-annex is an alternative to git LFS support, that is also being widely used. See here for the git-annex homepage:
https://git-annex.branchable.com

See here for "DataLad", an end to end scientific wrapped on git and git-annex, might inspire some thinking towards cas...
https://handbook.datalad.org/en/latest/basics/101-180-FAQ.html

Workaround for filesystem without xattr ?

Hi there,

I'm working on a linux HPC cluster, that doesn't have xattr available. Is cas still able to function if xattrs are not available, or is there a workaround?

2023/02/21 10:14:51 xattr.FSet hisat2_12B1-RiboZero.merged.bam user.cas.size: operation not supported

Notably, the xattr setting at least does seem to work on macOS & the APFS filesystem:

user.cas.hash: sha256:b99ea5e9a6e80b9e3cd9f5df62bf6a6324ee79e6529384ae44099a92b630f58f
user.cas.mtime:
0000   5C 89 30 FF 0F E9 45 17                            ..0...E.

user.cas.size:
0000   19 C6 3B 65 00 00 00 00                            ..;e....


Resolving the path of the file from the cas hash?

Hi there,

Is it possible to resolve the full path of the file from the cas hash? (i.e. analogous to cas blob, but returning the local filepath instead).

I'm imagining the use case where I could keep better track of large files that are identical on both a remote and local storage, but might have distinct paths / be moving around.

Feel free to say if this is a misunderstanding of how content addressable storage can/should work.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.