Code Monkey home page Code Monkey logo

finalfusion-utils's People

Contributors

danieldk avatar dependabot-preview[bot] avatar realnicolasbourbaki avatar sebpuetz avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

finalfusion-utils's Issues

finalfusion on Apple Silicon

finalfusion seems to work on Apple Silicon. I am using an x86_64 toolchain under Rosetta 2 because for some other projects the dependencies are not available yet for Apple Silicon (e.g. libtorch). Then I cross-compile for Apple Silicon:

$ rustup target add aarch64-apple-darwin
$ cargo build --target=aarch64-apple-darwin --release
$ file target/aarch64-apple-darwin/release/finalfusion
target/aarch64-apple-darwin/release/finalfusion: Mach-O 64-bit executable arm64
$ target/aarch64-apple-darwin/release/finalfusion
zsh: killed     target/aarch64-apple-darwin/release/finalfusion

Turns out that this is a known problem with code signing in cross compilation.

# Sign with ad-hoc signing
$ codesign -s - target/aarch64-apple-darwin/release/finalfusion
$ target/aarch64-apple-darwin/release/finalfusion             
finalfusion 

USAGE:
    finalfusion <SUBCOMMAND>

Open questions:

  • Can we easily generate a universal binary?
  • Maybe we should hook this up to CI to build macOS releases. I think we could even cross-compile with the current x86_64 builders as long as Xcode is new enough.

Refactor CL arguments

There's quite some repetitions in the setup of the commandline parser, we might be able to factor out some of the common arguments such as the input format or the input path.

Amend compute-accuracy docs

Amend README (and maybe command info?) that compute-accuracy's results are incomparable without identical vocabularies.

Use an MKL-based prebuilt version?

The MKL software license permits redistribution for a while now: https://software.intel.com/content/dam/develop/external/us/en/documents/pdf/intel-simplified-software-license.pdf

Maybe we should use MKL in the precompiled version:

  • We can enable all functionality that requires LAPACK.
  • Typically faster than native ndarray Rust and OpenBLAS.
  • We can link MKL statically, so that the user does not need to install MKL.

I guess this only means that we have to switch from a MUSL build to glibc, but that's not a problem.

I can do the implementation work, just wanted to check that you are ok with this, @sebpuetz.

Provide a `finalfusion reconstruct` subcommand

This subcommand should do the opposite of finalfusion quantize and reconstruct an embedding matrix from a quantized matrix. This would make it possible to evaluate quantized matrices with compute-accuracy. This is motivated by @NianhengWu 's observation in finalfusion/finalfusion-rust#82 that it is currently difficult to do an intrinsic evaluation of quantized embedding matrices.

ff-analogy utility

I just realized that we do not have an ff-analogy utility. This should lines from the standard input consisting of three words separated by a space. If the words are A B C, then it should return words that qualify as D in A is to B as C is to D.

Analogy queries are already supported by the finalfusion crate. So, this utility would be largely a copy of ff-similar, but reading three words rather than one word and using the analogy method.

@NianhengWu: maybe this is something that you could do besides the finalfrontier experiments? Seems like little work for a lot of gratification ;). You can get embeddings to test with at:

https://finalfusion.github.io/pretrained

Precompiled finalfusion utilities

Continuing the discussion in #26. I think I have found a nice solution to this: with Nix I can build a single binary with all the dependencies. It is not a static binary, but the binary will actually contain a tarball of the transitive closure of dependencies. It's self-extracting and uses user namespaces to pivot root (since the libraries are not relocatable).

I think it has nice properties, such as that we have complete control over the dependencies, down to the C library. But it requires that we have a single binary rather than several binaries as we have now.

So, I fear I have to use subcommands, which you know I am not a fan of ;). I'll implement a single finalfusion command with subcommands and see if I can live with it.

Documentation / README

We're lacking some basic documentation here. Especially after merging everything into a single binary, giving some examples might be nice.

Handling missing input stream/file

We have a number of tools that just quietly do nothing if we forget to pass an input. Maybe printing a status message to stderr stating something like Reading input from stdin or Reading input from path/to/file would allow users to catch that.

What's your opinion?

Mismatch in toml versions

Master currently doesn't build, probably because of a mismatch of toml versions (0.4 and 0.5) in finalfrontier-utils vs. finalfrontier-rust

error[E0631]: type mismatch in function arguments
  --> src/bin/ff-convert.rs:99:64
   |
99 |     let metadata = config.metadata_filename.map(read_metadata).map(Metadata);
   |                                                                ^^^
   |                                                                |
   |                                                                expected signature of `fn(toml::value::Value) -> _`
   |                                                                found signature of `fn(toml::value::Value) -> _`

error: aborting due to previous error

For more information about this error, try `rustc --explain E0631`.
error: Could not compile `finalfusion-utils`.
warning: build failed, waiting for other jobs to finish...
error: build failed

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.