Code Monkey home page Code Monkey logo

conditional-knn's Introduction

conditional-knn

Source code for the paper Sparse Cholesky factorization by greedy conditional selection.

Installing

Install dependencies from environment.yml with conda or mamba:

conda env create --prefix ./venv --file environment.yml

or from a non-explicit spec file (platform may need to match):

conda create --prefix ./venv --file linux-64-spec-list.txt

or from an explicit spec file (platform must match):

conda create --prefix ./venv --file linux-64-explicit-spec-list.txt

See managing environments for more information.

Activate conda environment:

conda activate ./venv

Build Cython extensions:

python setup.py build_ext --inplace

Intel oneMKL with conda

We rely on the Intel oneMKL library to provide fast numerical routines.

Make sure that numpy and scipy also use the MKL for BLAS and LAPACK by checking the output of

python -c "import numpy; numpy.__config__.show()"

which should show something like

blas_mkl_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['.../venv/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['.../venv/include']
...
lapack_mkl_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['.../venv/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['.../venv/include']
...

and similarly for

python -c "import scipy; scipy.__config__.show()"

conda install numpy from the defaults or anaconda channel (not conda-forge) should work, but it sometimes doesn't play well with installing mkl-devel. It's easiest just to use the intel channel.

Downloading datasets

We use datasets from the SuiteSparse Matrix Collection, the UCI Machine Learning Repository, LIBSVM, and the book Gaussian Processes for Machine Learning. Download the datasets with the provided fish script:

chmod +x get_datasets
./get_datasets

OCO-2 data

Downloading the dataset

Navigate to the OCO-2 solar induced fluorescence (SIF) dataset. Note that the (current) latest version of the dataset is 11r, but this might change in the future. If the above link doesn't work, be sure to directly search for the OCO2_L2_Lite_SIF dataset.

Click on the "Online Archive" blue button on right and then on the 2017 folder. Each file is a different day.

Note that in order to download files, an Earthdata account must be created.

Post-processing

First install R and NetCDF using your preferred package manger.

sudo pacman -S r netcdf

In order to install R packages locally, follow the instructions here to create the default R_LIBS_USER.

mkdir -p ~/R/x86_64-pc-linux-gnu-library/4.2/

Be sure to replace x86_64-pc-linux-gnu and 4.2 with your specific platform and R version, respectively. Running the command R --version should show you something like the below.

R version 4.2.3 (2023-03-15) -- "Shortstop Beagle"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

Next, start R and enter the following commands into the REPL to install the packages.

> install.packages("renv", repos = "https://cloud.r-project.org")
> renv::restore()

The data can now be compiled with

R --file=compile_fluorescence_data.R

The compile_fluorescence_data.R script is due to Joe Guinness.

Running

Files can be run as modules:

python -m experiments.cholesky
python -m figures.factor
python -m tests.cknn_tests

conditional-knn's People

Contributors

f-t-s avatar stephen-huan avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.