Code Monkey home page Code Monkey logo

Comments (4)

chrisdane avatar chrisdane commented on May 26, 2024

Yes I think so =)
Generally I am looking for data manipulation/arithmetic/subsetting as lazy as possible, i.e. applying as many methods as possible without using the memory. Ideally, this would include subsetting, but maybe this is technically nonsense ^_^:

methods(class="FileArray")
 [1] $          $<-        [          [<-        [[         apply     
 [7] as.array   coerce     dim        dimnames   dimnames<- fwhich    
[13] initialize length     mapreduce  max        min        range     
[19] show       subset     sum        typeof

Btw, would it be possible to lazy-load a netcdf file as FileArray? This seems possible with the stars package (called "proxy" there):

library(stars)
proxy <- stars::read_stars(system.file("nc/reduced.nc", package="stars"), proxy=T) # lazy-load nc file
message("proxy obj needs ", format(utils::object.size(proxy), units="auto"))
#proxy obj needs 12.3 Kb
stars <- stars::st_as_stars(proxy) # convert to accessible data = use memory
message("stars obj needs ", format(utils::object.size(stars), units="auto"))
#stars obj needs 519.3 Kb
methods(class="stars_proxy")
 [1] Math            Ops             [               [<-            
 [5] [[<-            adrop           aggregate       aperm          
 [9] as.data.frame   c               coerce          dim            
[13] droplevels      filter          hist            initialize     
[17] is.na           merge           plot            predict        
[21] print           show            slotsFromS3     split          
[25] st_apply        st_as_sf        st_as_stars     st_crop        
[29] st_dimensions<- st_downsample   st_mosaic       st_redimension 
[33] st_sample       st_set_bbox     write_stars

Thanks a lot for your great work!

from filearray.

dipterix avatar dipterix commented on May 26, 2024

Generally I am looking for data manipulation/arithmetic/subsetting as lazy as possible, i.e. applying as many methods as possible without using the memory. Ideally, this would include subsetting

That sounds like a good idea. There will be some limitations to the types of methods available. point-wise methods such as +-*/>< should be easiest. Indexing could be a little bit tricky (arr[arr>0.5]) but doable. Other methods (such as tensor decomposition or matrix multiplication) will not be implemented at this time.

but maybe this is technically nonsense ^_^:

No you are good. Glad you brought up this feature request.

Btw, would it be possible to lazy-load a netcdf file as FileArray?

Not natively. I think you can convert the arrays though. I'm not very familiar with the low-level implementation of netCDF... From what I have read, it seems netCDF or hdf5 are often chunked for random access.

FileArray has its own format. The file IO of filearray is written from scratch to make sure sequential IO is as fast as possible on NVMe SSD (2-4GB/s on Mac M1/M2, or 1GB/s on average windows).

The performance comes with costs. For example, random access is relatively slow. filearray does not use universal file formats that can be read from other programs. The data array is only expandable along the last margin... If you are OK with these disadvantages, or have alternative methods to get around, filearray should be a great tool for out-of-memory analyses (my project often needs to analyze 200x200x300x300+ data arrays within seconds.)

from filearray.

dipterix avatar dipterix commented on May 26, 2024

Hi @chrisdane I have added this experimental feature to branch https://github.com/dipterix/filearray/tree/lazyeval

Would you mind helping me check this branch to see if there is method that you want to support? Also please let me know if you find any bugs :)

You can install and compile this dev branch via

remotes::install_github("dipterix/filearray@lazyeval")

If you run on Windows, rtools is needed to compile. For osx, please run xcode-select --install in terminal to install building tools.

Here's a sanity test:

> x <- as_filearray(1:24, dimension = c(4,6))

> y <- (2^(x - 1) + log(x)) > 10000 | x <= 2

> print(y)
Reference class object of class "FileArrayProxy"
Mode: readwrite 
UUID: 0005-640eaaf8-c6e7-4f55-aa6e-2956a872155c (depth=5)
Dimension: 4x6 
Partition count: 6 
Partition size: 1 
Data type: logical 
Internal type: integer 
Location: $TEMPDIR/tmpfilearray11ef51b065fe9.farr 

> x[y]
 [1]  1  2 15 16 17 18 19 20 21 22 23 24

> # Sanity check
> x[][(2^(x[] - 1) + log(x[])) > 10000 | x[] <= 2]
 [1]  1  2 15 16 17 18 19 20 21 22 23 24

from filearray.

dipterix avatar dipterix commented on May 26, 2024

Added as of 0.1.6

from filearray.

Related Issues (4)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.