Code Monkey home page Code Monkey logo

Comments (10)

polakowo avatar polakowo commented on September 15, 2024 1

@wukan1986 thanks, seems like I need to further optimize column mapping to reduce memory footprint. I will release a fix soon.

Edit: @roelio just figured out how to allocate more virtual memory for Linux. The problem with Windows is that it will not overcommit memory and so you cannot allocate memory more than the RAM size + the current pagefile size, as I found here. This prevents vectorbt from defining arbitrarily large empty arrays.

from vectorbt.

polakowo avatar polakowo commented on September 15, 2024

I cannot reproduce the example, it works fine both on my machine and Colab.

from vectorbt.

wukan1986 avatar wukan1986 commented on September 15, 2024

Change num_tests = 2000 to num_tests = 20000 or it only crash on Windows

from vectorbt.

polakowo avatar polakowo commented on September 15, 2024

On my machine (MacBook Air 2017, 2,2 GHz Dual-Core Intel Core i7, 8 GB RAM) num_tests = 2000, num_tests = 20000, and even num_tests = 200000 run fine (although the last one runs pretty slow because of memory swapping). If your machine is Windows 32 bit, its address space is relatively small and that's why you might get errors rather than poor performance. See https://serverfault.com/a/75027 for general info. If that's the case, I have no other tip apart from running memory-expensive code on Colab, as high memory consumption is one of the general cons of vectorbt. You may also want to disable caching globally or run your code in chunks, as illustrated in my article.

from vectorbt.

wukan1986 avatar wukan1986 commented on September 15, 2024

my computer:

AMD Ryzen 7 4800U with Radeon Graphics 1.80 GHz
16.0 GB RAM
Windows 10 Home x64

numba 0.51.2
numpy 1.19.3
pandas 1.0.3
python 3.8.3

from vectorbt.

polakowo avatar polakowo commented on September 15, 2024

Can you check whether you can run the following snippet:

import numpy as np
from numba import njit

@njit
def col_map_nb(col_arr, n_cols):
    col_idxs_out = np.empty((n_cols, len(col_arr)), dtype=np.int_)
    print(col_idxs_out.shape)
    col_ns_out = np.full((n_cols,), 0, dtype=np.int_)

    for r in range(col_arr.shape[0]):
        col = col_arr[r]
        col_idxs_out[col, col_ns_out[col]] = r
        col_ns_out[col] += 1
    return col_idxs_out[:, :np.max(col_ns_out)], col_ns_out

col_map_nb(np.repeat(np.arange(10000), 50), 10000)

from vectorbt.

wukan1986 avatar wukan1986 commented on September 15, 2024

no crash

col_map_nb(np.repeat(np.arange(10000), 50), 10000)`

(10000, 500000)

Process finished with exit code 0

crash

col_map_nb(np.repeat(np.arange(10000), 50), 1)

(1, 500000)

Process finished with exit code -1073741819 (0xC0000005)

disable numba

Traceback (most recent call last):
  File "D:/test_vectorbt/check_numba.py", line 23, in <module>
    col_map_nb(np.repeat(np.arange(10000), 50), 1)
  File "D:/test_vectorbt/check_numba.py", line 17, in col_map_nb
(1, 500000)
    col_idxs_out[col, col_ns_out[col]] = r
IndexError: index 1 is out of bounds for axis 0 with size 1

from vectorbt.

wukan1986 avatar wukan1986 commented on September 15, 2024

disable numba

the iloc need lot of memory

Traceback (most recent call last):
  File "D:/test_vectorbt/demo_crash.py", line 71, in <module>
    rb_portfolio.iloc[0]
  File "D:\Users\Kan\miniconda3\envs\py38_vectorbt\lib\site-packages\vectorbt\base\indexing.py", line 24, in __getitem__
    return self._indexing_func(lambda x: x.iloc.__getitem__(key), **self._indexing_kwargs)
  File "D:\Users\Kan\miniconda3\envs\py38_vectorbt\lib\site-packages\vectorbt\portfolio\base.py", line 435, in _indexing_func
    new_order_records = self._orders._col_idxs_records(col_idxs)
  File "D:\Users\Kan\miniconda3\envs\py38_vectorbt\lib\site-packages\vectorbt\records\base.py", line 300, in _col_idxs_records
    self.values, self.col_mapper.col_map, to_1d(col_idxs))
  File "D:\Users\Kan\miniconda3\envs\py38_vectorbt\lib\site-packages\vectorbt\utils\decorators.py", line 215, in __get__
    val = self.func(instance)
  File "D:\Users\Kan\miniconda3\envs\py38_vectorbt\lib\site-packages\vectorbt\records\col_mapper.py", line 67, in col_map
    return nb.col_map_nb(self.col_arr, len(self.wrapper.columns))
  File "D:\Users\Kan\miniconda3\envs\py38_vectorbt\lib\site-packages\vectorbt\records\nb.py", line 99, in col_map_nb
    col_idxs_out = np.empty((n_cols, len(col_arr)), dtype=np.int_)
MemoryError: Unable to allocate 21.6 GiB for an array with shape (11000, 527993) and data type int32

from vectorbt.

polakowo avatar polakowo commented on September 15, 2024

Check whether it can run now.

Apart from core optimizations, Portfolio.from_* methods have two additional arguments: max_orders and max_logs. If you run into memory errors and any of them is caused by initializing order_records or log_records arrays, you can set the maximum number of records you expect the simulation to generate.

By default, if you have data of shape (1000, 1000), vectorbt will generate 1000 * 1000 = 1000000 empty records. But most of the time, you need only a fraction of that (for example, if buy and hold, you need only 1000 records - one per column), so this fraction can now be specified to consume less memory. If you don't need logging and you use Portfolio.from_order_func, set max_logs to 0.

from vectorbt.

wukan1986 avatar wukan1986 commented on September 15, 2024

Than you very much!

from vectorbt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.