General Operating System : Linux <s

Memory usage increases across multiple `parallel_apply` about pandarallel HOT 2 OPEN

hogan-roblox commented on June 10, 2024

Memory usage increases across multiple `parallel_apply`

from pandarallel.

Comments (2)

hogan-roblox commented on June 10, 2024 1

I have some updates on this -- it seems that pandarallel.initialize(progress_bar=True, nb_workers=120) has to be re-executed between different data frames. Is it expected?

The below updated code somehow solves the issue for me.

for file_path in file_paths:
    pandarallel.initialize(progress_bar=True, nb_workers=120)
    df = pd.read_csv(file_path)
    df = pd.DataFrame.from_dict(
        df.sample(frac=1.0).parallel_apply(SOME_FUNCTION, axis=1).to_dict(),
        orient="columns",
    )

This issue is no longer a blocker for me, but I would like to leave open for a while to see if someone else has the same issue and whether this is an expected behavior.

from pandarallel.

shermansiu commented on June 10, 2024

Could you please attach a sample CSV and the simplest SOME_FUNCTION for which you can reproduce your error?

I'm unable to reproduce your problems with the memory usage.

Python: 3.10.13
Pandarallel: 1.6.5
Pandas: 2.2.0

import pandas as pd
import pandarallel

pandarallel.pandarallel.initialize(progress_bar=True, nb_workers=120)

for _ in range(10):
    df = pd.DataFrame({"foo": range(100_000)})
    df = pd.DataFrame.from_dict(
        df.sample(frac=1.0).parallel_apply(lambda x: x+1, axis=1).to_dict(),
        orient="columns",
    )

You mentioned that this issue is no longer a blocker for you, so if you don't reply in a while, this issue should probably be closed.

from pandarallel.

Recommend Projects

Memory usage increases across multiple `parallel_apply` about pandarallel HOT 2 OPEN

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent