Comments (2)
Converting from Arrow is not always zero copy. We have a different string representation than what most existing Arrow implementations have. So the behavior here is expected.
from polars.
Hi, @stinodego,
I still have the following questions:
- If it's like that, maybe the documentation should be explicit about that, no? I think string is a pretty common type and I think one would understand that the conversion is zero-cost.
- In this example it's implied that the whole dataframe is copied. Because the memory required is double the dataframe size. If it's because of what you say, shouldn't it be only the string columns?
- Is there anything we can do to circumvent this? Like using certain data type with pyarrow.
- Will something similar happen with categories? Or is converting to categories first a good alternative if the number of different values of the string columns are small.
Thanks in advance,
from polars.
Related Issues (20)
- Merge list of dataframes with common keys HOT 4
- Request for Inequality operator to handle Null values as-well HOT 3
- Polars' rust parquet engine reads/writes files that are unreadable by duckdb/pandas/pyarrow `(use_pyarrow=True)` HOT 2
- `pl.cum_count` doesn't
- Transpose option for `DataFrame.describe()`
- Adding “Rounding half to even”
- `the name 'literal' is duplicate` when selecting a multi-element NumPy array or list
- Select expression with .when and .then statements gives incorrect results depending on preceding row. HOT 8
- Concat of columns with lists of objects raises error HOT 3
- Append scalar column to list column HOT 2
- `pl.concat` inside `.agg()` raises InvalidOperationError - output length must be equal
- Make `int_range()` and `int_ranges()` work with no inputs and default to `int_range(0, pl.len())` HOT 2
- map_elements applied to dataframe with empty column or batch with empty column returns series with length 0. HOT 5
- `scan_csv`does not support a list of datatypes in `schema_overrides`
- `read_csv` and `read_ipc` do not use native `storage_options` configuration keys
- DataFrame construction from numpy with dtype object
- Arithmetic with nested arrays gives wrong results
- Rename: Support for ignore_missing parameter HOT 1
- Cloud paths with square brackets in paths are not treated as non-glob paths, even with `glob=False`
- `.list.to_struct()` PanicException when used on non-list column
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from polars.