Comments (2)
@stinodego the csv is a read herring. This is a stackoverflow in streaming engine. The sink parquet shares the append
union which currently is implemented recursively per pipeline. 17K in this case.
Even if this is not possible, I expect a clear error not a seg fault. (I thought it was theoretically impossible to get a seg fault with Rust.)
No it is not. And a stackoverflow also isn't handled gracefully that's why you don't get a nice error.
from polars.
Ah ok, that makes sense.
My workaround for now is to process the files in batches (17k CSV to 17 parquet in 17 batches of 1000, then another step to combine 17 parquet files into 1 parquet file.). But for different datasets the limit is lower than 17k.
from polars.
Related Issues (20)
- Wrong results from `floordiv` HOT 9
- Wrong polars dtype for empty Series HOT 4
- `from_pandas` incorrect type for all null columns HOT 1
- partition_by on LazyFrame HOT 3
- `str.split` by an empty string produces incorrect results HOT 10
- Deprecate `str.explode`
- `Array` columns not supported by `extend()`
- Write support for Apache Iceberg HOT 2
- Linting error on `pl.read_csv(...)`: `Argument of type "IO[bytes]"` incompatible with `str | TextIO | BytesIO | Path | BinaryIO | bytes` HOT 4
- Pivot fail when one of the index columns is a list - regression from 0.20.6 HOT 7
- Improve documentation for floordiv
- read_csv issues a misleading warning when using non-utf8 encoding and glob pattern HOT 2
- len/count regression since 0.20.6 - 11x times slower in sample HOT 3
- df.assert_schema(expected_schema) HOT 2
- Allow not using cloudpickle in LazyFrame.serialize() HOT 1
- LazyFrame.deserialize() should document the security implications HOT 2
- `FromIterator` for `Series` should extend to `Option<String>` and `Option<&'a str>`
- Initialize `LazyFrame` from `LazyFrame` HOT 4
- Saving parquet to Google Cloud Storage with `df.write_parquet()` HOT 2
- Make `read_database` and `read_database_uri` consistent
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from polars.