Comments (5)
You can control the way polars
interprete 2D data input with orient
keyword:
x2 = [[1, 2], [3, 4]]
df1 = pl.DataFrame(x2, schema=['c0', 'c1'], orient="row")
Note that for a square matrix like yours both orientation are valid and make sense. Although the default orientation for polars
(orient="col"
) here differs from that of pandas
, which may confuse users from pandas
.
from polars.
Yup, this is not a bug. Polars/Arrow are column-oriented by design, so when there is ambiguity (same number of rows/columns and the schema types don't help and you have not set the "orient" parameter), "col" will be the default.
This is detailed in the DataFrame
docstring:
orient{‘col’, ‘row’}, default None
Whether to interpret two-dimensional data as columns or as rows. If None,
the orientation is inferred by matching the columns and data dimensions.
If this does not yield conclusive results, column orientation is used.
Note that, if you have the option, column data will load more efficiently; otherwise, set orient="row"
explicitly to avoid the need for any data-level inference 👍
from polars.
@alexander-beedie Another (relevant) confusion as a polars
newbie - when the 2D list is converted as numpy array, it's default orientation becomes "row"
, and this behavior change seems not documented:
x2 = [[1, 2], [3, 4]]
pl.DataFrame(x2, schema=['c0', 'c1'])
# shape: (2, 2)
# ┌─────┬─────┐
# │ c0 ┆ c1 │
# │ --- ┆ --- │
# │ i64 ┆ i64 │
# ╞═════╪═════╡
# │ 1 ┆ 3 │
# │ 2 ┆ 4 │
# └─────┴─────┘
pl.DataFrame(np.asarray(x2), schema=['c0', 'c1'])
# shape: (2, 2)
# ┌─────┬─────┐
# │ c0 ┆ c1 │
# │ --- ┆ --- │
# │ i64 ┆ i64 │
# ╞═════╪═════╡
# │ 1 ┆ 2 │
# │ 3 ┆ 4 │
# └─────┴─────┘
If I were to suggest, such a silent behavior change is often dangerous (as a full-time ML engineer, I spent way more time debugging a silent behavior change than fixing a noisy spam of warnings, silent killer is a true evil...), it would be consistent across input class or well-documented at least.
from polars.
Thank you both for clarifying! I'm leaving the issue open due to @cjackal's observation about the inconsistency. Once that is resolved, whoever would like can close the issue.
from polars.
People bump their heads on this one all the time. This must be the 5th issue with this exact complaint.
I think it's time to flip the switch and use row-orientation by default for sequence-of-sequences (if we cannot infer that it should be column-oriented). It just makes sense to parse these as rows - we have the dict format for column-oriented input. And we do the same for NumPy inputs.
@ritchie46 What do you think?
from polars.
Related Issues (20)
- Projection pushdown not working for AnonymousScan when filtering on calculated column HOT 4
- `.implode` + `.over` + `.list.set_intersection` PanicException left == right failed. HOT 1
- polars-lazy fails to compile with `super::get_glob_start_idx` error HOT 4
- Unable to build project with 0.41.0 or 0.41.1: error[E0277] with group_join_inner in polars-ops HOT 14
- Unable to build the crate with the lazy feature! V.0.41.1 HOT 2
- Issue when collecting df
- Mention required feature flags for plotting / convert to pandas without PyArrow if possible HOT 2
- aho-corasick `.str.extract_many()`
- Broadcast operations similar to Pandas / Numpy
- pl.scan_pyarrow_dataset drops timezone information
- Hive partitions are corrupted during reads from cloud storage in Polars 1.0.0-rc.1
- Rust 0.41.1: error[E0412]: cannot find type `CloudOptions` in this scope HOT 4
- Support writing to multiple files in a directory with `write/sink_parquet` HOT 1
- `DataFrame.top_k` not handling nulls correctly in version 1.0.0-rc.1 HOT 2
- String to dateime conversion with custom format HOT 2
- Add `get()` Method for Safe Column Access in DataFrame HOT 1
- Segmentation Fault when plotting with plotly HOT 2
- sink_parquet_cloud doesnt work when updating from 0.40 -> 0.41 HOT 1
- read_csv ignores columns param when target csv has only header
- Panic when importing parquet file HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from polars.