Comments (4)
Just trying to understand the example here:
In the Pandas example, you're sorting before log / diff:
# pandas
df.sort_values('date').groupby('code').close.transform(np.log).diff()
But in the Polars example, you're sorting after log / diff:
# polars
pldf.with_columns(
pl.col('close').log().diff().sort_by('date').over('code').alias("log_return1")
)
Should it not come before log/diff in order to be equivalent to the Pandas example?
pldf.with_columns(
pl.col('close').sort_by('date').log().diff().over('code').alias("log_return1")
)
from polars.
@cmdlineluser I thought pl.Expr
is a lazy func like in spark
Something like
w = Window().partitionBy(['store_id', 'product_id', 'date']).orderBy(col('time_create').desc()
F.avg('sale_count').over(w)
from polars.
Yes, but if you sort after .diff()
it will produce different results?
import polars as pl
import numpy as np
df = pl.from_repr("""
┌─────────────────────┬──────┬───────┐
│ date ┆ code ┆ close │
│ --- ┆ --- ┆ --- │
│ datetime[ns] ┆ str ┆ i64 │
╞═════════════════════╪══════╪═══════╡
│ 2021-01-01 00:04:00 ┆ 0001 ┆ 17 │
│ 2021-01-01 00:01:00 ┆ 0001 ┆ 18 │
│ 2021-01-01 00:02:00 ┆ 0001 ┆ 3 │
│ 2021-01-01 00:06:00 ┆ 0001 ┆ 3 │
│ 2021-01-01 00:05:00 ┆ 0001 ┆ 14 │
│ 2021-01-01 00:03:00 ┆ 0001 ┆ 7 │
│ 2021-01-01 00:09:00 ┆ 0001 ┆ 2 │
│ 2021-01-01 00:00:00 ┆ 0001 ┆ 12 │
│ 2021-01-01 00:08:00 ┆ 0001 ┆ 14 │
│ 2021-01-01 00:07:00 ┆ 0001 ┆ 2 │
└─────────────────────┴──────┴───────┘
""")
Your pandas example:
(df.to_pandas()
.sort_values('date')
.groupby('code')
.close.transform(np.log).diff()
)
# 7 NaN
# 1 0.405465
# 2 -1.791759
# 5 0.847298
# 0 0.887303
# 4 -0.194156
# 3 -1.540445
# 9 -0.405465
# 8 1.945910
# 6 -1.945910
# Name: close, dtype: float64
sorting before/after diff:
df.select(
yes = pl.col('close').sort_by('date').log().diff().over('code'),
no = pl.col('close').log().diff().sort_by('date').over('code')
)
# shape: (10, 2)
# ┌───────────┬───────────┐
# │ yes ┆ no │
# │ --- ┆ --- │
# │ f64 ┆ f64 │
# ╞═══════════╪═══════════╡
# │ null ┆ 1.791759 │
# │ 0.405465 ┆ 0.057158 │
# │ -1.791759 ┆ -1.791759 │
# │ 0.847298 ┆ -0.693147 │
# │ 0.887303 ┆ null │
# │ -0.194156 ┆ 1.540445 │
# │ -1.540445 ┆ 0.0 │
# │ -0.405465 ┆ -1.94591 │
# │ 1.94591 ┆ 0.154151 │
# │ -1.94591 ┆ -1.252763 │
# └───────────┴───────────┘
from polars.
Oh, I understand. diff
is different from other agg functions, depend on order. The new value would not remember the corresponding position of date
.
Thank you for clarification.
from polars.
Related Issues (20)
- Calling `explode()` on multiple columns of a dataframe slice throws an error.
- Printing a LazyFrame triggers resolving to the IR HOT 1
- Documentation suggests you can use `<`, `>`, `==`, `<=`, `>=` operators to construct `Expr` in Rust but that's fundamentally not possible.
- cannot find function `as_struct` in this scope HOT 2
- QOL improvements for .rolling
- Include example with function accepting multiple arguments in `Expr.map_batches`
- Documentation issue in `normalize`/`name` parameter from `.value_counts()` method HOT 2
- writing to os.devnull
- `Series[list].explode()` should not return `None` for empty lists HOT 2
- write_database to snowflake with adbc engine spouts context canceled error log
- Cannot tell if hvplot version 0.10.0 >= 0.9.1 HOT 2
- dtype 'Time' gets converted to i64 when collect(streaming=True) is used. HOT 1
- `.agg_groups()` PanicException when not used in a group_by context
- Additional Parameter for json_normalize HOT 5
- fold shouldn't require that acc and exprs share the same dtype
- Adding `descending` parameter to `Expr.over` HOT 5
- polars.LazyFrame.head recommends using fetch() HOT 1
- Reading large json file error: ComputeError: InputTooLarge at character 0
- Serialize for AnyType has a todo!() HOT 1
- File cache invalidation not triggered for HTTP if size is the same
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from polars.