Comments (7)
It looks like “where” is already a method in Pandas. It does not behave as a SQL equivalent, however.
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#the-where-method-and-masking
from pyjanitor.
Nice! How will this add to the API than merely doing df[df.a < df.b]
though?
What I find nice about the above syntax is that since the above is a list of [True, True, False, False, etc] you can do the logic outside of the brackets and add it later, like this: my_logic = df.a < df.b
df[my_logic]
.
I feel that when working with strings, it's harder to work out more complex logic than just simple situations.
But I could be mistaken if there's a specific situation you're thinking about.
from pyjanitor.
I think the API can be helpful for R users who are new to the pandas-land.
Riffing off your example, originally, we would write this kind of pandas code:
df = pd.DataFrame(...)
df = df[df['a'] < df['b']] # or df[df.a < df.b]
Our current implementation of filter_on
doesn't do any kind of symbolic evaluation, which means there is a block of discontinuity in the method chaining.
df = pd.DataFrame(...)
df = df.filter_on(df['a'] < df['b'])
By wrapping the .query()
API, we now get:
df = pd.DataFrame(...).filter_on('a < b')
The query API also provides other boolean logic (&
|
~
), as well as arithmetic comparators (the <
=
>
). I doubt that 80% of our users will end up doing complex logic, though. Most likely it'll be "filter for this" and then "filter for that" (which are equivalent to logical ANDs, basically).
Naturally, the selector way of filtering a dataframe is not precluded!
from pyjanitor.
Re-opening, so it shows up in the issues tracker. We can continue the discussion regardless!
from pyjanitor.
I know this is implemented already, but to add to my SQL campaign, I would vote to change the name to where
.
I think
df = pd.DataFrame(...).where('a < b')
reads a bit easier.
And I think all you need to do is add where = filter_on
in the functions.py
file to add the alternate name without breaking legacy code. (Just need to check if the pandas_flavor
changes that at all, but I don't think so.)
from pyjanitor.
I vote in favour of the alias! Particularly because that just happens to be how I'm thinking about it as well. I'm in the midst of some heavy work this morning; would you be open to putting in the PR, @szuckerman?
from pyjanitor.
Good catch, @jcmkk3!
from pyjanitor.
Related Issues (20)
- RuntimeWarning: subpackages can technically be lazily loaded HOT 16
- explode_levels
- Not able to import janitor.clean_name function - ImportError: cannot import name 'ABCPandasArray' from 'pandas.core.dtypes.generic' HOT 2
- Typos in repository
- expand function
- [INFRA] Switch over to pyproject.toml
- Support efficient json extraction within a pandas column HOT 1
- [ENH] implement full numba version of a single conditional_join
- deprecation warning for pivot_longer HOT 1
- Return only matching indices for `conditional_join`
- [ENH] cython a subset of _range_join_indices and equi join HOT 4
- extend `col` powers for index selection HOT 1
- dtype conversion on index
- `conditional_join` fails on mac for `equi-join` and numba HOT 1
- Outdated version in conda forge HOT 1
- extend `row_to_names` to support multiindex
- `sheet_name` not required in jn.xlsx_table
- Problems with equalities in contional_join HOT 18
- Make clean_names() compatible with polars and geopandas dataframes HOT 6
- implement similar functions for polars
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyjanitor.