ajcr / 100-pandas-puzzles Goto Github PK
View Code? Open in Web Editor NEW100 data puzzles for pandas, ranging from short and simple to super tricky (60% complete)
License: MIT License
100 data puzzles for pandas, ranging from short and simple to super tricky (60% complete)
License: MIT License
It would be nice if these exercises would have a license, so one knows under which conditions one can make use of them.
I don't have any particular license in mind myself, and of course that's not my call to make, tough in the name of reducing license proliferation I would suggest to use the same license as pandas itself uses: https://github.com/pandas-dev/pandas/blob/master/LICENSE .
The solution below (Q29-2) output wrong answer when I input dataframe whose value starts with zero.
x = (df['X'] != 0).cumsum()
y = x != x.shift()
df['Y'] = y.groupby((y != y.shift()).cumsum()).cumsum()
In this code, Series y
has to have True
where its value is not zero and False
otherwise.
However, the first value of y
become True
in any case.
e.g.
df1 = pd.DataFrame({'X': [0, 2, 0, 3]})
df2 = pd.DataFrame({'X': [1, 2, 0, 3]})
x = (df1['X'] != 0).cumsum()
y = x != x.shift()
print(y[0])
x = (df2['X'] != 0).cumsum()
y = x != x.shift()
print(y[0])
outputs
True
True
This bug can be fixed by replacing first two lines into y = df['X'] != 0
Here's the code to compare the results between the solution 1 , solution 2 and modified solution2.
import pandas as pd
import numpy as np
df = pd.DataFrame({'X': [0, 2, 0, 3, 4, 2, 5, 0, 3, 4]})
def solution1(df):
izero = np.r_[-1, (df['X'] == 0).nonzero()[0]] # indices of zeros
idx = np.arange(len(df))
return pd.Series(idx - izero[np.searchsorted(izero - 1, idx) - 1])
def solution2(df):
x = (df['X'] != 0).cumsum()
y = x != x.shift()
return y.groupby((y != y.shift()).cumsum()).cumsum()
def solution2_modified(df):
y = df['X'] != 0
return y.groupby((y != y.shift()).cumsum()).cumsum()
check_df = pd.concat([df, solution1(df), solution2(df), solution2_modified(df)], axis=1)
check_df.columns = ['input_df', 'solution1', 'solution2', 'solution2_modified']
display(check_df)
input_df | solution1 | solution2 | solution2_modified |
---|---|---|---|
0 | 0 | 1 | 0 |
2 | 1 | 2 | 1 |
0 | 0 | 0 | 0 |
3 | 1 | 1 | 1 |
4 | 2 | 2 | 2 |
2 | 3 | 3 | 3 |
5 | 4 | 4 | 4 |
0 | 0 | 0 | 0 |
3 | 1 | 1 | 1 |
4 | 2 | 2 | 2 |
I executed these code with Python 3.6.7 & pandas 0.24.0.
Thanks for the project.
When i working with question#21 using pandas1.2.4. It needs to fillna first.
df['age'] = df['age'].fillna(0)
df.pivot_table(index='animal', columns='visits', values='age', aggfunc='mean')
df[.loc[df['A'].shift() != df['A']]]
throws an SyntaxError.
Instead, df[df['A'].shift() != df['A']]
works.
df.loc['k'] = [5.5, 'dog', 'no', 2]
I think it would be better to add new row accoring to the columns as follows:
df.loc['k'] = ['dog', 5.5, 2, 'no',]
Since The Data looks like:
animal | age | visits | priority |
---|---|---|---|
cat | 2.5 | 1 | yes |
cat | 3.0 | 3 | yes |
snake | 0.5 | 2 | no |
dog | NaN | 3 | yes |
I was able to get the correct result with the following:
df.groupby('grps')['vals'].nlargest(3).groupby('grps').sum()
but I'm sure there's a more elegant way to do it than by using the groupby
method twice in one line.
The following code raises error with pandas 1.5.3:
df['adjacent'] = (counts - mine_grid).ravel('F')
reporting pandas DataFrame doesn't have any method named ravel.
How about correcting it as:
df['adjacent'] = (counts - mine_grid).values.ravel('F')
The built-in drop_duplicates makes more easier to accomplish :)
- A DataFrame has a column of groups 'grps' and and column of numbers 'vals'. For each group, find the sum of the three greatest values.
The solution starts with sorting the 'vals' column - this is not needed. The nlargest
method selects the three greatest values irrespective of the order of element.
Suggestion: delete the sorting, the solution is provided by just by the second line of code.
Pretty much what the title says
Hi Alex,
As a quest to better learn pandas I created a series of exercises in a different form than yours.
I would like to know if you might to be interested to contribute in any way to my repo or if I can use the your exercises.
Thanks
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.