Code Monkey home page Code Monkey logo

100-pandas-puzzles's Issues

Please add a license file

It would be nice if these exercises would have a license, so one knows under which conditions one can make use of them.

I don't have any particular license in mind myself, and of course that's not my call to make, tough in the name of reducing license proliferation I would suggest to use the same license as pandas itself uses: https://github.com/pandas-dev/pandas/blob/master/LICENSE .

The second solution to Q29 does not work propery.

The solution below (Q29-2) output wrong answer when I input dataframe whose value starts with zero.

x = (df['X'] != 0).cumsum()
y = x != x.shift()
df['Y'] = y.groupby((y != y.shift()).cumsum()).cumsum()

In this code, Series y has to have True where its value is not zero and False otherwise.
However, the first value of y become True in any case.

e.g.

df1 = pd.DataFrame({'X': [0, 2, 0, 3]})
df2 = pd.DataFrame({'X': [1, 2, 0, 3]})

x = (df1['X'] != 0).cumsum()
y = x != x.shift()
print(y[0])

x = (df2['X'] != 0).cumsum()
y = x != x.shift()
print(y[0])

outputs

True
True

This bug can be fixed by replacing first two lines into y = df['X'] != 0

Here's the code to compare the results between the solution 1 , solution 2 and modified solution2.

import pandas as pd
import numpy as np
df = pd.DataFrame({'X': [0, 2, 0, 3, 4, 2, 5, 0, 3, 4]})

def solution1(df):
    izero = np.r_[-1, (df['X'] == 0).nonzero()[0]] # indices of zeros
    idx = np.arange(len(df))
    return pd.Series(idx - izero[np.searchsorted(izero - 1, idx) - 1])

def solution2(df):
    x = (df['X'] != 0).cumsum()
    y = x != x.shift()
    return y.groupby((y != y.shift()).cumsum()).cumsum()

def solution2_modified(df):
    y = df['X'] != 0
    return y.groupby((y != y.shift()).cumsum()).cumsum()

check_df = pd.concat([df, solution1(df), solution2(df), solution2_modified(df)], axis=1)
check_df.columns = ['input_df', 'solution1', 'solution2', 'solution2_modified']
display(check_df)
input_df solution1 solution2 solution2_modified
0 0 1 0
2 1 2 1
0 0 0 0
3 1 1 1
4 2 2 2
2 3 3 3
5 4 4 4
0 0 0 0
3 1 1 1
4 2 2 2

I executed these code with Python 3.6.7 & pandas 0.24.0.

NaN problem with question 21

Thanks for the project.
When i working with question#21 using pandas1.2.4. It needs to fillna first.

df['age'] = df['age'].fillna(0)
df.pivot_table(index='animal', columns='visits', values='age', aggfunc='mean')

Correction for Q16 (partA)

  1. Append a new row 'k' to df with your choice of values for each column.
df.loc['k'] = [5.5, 'dog', 'no', 2] 

I think it would be better to add new row accoring to the columns as follows:

df.loc['k'] = ['dog', 5.5, 2, 'no',] 

Since The Data looks like:

animal age visits priority
cat 2.5 1 yes
cat 3.0 3 yes
snake 0.5 2 no
dog NaN 3 yes

Solution to Question 27 is no longer supported.

I was able to get the correct result with the following:

df.groupby('grps')['vals'].nlargest(3).groupby('grps').sum()

but I'm sure there's a more elegant way to do it than by using the groupby method twice in one line.

Correction Request for solution to Q53:

The following code raises error with pandas 1.5.3:

df['adjacent'] = (counts - mine_grid).ravel('F')

reporting pandas DataFrame doesn't have any method named ravel.

How about correcting it as:

df['adjacent'] = (counts - mine_grid).values.ravel('F')

Sorting not needed in solution to question 27.

  1. A DataFrame has a column of groups 'grps' and and column of numbers 'vals'. For each group, find the sum of the three greatest values.

The solution starts with sorting the 'vals' column - this is not needed. The nlargest method selects the three greatest values irrespective of the order of element.

Suggestion: delete the sorting, the solution is provided by just by the second line of code.

Join forces?

Hi Alex,

As a quest to better learn pandas I created a series of exercises in a different form than yours.
I would like to know if you might to be interested to contribute in any way to my repo or if I can use the your exercises.

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.