Code Monkey home page Code Monkey logo

learnr's Introduction

Hi there, I'm Ashish 👋

⚡ I love applied maths, programming, data science, and books

  • 🌱 I’m addicted to learning and growing every day

  • 🌍 I am currently sharing a little bit of my knowledge to the world through my blog.

  • ✏️ I am current working on mixed data clustering

  • Connect with me on:

  • 📫 Learn more about me on:

Ashish's GitHub stats

learnr's People

Contributors

duttashi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

learnr's Issues

How to group factor levels?

This Q was originally asked on SO. I'm reproducing it here for referencing purpose:

Suppose a dataset has a factor column with values like;

> mydata                    
   question id           value
1         1  1      not likely
2         2  1      not likely
3         3  1      not likely
4         4  1      not likely
5         5  1 slightly likely
6         1  2     very likely
7         2  2 slightly likely
8         3  2 slightly likely
9         4  2      not likely
10        5  2     very likely

So how do I group the factor levels for variable value into say two levels ?

How to replace multiple summarize statements by a custom function?

This question was originally asked on SO. Reproducing it here for reference purpose only.

A minimum example:

library(tidyverse)
col1 <- c("UK", "US", "UK", "US")
col2 <- c("Tech", "Social", "Social", "Tech")
col3 <- c("0-5years", "6-10years", "0-5years", "0-5years")
col4 <- 1:4
col5 <- 5:8

df <- data.frame(col1, col2, col3, col4, col5)

result1 <- df %>% 
  group_by(col1, col2) %>% 
  summarize(sum1 = sum(col4, col5))

result2 <- df %>% 
  group_by(col2, col3) %>% 
  summarize(sum1 = sum(col4, col5))

result3 <- df %>% 
  group_by(col1, col3) %>% 
  summarize(sum1 = sum(col4, col5))

Set up library path location for installed packages, remove installed packages (clean up) and install only required packages

Last evening I decided to update the installed packages. During the update process, a prompt came up, do you want to install from sources the packages which need compilation r. I chose the option, yes and boy, this messed up everything in RStudio.

Lesson learnt: if such a message pop's up, choose No. See this post for reference.

Another problem was I had never set up the library path for the installed packages. So, I needed to set the library path. And finally, I needed to tweak the .Rprofile.site file. For windows OS it is located in, C:\Program Files\R\R-3.5.0\etc

"warning message: position_dodge requires non-overlapping x intervals", when plotting a boxplot

> str(data_balanced)
'data.frame':	610 obs. of  10 variables:
 $ VisitRsrc  : int  11 22 91 90 41 64 25 61 25 80 ...
 $ raisedhands: int  2 20 90 80 27 62 8 7 15 20 ...

> ggplot(data = data_balanced, aes(x=VisitRsrc, y=raisedhands, fill=gender)) +
+   geom_boxplot()+
+   coord_flip()+
+   scale_fill_discrete(name="Gender")+
+   facet_grid(~Relation)
Warning messages:
1: position_dodge requires non-overlapping x intervals 
2: position_dodge requires non-overlapping x intervals 

Issue: "warning message: position_dodge requires non-overlapping x intervals", when plotting a box plot is generated. This warning is generated, when plotting continuous variables on both x and y- axis

Summarize every column in a data frame and ignore the missing values

Suppose a given dataframe contains missing values.

# inject random NA values in the mtcars dataset
mtcarsNew<- as.data.frame(lapply(mtcars, function(cc) 
  cc[ sample(c(TRUE, NA), prob = c(0.85, 0.15), size = length(cc), replace = TRUE)
      ]
  )
  )
# total missing values
R> sum(is.na(mtcarsNew))
[1] 44
R> colnames(is.na(mtcarsNew))
 [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear" "carb"

How to summarize every column in a dataframe such that it ignores the missing values

How to rename single or multiple column names in a data frame?

Often in a data analysis project, there arises a need or requirement to rename the column name in either a single or multiple column. An example is given below;

# create a data frame
> df<- data.frame(sample(4,size = 4, replace = TRUE),
                sample(4,size = 4, replace = TRUE),
                sample(4,size = 4, replace = TRUE),
                sample(4,size = 4, replace = TRUE)
                )
# show the column names
> colnames(df)
[1] "sample.4..size...4..replace...TRUE."  
[2] "sample.4..size...4..replace...TRUE..1"
[3] "sample.4..size...4..replace...TRUE..2"
[4] "sample.4..size...4..replace...TRUE..3"

As you can see the column names suck.. Need to make them more meaningful. How to do this?

System error Rterm: missing libatk-1.0-0.dll

Yesterday, when I launched the RStudio on my computer, I was greeted with this error message; To the best of my knowledge I did not change anything. I'm running RStudio version 1.0.136 and R version 3.3.3

Rterm.exe - System Error. 
  The program can't start because libatk-1.0-0.dll is missing from your computer. 
  Try reinstalling the program to fix this problem. 

. Clicking the OK button on the error message, will not close it. And, RStudio will not work any longer.

How to separate Date into year, month and date?

The dataframe looks like the following;

> str(dengue.train$week_start_date) Factor w/ 1049 levels "1990-04-30","1990-05-07",..: 1 2 3 4 5 6 7 8 9 10 ...

As we can see, the variable, week_start_date is read in Factor or String format. How do I change it numeric format?

How to collapse rows with same identifier and retain non-empty column values?

This question was originally asked on SO

Question

How to collapse (or merge?) rows with the same identifier and retain the non-empty (here, any nonzero values) values in each column?

Data

df = data.frame(produce = c("apples","apples", "bananas","bananas"),
                grocery1=c(0,1,1,1),
                grocery2=c(1,0,1,1),
                grocery3=c(0,0,1,1))

Desired output

 shopping grocery1 grocery2 grocery3
1   apples        1        1        0
2  bananas        1        1        1

How to sort a dataframe by column(s)?

I want to sort a data.frame by multiple columns. For example, with the data.frame below I would like to sort by column z (descending) then by column b (ascending):

dd <- data.frame(b = factor(c("Hi", "Med", "Hi", "Low"), 
      levels = c("Low", "Med", "Hi"), ordered = TRUE),
      x = c("A", "D", "A", "C"), y = c(8, 3, 9, 9),
      z = c(1, 1, 1, 2))
dd
    b x y z
1  Hi A 8 1
2 Med D 3 1
3  Hi A 9 1
4 Low C 9 2

Extracting multiple variables from multiple dataframes?

This question was originally asked on SO

Question: Suppose there are n dataframes (in this case 3). How to extract variables which appear in all n dataframes?

Dataset

df1 <- structure(list(Variable = c("a", "g", "e"), Val = c(0.9, 0.3, 
0.1)), class = "data.frame", row.names = c(NA, -3L))

df2 <- structure(list(Variable = c("h", "a", "e"), Val = c(0.2, 0.7, 
0.9)), class = "data.frame", row.names = c(NA, -3L))

df3 <- structure(list(Variable = c("z", "a", "e"), Val = c(0.5, 0.7, 
0.9)), class = "data.frame", row.names = c(NA, -3L))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.