stat157 / questionnaire Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 23.0 530 KB

Stat 157 Questionnaire Data Wrangling

Python 100.00%

questionnaire's People

Contributors

Stargazers

Watchers

questionnaire's Issues

Editting example.cfg

I tried this 'cp example.cfg' command but it returns 'no such file or directory'.
So I guess I need to save this example.cfg locally first, and I saved it to my Project1 folder. I then try that command again locally, without running virtual machine, and it works! But, it does not follow when I run the machine. It still returns 'no such file or directory'. How should I fix this?

And also, for the bConnected Key, do we make up some numbers there? What is the point of doing this? (downloading to own computer? and edit it through virtual machine? but putting fake password there?)

Visualizer-Analyzer Collaboration

In the workflow specified in class, the analyzer does her work, and then the visualizer. However, once the data curator has finished her task, shouldn't the analyzer and visualizer be able to start their work at the same time? We should know the format of the visualizations before starting, just not the specific numbers in the visualization. If anything, isn't it easier to do analysis after the visualizer has finished her work?

Cannot find the spreadsheet in my bConnect drive

Hi,

I have no problem with gspread.login but client.open_by_key complains about SpreadsheetNotFound. I also tried to find it on bConnect drive but cannot find any spreadsheet shared with me there. I'm not sure if I'm allowed to access it. Could you please take a look?

Thanks
Qi

What should a presenter choose for the means of presentation?

MS powerpoint? Prezi?... HTML5?

is there any chosen format that a presenter is required to use?

Installing SciPy/Matplotlib/etc Stack

Mac and Linux Users:
http://www.scipy.org/install.html

Google calendar (B-cal) scheduling separate office hours

Our group would like to schedule a time with you Saturday night but noticed that we can not make edits to the calendar that you shared with us.
Help!
What would you suggest as the best way for us to set up a time with you?

How deep should our analysis of the column we chose be?

Our team chose the column regarding previous courses taken. Although we could do some analysis on the courses taken, we haven't thought of too many more ways to look at the data.

Our team's question was: is it okay to just include simple analyses (e.g., just look at totals, means or medians) in our assignment?

Thanks.

Data Cleaning: Specific Columns

The assignment states that we need to analyze two columns of the data: the learning style plus one additional column. Do we need to clean up the entire data or just the columns we decide to use?

how to edit a file when you vi stat.cfg?

I have an issue regarding the editing of a cfg file

Omitting Data

I have cleaned up the learning styles column so that I have what each student's visual, aural, read/write, and kinesthetic scores. However, out of the 48 students, there are 17 that did not provide their scores on the questionnaire. Is it ok to omit those students' data?

Bug: screen isn't responsive

So I typed in "ipython notebook --no-browser --ip=0.0.0.0"

But it lead me to a screen that isn't particularly responsive to any commands anymore.
The format is:
"
Iphjyhton Notebook

Notebook
Clusters

To import a notbeook, drag a file onto the listing below or click here
"
I'm really new to Python and this screen is not responsive to normal commands. What's the significance of this screen and how do I go about loading example.ipynb?

How to pull up previous ipython file after reopening VM

I had saved a IPython file inside my repository before but now my IPython has no files in it. Do I need to type in a set of commands to get my IPython notebook into a certain repository?

Starting the process

Preliminary Setup Steps:
"sudo pip install gspread"

When I try to type this step, it gives me the error
sudo: pip: command not found

What am I doing wrong...?

CSV File

NOTE: Do NOT simply copy & paste or export a CSV file from the document. You must use the provided example IPython Notebook to start with to access the data via the Google API.

Since our group is using R to do our data cleaning and analysis, we used read.csv to access the data. Do we need to find an alternative way of importing the data even though we are using R? Since the clean up is still using R code.

Other Languages instead of Python

When analyzing the data, can we use languages besides python that we might have more experience in, like R, before giving our analysis to the visualizers and presenters?

How to contribute before actually getting data?

If I am a presenter, how can I help my group before receiving their share of the work? Should I leverage my knowledge of R to plan an overseer role in addition to being the presenter?

1 What is the meaning/purpose of reproducibility for this project?

As stated in the description of this project, the objectives are to visualize data from our questionnaires, to better understand us(Cal Fall 2013 Stat 157 students) and to make our project process reproducible by others. However, I am not sure I understand the meaning of reproducibility in this case. If we are using the same method/code to examine the same sample group again(the same data), our results should be expected to be the same right? But if we are talking about reproducibility of the project process in a different sample group, because we start the project by looking through all our samples, and because we determine our project process/methods/codes based on our samples, our results should be different if we use a different sample group, right? I am just not sure the meaning/purpose when we are talking about reproducibility in this case.

Importing the Google Spreadsheet into Python via gspread

I'm a bit unsure about how to gspread within the iPython notebook. Is there any helpful guide or video demonstration that can be used?

I did independent research on this and found the command gspread.login and tried implementing it as such:

gc = gspread.login('[email protected]','*****')

NameError Traceback (most recent call last)
in ()
----> 1 gc = gspread.login('[email protected]','*****')

NameError: name 'gspread' is not defined

I imported the gspread package so idk what I'm doing wrong. I'm also confused on the application of the ~/stat157.cfg file.

Documentation of Process?

Since a portion of our grade is based on reproducibility of our process, does that mean every member of our group (every role) must submit the source code for their portion? And include step-by-step descriptions for tasks done manually? Is there a particular format for this?

Visualization Tools

For Visualizers, are we allowed to use tools like R to display the information? Or are we expected to use specific tools.

Data Set

Hi the data set size small enough that just using excel to clean the data is faster than figuring out how to clean it with say regular expressions or whatever other more programmatic method. Is it ok if we just manually clean the data in excel?

To what extent do Presenters need to know the details of our group member's methodology?

I'm wondering whether or not the Presenter has to know inside and out all the code and such that our partners generate.

Password for sudo/gspread?

I am trying to install gspread but it keeps asking me for a password. I have no idea what this password is or could be considering this is the first time I'm issuing the command

Using R code within Python.

Part of my group is unable to work in python and must work in R, is there any way to merge within python??

Accessing data through Google API issue

Regarding the note in the Data category in the assignment page:

NOTE: Do NOT simply copy & paste or export a CSV file from the document. You must use the provided example IPython Notebook to start with to access the data via the Google API.

To be more particular, is that referring to the Google spreadsheet API? I looked up at the documentation page and it is available in Java or .NET. I'm not sure if there will be a python version example. So is it possible to do it in Java or export the CSV file so that it can be processed in R. Which method would you suggest? Thanks!

Help! The original example code in ipython isn't working for me anymore

the original example code in ipython isn't working for me anymore (it was working earlier this week). 4th code box and all subsequential code boxes has a star marked on it , like In[ * ]

I had not touched any of the code in the box, so I'm confused as to why this is happening.

To try to solve this, my teammate and I tried a number of things, including restarting the machine, and trying to reclone the repository (which works for my other teammates). I'm getting this error:

I am using the wifi I have in my apartment, and it was working fine before.

My role is the visualizer and i have succeeded in visualizing one graph, but I am unable to finish my second graph because of this problem. If I am unable to fulfill my duty as visualizer by tomorrow night because of this, I was wondering if we could just present what we have, and discuss some of the problems we have in trying to get the second graph to work?

Thanks,
Christina

Trouble getting curator's code to work.

I'm trying to run my partner's code for the data curation, but I'm running into an issue saying that my id doesn't work.

I followed the steps to set the stat157 config file, so I'm not sure what the issue is.

Using Git to collaborate on files

Is anyone familiar with how to use Git on the command line to collaborate on files and send requests? Anyone using git effectively to collaborate with their teams?

How do I undelete a file after deleting git repository and pushing to github?

Spreadsheet Data not shared with me

When I clicked on the link for the spread sheet, it says that the file is not shared with me. My email is [email protected] . And I logged in when I clicked on the link.

-He Ma

How to get file to my virtual machine

copy the example file to my virtual machine.
I tried type the code in my VM: cp example.cfg ~/stat157.cfg it didn't work because there was no file called example.cfg. Since the file is not big, I thought I could just make a new file called stat157.cfg in VM is much easier. So I typed vi stat157.cfg, inside the text editor I typed i, than I could type the context. After I finished, I typed :x to save and exit. Be careful this step, this command will be come up at the bottom, you need to scroll down to find it and enter. (I wasted long time to figure out where is my command)
But I still need transfer example.ipynb to my VM
I tried type git clone http://github.com/MYACCOUNT/questionarie.git it shows error, coun't resolve the host 'github.com' while accessing... which reminds me maybe i couldn't connect it at home, because when I did yesterday install gspread at home it didn't work but it worked at school.
I tried look up online http://stackoverflow.com/questions/11890740/git-error-couldnt-connect-to-host-while-accessing, I still didn't figured it out.
Also I tried gspread.login('email','password') , it showed error too.
So does anybody know how to do it? Thank you!

Authentication Error of the Google Spreadsheet from the config file

I met with Chris earlier today because I failed to run some basic codes on the example ipython notebook (e.g., print "cat"). Later he discovered that nbformat needs to be 2 instead of 3 to work with the apt-get version. So in order to do that, you can be in your questionnaire directory and type "vi example.ipynb" and then change the nbformat from 3 to 2 (By pressing the 'i' key, you can switch to insert mode. Press 'Esc' to return to command mode, and :x to save and exit)

So after that, when I reroad the example python notebook, I am able to see many codes and markdowns. However, I am not able to read the docid of the Google Spreadsheet from the config file. The Authentication Error is "Incorrect username or password". However, these are my correct username and password. I realized there is a space after username: (e.g., username: [email protected]). I tried deleted them, but still, there is an Authentication Error.

Please give me some insights how to fix the problem or fix the problem. Thanks.

How to submit the project

My group have pushed everything to our group repository. Should we send a pull request to stat157 to submit the project? Thanks.

iPython examples?

Are there are useful examples of using iPython as a presentation tool on the web ? I think it would really help some of us understand what our end product should look like and give us an idea of the level of detail required for the presentation.

Clarification on the Visualizer's Role

We discussed in class that the visualizer's role is to digest the cleaned data from the other two roles and make it accessible for the presenter and the audience to easily understand. Does that role simply amount to making graphs, charts, etc (in R, or other programs), or would the visualizers be heavily involved in creating the actual presentation itself (for the presenter?) Or is this something flexible that the vertical group members can decide amongst themselves what the visualizer should do?

I'm aware that tristantao has asked a related question, but I am specifically asking about the visualizer's role.

R to clean data?

Is it okay to use R to clean up the data that we are going to use?

Submitting the project and git workflow

In the instruction, it says we should turn in the homework by doing a git push. However, which repository should we push to?

Also is there some suggested team collaboration pattern for this project? Should one of our members fork from stat157/questionnaire and let other people fork from this member so that this member behaves like a administrator of the group? This method seems to be pretty advanced and I don't think we can master it in a day. (http://stackoverflow.com/questions/7244321/how-to-update-github-forked-repository).

Otherwise, can we just fork one copy of stat157/questionnaire and let everyone work on it? I'm not even sure if we are familiar with git enough to deal with merging conflicts and committing/pushing the project properly. So the version control might take us more time than the project itself.

Learning these Tools

What is the best way to learn some of these tools outside of R? For instance D3, SQL, etc. online tutorials or guides similar to try.github.com would be good. Or classes (online) where we could learn.

How do you git push on the virtual machine?

I am trying to push my new IPython notebook file onto my github repo.

What should analyzers do?

I'm not really sure what the analyzers need to do. It seems like everything is up to the visualizer because the project is about visualizing them.

Groups Clarification

Are we working in the designated 'vertical groups' (from Kristina's email) or the original groups we formed during class a few weeks ago?

Ommiting Responses & Quantitative vs. Qualitative Data

As we looked in class, the visualization represented quantitative data. For our project we have qualitative data within the "What is your learning style?" portion of our questionnaire.

How should we go about insufficient responses and the lack of quanitative data within the response?

Will there be a standardization between groups or will this be left up to the digression of each individually?

Suggested Workflow: Shared Repo?

Goal: Efficient Workflow

I'm looking for suggestions from classmates/instructors on how our group should collaborate and share code. We will of course be pushing our code to GitHub, but what are the merits of contributing to a shared repo vs submitting pull requests to one member?

Group Control vs Administrator

Currently, one of our group members created a repo and is the sole contributor (the only one with write access). We fork, clone, modify, push, and submit a pull request. It's a lot of work, and it might make merging somewhat more difficult, but it's nice to have someone who can review your code before approving the pull request.

I'm thinking that a shared repo might be better because we can push to it in real-time without waiting for the approval of a pull request; that is, we all have write access to the repo, we each edit code on branches, then merge those branches to master once we've completed a task. The control is distributed, though, so it seems like it might be easier for one group member to make mistakes.

Thoughts?

2 columns

When visualizing data from 2 columns, do these need to be related and put into 1 visualization? Or can we make 2 separate analysis and 2 different visualizations?

Saving the edit from vi

I understand how to get to vi ~/stat157.cfg, but I can't figure out how to save and get back. I've looked online and I have read multiple times that the command :q to save and exit. However, whenever I type that, :q is just added into the text. What is the correct command to use?

Presentation format

This is a question (I guess of lesser importance) on presentation and the project.
The assignment specified to use IPython Notebook or an HTML5-based presentation.
Does that mean we can not use another type of program to clean our data?
Also, if we are not familiar with ipython notebook/python, could you recommend us a place to look for help?
This is a Toronto University online tutorial on python: https://class.coursera.org/programming1-002/lecture/index

Data Analysis

How detailed should our data analysis be? Do we need to conduct statistical tests such as ANOVA or T-tests on our data or will explanations of what we find using histograms or other graphs suffice?

Don't know how to create slideshow with IPython?

I think this blog post can help. Let me know if this helps:

http://www.damian.oquanta.info/posts/make-your-slides-with-ipython.html

How important is it for us to adhere to our role?

I am the curator of our team. I understand the role that I need to take. However, I was wondering if it was possible for us to interchange a little bit between our roles. I know I can consult other curators for help, but can I help my team member (who is an analyst)?

What I'm trying to ask is the following: are we allowed to step out of our roles in contribution? If so, to what degree?

stat157 / questionnaire Goto Github PK

questionnaire's People

Contributors

Stargazers

Watchers

Forkers

questionnaire's Issues

gc = gspread.login('[email protected]','*****')

Goal: Efficient Workflow

Group Control vs Administrator

Recommend Projects

Recommend Topics

Recommend Org