stat157 / questionnaire Goto Github PK
View Code? Open in Web Editor NEWStat 157 Questionnaire Data Wrangling
Stat 157 Questionnaire Data Wrangling
I tried this 'cp example.cfg' command but it returns 'no such file or directory'.
So I guess I need to save this example.cfg locally first, and I saved it to my Project1 folder. I then try that command again locally, without running virtual machine, and it works! But, it does not follow when I run the machine. It still returns 'no such file or directory'. How should I fix this?
And also, for the bConnected Key, do we make up some numbers there? What is the point of doing this? (downloading to own computer? and edit it through virtual machine? but putting fake password there?)
In the workflow specified in class, the analyzer does her work, and then the visualizer. However, once the data curator has finished her task, shouldn't the analyzer and visualizer be able to start their work at the same time? We should know the format of the visualizations before starting, just not the specific numbers in the visualization. If anything, isn't it easier to do analysis after the visualizer has finished her work?
Hi,
I have no problem with gspread.login but client.open_by_key complains about SpreadsheetNotFound. I also tried to find it on bConnect drive but cannot find any spreadsheet shared with me there. I'm not sure if I'm allowed to access it. Could you please take a look?
Thanks
Qi
MS powerpoint? Prezi?... HTML5?
is there any chosen format that a presenter is required to use?
Mac and Linux Users:
http://www.scipy.org/install.html
Our group would like to schedule a time with you Saturday night but noticed that we can not make edits to the calendar that you shared with us.
Help!
What would you suggest as the best way for us to set up a time with you?
Our team chose the column regarding previous courses taken. Although we could do some analysis on the courses taken, we haven't thought of too many more ways to look at the data.
Our team's question was: is it okay to just include simple analyses (e.g., just look at totals, means or medians) in our assignment?
Thanks.
The assignment states that we need to analyze two columns of the data: the learning style plus one additional column. Do we need to clean up the entire data or just the columns we decide to use?
I have an issue regarding the editing of a cfg file
I have cleaned up the learning styles column so that I have what each student's visual, aural, read/write, and kinesthetic scores. However, out of the 48 students, there are 17 that did not provide their scores on the questionnaire. Is it ok to omit those students' data?
So I typed in "ipython notebook --no-browser --ip=0.0.0.0"
But it lead me to a screen that isn't particularly responsive to any commands anymore.
The format is:
"
Iphjyhton Notebook
Notebook
Clusters
To import a notbeook, drag a file onto the listing below or click here
"
I'm really new to Python and this screen is not responsive to normal commands. What's the significance of this screen and how do I go about loading example.ipynb?
I had saved a IPython file inside my repository before but now my IPython has no files in it. Do I need to type in a set of commands to get my IPython notebook into a certain repository?
Preliminary Setup Steps:
"sudo pip install gspread"
When I try to type this step, it gives me the error
sudo: pip: command not found
What am I doing wrong...?
NOTE: Do NOT simply copy & paste or export a CSV file from the document. You must use the provided example IPython Notebook to start with to access the data via the Google API.
Since our group is using R to do our data cleaning and analysis, we used read.csv to access the data. Do we need to find an alternative way of importing the data even though we are using R? Since the clean up is still using R code.
When analyzing the data, can we use languages besides python that we might have more experience in, like R, before giving our analysis to the visualizers and presenters?
If I am a presenter, how can I help my group before receiving their share of the work? Should I leverage my knowledge of R to plan an overseer role in addition to being the presenter?
As stated in the description of this project, the objectives are to visualize data from our questionnaires, to better understand us(Cal Fall 2013 Stat 157 students) and to make our project process reproducible by others. However, I am not sure I understand the meaning of reproducibility in this case. If we are using the same method/code to examine the same sample group again(the same data), our results should be expected to be the same right? But if we are talking about reproducibility of the project process in a different sample group, because we start the project by looking through all our samples, and because we determine our project process/methods/codes based on our samples, our results should be different if we use a different sample group, right? I am just not sure the meaning/purpose when we are talking about reproducibility in this case.
I'm a bit unsure about how to gspread within the iPython notebook. Is there any helpful guide or video demonstration that can be used?
I did independent research on this and found the command gspread.login and tried implementing it as such:
NameError Traceback (most recent call last)
in ()
----> 1 gc = gspread.login('[email protected]','*****')
NameError: name 'gspread' is not defined
I imported the gspread package so idk what I'm doing wrong. I'm also confused on the application of the ~/stat157.cfg file.
Since a portion of our grade is based on reproducibility of our process, does that mean every member of our group (every role) must submit the source code for their portion? And include step-by-step descriptions for tasks done manually? Is there a particular format for this?
For Visualizers, are we allowed to use tools like R to display the information? Or are we expected to use specific tools.
Hi the data set size small enough that just using excel to clean the data is faster than figuring out how to clean it with say regular expressions or whatever other more programmatic method. Is it ok if we just manually clean the data in excel?
I'm wondering whether or not the Presenter has to know inside and out all the code and such that our partners generate.
Part of my group is unable to work in python and must work in R, is there any way to merge within python??
Regarding the note in the Data category in the assignment page:
NOTE: Do NOT simply copy & paste or export a CSV file from the document. You must use the provided example IPython Notebook to start with to access the data via the Google API.
To be more particular, is that referring to the Google spreadsheet API? I looked up at the documentation page and it is available in Java or .NET. I'm not sure if there will be a python version example. So is it possible to do it in Java or export the CSV file so that it can be processed in R. Which method would you suggest? Thanks!
the original example code in ipython isn't working for me anymore (it was working earlier this week). 4th code box and all subsequential code boxes has a star marked on it , like In[ * ]
I had not touched any of the code in the box, so I'm confused as to why this is happening.
To try to solve this, my teammate and I tried a number of things, including restarting the machine, and trying to reclone the repository (which works for my other teammates). I'm getting this error:
I am using the wifi I have in my apartment, and it was working fine before.
My role is the visualizer and i have succeeded in visualizing one graph, but I am unable to finish my second graph because of this problem. If I am unable to fulfill my duty as visualizer by tomorrow night because of this, I was wondering if we could just present what we have, and discuss some of the problems we have in trying to get the second graph to work?
Thanks,
Christina
Is anyone familiar with how to use Git on the command line to collaborate on files and send requests? Anyone using git effectively to collaborate with their teams?
When I clicked on the link for the spread sheet, it says that the file is not shared with me. My email is [email protected] . And I logged in when I clicked on the link.
-He Ma
I met with Chris earlier today because I failed to run some basic codes on the example ipython notebook (e.g., print "cat"). Later he discovered that nbformat needs to be 2 instead of 3 to work with the apt-get version. So in order to do that, you can be in your questionnaire directory and type "vi example.ipynb" and then change the nbformat from 3 to 2 (By pressing the 'i' key, you can switch to insert mode. Press 'Esc' to return to command mode, and :x to save and exit)
So after that, when I reroad the example python notebook, I am able to see many codes and markdowns. However, I am not able to read the docid of the Google Spreadsheet from the config file. The Authentication Error is "Incorrect username or password". However, these are my correct username and password. I realized there is a space after username: (e.g., username: [email protected]). I tried deleted them, but still, there is an Authentication Error.
Please give me some insights how to fix the problem or fix the problem. Thanks.
My group have pushed everything to our group repository. Should we send a pull request to stat157 to submit the project? Thanks.
Are there are useful examples of using iPython as a presentation tool on the web ? I think it would really help some of us understand what our end product should look like and give us an idea of the level of detail required for the presentation.
We discussed in class that the visualizer's role is to digest the cleaned data from the other two roles and make it accessible for the presenter and the audience to easily understand. Does that role simply amount to making graphs, charts, etc (in R, or other programs), or would the visualizers be heavily involved in creating the actual presentation itself (for the presenter?) Or is this something flexible that the vertical group members can decide amongst themselves what the visualizer should do?
I'm aware that tristantao has asked a related question, but I am specifically asking about the visualizer's role.
Is it okay to use R to clean up the data that we are going to use?
In the instruction, it says we should turn in the homework by doing a git push. However, which repository should we push to?
Also is there some suggested team collaboration pattern for this project? Should one of our members fork from stat157/questionnaire and let other people fork from this member so that this member behaves like a administrator of the group? This method seems to be pretty advanced and I don't think we can master it in a day. (http://stackoverflow.com/questions/7244321/how-to-update-github-forked-repository).
Otherwise, can we just fork one copy of stat157/questionnaire and let everyone work on it? I'm not even sure if we are familiar with git enough to deal with merging conflicts and committing/pushing the project properly. So the version control might take us more time than the project itself.
What is the best way to learn some of these tools outside of R? For instance D3, SQL, etc. online tutorials or guides similar to try.github.com would be good. Or classes (online) where we could learn.
I am trying to push my new IPython notebook file onto my github repo.
I'm not really sure what the analyzers need to do. It seems like everything is up to the visualizer because the project is about visualizing them.
Are we working in the designated 'vertical groups' (from Kristina's email) or the original groups we formed during class a few weeks ago?
As we looked in class, the visualization represented quantitative data. For our project we have qualitative data within the "What is your learning style?" portion of our questionnaire.
How should we go about insufficient responses and the lack of quanitative data within the response?
Will there be a standardization between groups or will this be left up to the digression of each individually?
I'm looking for suggestions from classmates/instructors on how our group should collaborate and share code. We will of course be pushing our code to GitHub, but what are the merits of contributing to a shared repo vs submitting pull requests to one member?
Currently, one of our group members created a repo and is the sole contributor (the only one with write access). We fork, clone, modify, push, and submit a pull request. It's a lot of work, and it might make merging somewhat more difficult, but it's nice to have someone who can review your code before approving the pull request.
I'm thinking that a shared repo might be better because we can push to it in real-time without waiting for the approval of a pull request; that is, we all have write access to the repo, we each edit code on branches, then merge those branches to master once we've completed a task. The control is distributed, though, so it seems like it might be easier for one group member to make mistakes.
Thoughts?
When visualizing data from 2 columns, do these need to be related and put into 1 visualization? Or can we make 2 separate analysis and 2 different visualizations?
I understand how to get to vi ~/stat157.cfg, but I can't figure out how to save and get back. I've looked online and I have read multiple times that the command :q to save and exit. However, whenever I type that, :q is just added into the text. What is the correct command to use?
This is a question (I guess of lesser importance) on presentation and the project.
The assignment specified to use IPython Notebook or an HTML5-based presentation.
Does that mean we can not use another type of program to clean our data?
Also, if we are not familiar with ipython notebook/python, could you recommend us a place to look for help?
This is a Toronto University online tutorial on python: https://class.coursera.org/programming1-002/lecture/index
How detailed should our data analysis be? Do we need to conduct statistical tests such as ANOVA or T-tests on our data or will explanations of what we find using histograms or other graphs suffice?
I think this blog post can help. Let me know if this helps:
http://www.damian.oquanta.info/posts/make-your-slides-with-ipython.html
I am the curator of our team. I understand the role that I need to take. However, I was wondering if it was possible for us to interchange a little bit between our roles. I know I can consult other curators for help, but can I help my team member (who is an analyst)?
What I'm trying to ask is the following: are we allowed to step out of our roles in contribution? If so, to what degree?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.