choen51 / istm-6212 Goto Github PK
View Code? Open in Web Editor NEWGWU MSBA Data Management
GWU MSBA Data Management
Hey Daniel,
Great job on project2! This is my review for your work. I can run your file successfully in my computer. The list below is what I think you did very well.
Overall, you did a very nice work!
Thank you!
Zezhi
Hey Dan!
First, very cool and very relevant project. I enjoyed reading through the analysis. Listed below are a few points in particular that I thought you did a great job on:
My only suggestions would be as follows:
http://www.datasciencecentral.com/profiles/blogs/10-resources-to-help-you-stop-doing-pie-charts
Overall, very nice work on project 3!
Hi Lee,
You did a superb job on Project 1: You were even able to answer the bonus question. The introductory paragraph at the the beginning of each section is very helpful: It gives other programmers a clear overview of your approach to solutions. For the Python filter, the combined use of "!chmod +x _______.py" in one line and "| ./ _______.py" in the piped line shortens the piped line and makes it easier to follow. I definitely picked up a few coding tricks from reviewing your programs, which are, again, well documented and written in a clean and concise manner. -Dan
Hi Daniel
The notebook can run smoothly and neat.
First of all, we don’t have the same results. The code I used for this problem is "!grep -oE '\w{{2,}}' women.txt | grep -wc "Jo” women.txt”. Using this code, I firstly separate each word into a line, then I count the word Jo. Using this code, I got the answer 1355 for the name “Jo”.
I guess the difference between our answer is that !grep -w only shows the count of the line that “Jo” appears.
But the code is really neat. With the explanations you write, it is easy to understand the code.
The code runs smoothly, and which is most important, we share the same results!
Problem2
For the problem2, we basically have same idea to solve the problem, so we share the same answer.
In the Split Filter, we apply different code. The code I use is "compile('\w+’)” to find the word match the regular expression. But the code you use has same result with mine. I learned from it.
I was really impressed by the StopWords Filter you apply. Since the function process has two input values, people can change the stop words list they want to give, this is so nice!
Finally I have a suggestion. Since you didn’t give the women.txt and some other file in the package, it is inconvenient for us to run your code without !wget. So, I think it is a better way if you can use !wget in the code next time.
Overall you did a great job!
Good Luck,
Qinhui
Hi Daniel,
You did a great job in project 1! Your code is efficiency! I really should learn it from you to make my code less tedious.
In Prob 1, you used "^ Jul." to filter out speaking lines in part B. It's brilliant and let me learned a new way to count speaking lines. Mine is "Jul. ". Other of your outputs are almost the same with mine. In Part A, you got the right answer. But your code here actually is to count number of lines containing the name. If there are lines where the name occurs more than once, your code would still take it as one match.
Also I would like to bring up some suggestions. I had a little difficulty in reproducing your work. The txt and csv files are not in my folder/computer so I had to download them before ran your code. And your file path in assert function is the specific path in your computer, which would cause error "no such file or directory" in mine. So I suggest that you could use "!wget" to download required files first to realize reproduction. Another thing is the split python filter in Prob 3 Part 4. Your filter retains single letter line while "\w{{2,}}" drop them so that the two outputs are different. But I don't think it's a big deal.
Thanks,
Qinya
Hi Jingning,
It seems that you have the all pieces but were unable to put them together in time. It is unfortunate. You completed Problem 1 well! For the Python split filter, the piped commands ought to be in Jupyter. The Python program performs only the task of re-formatting lines into one word per line. For the other 2 Python filter programs, you have the correct logic and just need to put it into proper syntax. If you need any help going forward, feel free to reach out to your classmates or the professor.
Hi Daniel,
Great job on completing project 2, I was able to execute all your code on datanotebook.org and the outcome looks good.
For Part 1, I found it really interesting you guys showed some error of the raw data on purpose and then you explained and specified how you come up adding "latin" to remove the error message. I can see there is a great amount of variables and you guys managed to explained in a short but effective way.
For Part 2, the whole process is well structured and I can follow the flow very easily. Although you only had two queries for this part, I like how you analyze your data and your thoughts on the result. I don't know whether it will be helpful to implement some graphs such as histogram or piechart, to make the results more visually appealing, which is just my personal preference.
For Part 3, I believe there is a way to incorporte your star schema into your notebook using "PATH" other than just put it in a separate file. Your structure of the star schema, dimension tables and fact table look good. I think you can limit 20 or limit 15 when you test run your table.
.
For Part 4, I like all your queries and the way you analyze those results. I also like about you use
bullent points to state your findings, which is really clear.
Good answer on bonus.
Overall, Great job.
Yi
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.