choen51 / istm-6212 Goto Github PK

View Code? Open in Web Editor NEW

0.0 1.0 0.0 775 KB

GWU MSBA Data Management

Jupyter Notebook 99.63% Python 0.37%

istm-6212's Introduction

ISTM-6212

istm-6212's People

Contributors

Watchers

istm-6212's Issues

Project2 Review

Hey Daniel,
Great job on project2! This is my review for your work. I can run your file successfully in my computer. The list below is what I think you did very well.

I liked that you show the difficulties you encountered at first and told us about how you solved the them. The way treated them taught me a lot and showed that you had tried your best to finish the project.
It’s great that you use very specific markdown to show the steps, which help me easily know how you think about this problem. Also, it’s easy for me to understand your tables by seeing the Quick validation of each table. You finished the project by telling a story and showing matpotlib plots, which is interesting and attractive.
For the coding part, I learned the type “REAL” and “SET” from your work, which I can use in the future. For the bonus part, I learned that we can reduce data entry errors by placing limits on valid values for all new data. I didn’t thought that before.

Overall, you did a very nice work!

Thank you!
Zezhi

Review Final Project (Project #3)

Project 3 Review

Hey Dan!

First, very cool and very relevant project. I enjoyed reading through the analysis. Listed below are a few points in particular that I thought you did a great job on:

The context that is provided through each step of the process is great. While reading through the project, I never felt that I did not understand why you were taking a particular step.
The introduction was great. I liked that you guys were able to introduce a very broad macro level topic (government spending), yet then focus the conversation on a particular aspect that your analysis addressed.
Very clean implementation of start schema; nice work here.
Interesting questions for analysis! I especially appreciate the level grain that you go into. For example, not just looking at which departments are issuing contract, but who are the companies that are receiving the contracts and specifically what are the contracts for.

My only suggestions would be as follows:

Perhaps articulate a few research question upfront, rather than just casually explore the data set.
Perhaps use an alternative visualization plot rather than pie charts. Listed below is a fun link that may help change your feelings about pie charts!

http://www.datasciencecentral.com/profiles/blogs/10-resources-to-help-you-stop-doing-pie-charts

Overall, very nice work on project 3!

You did a superb job on Project 1: You were even able to answer the bonus question. The introductory paragraph at the the beginning of each section is very helpful: It gives other programmers a clear overview of your approach to solutions. For the Python filter, the combined use of "!chmod +x _______.py" in one line and "| ./ _______.py" in the piped line shortens the piped line and makes it easier to follow. I definitely picked up a few coding tricks from reviewing your programs, which are, again, well documented and written in a clean and concise manner. -Dan

Review of Project 1

Hi Daniel
The notebook can run smoothly and neat.

Problem1 Part A

First of all, we don’t have the same results. The code I used for this problem is "!grep -oE '\w{{2,}}' women.txt | grep -wc "Jo” women.txt”. Using this code, I firstly separate each word into a line, then I count the word Jo. Using this code, I got the answer 1355 for the name “Jo”.
I guess the difference between our answer is that !grep -w only shows the count of the line that “Jo” appears.
But the code is really neat. With the explanations you write, it is easy to understand the code.

Problem1 Part B

The code runs smoothly, and which is most important, we share the same results!
Problem2
For the problem2, we basically have same idea to solve the problem, so we share the same answer.

Problem3

In the Split Filter, we apply different code. The code I use is "compile('\w+’)” to find the word match the regular expression. But the code you use has same result with mine. I learned from it.
I was really impressed by the StopWords Filter you apply. Since the function process has two input values, people can change the stop words list they want to give, this is so nice!

Finally I have a suggestion. Since you didn’t give the women.txt and some other file in the package, it is inconvenient for us to run your code without !wget. So, I think it is a better way if you can use !wget in the code next time.

Overall you did a great job!
Good Luck,
Qinhui

review for project 1

Hi Daniel,

You did a great job in project 1! Your code is efficiency! I really should learn it from you to make my code less tedious.

In Prob 1, you used "^ Jul." to filter out speaking lines in part B. It's brilliant and let me learned a new way to count speaking lines. Mine is "Jul. ". Other of your outputs are almost the same with mine. In Part A, you got the right answer. But your code here actually is to count number of lines containing the name. If there are lines where the name occurs more than once, your code would still take it as one match.

Also I would like to bring up some suggestions. I had a little difficulty in reproducing your work. The txt and csv files are not in my folder/computer so I had to download them before ran your code. And your file path in assert function is the specific path in your computer, which would cause error "no such file or directory" in mine. So I suggest that you could use "!wget" to download required files first to realize reproduction. Another thing is the split python filter in Prob 3 Part 4. Your filter retains single letter line while "\w{{2,}}" drop them so that the two outputs are different. But I don't think it's a big deal.

Thanks,
Qinya

Review for JingningLi

Hi Jingning,

It seems that you have the all pieces but were unable to put them together in time. It is unfortunate. You completed Problem 1 well! For the Python split filter, the piped commands ought to be in Jupyter. The Python program performs only the task of re-formatting lines into one word per line. For the other 2 Python filter programs, you have the correct logic and just need to put it into proper syntax. If you need any help going forward, feel free to reach out to your classmates or the professor.

Project2 Review

Hi Daniel,

Great job on completing project 2, I was able to execute all your code on datanotebook.org and the outcome looks good.

For Part 1, I found it really interesting you guys showed some error of the raw data on purpose and then you explained and specified how you come up adding "latin" to remove the error message. I can see there is a great amount of variables and you guys managed to explained in a short but effective way.

For Part 2, the whole process is well structured and I can follow the flow very easily. Although you only had two queries for this part, I like how you analyze your data and your thoughts on the result. I don't know whether it will be helpful to implement some graphs such as histogram or piechart, to make the results more visually appealing, which is just my personal preference.

For Part 3, I believe there is a way to incorporte your star schema into your notebook using "PATH" other than just put it in a separate file. Your structure of the star schema, dimension tables and fact table look good. I think you can limit 20 or limit 15 when you test run your table.
.
For Part 4, I like all your queries and the way you analyze those results. I also like about you use
bullent points to state your findings, which is really clear.

Good answer on bonus.

Overall, Great job.

choen51 / istm-6212 Goto Github PK

istm-6212's Introduction

ISTM-6212

istm-6212's People

Contributors

Watchers

istm-6212's Issues

Project2 Review

Review Final Project (Project #3)

Project 3 Review

Review for leeeyler

Review of Project 1

Problem1 Part A

Problem1 Part B

Problem3

review for project 1

Review for JingningLi

Project2 Review

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent