pwp-capstones's People
pwp-capstones's Issues
Don't forget to add the value to appearances if key is not in table 2
def frequency_comparison(table1, table2):
appearances = 0
mutual_appearances = 0
for i in table1:
if i in table2:
if table1[i] > table2[i]:
appearances += table1[i]
mutual_appearances += table2[i]
else:
appearances += table2[i]
mutual_appearances += table1[i]
for i in table2:
if i not in table1:
appearances += table2[i]
comparison = mutual_appearances / appearances
return comparison
Really nice job with this function! We're just missing one small thing: currently, if a value in table1
does not exist in table2
, the function will not capture that value. So, to correct for this, we have to add an else
statement that will add the table1
value to appearances
. Simply put, we just have to add an else
that lines up with this if
statement:
if i in table2:
Here is an example:
def frequency_comparison(table1, table2):
appearances = 0
mutual_appearances = 0
for i in table1:
if i in table2:
if table1[i] > table2[i]:
appearances += table1[i]
mutual_appearances += table2[i]
else:
appearances += table2[i]
mutual_appearances += table1[i]
else: # Add key to appearances if it doesn't show up in table2
appearances += table1[i]
for i in table2:
if i not in table1:
appearances += table2[i]
comparison = mutual_appearances / appearances
return comparison
This is a very small change, but it can make a big difference, especially if table1
has many keys that do not show up in table2
. Nevertheless, very good job with this function!
P.S. One other small optimization we could have made to this function is saving the values of both tables to a variable so that we could use them throughout the function. Here is an example:
def frequency_comparison(table1, table2):
appearances = 0
mutual_appearances = 0
for i in table1:
if i in table2:
value1 = table1[i]
value2 = table2[i]
if value1 > value2:
appearances += value1
mutual_appearances += value2
else:
appearances += value2
mutual_appearances += value1
else: # Add key to appearances if it doesn't show up in table2
appearances += value1
for i in table2:
if i not in table1:
appearances += table2[i]
comparison = mutual_appearances / appearances
return comparison
All this does is allow us to reuse a variable instead of constantly making a call to our dictionaries. It also provides a useful way to read this function a bit more easily -- simply, it makes it easier to distinguish which value we are adding and under what conditions. Of course, this makes no difference to functionality, so it's really a matter of preference. Just wanted to point it out!
Bit of a different way to filter out non-sentences from get_average_sentence_length
def get_average_sentence_length(text):
#standardize the sentence endings, to facilitate turning them into strings
remove_exclamations = text.replace("!", ".")
remove_questions = remove_exclamations.replace("?", ".")
#splits each sentence into a string
text_in_strings = remove_questions.split(".")
#count the number of sentences in each statement, subtracting one as the final period counts as an extra sentence
num_sentences = len(text_in_strings)
num_sentences = (num_sentences-1)
#split the sentences into words
words_in_strings = text.split(" ")
num_words = len(words_in_strings)
#find the average number of words per sentence
average_words_per_sentence = num_words / num_sentences
print(average_words_per_sentence)
return average_words_per_sentence
Great job with this function! It returns the exact output we want and it corrects for the bug in the code where there is a trailing space at the end of the sentence. However, the correction we have (always subtracting one from our sentence length) does not handle two cases: 1) if there are multiple sentences that have a trailing space at the end, and 2) if there is a sentence that does not have a trailing space. In both of those cases, our function will no longer calculate the correct length -- it will either overcompensate or under calculate.
So, I just wanted to show a slightly different way to achieve the same goal:
stripped = [sentence for sentence in text_in_strings if sentence.strip()]
num_sentences = len(stripped)
Okay, let's break this down. So, the line [sentence for sentence in text_in_strings if sentence.strip()]
can be rewritten like so:
stripped = list()
for sentence in text_in_string:
if sentence.strip():
stripped.append(sentence)
The way I have written it above is called list comprehension -- it is simply a way to write a for
loop in a single line to generate a list (or, really, any iterable). There are a few benefits to using list comprehension; but, for all intents and purposes, it is simply a single-line for
loop.
Basically, all the list comprehension is doing is getting rid of any string that does not contain characters. It does this by combining an if
statement with strip()
. Note that strip()
will remove any extra white space at the beginning and end of any string; thus, if the string only has white space, then strip()
will return None
. As such, we can use an if
statement to test if sentence
really has characters.
An advantage of implementing it this way is because it will catch any sentence that has a trailing white space. Also, this method allows us to further improve our implemention by combining the len
function with the list comprehension:
num_sentences = len([sentence for sentence in text_in_strings if sentence.strip()])
In any case, for the purposes of this project, the original implementation is perfectly fine! In fact, it is absolutely perfect for our use case. This is merely a suggestion and an introduction to list comprehension.
P.S. If you want to learn more about list comprehension, I recommend reading this article.
Summary
Rubric Score
Criteria 1: Valid Python Code
- Score Level: 4 (Exceeds Expectations)
- Comment(s): The code in the Jupyter notebook runs without any errors.
Criteria 2: Implementation of Project Requirements
- Score Level: 4 (Exceeds Expectations)
- Comment(s): The code produces the suite of functions and classes required of it and calls them in an appropriate order. Some of my favorite implementations are the
build_frequency_table
function (great use of thecount
function!), thefind_text_similarity
function (great use of theTextSample
object!), and thepercent_difference
function (really nice formatting!). The only thing that was missing was anelse
statement in thefrequency_comparison
function (covered here); otherwise, everything was perfect. Nice job!
Criteria 3: Software Architecture
- Score Level: 4 (Exceeds Expectations)
- Comment(s): The code is separated into distinct classes and functions, each of which are invoked for their own purposes. Again, great use of the
TextSample
object to calculate the text similarity between samples! Also, nice job making sure each function is accomplishing its own distinct goal with pristine accuracy!
Criteria 4: Uses Python Language Features
- Score Level: 4 (Exceeds Expectations)
- Comment(s): The code uses language features appropriately. If a task can be solved with a Python language feature, it is. This is especially apparent in the
build_frequency_table
function and thengram_creator
functions. Both made excellent use of Python native functions!
Criteria 5: Produces Accurate Output
- Score Level: 4 (Exceeds Expectations)
- Comment(s): The code is output properly and it is accurate.
Overall Score: 20/20
Really nice job with this project! The only thing I might focus on in the future is capturing all possible cases in our calculations. For example, making sure to capture all table1
values in the frequency_comparison
function (covered here), and/or considering if a sentence is input without a trailing space or with multiple trailing spaces in the get_average_sentence_length
function (covered here).
Other than that, this project was very well done. Excellent job!
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.