cgseitz / pwp-capstones Goto Github PK

0.0 0.0 0.0 11 KB

Jupyter Notebook 100.00%

pwp-capstones's People

Contributors

pwp-capstones's Issues

Don't forget to add the value to appearances if key is not in table 2

def frequency_comparison(table1, table2):
    appearances = 0
    mutual_appearances = 0
    for i in table1:
        if i in table2:
            if table1[i] > table2[i]:
                appearances += table1[i]
                mutual_appearances += table2[i]
            else:
                appearances += table2[i]
                mutual_appearances += table1[i]    
    for i in table2:
        if i not in table1:
            appearances += table2[i]  
    comparison = mutual_appearances / appearances
    return comparison

Really nice job with this function! We're just missing one small thing: currently, if a value in table1 does not exist in table2, the function will not capture that value. So, to correct for this, we have to add an else statement that will add the table1 value to appearances. Simply put, we just have to add an else that lines up with this if statement:

if i in table2:

Here is an example:

def frequency_comparison(table1, table2):
    appearances = 0
    mutual_appearances = 0
    for i in table1:
        if i in table2:
            if table1[i] > table2[i]:
                appearances += table1[i]
                mutual_appearances += table2[i]
            else:
                appearances += table2[i]
                mutual_appearances += table1[i]    
        else: # Add key to appearances if it doesn't show up in table2
             appearances += table1[i]
    for i in table2:
        if i not in table1:
            appearances += table2[i]  
    comparison = mutual_appearances / appearances
    return comparison

This is a very small change, but it can make a big difference, especially if table1 has many keys that do not show up in table2. Nevertheless, very good job with this function!

P.S. One other small optimization we could have made to this function is saving the values of both tables to a variable so that we could use them throughout the function. Here is an example:

def frequency_comparison(table1, table2):
    appearances = 0
    mutual_appearances = 0
    for i in table1:
        if i in table2:
            value1 = table1[i]
            value2 = table2[i]
            if value1 > value2:
                appearances += value1
                mutual_appearances += value2
            else:
                appearances += value2
                mutual_appearances += value1
        else: # Add key to appearances if it doesn't show up in table2
             appearances += value1
    for i in table2:
        if i not in table1:
            appearances += table2[i]  
    comparison = mutual_appearances / appearances
    return comparison

All this does is allow us to reuse a variable instead of constantly making a call to our dictionaries. It also provides a useful way to read this function a bit more easily -- simply, it makes it easier to distinguish which value we are adding and under what conditions. Of course, this makes no difference to functionality, so it's really a matter of preference. Just wanted to point it out!

Bit of a different way to filter out non-sentences from get_average_sentence_length

def get_average_sentence_length(text):
    #standardize the sentence endings, to facilitate turning them into strings
    remove_exclamations = text.replace("!", ".")
    remove_questions = remove_exclamations.replace("?", ".")
    
    #splits each sentence into a string
    text_in_strings = remove_questions.split(".")
    
    #count the number of sentences in each statement, subtracting one as the final period counts as an extra sentence
    num_sentences = len(text_in_strings)
    num_sentences = (num_sentences-1)
    
    #split the sentences into words
    words_in_strings = text.split(" ")
    num_words = len(words_in_strings)
    
    #find the average number of words per sentence
    average_words_per_sentence = num_words / num_sentences
    print(average_words_per_sentence)
    return average_words_per_sentence

Great job with this function! It returns the exact output we want and it corrects for the bug in the code where there is a trailing space at the end of the sentence. However, the correction we have (always subtracting one from our sentence length) does not handle two cases: 1) if there are multiple sentences that have a trailing space at the end, and 2) if there is a sentence that does not have a trailing space. In both of those cases, our function will no longer calculate the correct length -- it will either overcompensate or under calculate.

So, I just wanted to show a slightly different way to achieve the same goal:

stripped = [sentence for sentence in text_in_strings if sentence.strip()]
num_sentences = len(stripped)

Okay, let's break this down. So, the line [sentence for sentence in text_in_strings if sentence.strip()] can be rewritten like so:

stripped = list()
for sentence in text_in_string:
    if sentence.strip():
        stripped.append(sentence)

The way I have written it above is called list comprehension -- it is simply a way to write a for loop in a single line to generate a list (or, really, any iterable). There are a few benefits to using list comprehension; but, for all intents and purposes, it is simply a single-line for loop.

Basically, all the list comprehension is doing is getting rid of any string that does not contain characters. It does this by combining an if statement with strip(). Note that strip() will remove any extra white space at the beginning and end of any string; thus, if the string only has white space, then strip() will return None. As such, we can use an if statement to test if sentence really has characters.

An advantage of implementing it this way is because it will catch any sentence that has a trailing white space. Also, this method allows us to further improve our implemention by combining the len function with the list comprehension:

num_sentences = len([sentence for sentence in text_in_strings if sentence.strip()])

In any case, for the purposes of this project, the original implementation is perfectly fine! In fact, it is absolutely perfect for our use case. This is merely a suggestion and an introduction to list comprehension.

P.S. If you want to learn more about list comprehension, I recommend reading this article.

Summary

Rubric Score

Criteria 1: Valid Python Code

Score Level: 4 (Exceeds Expectations)
Comment(s): The code in the Jupyter notebook runs without any errors.

Criteria 2: Implementation of Project Requirements

Score Level: 4 (Exceeds Expectations)
Comment(s): The code produces the suite of functions and classes required of it and calls them in an appropriate order. Some of my favorite implementations are the build_frequency_table function (great use of the count function!), the find_text_similarity function (great use of the TextSample object!), and the percent_difference function (really nice formatting!). The only thing that was missing was an else statement in the frequency_comparison function (covered here); otherwise, everything was perfect. Nice job!

Criteria 3: Software Architecture

Score Level: 4 (Exceeds Expectations)
Comment(s): The code is separated into distinct classes and functions, each of which are invoked for their own purposes. Again, great use of the TextSample object to calculate the text similarity between samples! Also, nice job making sure each function is accomplishing its own distinct goal with pristine accuracy!

Criteria 4: Uses Python Language Features

Score Level: 4 (Exceeds Expectations)
Comment(s): The code uses language features appropriately. If a task can be solved with a Python language feature, it is. This is especially apparent in the build_frequency_table function and the ngram_creator functions. Both made excellent use of Python native functions!

Criteria 5: Produces Accurate Output

Score Level: 4 (Exceeds Expectations)
Comment(s): The code is output properly and it is accurate.

Overall Score: 20/20

Really nice job with this project! The only thing I might focus on in the future is capturing all possible cases in our calculations. For example, making sure to capture all table1 values in the frequency_comparison function (covered here), and/or considering if a sentence is input without a trailing space or with multiple trailing spaces in the get_average_sentence_length function (covered here).

Other than that, this project was very well done. Excellent job!

cgseitz / pwp-capstones Goto Github PK

pwp-capstones's People

Contributors

pwp-capstones's Issues

Don't forget to add the value to appearances if key is not in table 2

Bit of a different way to filter out non-sentences from get_average_sentence_length

Summary

Rubric Score

Criteria 1: Valid Python Code

Criteria 2: Implementation of Project Requirements

Criteria 3: Software Architecture

Criteria 4: Uses Python Language Features

Criteria 5: Produces Accurate Output

Overall Score: 20/20

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent