udacity / machine-learning Goto Github PK

View Code? Open in Web Editor NEW

3.9K 3.9K 6.4K 35.49 MB

Content for Udacity's Machine Learning curriculum

Python 2.88% Jupyter Notebook 97.12%

machine-learning's Introduction

Deprecated Repository

This repository is deprecated. Currently enrolled learners, if any, can:

Utilize the https://knowledge.udacity.com/ forum to seek help on content-specific issues.
Submit a support ticket if (learners are) blocked due to other reasons.

machine-learning

Content for Udacity's Machine Learning curriculum, which includes projects and their descriptions.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. Please refer to Udacity Terms of Service for further information.

machine-learning's People

Stargazers

Watchers

Forkers

ibadsiddiqui longnd84 davidraleigh h55nick mflor35 xjdeng xiaohuidahuichen namtthai north-point zhj930924 adonese phddone curiousest philippvogler m4573r zsoo extantpedant jnaja92 mjsquadr hcxyeah alexdemarsh royhu91 haraldnordgren jeremykohn ctilli yiyingw obnubilate mleonardallen dhiana anguliachao deep1blue redanco dhamo1 sre42 parambharat barcahead n1ckelman levinj ahmedas91 sureshsagir quantlandi cfadnavis shellandbull bopenggit micheleorsi narisan25 gijung00 elbamos kimishpatel rogersmarin semenka vemukamesh patrickjmcgoldrick aalinazar cg94301 adarsh0806 sjbaek bdiesel esolty swathikirankumar ryosukehonda elbernante ianhuang1984 giladgressel lucasarb ksjpswaroop ahmedezzeldin93 bastianlb bladeralien abnerzheng xiaolongmeng rl3012 dafenng voltron995 vietml mazeneldesouky xiaowenc00 yichen123 kratos4ever archelogos josibake felipeq burnssa duyongwen adamw523 jared-weed lianyangfeng sonamkoul2 thoo kablex kevinpatricksmith roboshew kevin-mercurio sthalles hallelujah101 sureshkekan huylllooo jedisom brandon-o suedb

machine-learning's Issues

Various typos on Smartcab Project

File smartcab.ipynb

Question 3

Given that the agent is driving randomly, does the rate of reliabilty make sense?
Should be
Given that the agent is driving randomly, does the rate of reliability make sense?

Question 5

Given what you know about the evironment and how it is simulated,
Should be
Given what you know about the environment and how it is simulated,

Improve Q-Learning Driving Agent

(the default threshold is 0.01)
Should be
(the default threshold is 0.05) - as written in line 111 of smartcab/simulator.py

When improving on your Q-Learning implementation, consider the impliciations it creates
Should be
When improving on your Q-Learning implementation, consider the implications it creates

Optional: Future Rewards - Discount Factor gamma

Including future rewards in the algorithm is used to aid in propogating positive rewards
Should be
Including future rewards in the algorithm is used to aid in propagating positive rewards

File smartcab/agent.py

def learn()

line 112
receives an award
should be
receives a reward

Numpy and Pandas Tutorial: Some coding quiz submit answer result in import error

Submitting Answer results in import error, because it imports the wrong function.

For example, in Quiz 11, Average Bronze Medals:

Traceback (most recent call last):
  File "vm_main.py", line 33, in 
    import main
  File "/tmp/vmuser_bjdwrarirz/main.py", line 2, in 
    import aiMain
  File "/tmp/vmuser_bjdwrarirz/aiMain.py", line 2, in 
    from student import avg_medal_count as student_code
ImportError: cannot import name avg_medal_count

However, the function that is provided is named avg_bronze_medal_count

Similar problem in Quiz 14, Olympics Medal Points

Typo for Epsilon in Simulator

In the simulator line 298 reads " print "espilon = {:.4f}; alpha = {:.4f}".format(a.epsilon, a.alpha)"
but it should be:
print "epsilon = {:.4f}; alpha = {:.4f}".format(a.epsilon, a.alpha)"

Epsilon was misspelled.

Error in tester code for robot motion planning

I think there's an error in the tester for the provided capstone project (robot motion planning). Line 105 (checking if the robot has reached its goal) reads:

if robot_pos['location'][0] in goal_bounds and robot_pos['location'][1] in goal_bounds:

Which means position (6, 6) will evaluate to True even if the goal is (6, 7), because both elements in the location tuple are found in the goal_bounds tuple. I believe the line should be simply:

if robot_pos['location'] == goal_bounds:

Is it alright if I test this and make a pull request to fix the issue?

The metrics in the naive bayes is not being properly reported as the precision_score, recall_score and f1_score assumes [1,0] as positive and negative labels. The code should contain a parameter for positive_label which is pos_label="spam"

Boston Housing Error

Getting the following error in the complexity curves section when running the code.

/usr/local/lib/python2.7/site-packages/sklearn/utils/validation.py:395: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  DeprecationWarning)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-9-56bd0b56ae45> in <module>()
----> 1 vs.ModelComplexity(X_train, y_train)

/Users/emi862/Workspaces/Projects/Machine-Learning/udacity-machine-learning-projects/boston_housing/visuals.pyc in ModelComplexity(X, y)
     80     # Calculate the training and testing scores
     81     train_scores, test_scores = curves.validation_curve(DecisionTreeRegressor(), X, y, \
---> 82         param_name = "max_depth", param_range = max_depth, cv = cv, scoring = 'r2')
     83 
     84     # Find the mean and standard deviation for smoothing

/usr/local/lib/python2.7/site-packages/sklearn/learning_curve.pyc in validation_curve(estimator, X, y, param_name, param_range, cv, scoring, n_jobs, pre_dispatch, verbose)
    352         estimator, X, y, scorer, train, test, verbose,
    353         parameters={param_name: v}, fit_params=None, return_train_score=True)
--> 354         for train, test in cv for v in param_range)
    355 
    356     out = np.asarray(out)[:, :2]

/usr/local/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in __call__(self, iterable)
    756             # was dispatched. In particular this covers the edge
    757             # case of Parallel used with an exhausted iterator.
--> 758             while self.dispatch_one_batch(iterator):
    759                 self._iterating = True
    760             else:

/usr/local/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in dispatch_one_batch(self, iterator)
    606                 return False
    607             else:
--> 608                 self._dispatch(tasks)
    609                 return True
    610 

/usr/local/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in _dispatch(self, batch)
    569         dispatch_timestamp = time.time()
    570         cb = BatchCompletionCallBack(dispatch_timestamp, len(batch), self)
--> 571         job = self._backend.apply_async(batch, callback=cb)
    572         self._jobs.append(job)
    573 

/usr/local/lib/python2.7/site-packages/sklearn/externals/joblib/_parallel_backends.pyc in apply_async(self, func, callback)
    107     def apply_async(self, func, callback=None):
    108         """Schedule a func to be run"""
--> 109         result = ImmediateResult(func)
    110         if callback:
    111             callback(result)

/usr/local/lib/python2.7/site-packages/sklearn/externals/joblib/_parallel_backends.pyc in __init__(self, batch)
    324         # Don't delay the application, to avoid keeping the input
    325         # arguments in memory
--> 326         self.results = batch()
    327 
    328     def get(self):

/usr/local/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in __call__(self)
    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
    132 
    133     def __len__(self):

/usr/local/lib/python2.7/site-packages/sklearn/cross_validation.pyc in _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters, error_score)
   1663             estimator.fit(X_train, **fit_params)
   1664         else:
-> 1665             estimator.fit(X_train, y_train, **fit_params)
   1666 
   1667     except Exception as e:

/usr/local/lib/python2.7/site-packages/sklearn/tree/tree.pyc in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
   1027             sample_weight=sample_weight,
   1028             check_input=check_input,
-> 1029             X_idx_sorted=X_idx_sorted)
   1030         return self
   1031 

/usr/local/lib/python2.7/site-packages/sklearn/tree/tree.pyc in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
    238         if len(y) != n_samples:
    239             raise ValueError("Number of labels=%d does not match "
--> 240                              "number of samples=%d" % (len(y), n_samples))
    241         if not 0 <= self.min_weight_fraction_leaf <= 0.5:
    242             raise ValueError("min_weight_fraction_leaf must in [0, 0.5]")

ValueError: Number of labels=312 does not match number of samples=1

Project 3: function rs.cluster_results(...) legend mislabelled

The label on the clusters do not correspond to the labels in the legend. For example on the scatterplot, cluster 1 is colored red while it is labelled yellow in the legend.

Unable to open project using Jupyter - see error

Here is the error that pops up when trying to open the file "titanic_survival_explioration.ipynb" from the browser session in Jupyter:

Unreadable Notebook: C:\Users\magnus\Google Drive\Machine Learning\Udacity NanoDegree\Projects\02 - Titanic\titanic_survival_exploration.ipynb NotJSONError("Notebook does not appear to be JSON: u'\n\n\n\n\n\n\n<html lan...",)

error: boston_housing

I'm getting the following error. Could you please let me know what is missing here?

print "'{}' is not a feature of the Titanic data. Did you spell something wrong?".format(key)

does anyone know why this is happening? I don't know what the warning means.

sklearn.cross_validation deprecated in favor of the model_selection module

Full trace:

anaconda3/lib/python3.6/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)

This module is used in section 3.1

Spam example - maketrans

Hi,
In the example of spam filter, the use of maketrans returns error.
First: type object 'str' has no attribute 'maketrans'
After i changed str to string, then: Maketrans() takes exactly 2 arguments

Following is the code in the sample.

sans_punctuation_documents = []
import string

for i in lower_case_documents:
    sans_punctuation_documents.append(i.translate(str.maketrans('', '', string.punctuation)))
print(sans_punctuation_documents)

Smartcab environment reports car on the left when it's actually behind.

The pink car is about to turn left and it is behind the red car. The inputs report it as being on the left.

LearningAgent.update(): deadline = -37, inputs = {'light': 'red', 'oncoming': None, 'right': None, 'left': 'left'}, action = right, reward = 2.0

Typos in finding_donors project Evaluating Model Performance Section

In section Evaluating Model Performance, subsection Metrics and the Naive Predictor there are some typos.

The pseudo-company CharityML is being referred as UdacityML.
would is appropriate. should read would be appropriate.
Supverised Learning Models should read Supervised Learning Models
Note: Dependent on which algorithm you chose, should read Note: Depending on which algorithm you chose,.

testing_data is not defined

"testing_data" variable is not defined.

At machine-learning/projects/smartcab/visuals.py file,
at calculate_safety function, line 37,

if minor >= len(testing_data)/2:

"testing_data" variable is not defined.

Image-classification One Hot Encoding test passing with wrong solution

Hello,
I found that the provided unit test for one_hot_encode is passing with following code:

def one_hot_encode(x):
    """
    One hot encode a list of sample labels. Return a one-hot encoded vector for each label.
    : x: List of sample Labels
    : return: Numpy array of one-hot encoded labels
    """
    # TODO: Implement Function
    y = np.zeros((len(x), 10))
    return y

It seems that the third test: assert np.array_equal(enc_labels, new_enc_labels) is always passing.

Feature Scaling Formula Quiz 3: values mixed up

xmax and xmin are mixed up. xmax is 175, not 115, xmin is 115, not 175.
This also applies to the solution video

Spelling mistake

Ctrl + f "classificatio)" in https://github.com/udacity/machine-learning/blob/master/projects/practice_projects/naive_bayes_tutorial/Naive_Bayes_tutorial.ipynb

change it to classification

boston housing README file - num data pts

Hi,
under the 'Data' section of the readme, it says 490 data points but actually 506 - 16 - 1 = 489 exist.
Lmk, I can submit a PR.
thank you.

Shuffle split needs to be updated for Sklearn 1.8 (boston housing)

Project: boston_housing

For the Visuals.py , both ModelLearning and ModelComplexity code needs to be updated for Shufflesplit..

n_iter is no longer supported.

cv = ShuffleSplit(n_splits=10, test_size=0.2,random_state = 0)
cv.get_n_splits(X.shape[0])

Boston housing project's "fitting a model" code is broken

Commit 890eb59 changed n_iter argument to n_splits in the ShuffleSplit class constructor. Current recommended scikit-learn version throughout the course is 0.17.x, which does not have n_splits yet. Since the course materials also recommend to have always up-to-date revision of this repository it contains broken code out of the box. Moreover ShuffleSplit constructor in scikit-learn 0.18 has a different signature altogether so this change will not work there either.

Please consider reverting the mentioned commit. Thanks!

Error initializing GUI objects;

For smartcab projects, I have encountered an error on initialize GUI.

Simulator.init(): Error initializing GUI objects; display disabled.
error: File is not a Windows BMP file
Simulator.run(): Trial 0

Pygame and libpng is successfully installed.

tests for smartcab/agent.py

When working on smartcab, it's difficult to figure out when my smartcab is having trouble because of the learning rate I am working with, or whether I messed up the implementation of some other function.

it would be great if we had some quick tests for those functions

Finding-Donors: Precision and Recall

Hi!

It seems the problem statement needs to be revised where Precision and Recall are defined. The descriptions are the same.

Thanks

logs folder?

I found there is no logs folder which is required for course submission. I tried to create one in home folder but simulation did not result in any log files. Any idea to fix?

[smartcab] Incomplete reward system

While working on this project I detected a small flaw in the reward system.

File environment.py lines 328-341

        # Agent wants to perform no action:
        elif action == None:
            if light == 'green' and inputs['oncoming'] != 'left': # No oncoming traffic
                violation = 1 # Minor violation

       # (...)

        # Did the agent attempt a valid move?
        if violation == 0:
            if action == agent.get_next_waypoint(): # Was it the correct action?
                reward += 2 - penalty # (2, 1)
            elif action == None and light != 'green': # Was the agent stuck at a red light?
                reward += 2 - penalty # (2, 1)
            else: # Valid but incorrect
                reward += 1 - penalty # (1, 0)

This doesn't cover the case where the agent wants to turn left and there is oncoming traffic forward or right. The optimal policy would be to take no action (None), but None, right or forward would result in the same reward being applied: Valid but incorrect case.

The solution would be to expand:

        # Agent wants to perform no action:
        elif action == None:
            if light == 'green' and (inputs['oncoming'] != 'left' or waypoint != 'left'): # No oncoming traffic
                violation = 1 # Minor violation

to:

elif action == None and light != 'green'
    reward += 2 - penalty # (2, 1)
elif action== None and light == 'green' and inputs['oncoming'] in ['forward', 'right']:
    reward += 2 - penalty # (2, 1)

and expand:

elif action == None and light != 'green'
    reward += 2 - penalty # (2, 1)

to:

elif action == None and light != 'green'
    reward += 2 - penalty # (2, 1)
elif action== None and light == 'green' and inputs['oncoming'] in ['forward', 'right']:
    reward += 2 - penalty # (2, 1)

Error Importing "import visuals as vs"

Guys I am having issue running first cell. I think there is error importing the visuals.py file..!!

Smart cab simulator records success for failed trip

It appears the smart cab simulator.py recorded success in the sim_improved-learning.csv file for the last testing trial (10 of 10) when the python output showed the agent ran out of time and did not reach the destination.
Python output shows
Trial Aborted!
Agent did not reach the destination.

Simulation ended. . .

However the csv file shows a 1 in the success column.
This may be tricky to recreate. I can provide the files or code to recreate the issue.

Error in argument type for function visuals.survival_stats

In notebook titanic_survival_exploration.ipynb, there is a TypeError when calling the function visuals.survival_stats:
TypeError: Cannot concatenate list of ['DataFrame', 'Series']

This could be fixed by transforming outcomes from pandas.Series to pandas.DataFrame by means of the function pandas.Series.to_frame():
outcomes.to_frame()

Student_Admissions.ipynb notebook for Student Admissions mini-project is already completed

Hi all.

The other day I went to complete the Student Admissions mini-project presented in the Deep Learning module, in the Deep Neural Networks section part 32, and found that the notebook that had been checked into both this repo at projects/practice_projects/imdb/Student_Admissions.ipynb and the https://github.com/udacity/aind2-dl repo have already been pre-completed.

smartcab project missing folder

In the smartcab project, student is asked to set the 'log_metrics' to True to record logs to /logs/. However, this folder is missing and students will run into error information like:

No such file or directory: 'logs/sim_no-learning.csv'

One quick fix is to add the logs folder manually.

PS. In the README.md, it stated there is a logs folder but there is none.

Typos in Creating Customer Segments (Unsupervised Learning Project)

Question 2

Which feature did you attempt to predict? What was the reported prediction score? Is this feature is necessary for identifying customers' spending habits?

Question 5

Hint: (...) The rate of increase or decrease is based on the indivdual feature weights.

Implementation: Dimensionality Reduction

(...) Additionally, if a signifiant amount of variance is explained by only two or three dimensions, the reduced data can be visualized afterwards.

"Answer" is missing.

At machine-learning/projects/smartcab/smartcab.ipynb,
Question 3,
There is no "Answer" section.

There is a typo

There is a typo after in[13]:
Examining the survival statistics, the majority of males younger then 10 survived the ship sinking, whereas most males age 10 or older did not survive the ship sinking.

It should be 'than' instead of then.

Converting the code from Python2 to Python3

As the scientific libraries of Python are slowly stopping the support for Python2 (including Jupyter), maybe it'd be a good idea if we have our code in Python3 rather than Python2.

Naive_Bayes_tutorial.ipynb Step 2: Removing all punctuations fails

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-27-56c5cbdbde76> in <module>()
      6 
      7 for i in lower_case_documents:
----> 8     sans_punctuation_documents.append(i.translate(str.maketrans('', '', string.punctuation)))
      9 print(sans_punctuation_documents)

AttributeError: type object 'str' has no attribute 'maketrans'

Line 8 should be:

sans_punctuation_documents.append(string.translate(i, table=None, deletions=string.punctuation))

or similar.

Comment bug in agent.py

In the LearningAgent class, the following comment block exists in two places (in build_state, lines 59-61, and in createQ, lines 88-90):

    # When learning, check if the state is in the Q-table
    #   If it is not, create a dictionary in the Q-table for the current 'state'
    #   For each action, set the Q-value for the state-action pair to 0

I believe this comment is correct in createQ method, but not the build_state method. It definitely shouldn't be in both places. We've had several students understandably confused by this, so I promised to see if we could get it corrected in the github repository.

-

capstone project link is incorrect

The capstone project link is actually the capstone proposal link. I will send a pull request shortly.

Deprecated function is used, and causing warning messages.

At machine-learning/projects/smartcab/visuals.py file,
pd.ralling_mean() is used, and it is causing unnecessary warning.
Warning should be suppressed or the function should be replaced with other function.

visuals.py:74: FutureWarning: pd.rolling_mean is deprecated for Series and will be removed in a future version, replace with
Series.rolling(window=10,center=False).mean()
data['average_reward'] = pd.rolling_mean(data['net_reward'] / (data['initial_deadline'] - data['final_deadline']), 10)
visuals.py:75: FutureWarning: pd.rolling_mean is deprecated for Series and will be removed in a future version, replace with
Series.rolling(window=10,center=False).mean()
data['reliability_rate'] = pd.rolling_mean(data['success']*100, 10) # compute avg. net reward with window=10
visuals.py:78: FutureWarning: pd.rolling_mean is deprecated for Series and will be removed in a future version, replace with
Series.rolling(window=10,center=False).mean()
(data['initial_deadline'] - data['final_deadline']), 10)
visuals.py:80: FutureWarning: pd.rolling_mean is deprecated for Series and will be removed in a future version, replace with
Series.rolling(window=10,center=False).mean()
(data['initial_deadline'] - data['final_deadline']), 10)
visuals.py:82: FutureWarning: pd.rolling_mean is deprecated for Series and will be removed in a future version, replace with
Series.rolling(window=10,center=False).mean()
(data['initial_deadline'] - data['final_deadline']), 10)
visuals.py:84: FutureWarning: pd.rolling_mean is deprecated for Series and will be removed in a future version, replace with
Series.rolling(window=10,center=False).mean()
(data['initial_deadline'] - data['final_deadline']), 10)
visuals.py:86: FutureWarning: pd.rolling_mean is deprecated for Series and will be removed in a future version, replace with
Series.rolling(window=10,center=False).mean()
(data['initial_deadline'] - data['final_deadline']), 10)

Error in lecture video?

In Lesson 1: Deep Neural Networks, Section 28. Neural Network Architecture., in the second video, at the 1’08" mark, he combines the weights into a more compact form. The weights from x1 are 5 and -2 and from x2 are 7 and -3. Shouldn’t the weights from x1 be 5 and 7 and from x2, -2 and -3? It seems the “inner” weights should be swapped, because when I write the equations down they make sense, but not in the form as presented in the lecture video.

Link to thread in discussion forum.

Typo in titanic_survival_exploration.ipynb: titanic_visualizations.py should be visuals.py

As captioned, will create a PR to fix it. Thanks.

finding_donors features scaling

I think there is an error in the "Normalizing Numerical Features" section of the finding_donors notebook. There the MinMaxScaler is applied to data but I think it should be better applied to features_raw, otherwise the log-transform made before would be useless.

scaler = MinMaxScaler()  
numerical = ['age', 'education-num', 'capital-gain', 'capital-loss', 'hours-per-week']
features_raw[numerical] = scaler.fit_transform(data[numerical])

A Small Typo

Under the "Implement a Q-Learning Driving Agent", the word iterative is misspelled as interative.

...based on the reward received and the interative update rule implemented.

Load Titanic_Survival_Exploration.ipynb failed

When I open this file by command "jupyter notebook Titanic_Survival_Exploration.ipynb", the web page alerts that:

Error loading notebook
Unreadable Notebook: /Users/Documents/WORK/project0_titanic/Titanic_Survival_Exploration.ipynb NotJSONError('Notebook does not appear to be JSON: u'\n\n\n\n\n<html lang="e...',)

Could you please help me fix this problem?
Thanks!

Typo in Q2 of Creating Customer Segments

In Question 2, the second "is" should be removed...

Is this feature is necessary for identifying customers' spending habits?

[Bug] Oncomming traffic not perceived when turning left

As one can see in the screenshot above, negative reward is awarded for a situation, where the agent wants to turn left, with green light but oncoming traffic. It is as if the reward system doesn't know about the oncoming traffic.

In a situation like this (agent intents to turn left, green light and oncoming traffic forward or right) the action of the agent should be None, but the reward system wants us to turn left.

Image classification project environment instructions

Everywhere in the nanodegree materials (including the Deep Learning course) it's stated that the Python version being used is 2.7, however apparently image classification project relies on 3.x. There are no instructions about setting up the environment for it too. I've managed to do this on my own by trial and error but this really should be specified somewhere in the course materials so that students wouldn't waste time on trying to set up with Python 2 for it.

UI suggestion: allow for comments on specific videos

When going through lessons, I notice that though the whole lesson itself is great, some videos could use improvement. It would be great to add quite notes, on specific videos, rather then adding a comment for the entire lesson

finding_donors has duplicated formulas

See image:

udacity / machine-learning Goto Github PK

machine-learning's Introduction

Deprecated Repository

machine-learning

machine-learning's People

Stargazers

Watchers

Forkers

machine-learning's Issues

File smartcab.ipynb

Question 3

Question 5

Improve Q-Learning Driving Agent

Optional: Future Rewards - Discount Factor gamma

File smartcab/agent.py

def learn()

Question 2

Question 5

Implementation: Dimensionality Reduction

Recommend Projects

Recommend Topics

Recommend Org