Code Monkey home page Code Monkey logo

project-walkthroughs's Introduction

Overview

This repository contains files, notebooks, and data used for live project walkthroughs on Dataquest. You can watch the project walkthroughs on Youtube.

These walkthroughs help you build complete end-to-end projects that can go into your portfolio.

Prerequisites

To complete these projects, you'll need to have a good understanding of:

  • Python syntax, including functions, if statements, and data structures
  • Data cleaning
  • Pandas syntax
  • Using Jupyter notebook
  • The basics of machine learning

Please make sure you've completed these Dataquest courses (or know the material) before trying these projects:

project-walkthroughs's People

Contributors

vikparuchuri avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

project-walkthroughs's Issues

Urgent Help needed

ln 44
predictions=backtest(kse100,model, new_predictors )
give me an error

Getting error ValueError: Input X contains NaN. SimpleImputer does not accept missing values encoded as NaN natively. For supervised learning, you might want to consider sklearn.ensemble.HistGradientBoostingClassifier and Regressor which accept missing values encoded as NaNs natively. Alternatively, it is possible to preprocess the data, for instance by using an imputer transformer in a pipeline or drop samples with missing values. See https://scikit-learn.org/stable/modules/impute.html You can find a list of all estimators that handle NaN values at the following page: https://scikit-learn.org/stable/modules/impute.html#estimators-that-handle-nan-values

At line model.fit(train[predictors], train["Target"])
ValueError: Input X contains NaN.
SimpleImputer does not accept missing values encoded as NaN natively. For supervised learning, you might want to consider sklearn.ensemble.HistGradientBoostingClassifier and Regressor which accept missing values encoded as NaNs natively. Alternatively, it is possible to preprocess the data, for instance by using an imputer transformer in a pipeline or drop samples with missing values. See https://scikit-learn.org/stable/modules/impute.html You can find a list of all estimators that handle NaN values at the following page: https://scikit-learn.org/stable/modules/impute.html#estimators-that-handle-nan-values

Then i try to do the following, but not able to resolve this,

Create our imputer to replace missing values with the mean e.g.

imp = SimpleImputer(missing_values=0, strategy='mean')
imp = imp.fit(train)

Impute our data, then train

X_train_imp = imp.transform(train)

model.fit(X_train_imp[predictors], X_train_imp["Target"])

Please share the solution

Name Error

Hi,

I've followed the same code but I keep getting this error.

NameError Traceback (most recent call last)
Input In [1], in <cell line: 4>()
1 FRAME_RATE = 16000
2 CHANNELS = 1
----> 4 model = Model(model_name="vosk-model-en-us-0.22")
6 rec = KaldiRecognizer(model, FRAME_RATE)
7 rec.SetWords(True)

NameError: name 'Model' is not defined

I've tried defining the file path but the error is the same.

NameError Traceback (most recent call last)
Input In [7], in <cell line: 4>()
1 FRAME_RATE = 16000
2 CHANNELS = 1
----> 4 model = Model(r"C:/Users/mahmoudatsanni-oba/cache.vosk/vosk-model-small-en-us-0.15")
6 rec = KaldiRecognizer(model, FRAME_RATE)
7 rec.SetWords(True)

NameError: name 'Model' is not defined

rolling_averages

def rolling_averages(group, cols, new_cols):
group = group.sort_values("date")
rolling_stats = group[cols].rolling(3, closed='left').mean()
group[new_cols] = rolling_stats
group = group.dropna(subset=new_cols)
return group

is not working for mine version. rolling_stats get all wrong

IndexError: list index out of range

This issue is related to footbal_matches, specifically the scraping part.

data = requests.get(standings_url)
soup = BeautifulSoup(data.text)
--> standings_table = soup.select('table.stats_table')[0]

IndexError: list index out of range

For the for loop for scrapping multiple years, I got an error saying I had to install html5lib, and once I did I stopped being able to scrape anything. I started getting this error, and now it is not just on the loop, but also earlier in the notebook when it is used by itself.

error

hello i need help when i run the code it works but as soon as i search for anything i get OperationalError.
sqlite3.OperationalError: no such column: link

Predict Football Matches

Hello guys, can anyone explain me how can i writhe the code to predict future games in futball?

Obtaining datasets using gdown in colab gives access denied error

Marvelous tutorial dude. Little curiosity btw!

In your tutorial, you had manually downloaded the datasets and place them in your working directory, however, using this snippet from goodreads, I tried to use gdown to download them in my colab environment. I obtain the following error:

Access denied with the following error:


 	Too many users have viewed or downloaded this file recently. Please
	try accessing the file again later. If the file you are trying to
	access is particularly large or is shared with many people, it may
	take up to 24 hours to be able to view or download the file. If you
	still can't access a file after 24 hours, contact your domain
	administrator. 

You may still be able to access the file from the browser:

	 https://drive.google.com/uc?id=1zmylV7XW2dfQVCLeg1LbllfQtHD2KUon 

It might not be an issue related to this tutorial, obviously, but I wonder if you might have a good suggestion or workaround for this issue!
I also opened this issue in their github repo, though have not got any response!
Cheers,

IndexError: list index out of range

Hi,
scrape is not working. i use python 3.9 on windows 11. Can you help?

IndexError Traceback (most recent call last)
Input In [19], in <cell line: 2>()
3 data = requests.get(standings_url)
4 soup = BeautifulSoup(data.text)
----> 5 standings_table = soup.select('table.stats_table')[0]
7 links = [l.get("href") for l in standings_table.find_all('a')]
8 links = [l for l in links if '/squads/' in l]

IndexError: list index out of range

Task exception error

When I run this code:

html = await get_html(url, '#content .filter')

I get this error:

Task exception was never retrieved
future: <Task finished name='Task-9' coro=<Connection.run() done, defined at C:\Users\J\anaconda3\lib\site-packages\playwright\_impl\_connection.py:240> exception=NotImplementedError()>
Traceback (most recent call last):
  File "C:\Users\J\anaconda3\lib\site-packages\playwright\_impl\_connection.py", line 247, in run
    await self._transport.connect()
  File "C:\Users\J\anaconda3\lib\site-packages\playwright\_impl\_transport.py", line 132, in connect
    raise exc
  File "C:\Users\J\anaconda3\lib\site-packages\playwright\_impl\_transport.py", line 120, in connect
    self._proc = await asyncio.create_subprocess_exec(
  File "C:\Users\J\anaconda3\lib\asyncio\subprocess.py", line 236, in create_subprocess_exec
    transport, protocol = await loop.subprocess_exec(
  File "C:\Users\J\anaconda3\lib\asyncio\base_events.py", line 1676, in subprocess_exec
    transport = await self._make_subprocess_transport(
  File "C:\Users\J\anaconda3\lib\asyncio\base_events.py", line 498, in _make_subprocess_transport
    raise NotImplementedError

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_16536\3703175856.py in <module>
----> 1 html= await get_html(url,'#content.filter')

~\AppData\Local\Temp\ipykernel_16536\3669287910.py in get_html(url, selector, sleep, retries)
      4         time.sleep(sleep * i)
      5         try:
----> 6             async with async_playwright() as p:
      7                 browser = await p.firefox.launch()
      8                 page = await browser.new_page()

~\anaconda3\lib\site-packages\playwright\async_api\_context_manager.py in __aenter__(self)
     44         if not playwright_future.done():
     45             playwright_future.cancel()
---> 46         playwright = AsyncPlaywright(next(iter(done)).result())
     47         playwright.stop = self.__aexit__  # type: ignore
     48         return playwright

~\anaconda3\lib\site-packages\playwright\_impl\_connection.py in run(self)
    245             self.playwright_future.set_result(await self._root_object.initialize())
    246 
--> 247         await self._transport.connect()
    248         self._init_task = self._loop.create_task(init())
    249         await self._transport.run()

~\anaconda3\lib\site-packages\playwright\_impl\_transport.py in connect(self)
    130         except Exception as exc:
    131             self.on_error_future.set_exception(exc)
--> 132             raise exc
    133 
    134         self._output = self._proc.stdin

~\anaconda3\lib\site-packages\playwright\_impl\_transport.py in connect(self)
    118                 env.setdefault("PLAYWRIGHT_BROWSERS_PATH", "0")
    119 
--> 120             self._proc = await asyncio.create_subprocess_exec(
    121                 str(self._driver_executable),
    122                 "run-driver",

~\anaconda3\lib\asyncio\subprocess.py in create_subprocess_exec(program, stdin, stdout, stderr, loop, limit, *args, **kwds)
    234     protocol_factory = lambda: SubprocessStreamProtocol(limit=limit,
    235                                                         loop=loop)
--> 236     transport, protocol = await loop.subprocess_exec(
    237         protocol_factory,
    238         program, *args,

~\anaconda3\lib\asyncio\base_events.py in subprocess_exec(self, protocol_factory, program, stdin, stdout, stderr, universal_newlines, shell, bufsize, encoding, errors, text, *args, **kwargs)
   1674             debug_log = f'execute program {program!r}'
   1675             self._log_subprocess(debug_log, stdin, stdout, stderr)
-> 1676         transport = await self._make_subprocess_transport(
   1677             protocol, popen_args, False, stdin, stdout, stderr,
   1678             bufsize, **kwargs)

~\anaconda3\lib\asyncio\base_events.py in _make_subprocess_transport(self, protocol, args, shell, stdin, stdout, stderr, bufsize, extra, **kwargs)
    496                                          extra=None, **kwargs):
    497         """Create subprocess transport."""
--> 498         raise NotImplementedError
    499 
    500     def _write_to_self(self):

NotImplementedError: 

How can I fix this?

possible to do with pycharm instead of jupyter ?

Hi, I'm Currently working on a sports project so i came across your Youtube Video and wanted to ask if it is possible to use Pycharm instead of jupyter Notebook, because i need to import my data into a database so that i will be able to visualize them. Thanks in Advance :)

Accuracy

Can you give a method to get the accuracy apart from the precision score

The model.predict() function parameter is malformed.

market_prediction.ipynb

Code In[42]


from sklearn.metrics import precision_score
preds = model.predict(test[predictors])
preds = pd.Series(preds, index=test.index)
precision_score(test["Target"], preds)

Error info


/home/tony/anaconda3/envs/prophet/lib/python3.9/site-packages/sklearn/base.py:450: UserWarning: X does not have valid feature names, but RandomForestClassifier was fitted with feature names
  warnings.warn(
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In [90], line 2
      1 from sklearn.metrics import precision_score
----> 2 preds = model.predict(test[predictors])
      3 preds = pd.Series(preds, index=test.index)
      4 precision_score(test["Target"], preds)

File ~/anaconda3/envs/prophet/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:832, in ForestClassifier.predict(self, X)
    811 def predict(self, X):
    812     """
    813     Predict class for X.
    814 
   (...)
    830         The predicted classes.
    831     """
--> 832     proba = self.predict_proba(X)
    834     if self.n_outputs_ == 1:
    835         return self.classes_.take(np.argmax(proba, axis=1), axis=0)

File ~/anaconda3/envs/prophet/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:874, in ForestClassifier.predict_proba(self, X)
    872 check_is_fitted(self)
    873 # Check data
--> 874 X = self._validate_X_predict(X)
    876 # Assign chunk of trees to jobs
    877 n_jobs, _, _ = _partition_estimators(self.n_estimators, self.n_jobs)

File ~/anaconda3/envs/prophet/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:605, in BaseForest._validate_X_predict(self, X)
    602 """
    603 Validate X whenever one tries to predict, apply, predict_proba."""
    604 check_is_fitted(self)
--> 605 X = self._validate_data(X, dtype=DTYPE, accept_sparse="csr", reset=False)
    606 if issparse(X) and (X.indices.dtype != np.intc or X.indptr.dtype != np.intc):
    607     raise ValueError("No support for np.int64 index based sparse matrices")

File ~/anaconda3/envs/prophet/lib/python3.9/site-packages/sklearn/base.py:577, in BaseEstimator._validate_data(self, X, y, reset, validate_separately, **check_params)
    575     raise ValueError("Validation should be done on X, y or both.")
    576 elif not no_val_X and no_val_y:
--> 577     X = check_array(X, input_name="X", **check_params)
    578     out = X
    579 elif no_val_X and not no_val_y:

File ~/anaconda3/envs/prophet/lib/python3.9/site-packages/sklearn/utils/validation.py:879, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator, input_name)
    877     # If input is 1D raise error
    878     if array.ndim == 1:
--> 879         raise ValueError(
    880             "Expected 2D array, got 1D array instead:\narray={}.\n"
    881             "Reshape your data either using array.reshape(-1, 1) if "
    882             "your data has a single feature or array.reshape(1, -1) "
    883             "if it contains a single sample.".format(array)
    884         )
    886 if dtype_numeric and array.dtype.kind in "USV":
    887     raise ValueError(
    888         "dtype='numeric' is not compatible with arrays of bytes/strings."
    889         "Convert your data to numeric values explicitly instead."
    890     )

ValueError: Expected 2D array, got 1D array instead:
array=[3.82533e+03 4.04695e+09 3.78100e+03 3.82982e+03 3.75210e+03].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

My system environment


~$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.5 LTS (Focal Fossa)"


conda 22.9.0
ipython                   8.6.0 
ipython_genutils          0.2.0
numpy                     1.23.4
numpy-base                1.23.4
pandas                    1.5.1
python                    3.9.15
python-dateutil           2.8.2
python-fastjsonschema     2.16.2
scikit-learn              1.1.3
yfinance                  0.1.87

Can you give me some tips? Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.