dataquestio / project-walkthroughs Goto Github PK

Data science, machine learning, and web development project code for https://www.youtube.com/c/Dataquestio .

Jupyter Notebook 99.90% Python 0.10% Dockerfile 0.01% Shell 0.01%

data-science machine-learning pandas python

project-walkthroughs's Introduction

Overview

This repository contains files, notebooks, and data used for live project walkthroughs on Dataquest. You can watch the project walkthroughs on Youtube.

These walkthroughs help you build complete end-to-end projects that can go into your portfolio.

Prerequisites

To complete these projects, you'll need to have a good understanding of:

Python syntax, including functions, if statements, and data structures
Data cleaning
Pandas syntax
Using Jupyter notebook
The basics of machine learning

Please make sure you've completed these Dataquest courses (or know the material) before trying these projects:

project-walkthroughs's People

Contributors

Stargazers

Watchers

Forkers

arauchen kiraha tjp1992 prasanthnoel abdul-yyc kirankrishnamurthi fgardete loudwave danitrasmith evi1angel atrotsik mmarikar drtriv wenchaoxu heidel12 sergey-losev ldebb sszczepanski-csc shawnszczepanski lovette-duke ochibobo the-intelligence-of-information nunofernandes-plight aguron maxcodextc brucew2099 renata7-hub jack-donz goswamimohit kochtbenahmed dincerdogan iretex ehizdanny rida-ri moreshud manideep-yellani ebizzness frozen-mind jm4ovpoe lekzyboi bjoernbuth hectorhim yming2-bu gallobits sejoro sgaadfne eladlevyy ebujak1 ormigi andjosaus akashakashpal pythonuecanliders4 arunjana23 ryanjtalbot benky-up sernle deeppatel7981 ascotjnr bijbhi githubsayan7776969 ckim12 alaawahab aryanindarapu toluhenok analyticnaveen colfi isaacdejesus avbitavbi vrtompki linares-rd cabicho wjudt cathbert-busiku suganya86 helenayew marcelteuber seunswift shakthibuffalo salkyna vlganesh07 obengkojo23 otaviodaflonc tamkhong0309 alsawy45 mantecon olumideolaoye mahmoudnoser bryanagarucd harshithbadiga blaze022 oghalemichael arturoguizar niilante maranatha443 gabriellimagomes15 bcodus kamusiall ozzy5629 chivian gabrielhdez41

project-walkthroughs's Issues

Urgent Help needed

ln 44
predictions=backtest(kse100,model, new_predictors )
give me an error

Getting error ValueError: Input X contains NaN. SimpleImputer does not accept missing values encoded as NaN natively. For supervised learning, you might want to consider sklearn.ensemble.HistGradientBoostingClassifier and Regressor which accept missing values encoded as NaNs natively. Alternatively, it is possible to preprocess the data, for instance by using an imputer transformer in a pipeline or drop samples with missing values. See https://scikit-learn.org/stable/modules/impute.html You can find a list of all estimators that handle NaN values at the following page: https://scikit-learn.org/stable/modules/impute.html#estimators-that-handle-nan-values

At line model.fit(train[predictors], train["Target"])
ValueError: Input X contains NaN.
SimpleImputer does not accept missing values encoded as NaN natively. For supervised learning, you might want to consider sklearn.ensemble.HistGradientBoostingClassifier and Regressor which accept missing values encoded as NaNs natively. Alternatively, it is possible to preprocess the data, for instance by using an imputer transformer in a pipeline or drop samples with missing values. See https://scikit-learn.org/stable/modules/impute.html You can find a list of all estimators that handle NaN values at the following page: https://scikit-learn.org/stable/modules/impute.html#estimators-that-handle-nan-values

Then i try to do the following, but not able to resolve this,

Create our imputer to replace missing values with the mean e.g.

imp = SimpleImputer(missing_values=0, strategy='mean')
imp = imp.fit(train)

Impute our data, then train

X_train_imp = imp.transform(train)

model.fit(X_train_imp[predictors], X_train_imp["Target"])

Please share the solution

dag is not visible in the web UI

Name Error

Hi,

I've followed the same code but I keep getting this error.

NameError Traceback (most recent call last)
Input In [1], in <cell line: 4>()
1 FRAME_RATE = 16000
2 CHANNELS = 1
----> 4 model = Model(model_name="vosk-model-en-us-0.22")
6 rec = KaldiRecognizer(model, FRAME_RATE)
7 rec.SetWords(True)

NameError: name 'Model' is not defined

I've tried defining the file path but the error is the same.

NameError Traceback (most recent call last)
Input In [7], in <cell line: 4>()
1 FRAME_RATE = 16000
2 CHANNELS = 1
----> 4 model = Model(r"C:/Users/mahmoudatsanni-oba/cache.vosk/vosk-model-small-en-us-0.15")
6 rec = KaldiRecognizer(model, FRAME_RATE)
7 rec.SetWords(True)

NameError: name 'Model' is not defined

rolling_averages

def rolling_averages(group, cols, new_cols):
group = group.sort_values("date")
rolling_stats = group[cols].rolling(3, closed='left').mean()
group[new_cols] = rolling_stats
group = group.dropna(subset=new_cols)
return group

is not working for mine version. rolling_stats get all wrong

IndexError: list index out of range

This issue is related to footbal_matches, specifically the scraping part.

data = requests.get(standings_url)
soup = BeautifulSoup(data.text)
--> standings_table = soup.select('table.stats_table')[0]

IndexError: list index out of range

For the for loop for scrapping multiple years, I got an error saying I had to install html5lib, and once I did I stopped being able to scrape anything. I started getting this error, and now it is not just on the loop, but also earlier in the notebook when it is used by itself.

error

hello i need help when i run the code it works but as soon as i search for anything i get OperationalError.
sqlite3.OperationalError: no such column: link

Rolling average...closed='left' code gives "closed only implemented for datetimelike and offset based windows"

Int he football matches, prediction code, I am getting a "closed only implemented for datetimelike and offset based windows" when I run the rolling average function. Anyone have an idea why the closed parameter would give this issue? I don't fully understand the error it's giving.

Predict Football Matches

Hello guys, can anyone explain me how can i writhe the code to predict future games in futball?

Obtaining datasets using gdown in colab gives access denied error

Marvelous tutorial dude. Little curiosity btw!

In your tutorial, you had manually downloaded the datasets and place them in your working directory, however, using this snippet from goodreads, I tried to use gdown to download them in my colab environment. I obtain the following error:

Access denied with the following error:


 	Too many users have viewed or downloaded this file recently. Please
	try accessing the file again later. If the file you are trying to
	access is particularly large or is shared with many people, it may
	take up to 24 hours to be able to view or download the file. If you
	still can't access a file after 24 hours, contact your domain
	administrator. 

You may still be able to access the file from the browser:

	 https://drive.google.com/uc?id=1zmylV7XW2dfQVCLeg1LbllfQtHD2KUon

It might not be an issue related to this tutorial, obviously, but I wonder if you might have a good suggestion or workaround for this issue!
I also opened this issue in their github repo, though have not got any response!
Cheers,

Projects

IndexError: list index out of range

Hi,
scrape is not working. i use python 3.9 on windows 11. Can you help?

IndexError Traceback (most recent call last)
Input In [19], in <cell line: 2>()
3 data = requests.get(standings_url)
4 soup = BeautifulSoup(data.text)
----> 5 standings_table = soup.select('table.stats_table')[0]
7 links = [l.get("href") for l in standings_table.find_all('a')]
8 links = [l for l in links if '/squads/' in l]

IndexError: list index out of range

Task exception error

When I run this code:

html = await get_html(url, '#content .filter')

I get this error:

Task exception was never retrieved
future: <Task finished name='Task-9' coro=<Connection.run() done, defined at C:\Users\J\anaconda3\lib\site-packages\playwright\_impl\_connection.py:240> exception=NotImplementedError()>
Traceback (most recent call last):
  File "C:\Users\J\anaconda3\lib\site-packages\playwright\_impl\_connection.py", line 247, in run
    await self._transport.connect()
  File "C:\Users\J\anaconda3\lib\site-packages\playwright\_impl\_transport.py", line 132, in connect
    raise exc
  File "C:\Users\J\anaconda3\lib\site-packages\playwright\_impl\_transport.py", line 120, in connect
    self._proc = await asyncio.create_subprocess_exec(
  File "C:\Users\J\anaconda3\lib\asyncio\subprocess.py", line 236, in create_subprocess_exec
    transport, protocol = await loop.subprocess_exec(
  File "C:\Users\J\anaconda3\lib\asyncio\base_events.py", line 1676, in subprocess_exec
    transport = await self._make_subprocess_transport(
  File "C:\Users\J\anaconda3\lib\asyncio\base_events.py", line 498, in _make_subprocess_transport
    raise NotImplementedError

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_16536\3703175856.py in <module>
----> 1 html= await get_html(url,'#content.filter')

~\AppData\Local\Temp\ipykernel_16536\3669287910.py in get_html(url, selector, sleep, retries)
      4         time.sleep(sleep * i)
      5         try:
----> 6             async with async_playwright() as p:
      7                 browser = await p.firefox.launch()
      8                 page = await browser.new_page()

~\anaconda3\lib\site-packages\playwright\async_api\_context_manager.py in __aenter__(self)
     44         if not playwright_future.done():
     45             playwright_future.cancel()
---> 46         playwright = AsyncPlaywright(next(iter(done)).result())
     47         playwright.stop = self.__aexit__  # type: ignore
     48         return playwright

~\anaconda3\lib\site-packages\playwright\_impl\_connection.py in run(self)
    245             self.playwright_future.set_result(await self._root_object.initialize())
    246 
--> 247         await self._transport.connect()
    248         self._init_task = self._loop.create_task(init())
    249         await self._transport.run()

~\anaconda3\lib\site-packages\playwright\_impl\_transport.py in connect(self)
    130         except Exception as exc:
    131             self.on_error_future.set_exception(exc)
--> 132             raise exc
    133 
    134         self._output = self._proc.stdin

~\anaconda3\lib\site-packages\playwright\_impl\_transport.py in connect(self)
    118                 env.setdefault("PLAYWRIGHT_BROWSERS_PATH", "0")
    119 
--> 120             self._proc = await asyncio.create_subprocess_exec(
    121                 str(self._driver_executable),
    122                 "run-driver",

~\anaconda3\lib\asyncio\subprocess.py in create_subprocess_exec(program, stdin, stdout, stderr, loop, limit, *args, **kwds)
    234     protocol_factory = lambda: SubprocessStreamProtocol(limit=limit,
    235                                                         loop=loop)
--> 236     transport, protocol = await loop.subprocess_exec(
    237         protocol_factory,
    238         program, *args,

~\anaconda3\lib\asyncio\base_events.py in subprocess_exec(self, protocol_factory, program, stdin, stdout, stderr, universal_newlines, shell, bufsize, encoding, errors, text, *args, **kwargs)
   1674             debug_log = f'execute program {program!r}'
   1675             self._log_subprocess(debug_log, stdin, stdout, stderr)
-> 1676         transport = await self._make_subprocess_transport(
   1677             protocol, popen_args, False, stdin, stdout, stderr,
   1678             bufsize, **kwargs)

~\anaconda3\lib\asyncio\base_events.py in _make_subprocess_transport(self, protocol, args, shell, stdin, stdout, stderr, bufsize, extra, **kwargs)
    496                                          extra=None, **kwargs):
    497         """Create subprocess transport."""
--> 498         raise NotImplementedError
    499 
    500     def _write_to_self(self):

NotImplementedError:

How can I fix this?

possible to do with pycharm instead of jupyter ?

Hi, I'm Currently working on a sports project so i came across your Youtube Video and wanted to ask if it is possible to use Pycharm instead of jupyter Notebook, because i need to import my data into a database so that i will be able to visualize them. Thanks in Advance :)

Accuracy

Can you give a method to get the accuracy apart from the precision score

Error on setting monthly_avg & day_of_year_avg

Hello,

I am getting the below error while running the script:

Any help is greatly appreciated!!

Unable to see page numbers in results page

How can I add page numbers in results page that take you the next 10 results?

The model.predict() function parameter is malformed.

market_prediction.ipynb

Code In[42]

from sklearn.metrics import precision_score
preds = model.predict(test[predictors])
preds = pd.Series(preds, index=test.index)
precision_score(test["Target"], preds)

Error info

/home/tony/anaconda3/envs/prophet/lib/python3.9/site-packages/sklearn/base.py:450: UserWarning: X does not have valid feature names, but RandomForestClassifier was fitted with feature names
  warnings.warn(
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In [90], line 2
      1 from sklearn.metrics import precision_score
----> 2 preds = model.predict(test[predictors])
      3 preds = pd.Series(preds, index=test.index)
      4 precision_score(test["Target"], preds)

File ~/anaconda3/envs/prophet/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:832, in ForestClassifier.predict(self, X)
    811 def predict(self, X):
    812     """
    813     Predict class for X.
    814 
   (...)
    830         The predicted classes.
    831     """
--> 832     proba = self.predict_proba(X)
    834     if self.n_outputs_ == 1:
    835         return self.classes_.take(np.argmax(proba, axis=1), axis=0)

File ~/anaconda3/envs/prophet/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:874, in ForestClassifier.predict_proba(self, X)
    872 check_is_fitted(self)
    873 # Check data
--> 874 X = self._validate_X_predict(X)
    876 # Assign chunk of trees to jobs
    877 n_jobs, _, _ = _partition_estimators(self.n_estimators, self.n_jobs)

File ~/anaconda3/envs/prophet/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:605, in BaseForest._validate_X_predict(self, X)
    602 """
    603 Validate X whenever one tries to predict, apply, predict_proba."""
    604 check_is_fitted(self)
--> 605 X = self._validate_data(X, dtype=DTYPE, accept_sparse="csr", reset=False)
    606 if issparse(X) and (X.indices.dtype != np.intc or X.indptr.dtype != np.intc):
    607     raise ValueError("No support for np.int64 index based sparse matrices")

File ~/anaconda3/envs/prophet/lib/python3.9/site-packages/sklearn/base.py:577, in BaseEstimator._validate_data(self, X, y, reset, validate_separately, **check_params)
    575     raise ValueError("Validation should be done on X, y or both.")
    576 elif not no_val_X and no_val_y:
--> 577     X = check_array(X, input_name="X", **check_params)
    578     out = X
    579 elif no_val_X and not no_val_y:

File ~/anaconda3/envs/prophet/lib/python3.9/site-packages/sklearn/utils/validation.py:879, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator, input_name)
    877     # If input is 1D raise error
    878     if array.ndim == 1:
--> 879         raise ValueError(
    880             "Expected 2D array, got 1D array instead:\narray={}.\n"
    881             "Reshape your data either using array.reshape(-1, 1) if "
    882             "your data has a single feature or array.reshape(1, -1) "
    883             "if it contains a single sample.".format(array)
    884         )
    886 if dtype_numeric and array.dtype.kind in "USV":
    887     raise ValueError(
    888         "dtype='numeric' is not compatible with arrays of bytes/strings."
    889         "Convert your data to numeric values explicitly instead."
    890     )

ValueError: Expected 2D array, got 1D array instead:
array=[3.82533e+03 4.04695e+09 3.78100e+03 3.82982e+03 3.75210e+03].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

My system environment

~$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.5 LTS (Focal Fossa)"


conda 22.9.0
ipython                   8.6.0 
ipython_genutils          0.2.0
numpy                     1.23.4
numpy-base                1.23.4
pandas                    1.5.1
python                    3.9.15
python-dateutil           2.8.2
python-fastjsonschema     2.16.2
scikit-learn              1.1.3
yfinance                  0.1.87

Can you give me some tips? Thank you.

dataquestio / project-walkthroughs Goto Github PK

project-walkthroughs's Introduction

Overview

Prerequisites

project-walkthroughs's People

Contributors

Stargazers

Watchers

Forkers

project-walkthroughs's Issues

Create our imputer to replace missing values with the mean e.g.

Impute our data, then train

model.fit(X_train_imp[predictors], X_train_imp["Target"])

Hi, scrape is not working. i use python 3.9 on windows 11. Can you help?

Code In[42]

Error info

My system environment

Recommend Projects

Recommend Topics

Recommend Org

Hi,
scrape is not working. i use python 3.9 on windows 11. Can you help?