Code Monkey home page Code Monkey logo

mastering-big-data-analytics-with-pyspark's People

Contributors

dannymeijer avatar dependabot[bot] avatar packt-itservice avatar packtutkarshr avatar tenythomas01 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mastering-big-data-analytics-with-pyspark's Issues

No module named 'docker'

Hello,
when i try to run python run_me.py I get the following error:
ModuleNotFoundError: No module named 'docker'

Can you please help?
Thank you
Riccardo

Missing ipynb-File in Section 8

Hi

I have downloaded the current version of the github repo and I noticed that in Section 8 in the twitter_app folder the file twitter_app.ipynb is missing.

I am working with your online course at packt publishing.

Regards,

Vinz Frauchiger

Invalid syntax at line 135 in run_me.py

Mastering-Big-Data-Analytics-with-PySpark. on ๎‚  master [!] via ๐Ÿ v2.7.16 took 1m39s
โฏ python run_me.py
  File "run_me.py", line 135
    **{
     ^

Environment

Mastering-Big-Data-Analytics-with-PySpark. on ๎‚  master [!] via ๐Ÿ v2.7.16
โฏ sw_vers
ProductName:	Mac OS X
ProductVersion:	10.15.2
BuildVersion:	19C57
Mastering-Big-Data-Analytics-with-PySpark. on ๎‚  master [!] via ๐Ÿ v2.7.16
โฏ python --version
Python 2.7.16

Notebook for 4.5 is Missing

Hi, I'm looking for the video you referenced in your Packt training series that has a notebook for section 4-5 with ML code. It's not in your github. Is that somewhere online that I can retrieve it?

image

Trace back when running run_me.py

I am running an Ubuntu 20.04 LTS host on VMWARE. Check the Readme and requirements. All the modules have been installed. Python version (from conda - 4.9.2) is 3.8.5

Trying to execute the run_me.py script, but getting the following errors

(base) phil@ubuntu:~/Project/Mastering-Big-Data-Analytics-with-PySpark-master (3)/Mastering-Big-Data-Analytics-with-PySpark-master$ python run_me.py
INFO CourseHandler Welcome to 'Mastering Big Data Analytics with Pyspark' by Danny Meijer [email protected]
INFO CourseHandler Course Version: 20200420
INFO CourseHandler Course Name: Mastering Big Data Analytics with Pyspark
INFO CourseHandler Container Name: mastering-pyspark-ml
INFO CourseHandler Downloading the data
INFO CourseHandler Processing Sentiment140TrainingData
INFO CourseHandler Directory "sentiment-140-training-data" already exists
INFO CourseHandler Skipping "trainingandtestdata.zip"
INFO CourseHandler Processing MovieLens-Small
INFO CourseHandler Directory "ml-latest-small" already exists
INFO CourseHandler Skipping "ml-latest-small.zip"
INFO CourseHandler Processing MovieLens
INFO CourseHandler Directory "ml-latest" already exists
INFO CourseHandler Skipping "ml-latest.zip"
INFO CourseHandler Connecting to Docker API
Traceback (most recent call last):
File "/home/phil/Downloads/yes/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/home/phil/Downloads/yes/lib/python3.8/site-packages/urllib3/connectionpool.py", line 394, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/home/phil/Downloads/yes/lib/python3.8/http/client.py", line 1255, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/home/phil/Downloads/yes/lib/python3.8/http/client.py", line 1301, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/home/phil/Downloads/yes/lib/python3.8/http/client.py", line 1250, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/home/phil/Downloads/yes/lib/python3.8/http/client.py", line 1010, in _send_output
self.send(msg)
File "/home/phil/Downloads/yes/lib/python3.8/http/client.py", line 950, in send
self.connect()
File "/home/phil/Downloads/yes/lib/python3.8/site-packages/docker/transport/unixconn.py", line 30, in connect
sock.connect(self.unix_socket)
FileNotFoundError: [Errno 2] No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/phil/Downloads/yes/lib/python3.8/site-packages/requests/adapters.py", line 439, in send
resp = conn.urlopen(
File "/home/phil/Downloads/yes/lib/python3.8/site-packages/urllib3/connectionpool.py", line 755, in urlopen
retries = retries.increment(
File "/home/phil/Downloads/yes/lib/python3.8/site-packages/urllib3/util/retry.py", line 531, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/home/phil/Downloads/yes/lib/python3.8/site-packages/urllib3/packages/six.py", line 734, in reraise
raise value.with_traceback(tb)
File "/home/phil/Downloads/yes/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/home/phil/Downloads/yes/lib/python3.8/site-packages/urllib3/connectionpool.py", line 394, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/home/phil/Downloads/yes/lib/python3.8/http/client.py", line 1255, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/home/phil/Downloads/yes/lib/python3.8/http/client.py", line 1301, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/home/phil/Downloads/yes/lib/python3.8/http/client.py", line 1250, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/home/phil/Downloads/yes/lib/python3.8/http/client.py", line 1010, in _send_output
self.send(msg)
File "/home/phil/Downloads/yes/lib/python3.8/http/client.py", line 950, in send
self.connect()
File "/home/phil/Downloads/yes/lib/python3.8/site-packages/docker/transport/unixconn.py", line 30, in connect
sock.connect(self.unix_socket)
urllib3.exceptions.ProtocolError: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/phil/Downloads/yes/lib/python3.8/site-packages/docker/api/client.py", line 214, in _retrieve_server_version
return self.version(api_version=False)["ApiVersion"]
File "/home/phil/Downloads/yes/lib/python3.8/site-packages/docker/api/daemon.py", line 181, in version
return self._result(self._get(url), json=True)
File "/home/phil/Downloads/yes/lib/python3.8/site-packages/docker/utils/decorators.py", line 46, in inner
return f(self, *args, **kwargs)
File "/home/phil/Downloads/yes/lib/python3.8/site-packages/docker/api/client.py", line 237, in _get
return self.get(url, **self._set_request_timeout(kwargs))
File "/home/phil/Downloads/yes/lib/python3.8/site-packages/requests/sessions.py", line 555, in get
return self.request('GET', url, **kwargs)
File "/home/phil/Downloads/yes/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/home/phil/Downloads/yes/lib/python3.8/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/home/phil/Downloads/yes/lib/python3.8/site-packages/requests/adapters.py", line 498, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "run_me.py", line 324, in
course = Course()
File "run_me.py", line 114, in init
self._client()
File "run_me.py", line 152, in _client
self.client = docker.from_env()
File "/home/phil/Downloads/yes/lib/python3.8/site-packages/docker/client.py", line 96, in from_env
return cls(
File "/home/phil/Downloads/yes/lib/python3.8/site-packages/docker/client.py", line 45, in init
self.api = APIClient(*args, **kwargs)
File "/home/phil/Downloads/yes/lib/python3.8/site-packages/docker/api/client.py", line 197, in init
self._version = self._retrieve_server_version()
File "/home/phil/Downloads/yes/lib/python3.8/site-packages/docker/api/client.py", line 221, in _retrieve_server_version
raise DockerException(
docker.errors.DockerException: Error while fetching server API version: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))

Building docker image via run_me.py fails

INFO CourseHandler Welcome to 'Mastering Big Data Analytics with Pyspark' by Danny Meijer [email protected]
INFO CourseHandler Course Version: 20200420
INFO CourseHandler Course Name: Mastering Big Data Analytics with Pyspark
INFO CourseHandler Container Name: mastering-pyspark-ml
INFO CourseHandler Downloading the data
INFO CourseHandler Processing Sentiment140TrainingData
INFO CourseHandler Directory "sentiment-140-training-data" already exists
INFO CourseHandler Skipping "trainingandtestdata.zip"
INFO CourseHandler Processing MovieLens-Small
INFO CourseHandler Directory "ml-latest-small" already exists
INFO CourseHandler Skipping "ml-latest-small.zip"
INFO CourseHandler Processing MovieLens
INFO CourseHandler Directory "ml-latest" already exists
INFO CourseHandler Skipping "ml-latest.zip"
INFO CourseHandler Connecting to Docker API
INFO CourseHandler Checking if Docker image is already set-up
WARNING CourseHandler Docker image has not been built yet
INFO CourseHandler Building Docker image
INFO docker.api.build Initiating (this might take a few moments)
Traceback (most recent call last):
File "run_me.py", line 325, in
course = Course()
File "run_me.py", line 117, in init
self._image()
File "run_me.py", line 256, in _image
self.build_image()
File "run_me.py", line 194, in build_image
rm=True,
File "C:\Users\Hansrajjadhav\AppData\Local\Programs\Python\Python37\lib\site-packages\docker\api\build.py", line 261, in build
self._set_auth_headers(headers)
File "C:\Users\Hansrajjadhav\AppData\Local\Programs\Python\Python37\lib\site-packages\docker\api\build.py", line 308, in _set_auth_headers
auth_data = self.auth_configs.get_all_credentials()
File "C:\Users\Hansrajjadhav\AppData\Local\Programs\Python\Python37\lib\site-packages\docker\auth.py", line 302, in get_all_credentials
for k in store.list().keys():
File "C:\Users\Hansrajjadhav\AppData\Local\Programs\Python\Python37\lib\site-packages\docker\credentials\store.py", line 72, in list
return json.loads(data.decode('utf-8'))
File "C:\Users\Hansrajjadhav\AppData\Local\Programs\Python\Python37\lib\json_init
.py", line 348, in loads
return _default_decoder.decode(s)
File "C:\Users\Hansrajjadhav\AppData\Local\Programs\Python\Python37\lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Users\Hansrajjadhav\AppData\Local\Programs\Python\Python37\lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Running container inside a vm

If you are running the container inside a vm, you should change line 290:
port_map = {"{}/tcp".format(p): ("127.0.0.1", p) for p in self.ports}
to
port_map = {"{}/tcp".format(p): ("0.0.0.0", p) for p in self.ports}
otherwise you cant connect from outside your vm

'NoneType' object has no attribute 'id'

The run_my.py script seems to be running fine and almost finishes when I get this error.

INFO docker.api.build 58.14% | 26 Packages, 7 Downloading, 19 Extracted INFO docker.api.build 94.66% | 26 Packages, 7 Downloading, 19 Extracted INFO docker.api.build 94.66% | 26 Packages, 7 Downloading, 19 Extracted INFO docker.api.build Image was built successfully Traceback (most recent call last): File "run_me.py", line 324, in <module> course = Course() File "run_me.py", line 115, in __init__ self._image() File "run_me.py", line 255, in _image self.build_image() File "run_me.py", line 244, in build_image logger.info("Image ID: %s", self.image.id) AttributeError: 'NoneType' object has no attribute 'id'

run_me.py: FileNotFoundError: [Errno 2] No such file or directory

Hi I was trying to run python run_me.py command as suggested in the README, however ended up having a bit long, hard to understand error messages. I use pipenv and created a pyhton3.6.5 environment for this course.

Please help me decipher the following issues:

(ozkans) ozkans@OZKANs-MacBook-Pro Mastering-Big-Data-Analytics-with-PySpark % python --version
Python 3.6.5
(ozkans) ozkans@OZKANs-MacBook-Pro Mastering-Big-Data-Analytics-with-PySpark % python run_me.py
INFO CourseHandler Welcome to 'Mastering Big Data Analytics with Pyspark' by Danny Meijer [email protected]
INFO CourseHandler Course Version: 20200420
INFO CourseHandler Course Name: Mastering Big Data Analytics with Pyspark
INFO CourseHandler Container Name: mastering-pyspark-ml
INFO CourseHandler Downloading the data
INFO CourseHandler Processing Sentiment140TrainingData
INFO CourseHandler Directory "sentiment-140-training-data" already exists
INFO CourseHandler Skipping "trainingandtestdata.zip"
INFO CourseHandler Processing MovieLens-Small
INFO CourseHandler Directory "ml-latest-small" already exists
INFO CourseHandler Skipping "ml-latest-small.zip"
INFO CourseHandler Processing MovieLens
INFO CourseHandler Directory "ml-latest" already exists
INFO CourseHandler Skipping "ml-latest.zip"
INFO CourseHandler Connecting to Docker API
Traceback (most recent call last):
File "/Users/ozkans/.local/share/virtualenvs/ozkans-QsBVdfJX/lib/python3.6/site-packages/urllib3/connectionpool.py", line 677, in urlopen
chunked=chunked,
File "/Users/ozkans/.local/share/virtualenvs/ozkans-QsBVdfJX/lib/python3.6/site-packages/urllib3/connectionpool.py", line 392, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1239, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1285, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1234, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1026, in _send_output
self.send(msg)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 964, in send
self.connect()
File "/Users/ozkans/.local/share/virtualenvs/ozkans-QsBVdfJX/lib/python3.6/site-packages/docker/transport/unixconn.py", line 43, in connect
sock.connect(self.unix_socket)
FileNotFoundError: [Errno 2] No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Users/ozkans/.local/share/virtualenvs/ozkans-QsBVdfJX/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
timeout=timeout
File "/Users/ozkans/.local/share/virtualenvs/ozkans-QsBVdfJX/lib/python3.6/site-packages/urllib3/connectionpool.py", line 727, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File "/Users/ozkans/.local/share/virtualenvs/ozkans-QsBVdfJX/lib/python3.6/site-packages/urllib3/util/retry.py", line 410, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/Users/ozkans/.local/share/virtualenvs/ozkans-QsBVdfJX/lib/python3.6/site-packages/urllib3/packages/six.py", line 734, in reraise
raise value.with_traceback(tb)
File "/Users/ozkans/.local/share/virtualenvs/ozkans-QsBVdfJX/lib/python3.6/site-packages/urllib3/connectionpool.py", line 677, in urlopen
chunked=chunked,
File "/Users/ozkans/.local/share/virtualenvs/ozkans-QsBVdfJX/lib/python3.6/site-packages/urllib3/connectionpool.py", line 392, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1239, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1285, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1234, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1026, in _send_output
self.send(msg)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 964, in send
self.connect()
File "/Users/ozkans/.local/share/virtualenvs/ozkans-QsBVdfJX/lib/python3.6/site-packages/docker/transport/unixconn.py", line 43, in connect
sock.connect(self.unix_socket)
urllib3.exceptions.ProtocolError: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Users/ozkans/.local/share/virtualenvs/ozkans-QsBVdfJX/lib/python3.6/site-packages/docker/api/client.py", line 205, in _retrieve_server_version
return self.version(api_version=False)["ApiVersion"]
File "/Users/ozkans/.local/share/virtualenvs/ozkans-QsBVdfJX/lib/python3.6/site-packages/docker/api/daemon.py", line 181, in version
return self._result(self._get(url), json=True)
File "/Users/ozkans/.local/share/virtualenvs/ozkans-QsBVdfJX/lib/python3.6/site-packages/docker/utils/decorators.py", line 46, in inner
return f(self, *args, **kwargs)
File "/Users/ozkans/.local/share/virtualenvs/ozkans-QsBVdfJX/lib/python3.6/site-packages/docker/api/client.py", line 228, in _get
return self.get(url, **self._set_request_timeout(kwargs))
File "/Users/ozkans/.local/share/virtualenvs/ozkans-QsBVdfJX/lib/python3.6/site-packages/requests/sessions.py", line 543, in get
return self.request('GET', url, **kwargs)
File "/Users/ozkans/.local/share/virtualenvs/ozkans-QsBVdfJX/lib/python3.6/site-packages/requests/sessions.py", line 530, in request
resp = self.send(prep, **send_kwargs)
File "/Users/ozkans/.local/share/virtualenvs/ozkans-QsBVdfJX/lib/python3.6/site-packages/requests/sessions.py", line 643, in send
r = adapter.send(request, **kwargs)
File "/Users/ozkans/.local/share/virtualenvs/ozkans-QsBVdfJX/lib/python3.6/site-packages/requests/adapters.py", line 498, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "run_me.py", line 329, in
course = Course()
File "run_me.py", line 113, in init
self._client()
File "run_me.py", line 153, in _client
self.client = docker.from_env()
File "/Users/ozkans/.local/share/virtualenvs/ozkans-QsBVdfJX/lib/python3.6/site-packages/docker/client.py", line 85, in from_env
timeout=timeout, version=version, **kwargs_from_env(**kwargs)
File "/Users/ozkans/.local/share/virtualenvs/ozkans-QsBVdfJX/lib/python3.6/site-packages/docker/client.py", line 40, in init
self.api = APIClient(*args, **kwargs)
File "/Users/ozkans/.local/share/virtualenvs/ozkans-QsBVdfJX/lib/python3.6/site-packages/docker/api/client.py", line 188, in init
self._version = self._retrieve_server_version()
File "/Users/ozkans/.local/share/virtualenvs/ozkans-QsBVdfJX/lib/python3.6/site-packages/docker/api/client.py", line 213, in _retrieve_server_version
'Error while fetching server API version: {0}'.format(e)
docker.errors.DockerException: Error while fetching server API version: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.