Code Monkey home page Code Monkey logo

spark-py-notebooks's People

Contributors

bitdeli-chef avatar chusopr avatar gitter-badger avatar jadianes avatar ypriverol avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

spark-py-notebooks's Issues

urllib module in nb1-rdd-creation

I think for python3.x users,urllib module has been split into several modules and therefore
import urllib.request.urlretrieve will make more sense i guess.
Possibly update on the same if you thing is needed.

[bug] About nb10-sql-dataframes.ipynb (DF.map→RDD.map)

@jadianes
hello I'm Hiroyuki.
nice Tutorial, Thank you!

In[7]

tcp_interactions_out = tcp_interactions.map(lambda p: "Duration: {}, Dest. bytes: {}".format(p.duration, p.dst_bytes))
for ti_out in tcp_interactions_out.collect():
  print ti_out

but map can use only for RDD.
so we need to change tcp_interactions(DataFrame) to RDD , I think.

here is the sample

tcp_interactions_out = tcp_interactions.rdd.map(lambda p: "Duration: {}, Dest. bytes: {}".format(p.duration, p.dst_bytes))
for ti_out in tcp_interactions_out.collect():
  print ti_out

how do you think about it?

If there is my mistake in my code or in my sentence , sorry. (couse Im not good at writting English)
please forgive me if I make you feel bad.

Apparent Memory Issues

juyptererror.txt
commandprompt.txt
commandprompterror.txt

Hi - I am a student attempting to learn how to use PYSPSARK/JUPYTER to build classification models for large data. I installedPYSPARK V2.2.1 and Juypter as per tutorial on medium website by Michael Galarnyk. It seemed to install ok and I was able to run your first notebook. However in the second notebook nb2-rdd-basics I had problems with the "collect" code

from time import time
t0 = time()
head_rows = csv_data.take(100000)
tt = time() - t0
print "Parse completed in {} seconds".format(round(tt,3))
Thinking it was a memory issue I then launched Jupyter with command
pyspark --master local[4] --driver-memory 32g --executor-memory 32g
I have attached the Juypter error and command prompt data before and after error
Please help - how do I increase memory in the kernel

spark context

I had an issue with the command line
$ MASTER="spark://127.0.0.1:7077" SPARK_EXECUTOR_MEMORY="1G" IPYTHON_OPTS="notebook --pylab inline" /home/philippe/Downloads/spark-master/bin/pyspark

the error was Connection refused: /127.0.0.1:7077

and was resolved with
$ MASTER=local[4] SPARK_EXECUTOR_MEMORY="1G" IPYTHON_OPTS="notebook --pylab inline" /home/philippe/Downloads/spark-master/bin/pyspark
maybe you could say a word in the readme about it.

Otherwise great notebooks and great help Thank you!

Logistic Regression with LBFGS in Spark 1.6 and 2.1

@jadianes Nice tutorial on Logistic Regression, thankyou.
I ran the tutorial on Spark 1.6.2 and 2.1.0 - both ran fine and I could repeat your results perfectly in 1.6.2, but I would like to offer the following observation re 2.1.0. In 2.1.0 the process takes about 3 times longer to run and produces a different answer than that produced by 1.6.2. I thought this was strange and found that in the list of Spark tasks 2.1.0 was calling a non-LBFGS algorithm. I raised this issue in a JIRA question (https://issues.apache.org/jira/browse/SPARK-16768). It seems that even though a user can import the LBFGS version into pyspark and you can call help on it and actually call it, I don't think it is actually an LBFGS version.
http://spark.apache.org/docs/latest/mllib-optimization.html has some other information on LBFGS in Spark.
Later when 2.1.0 becomes the standard your readers may find that they don't get your results for accuracy. Or maybe I just missed something, can anyone confirm my observations?

Website isn't working

Thanks for the tutorials!
The domain of the website is probably expired and the .github.io link is routing to that domain too.

Possible solutions:

  1. Renew the domain subscription
  2. Cancel the alias or record that's causing the GitHub page to go to the custom domain

license?

What is the license for this repo? Apache 2.0 would be nice :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.