Code Monkey home page Code Monkey logo

Comments (6)

msftristew avatar msftristew commented on May 23, 2024

I am currently investigating solutions to this issue. Here is the problem we're facing:

  1. If we want to enable pretty-printing, the magics necessarily need to have a side effect. So Livy would give us a dataframe, and the spark magic would return None. Returning None silences the kernel so nothing else would be printed to the screen.
  2. We can't return the result of the magic because if we did, then that would frequently cause results for an invocation to be printed twice: once (pretty-printed) by the magics, and once (improperly formatted) by the kernel.
  3. However, this prevents users from being able to do something like x = %sql SELECT * FROM..., or from doing %sql SELECT * FROM ... and using the Out[] array to get the dataframe from an invocation. This limits the usefulness of the magics.

If we all agree this functionality is necessary, then we have a few options.

  1. Give different commands different names. So we would have one command for running sql queries and pretty-printing the results, and one other command for running sql queries and getting the results of the query as a dataframe. Alejandro and I think this is a pretty messy solution.
  2. Add an option to silence and return the output of the magic. So for example one could write %sql SELECT * FROM ... -r, which would have no output and simply return the dataframe. What's unfortunate here is that you then can't pretty-print the results and also save the results as a dataframe, since silencing and returning the output are captured in the same option.
  3. Similarly to option 2, add two options: -r, which returns the dataframe instead of None, and -q, which silences the pretty-printer. This has all the power that one would need from the system, but now pretty-printing the output while also capturing the results as a dataframe (which might be a common pattern for some users) becomes pretty verbose: %sql SELECT * FROM ... -r -q.

What do you think? Do we want the more verbose but more powerful options for the magics?

@aggFTW @ellisonbg

from sparkmagic.

ellisonbg avatar ellisonbg commented on May 23, 2024

There is an aspect missing here. The sparkmagic itself should not do any
pretty printing at all. It should simply return data frames. The pretty
printing can then be done in two ways:

  1. Passing those data frames to Python functions which do the pretty
    printing.
  2. Using the IPython display system to register those functions as the
    default display representations for data frames.

The advantage of 1) is that you can offer multiple rich representations of
data frames for users and let them pick which is used. The downside of 1)
is that it isn't automatic.

The advantage of 2) is that it is automatic and will make all data frames
appear nice. The downside is that it is automatic so users can only get
that representation automatically.

My advice is this:

  • Only return data frames.
  • Build a set of function which take a data frame and do one particular
    rich representation (table or vis)
  • Then write functions which optionally enable one of those display
    function as the automatic display function for data frames.

It is really important to keep the scope of the sparkmagic narrow - it
shouldn't know anything about visualization, tables or other pretty
printing.

Does this make sense?

On Tue, Dec 8, 2015 at 4:44 PM, msftristew [email protected] wrote:

I am currently investigating solutions to this issue. Here is the problem
we're facing:

If we want to enable pretty-printing, the magics necessarily need to
have a side effect. So Livy would give us a dataframe, and the spark magic
would return None. Returning None silences the kernel so nothing else
would be printed to the screen.
2.

We can't return the result of the magic because if we did, then that
would frequently cause results for an invocation to be printed twice: once
(pretty-printed) by the magics, and once (improperly formatted) by the
kernel.
3.

However, this prevents users from being able to do something like x =
%sql SELECT * FROM..., or from doing %sql SELECT * FROM ... and using
the Out[] array to get the dataframe from an invocation. This limits
the usefulness of the magics.

If we all agree this functionality is necessary, then we have a few
options.

Give different commands different names. So we would have one command
for running sql queries and pretty-printing the results, and one other
command for running sql queries and getting the results of the query
as a dataframe. Alejandro and I think this is a pretty messy solution.
2.

Add an option to silence and return the output of the magic. So for
example one could write %sql SELECT * FROM ... -r, which would have no
output and simply return the dataframe. What's unfortunate here is that you
then can't pretty-print the results and also save the results as a
dataframe, since silencing and returning the output are captured in the
same option.
3.

Similarly to option 2, add two options: -r, which returns the
dataframe instead of None, and -q, which silences the pretty-printer.
This has all the power that one would need from the system, but now
pretty-printing the output while also capturing the results as a dataframe
(which might be a common pattern for some users) becomes pretty verbose: %sql
SELECT * FROM ... -r -q.

What do you think? Do we want the more verbose but more powerful options
for the magics?

@aggFTW https://github.com/aggFTW @ellisonbg
https://github.com/ellisonbg


Reply to this email directly or view it on GitHub
#60 (comment)
.

Brian E. Granger
Associate Professor of Physics and Data Science
Cal Poly State University, San Luis Obispo
@ellisonbg on Twitter and GitHub
[email protected] and [email protected]

from sparkmagic.

aggFTW avatar aggFTW commented on May 23, 2024

It does, but we are also talking about stack traces, errors, general Spark results that come back as text.

Given what´s been said, it´s probably best to return None for %%spark code execution and print the result and return a pandas dataframe with %%spark -c sql.

I'm a little bit worried about the consistency of the behavior, but that's what we are really doing: code execution and its result its kept in the remote Spark context while dataframes are retrieved back to the context. I think it makes sense to return None and a pandas df back then.

from sparkmagic.

msftristew avatar msftristew commented on May 23, 2024

That's a tough one.

As Alejandro indicates, everything is running inside a magic for the purposes of these wrapper kernels, and they can return "strings" (either to stdout or stderr) in addition to dataframes. We could probably let the magics all return a value and get the results we want by leveraging IPython's display functionality, assuming it's extensible enough.

from sparkmagic.

aggFTW avatar aggFTW commented on May 23, 2024

Tackled by #68. @ellisonbg could you take a look?

Thanks!

from sparkmagic.

ellisonbg avatar ellisonbg commented on May 23, 2024

Hi, things are a bit chaotic here as we try to release 4.1 (and Christmas
too). Why don't you go ahead and keep pushing forward. Maybe we can do
another round of video code review the week after Christmas?

On Wed, Dec 16, 2015 at 6:00 PM, Alejandro Guerrero Gonzalez <
[email protected]> wrote:

Tackled by #68 #68.
@ellisonbg https://github.com/ellisonbg could you take a look?

Thanks!


Reply to this email directly or view it on GitHub
#60 (comment)
.

Brian E. Granger
Associate Professor of Physics and Data Science
Cal Poly State University, San Luis Obispo
@ellisonbg on Twitter and GitHub
[email protected] and [email protected]

from sparkmagic.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.