Right now, magics return the result from Livy without being aware of whether the strin

Tackled by <a class="issue-link js-issue-link" data-error-text="Failed to load title"

Return well formated string/error from Livy to user,about jupyter-incubator/sparkmagic

Comments (6)

msftristew commented on May 23, 2024

I am currently investigating solutions to this issue. Here is the problem we're facing:

If we want to enable pretty-printing, the magics necessarily need to have a side effect. So Livy would give us a dataframe, and the spark magic would return None. Returning None silences the kernel so nothing else would be printed to the screen.
We can't return the result of the magic because if we did, then that would frequently cause results for an invocation to be printed twice: once (pretty-printed) by the magics, and once (improperly formatted) by the kernel.
However, this prevents users from being able to do something like x = %sql SELECT * FROM..., or from doing %sql SELECT * FROM ... and using the Out[] array to get the dataframe from an invocation. This limits the usefulness of the magics.

If we all agree this functionality is necessary, then we have a few options.

Give different commands different names. So we would have one command for running sql queries and pretty-printing the results, and one other command for running sql queries and getting the results of the query as a dataframe. Alejandro and I think this is a pretty messy solution.
Add an option to silence and return the output of the magic. So for example one could write %sql SELECT * FROM ... -r, which would have no output and simply return the dataframe. What's unfortunate here is that you then can't pretty-print the results and also save the results as a dataframe, since silencing and returning the output are captured in the same option.
Similarly to option 2, add two options: -r, which returns the dataframe instead of None, and -q, which silences the pretty-printer. This has all the power that one would need from the system, but now pretty-printing the output while also capturing the results as a dataframe (which might be a common pattern for some users) becomes pretty verbose: %sql SELECT * FROM ... -r -q.

What do you think? Do we want the more verbose but more powerful options for the magics?

@aggFTW @ellisonbg

from sparkmagic.

ellisonbg commented on May 23, 2024

There is an aspect missing here. The sparkmagic itself should not do any
pretty printing at all. It should simply return data frames. The pretty
printing can then be done in two ways:

Passing those data frames to Python functions which do the pretty
printing.
Using the IPython display system to register those functions as the
default display representations for data frames.

The advantage of 1) is that you can offer multiple rich representations of
data frames for users and let them pick which is used. The downside of 1)
is that it isn't automatic.

The advantage of 2) is that it is automatic and will make all data frames
appear nice. The downside is that it is automatic so users can only get
that representation automatically.

My advice is this:

Only return data frames.
Build a set of function which take a data frame and do one particular
rich representation (table or vis)
Then write functions which optionally enable one of those display
function as the automatic display function for data frames.

It is really important to keep the scope of the sparkmagic narrow - it
shouldn't know anything about visualization, tables or other pretty
printing.

Does this make sense?

On Tue, Dec 8, 2015 at 4:44 PM, msftristew [email protected] wrote:

I am currently investigating solutions to this issue. Here is the problem
we're facing:

If we want to enable pretty-printing, the magics necessarily need to
have a side effect. So Livy would give us a dataframe, and the spark magic
would return None. Returning None silences the kernel so nothing else
would be printed to the screen.
2.

We can't return the result of the magic because if we did, then that
would frequently cause results for an invocation to be printed twice: once
(pretty-printed) by the magics, and once (improperly formatted) by the
kernel.
3.

However, this prevents users from being able to do something like x =
%sql SELECT * FROM..., or from doing %sql SELECT * FROM ... and using
the Out[] array to get the dataframe from an invocation. This limits
the usefulness of the magics.

If we all agree this functionality is necessary, then we have a few
options.

Give different commands different names. So we would have one command
for running sql queries and pretty-printing the results, and one other
command for running sql queries and getting the results of the query
as a dataframe. Alejandro and I think this is a pretty messy solution.
2.

Add an option to silence and return the output of the magic. So for
example one could write %sql SELECT * FROM ... -r, which would have no
output and simply return the dataframe. What's unfortunate here is that you
then can't pretty-print the results and also save the results as a
dataframe, since silencing and returning the output are captured in the
same option.
3.

Similarly to option 2, add two options: -r, which returns the
dataframe instead of None, and -q, which silences the pretty-printer.
This has all the power that one would need from the system, but now
pretty-printing the output while also capturing the results as a dataframe
(which might be a common pattern for some users) becomes pretty verbose: %sql
SELECT * FROM ... -r -q.

What do you think? Do we want the more verbose but more powerful options
for the magics?

@aggFTW https://github.com/aggFTW @ellisonbg
https://github.com/ellisonbg

—
Reply to this email directly or view it on GitHub
#60 (comment)
.

Brian E. Granger
Associate Professor of Physics and Data Science
Cal Poly State University, San Luis Obispo
@ellisonbg on Twitter and GitHub
[email protected] and [email protected]

from sparkmagic.

aggFTW commented on May 23, 2024

It does, but we are also talking about stack traces, errors, general Spark results that come back as text.

Given what´s been said, it´s probably best to return None for %%spark code execution and print the result and return a pandas dataframe with %%spark -c sql.

I'm a little bit worried about the consistency of the behavior, but that's what we are really doing: code execution and its result its kept in the remote Spark context while dataframes are retrieved back to the context. I think it makes sense to return None and a pandas df back then.

from sparkmagic.

msftristew commented on May 23, 2024

That's a tough one.

As Alejandro indicates, everything is running inside a magic for the purposes of these wrapper kernels, and they can return "strings" (either to stdout or stderr) in addition to dataframes. We could probably let the magics all return a value and get the results we want by leveraging IPython's display functionality, assuming it's extensible enough.

from sparkmagic.

aggFTW commented on May 23, 2024

Tackled by #68. @ellisonbg could you take a look?

Thanks!

from sparkmagic.

ellisonbg commented on May 23, 2024

Hi, things are a bit chaotic here as we try to release 4.1 (and Christmas
too). Why don't you go ahead and keep pushing forward. Maybe we can do
another round of video code review the week after Christmas?

On Wed, Dec 16, 2015 at 6:00 PM, Alejandro Guerrero Gonzalez <
[email protected]> wrote:

Tackled by #68 #68.
@ellisonbg https://github.com/ellisonbg could you take a look?

Thanks!

—
Reply to this email directly or view it on GitHub
#60 (comment)
.

Brian E. Granger
Associate Professor of Physics and Data Science
Cal Poly State University, San Luis Obispo
@ellisonbg on Twitter and GitHub
[email protected] and [email protected]

from sparkmagic.

Return well formated string/error from Livy to user about sparkmagic HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent