Code Monkey home page Code Monkey logo

ploomber / jupysql Goto Github PK

View Code? Open in Web Editor NEW

This project forked from catherinedevlin/ipython-sql

666.0 666.0 73.0 13.11 MB

Better SQL in Jupyter. 📊

Home Page: https://jupysql.ploomber.io

License: Apache License 2.0

Python 98.11% Shell 0.05% CSS 0.05% JavaScript 1.75% Dockerfile 0.03%
bigquery clickhouse data-engineering data-science duckdb hive jupyter mysql polars postgres presto python redshift snowflake spark-sql sql sqlite trino tsql

jupysql's Introduction

CI Linux CI macOS Documentation Status PyPI Conda (channel only) Conda Coverage Twitter Downloads

Tip

Deploy AI apps for free on Ploomber Cloud!

Join our community | Newsletter | Contact us | Docs | Blog | Website | YouTube

Ploomber is the fastest way to build data pipelines ⚡️. Use your favorite editor (Jupyter, VSCode, PyCharm) to develop interactively and deploy ☁️ without code changes (Kubernetes, Airflow, AWS Batch, and SLURM). Do you have legacy notebooks? Refactor them into modular pipelines with a single command.

Installation

Compatible with Python 3.7 and higher.

Install with pip:

pip install ploomber

Or with conda:

conda install ploomber -c conda-forge

Getting started

Try the tutorial:

Community

Main Features

⚡️ Get started quickly

A simple YAML API to get started quickly, a powerful Python API for total flexibility.

get-started.mp4

⏱ Shorter development cycles

Automatically cache your pipeline’s previous results and only re-compute tasks that have changed since your last execution.

shorter-cycles.mp4

☁️ Deploy anywhere

Run as a shell script in a single machine or distributively in Kubernetes, Airflow, AWS Batch, or SLURM.

deploy.mp4

📙 Automated migration from legacy notebooks

Bring your old monolithic notebooks, and we’ll automatically convert them into maintainable, modular pipelines.

refactor.mp4

I want to migrate my notebook.

Show me a demo.

Resources

About Ploomber

Ploomber is a big community of data enthusiasts pushing the boundaries of Data Science and Machine Learning tooling.

Whatever your skillset is, you can contribute to our mission. So whether you're a beginner or an experienced professional, you're welcome to join us on this journey!

Click here to know how you can contribute to Ploomber.

jupysql's People

Contributors

albrechtje avatar amjith avatar anirudhviyer avatar anupam-tiwari avatar bbeat2782 avatar bryannho avatar catherinedevlin avatar chuwpeng123 avatar daveokpare avatar edublancas avatar gtduncan avatar idomic avatar jmoore1127 avatar jorisroovers avatar jstoebel avatar lucaszw avatar mehtamohit013 avatar meihkv avatar neelasha23 avatar nooodle-soup avatar olethanh avatar palashio avatar rodolfoferro avatar tegveerg avatar tl3119 avatar tonykploomber avatar wsshawn avatar xiaochuanyu avatar yafim avatar yafimvo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

jupysql's Issues

Obfuscate connection string

This issue discuss printing the connection string as a cell output. We should add some flag to allow the user to avoid this since it's not too secure. In a similar manner to this, we need to document this flag.

fixing CI on windows

When adding windows to the CI, it fails. We have to fix the tests, often it's just incompatibilities in the tests, but there might be cases where there's an actual incompatibility.

idea: open SQL files as notebooks

Similar to what Jupytext does, except for SQL.

Users can edit SQL files as notebooks, and the underlying SQL file is something that a database can run (i.e., only one query is active, and the rest are commented out). Integration with Ploomber. You can run these SQL files on Ploomber and select if you want to run them as notebooks or SQL. If the former, you get the output notebooks with all the charts.

Builds in readthedocs fail since updates in plot example

I've modified the example plot.md to consider a larger database, as detailed in #8.

For some reason, builds in readthedocs are failing: https://readthedocs.org/projects/jupysql/builds/

I created a new branch with diferent table sizes to verify if it was the problem and also tried with a bigger timeout from jupyterbook config. In all cases builds failed, so we need to verify why this is happening and build the new docs with the updated examples.

Suppress message "Returning data to local variable"

Example:

In:

%%sql result <<
select col_name
from table_name

Out:

Returning data to local variable result
I don't want this message to clutter my output for the document I am producing. It would be useful to have a setting to suppress this message.

Current workaround for this issue (needs to be applied for each cell):

%%capture
%%sql result <<
select col_name
from table_name

experiment: showing our community page on errors

We want to incentivize our users to talk to us when they encounter problems by showing them our link to Slack in the exception message.

In ploomber-core 0.1, I added some custom exceptions, so we do that. Essentially, we want to replace some errors with their customized counterparts:

ValueError -> PloomberValueError
TypeError -> PloomberTypeError

see documentation here

validating input parameter when initializing connection

When starting a new session, users can connect to a db like this:

%sql connection_string

However, the connection_string is not validated, and if I pass an invalid object:

%sql "not a connection string"
%sql 100

The following message shows:

Environment variable $DATABASE_URL not set, and no connect string given.
Connection info needed in SQLAlchemy format, example:
               postgresql://username:password@hostname/dbname
               or an existing connection: dict_keys([])

This is confusing as it doesn't tell me my input is wrong. It says "no connection string given", but that's incorrect.

syntax highlighting

This isn't supported in JupyterLab, and the required feature is scheduled for the 4.0 release, but there isn't an expected release date yet; see jupyterlab/jupyterlab#3869

based on this, it seems that another way to get syntax highlighting with jupyterlab-lsp, I tested it and it works:

image

Looks like at this point, jupyter-lsp is the way to go, so we should add instructions to the docs

writing large SQL queries

The best way to write large SQL queries is via CTEs:

WITH something AS (
  SELECT * FROM some_table
  WHERE column = 1
)
SELECT *
FROM something
WHERE another_column = 2

However, when writing CTEs, we lose the ability to run the sub-expressions interactively.

We could add something like this:

%%sql --name something
SELECT * FROM some_table
WHERE column = 1

Then, to re-use the previous query:

%%sql
SELECT *
-- refer to the previous query
FROM ':something'
WHERE another_column = 2

And have the extension automatically expand the expression above to:

WITH something AS (
  SELECT * FROM some_table
  WHERE column = 1
)
SELECT *
FROM something
WHERE another_column = 2

Before submitting to the database.

better connection string parsing

we documented a workaround for connecting to SQLite databases that contain spaces (#35), however, this hasn't been fixed. so I'm opening this issue. This is related to the parsing logic (see #33 (comment)).

adding new magics for plotting

JupySQL has a feature to plot histograms and boxplots for large-scale datasets. It works by computing the summary statistics and then using those statistics to create the plot (as opposed to bringing all the data and computing locally, which can blow up memory usage).

However, only a Python API is exposed, and it'd be better to have it as magic. For example:

Plot histogram of x:

%%sql-histogram
SELECT x
FROM numbers

Plot boxplot of y:

%%sql-boxplot
SELECT y
FROM numbers

There are a few things to take into account:

  • the query might select more than one variable. In such case, we should throw an error (or ignore the others?)
  • the data type might be wrong, we should run the query anyway, so we don't have to perform validations, but if an error occurs, we should ensure the error message is clear
  • We should keep the Python API since I gave a talk at PyData 2022, so it's likely that some people are already using it
  • I'm unsure if it's best to create a new magic or re-use the existing one. It feels like the %%sql magic already has too many parameters

Check rate of `--no-execute` usage on `--save`

We're using the --save, and --no-execute functions. This tells JupySQL to store the query, but skips execution.
Ideally save would only save the query given users will want to chain multiple saved queries, they'll always use the --no-execute flag. We want to track it and see if we need to reverse the functionality, meaning --save will only save, and to execute we need to add the --execute flag

buggy behavior in `%config SqlMagic.style`

I was testing the %config SqlMagic.style and realized it doesn't work well:

image

sometimes the text representation is empty, and changing the style does not affect it.

sample code:

%sql sqlite://
%%sql
CREATE TABLE languages (name, rating, change);
INSERT INTO languages VALUES ('Python', 14.44, 2.48);
INSERT INTO languages VALUES ('C', 13.13, 1.50);
INSERT INTO languages VALUES ('Java', 11.59, 0.40);
INSERT INTO languages VALUES ('C++', 10.00, 1.98);
%config SqlMagic.style = "PLAIN_COLUMNS"
%config SqlMagic.style = "RANDOM"
%config SqlMagic.style = "DEFAULT"
result = %sql SELECT * FROM languages LIMIT 2
print(result.pretty.get_string())

UDF support

Currently the package doesn't support it.
We should add a feature to accept it and execute it via the engine.

proposal: `%sqlcmd profile`

When working with a new dataset, practitioners need to explore and summarize it quickly: column values, types, distributions, etc. We could create a %sqlcmd profile magic that produces an HTML table/report of a table. Similar to pandas-profiling, except this would run the analysis on the SQL engine, making it more scalable.

Examples:

display an embedded summary:

%sqlcmd profile --table my_table

store a report:

%sqlcmd profile --table my_table --output report.html

proposal: `%sqlcmd test`

Often, practitioners need to evaluate the quality of their data (e.g., no NULLs, values within range, etc.). We could create a new magic for rapid testing:

%sqlcmd test --table data --column numbers --within 0,100 --type float
%sqlcmd test --table data --column names --no-nulls
%sqlcmd test --table data --column birth_date --strictly-greater-than 0

add algolia for searching

Like we did with Ploomber, we should add Algolia to JupySQL docs. Ploomber's doc is built on top of sphinx, while JupySQL's docs is on top of Jupyter Book (another layer of abstraction on top of sphinx), so there might be some exploration needed. We must document the process so we can apply it easily to the rest of our projects.

formatting SQL to improve readability

we could have a CLI to format SQL in notebooks:

jupysql format some-notebook.ipynb

we already have sqlparse as dependency so we can use that for formatting

compatibility with SQLAlchemy 2

Connect throws

MetaData.init() got an unexpected keyword argument 'bind'

For now specifying SQLAlchemy version better, but should actually fix

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.