tim-patterson / jsonsql Goto Github PK
View Code? Open in Web Editor NEWA simple cli to run sql against json
Home Page: https://github.com/tim-patterson/jsonsql
License: MIT License
A simple cli to run sql against json
Home Page: https://github.com/tim-patterson/jsonsql
License: MIT License
The reasons behind this are to enable:
At the same time this opportunity should be used to remove:
__all__
thing and replace with a true *
expansion.This will allow for better integration testing of native images
Csv parser pulls in quite a few deps and doesn't seem to be that configurable around what to consider as null,escaping of separators etc
The *
could just be treated just the same as the __all__
is currently and have support to expand in the ui or in the file sinks etc.
Edge cases
Select f.foo from (
select * from ...
) f
Any fields like foo
that we can't find during semantic validation attempt to pull out of *
Select foo from (
select `*`["foo"] as foo from ...
)
In fact we could even push the * all the way down from the top, ie if its not in the top level select we'd end up writing it out.
The only trouble here is if we had something like
Select foo from (
select * from ...
) a
join (
select * from ...
) b on ...
we could generate a coalesce but maybe it's more correct to just throw a semantic error
Currently gathers branches are calculated at query compile time and have a whole bunch of weird hacks to create a bunch of children with different file sources etc.
Now with the data() method these can be done at runtime.
And it might be better to override the paths via some context object that we pass into the data() method.
ie SELECT 1+2
to tidy up code and prevent the need for tableAlias: String?
inGroupByOperator
,ProjectOperator
and TableScanOperator
When exploring s3 datasets it might be useful to implement an s3 caching strategy.
Add Csv table option.
We should support spitting out the logical operator tree as sql after any optimisations and query rewrites(select distinct
, from ( select * ...
for debugging purposes
currently comparison operators don't correctly handle nulls.
ie null = null
currently evaluates to true, under sql rules it should evaluate to null
Kafka was disabled due to graal issues.
It seems someone's already written the subsitutions https://github.com/micronaut-projects/micronaut-kafka/blob/master/kafka/src/main/java/io/micronaut/configuration/kafka/graal/KafkaSubstitutions.java
If we add a table type of dir
alongside csv and json we could do stuff like:
select filename, size from dir '/some/directory' order by size desc limit 10
or even
-- top directories ordered by volume of json files
select parent_dir, sum(size) as total_size
from dir '/some/directory'
where extension = 'json'
group by parent_dir
order by total_size desc limit 10
The reasoning for this is to allow stuff like
INSERT INTO csv 'create_table.sql'
DESCRIBE json 's3://...'
It could also eventually be used for stuff like
SELECT function_name, description FROM (
SHOW functions
) WHERE function_name like '%str%'
When using dot notation to access nested fields ie
field.subfield
the expression just gets the default alias of _col1
etc,
It should be easy enough to capture this info when building the ast
to at least have a bit more of a sane default
Add lateral view outer option
Might be needed where field names clash with keywords, or contain spaces, .
's etc
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.