demmings / gssql Goto Github PK

View Code? Open in Web Editor NEW

20.0 20.0 2.0 6.18 MB

Use real SQL SELECT syntax to filter your Google Sheets data using a simple custom function.

Home Page: https://demmings.github.io/

License: GNU General Public License v3.0

JavaScript 100.00%

apps-script gas-library google google-apps-script join query select spreadsheet sql

gssql's Introduction

Hi My name is Chris Demmings

Sheets Custom Function Developer

Retired from programming server side applications in the school notifications industry. (your kid was absent (or will be absent) messages via phone, email, text...)

🌍 I'm based in Greater Toronto Area (GTA).
🖥️ See my portfolio at My Home Page
🚀 I'm currently working on gsSQL
🧠 I'm learning Google Sheets Javascript

Skills

Socials

Badges

My GitHub Stats

Top Repositories

gssql's People

Contributors

Stargazers

Watchers

Forkers

gssql's Issues

Use column LETTER as field name.

gsSQL currently requires unique column titles for the input data range.
Currently, the column titles are used as the field name.
The suggested enhancement would be to use the column letter as field name when column titles are not available.
The column letter will use relative position in the data range, so the first column in the range will be column 'A'.
This is different than the current Sheets implementation where the physical column name is always used.
The problem with the Google approach is that when columns are inserted/remove prior to the range, the QUERY fails since the columns are different.

? Is there a good way to know that no column titles exist and therefore expect letters for columns, OR should we just analyze the SELECT statement and figure out if only letters are used (only necessary if no table definition).

Add a new table definition parameter to indicate if column titles are available.

Nested SELECT FROM (subquery) Fails

This nested SELECT FROM is failing to be parse properly.

        select invoice from (select invoice from (select * from booksales where customer_id = 'C1') as c1sales) as mysales

Join on multiple conditions fails.

JOIN table on one condition works, but two or more conditions fails.
Gets error message: Error: Invalid JOIN TABLE ON

select * from booksales join bookreturns on booksales.quantity = bookreturns.quantity and booksales.price = bookreturns.price

Date comparison not valid when NON DATE characters are in column.

=gsSQL("select * from booksales where Date > '5/3/2022' ", "booksales", BookSales!A1:H)

is including records with TEXT CHARACTERS in the Date column. My comparisons to mySQL don't have this issue since I only have DATES in that SQL table. However, sheets can have anything - so I would probably think those records should be skipped.

Note This specific example may not be solvable.
When a comparison is to be made and either side of the logical comparison is an instance of Date, both sides are converted to a JS date. So the problem here is that if a column data in 'Date' is character data and '5/3/22' is character data - a regular ASCII comparison would be made. If however, '5/3/2022' references a CELL (bind variables) - it would be taken to be a date since Sheets would automatically converted it into a JS date

e.g. cell C1 was a date.

=gsSQL("select * from booksales where Date > ?1 ", "booksales", BookSales!A1:H, true, C1)

NOT IN BUG

NOT IN BUG PLEASE HELP FIX IT

library id not working

1ZfedAgGG2K5kPLC2NPfe0Kb1xAg-0gvmliR3V8pRNk6DZMTUQyCbMW1W

not working

WHERE Correlated Sub Query used in comparison FAILS.

When the correlated sub-query is used in WHERE and the results are compared (rather than using EXISTS or NOT EXISTS), it fails.
An example SQL that fails:

   SELECT *
  FROM booksales as b1 
  WHERE price = (select max(booksales.price) from booksales where booksales.customer_id = b1.customer_id)

Select FROM sub-query that uses LIKE may fail.

For example:

=gsSQL("Select * from (select * from books where title like 'Your%') as mybooks")

The PARSER is converting the sub-query of 'Your%' to 'YOUR%' and is failing to match any records.
The conversion of the select into the Abstract Syntax Tree is not parsing properly.
If the data actually contained YOUR, it would match those records - but that would just be plain luck.

Large table tests.

No tests have been conducted where two (or more) large tables are joined.
It would be interesting to see if any optimizations would improve any JOIN table - and at what point would it be necessary.
How big can the tables be before the system is not useable.
In my current use, my largest table is around 5K records - but it only joins to tiny tables - so this is indicitive of nothing.
Optimizing the code will add complexity and may not yield significant improvements for typical non-commerical use - so this worth investigating if we find that we hit a performance wall at a specific number of records.

Select from SUB-QUERY and no table alias.

When doing a sub-query for FROM, you must specify a table alias for the derived table.
For this example, the derived table name is myTable

Select * from (select * from books where title like 'Your%') as myTable

If you forget to include the derived table name, the returned error is not helpful:

TypeError: Cannot read properties of undefined (reading 'toUpperCase') (line 1709)

README cleanup.

Cleaner description of gsSQL purpose.
List of all supported SELECT statement capabilities.
Better syntax description.
More real world examples with screen capture output.
Wiki needs documentation for every supported command and option.

Sonarcloud code smell.

An outstanding code smell relating to function complexity still needs improvement.

https://sonarcloud.io/project/issues?resolved=false&types=CODE_SMELL&id=demmings_gsSQL&open=AYMZr7v0NhuguHqJu8bE

Ultimate goal for SimpleParser.js is to reorganize related functions into classes, rather than global functions.
The primary function of this file is sql2ast() - which I have started to simplify, still needs work.

Select from SUBQUERY that contains JOIN fails.

ERROR When the subquery contains a JOIN table clause that is used to create a derived table used in FROM clause.

select table3.invoice, table3.name from (select * from (select invoice, table1.QTY as quantity, customers.name from (select invoice, quantit
y as QTY, customer_id from booksales where quantity <= 10) as table1 join customers on table1.customer_id = customers.id where customer_id = 'C1')
as table2) as table3

Error message.

TypeError: Cannot read properties of undefined (reading 'tableInfo')

Data with single quotes AND calculated fields.

If any SELECTED fields contained data with a single quote and a calculated field was used, a string error was reported.
The field with the single quote does not need to be anything in the calculated field - it still fails.
The issue is related to how calculated fields are resolved. Javascript code is created for each calculated field, and all field data is assigned to variables and the appropriate function is created. The Function is evaluated. Anything that is a STRING is surrounded by single quotes - which fails when a single quote is embedded in the data.

Correlated Subquery not supported.

select id, title, (select count(*) from booksales where books.id = booksales.book_id) from books
Where the main or outer query relies on information obtained from the inner query.
The value of books.id from outer query needs to be available to inner query select count(*) from booksales where books.id = booksales.book_id)
This version does not support this feature, so I'm calling it an enhancement.

Where LIKE is not filtering properly.

The wildcard lookup is including rows that match in the middle of the string, but it should only find the match at the start of the string. For example:

Using data from our test sheet, the mysql SQL returns the following (which is used as gospel for accuracy)

mysql> select id, title, author_id from books where title like 'Your%';
+------+-----------------+-----------+
| id   | title           | author_id |
+------+-----------------+-----------+
| 2    | Your Trip       | 15        |
| 6    | Your Happy Life | 15        |
+------+-----------------+-----------+
2 rows in set (0.02 sec)

but

=gsSQL("select id, title, author_id from books where title like 'Your%'")

id | title | author_id
-- | -- | --
2 | Your Trip | 15
4 | Dream Your Life | 11
6 | Your Happy Life | 15

It matched Dream Your Life, and it should not have.

Where NOT LIKE fails when field is NULL.

The NOT LIKE should evaluate to FALSE when the field is NULL. It evaluates to TRUE.

WHERE customers.email NOT LIKE '%gmail.com'

If email is NULL, it should evaluate to FALSE - it however evaluates to TRUE.

select ORDER BY expression

ORDER BY only supports sorting by a column name.
The following example works:

select * from booksales order by customer_id asc

Evaluated sort expressions throw an error.
The following example fails on the ORDER BY to sort:

select * from booksales order by customer_id asc, convert(day(date), char) + convert(year(date), char) desc

Tighten UP Caching.

The long term cache is stored in the SCRIPT properties (> 21600 seconds). The problem however is that it has a very limited size and no automatic cache expiry (the script itself handles this).
The script needs to check the size of the data before putting it into long term cache, and it would be easiest to just reduce the cache timeout to < 21600 seconds in that scenario.
The tables cached in long term need to be very small, so we should just adjust automatically when a big request is made.
When there is an error updating the cache (primarily too big), the script just throws an error. This should be handled more seemlessly.
When the script gets a request to cache a large amount of data, it breaks it up into checks and then re-assembles later when requested. The size of the chunks is just a rough estimate of the number of records that would fit into a chunk sucessfully. This needs to be cleaned up. We know from Google what the limits are, so the script should store more or fewer records by checking the JSON block before storing - and then reduce if too much.

LEFT/RIGHT Join using condition to select left/right table.

In a left or right join, the condition itself is used to determine which table is left or right.
It should be taking the FROM table as LEFT and JOIN table as right.
For example, this is not working correctly. It assumes AUTHORS is the left table, but it should be BOOKS.

select * from books left join authors on authors.id = books.author_id

Aggregate Calculated field within SELECT fields with GROUP BY failing.

gsSQL() is failing in the case where there is a calculation in the select field list involving aggregate functions. For example:

select author_id, count(translators.id), min(editor_id), (min(books.editor_id)-count(translators.id)) as test from books left join translators on books.translator_id = translators.id group by  author_id

The (min(books.editor_id)-count(translators.id)) is not able to be processed and we receive an error.

Error: Invalid select field: (min(books.editor_id)-count(translators.id)) (line 2835).

Error when CONCAT_WS() receives MONTH() as param

I attempted to execute the following statement:
SELECT CONCAT_WS('/', DAY(MYTABLE.MYDATE), MONTH(MYTABLE.MYDATE), YEAR(MYTABLE.MYDATE)) AS 'MY_DATE_STRING' FROM MYTABLE
For instance, for the date 23/09/2023, I obtained the following result: 23/81/2023. This occurs because the month() function returns new Date(${parms[0]}).getMonth() + 1

It can be fixed as follows:

month(parms){
    this.referencedTableColumns.push(parms[0]);
    return `(new Date(${parms[0]}).getMonth() + 1)`;
}

Complex JOIN on Conditions not working.

Simple equality join ON conditions (including the use of AND and OR) works, however anything beyond that does not work.
So this will work:

select * from booksales join bookreturns on booksales.quantity = bookreturns.quantity and booksales.date = bookreturns.date or booksales.book_id = bookreturns.book_id

BUT this will not work...

select * from booksales inner join bookreturns on booksales.book_id = bookreturns.book_id and booksales.quantity < 10

Sql Set (union, union all,...) in subquery fails.

The gsSQL returns an error for the following statement which should work:

select id,  first_name, last_name, count(*)
from (select id, first_name, last_name from editors union all 
           select id, first_name, last_name from authors union all 
          select id, first_name, last_name from translators) as test 
group by id, first_name, last_name

It appears on first inspection that the parsing to AST (abstract syntax tree) is not outputing expected format.

max(date_column) fails.

MIN and MAX aggregate functions when used on DATE info in a column returns 0.
For example:

select email, max(order_date) as maxDate from billing group by email

Nested SELECT statements 2 Layers Deep Fails.

=gsSQL("select * from books where id in (select book_id from booksales where price > (select avg(price) from booksales))")
Internally, the SimpleParser.js which converts to AST (abstract syntax tree) fails to convert the innermost SELECT.

select fields from (subquery) fails.

For example:

    select 
        score.customer_id, score.wins, score.loss, (score.wins / (score.wins+score.loss)) as rate 
    from 
        (select 
             customer_id, sum(case when quantity < 100 then 1 else 0 end) as wins, sum(case when quantity >= 100 then 1 else 0 end) as loss 
         from booksales 
         group by customer_id) as score

Currently after FROM, it expects a table. The derived table score in this example should be acceptable, but in fact fails.

WHERE EXISTS not supported

The test for any records in a sub-query is not yet supported. For example:

   select * from customers where exists (select * from booksales where booksales.customer_id = customers.id)

select count(distinct column1), count(distinct column2) from table

select count(distinct column) from table - Not supported.
If it was supported, having two count distinct would require a major rework of how it would be accomplished.
Supporting just one count distinct column would be supportable with current framework.
Select with group by not correct: select count(customer_id) from booksales group by customer_id - FAILS
select customer_id, count(customer_id) from booksales group by customer_id having count(customer_id) > 1 - SUCCESS
select customer_id, count(*) from booksales group by customer_id having count(*) > 1 - FAILS
Select count requires a field name, and count(*) does not work.

IN GENERAL ==> count() needs some work

charIndex Function fails when data starts with digits.

If you have data in order_id column like:

123456-1

and your charindex is like:

charindex('-', order_id)

It fails.
I suspect my other functions that try to parse data that is expected to be text that may have been read as float will fail also.

SQL function parameter with quoted comma.

The following SELECT where the function contains a string and that string has a comma, is generating an error:

Select concat(invoice, ', ', price) from  booksales

generates error:

Error: Invalid select field: concat(invoice, ', ', price) (line 2663).

but the output should be like:

mysql> Select concat(invoice, ', ', price) from  booksales;
+------------------------------+
| concat(invoice, ', ', price) |
+------------------------------+
| I7200, 34.95                 |
| I7201, 29.95                 |
| I7201, 18.99                 |
| I7202, 59.99                 |
| I7203, 90                    |
| I7204, 65.49                 |
| I7204, 24.95                 |
| I7204, 19.99                 |
| I7205, 33.97                 |
| I7206, 17.99                 |
+------------------------------+
10 rows in set (0.00 sec)

Mixed Case Keywords Fails

A select like select * from table is fine, SELECT * from table is fine, but Select * from table does not work.
The SimpleParser.js is converting the keywords to search for either all UPPER case or all lower case and then examines the select statement.
Forcing the SELECT to either upper or lower case and then searching - failed - so more invegtiation is needed.

Paralell Table Load

When several tables are referenced in a SELECT, there is a noticeable loading delay.
Each table is loaded sequentially using the SpreadsheetApp.getActiveSpreadsheet().getRangeByName() command, which can be quite slow depending on the table.
There is a potential opportunity for improvement by making the requests in separate threads and then continuing once all are loaded.
It is possible that the Google API will either throttle the script OR it orders the requests sequentially - so there might not be any loading improvements.
Its worth a shot though.

Group By and Order By Fail if column not in selected columns.

select price, quantity from booksales group by invoice FAILS
select price, quantity from booksales order by invoice FAILS

Refresh result when data changes.

The gsSQL() select is re-run WHEN
- the sheet is loaded
- Data in one of the parameters to the function changes.
The problem is when the referenced "TABLE" data changes and the results of the SELECT would also change - it is not immediately updated.
One KLUDGE is to add one final parameter AFTER the bind variables that links to a checkbox. Then when you click it, the SELECT is refreshed.
- One issue however is the CACHING. If the checkbox is clicked back to back and the default caching time of 60 seconds has not elapsed - the refresh works on data loaded within the last 60 seconds.
- The default cache of 60 could be changed, or even set in the table definition parameter - but it would be nice to have ALL cache data removed in the case where the check mark is clicked.
- This may involve running a script from the checkbox to do this.....we need to look into this.

Misleading error message for invalid ORDER BY

When an invalid keyword is used to sort by ascending or descending - we get a strange error onscreen.
For example using the following on our test sheet:

select * from bookSales order by DATE DSC, customer_id asc

Gives the following convoluted error:

"Calculated Field Error: Unexpected identifier.  let INVOICE = 'I7200';let BOOKSALES = {};BOOKSALES.INVOICE = 'I7200';let BOOK_ID = '9';BOOKSALES.BOOK_ID = '9';let CUSTOMER_ID = 'C1';BOOKSALES.CUSTOMER_ID = 'C1';let QUANTITY = 10;BOOKSALES.QUANTITY = 10;let PRICE = 34.95;BOOKSALES.PRICE = 34.95;let DATE = '05/01/2022';BOOKSALES.DATE = '05/01/2022'; return DATE DSC"

Calculated field error in select statement

hi,

first of all, thank you very much for the amazing library. it saved my life. I cannot find a word to express my feelings.

I have a query as below that works fine in gsSQL

"Select d.account_name, d.account_id,count(d.device_id) as devicetotal,a.account_name,a.account_id,min(a.number_of_active_devices) as accounttotal
from d full join a on d.account_id=a.account_id group by d.account_name, d.account_id,a.account_name,a.account_id"

when I added a calculated field to the statement as follows I got an error :

"
Select d.account_name, d.account_id,count(d.device_id) as devicetotal,a.account_name,a.account_id,min(a.number_of_active_devices) as accounttotal, (devicetotal-accounttotal) as sss
from d full join a on d.account_id=a.account_id group by d.account_name, d.account_id,a.account_name,a.account_id
"

the error I got is: Invalid select field: (devicetotal-accounttotal) (line 2835)

I checked your examples and I see no difference. What is the problem? I Appreciate if you help me in this manner.

Thank you again,

Tolga

dateDiff() works in morning, not in afternoon.

In order for my tests to always work, I added this as a test for dateDiff() so that it would always return '7'.
There is something wrong with the ROUNDING. I call FLOOR, but it appears to be rounding up. Anyway, not sure at this point.

The only weird thing is that the tests always seems to work when run in Google Sheets. It just appears to only fail in NODE.JS. Anyway, it causes my tests to sometimes fail.

datediff(adddate(curDate(),7), now())

Bind Variables in Correlated Sub-Query not working.

If a bind variable ('?') is inserted into the correlated sub-query, it will fail. For example:

select * from booksales as b1 where price in (select max(price) from booksales where b1.customer_id = customer_id and book_id <> ?)

The way that correlated sub-queries were solved was by inserting a bind variable for every outer table field reference in the inner query. This needs to be resolved.

where NOT LIKE excluding some it shouldn't.

A previous where LIKE issue, this was missed for the NOT LIKE.
The book title with 'Your' in the middle should not be excluded, but it is.
ex: Dream Your Life
The current example should only exclude those records STARTING with 'Your'.

=gsSQL("select * from books where title not like 'Your%'")

Field ALIAS used in Calculations.

Currently a field alias is ONLY used as a column title output. For example Concatenated is only available for column title output.

`select concat_ws('-', *) as Concatenated from booksales `

It would be more useful if the field alias were also available for comparisons. This is especially true if the field itself is a calculation and it is quite long, such as Sales in the following example:

`select booksales.invoice as 'Invoice', booksales.quantity as 'Quantity', booksales.price as 'Price', booksales.quantity * booksales.price as 'Sales' from booksales` ORDER BY Sales

If we need to make a selection of records based on sales, we can't just use something like where Sales > 500, we would have to do where booksales.quantity * booksales.price > 500
NOTE - after trying with mySQL, the alias for a calculation cannot be used in WHERE condition, but it can be used in ORDER BY.
ORDER BY alias DOES NOT WORK in gsSQL

Wiki Document

The wiki needs work.

detailed documentation about all use cases.
more real world test cases and output. It is easier to see it used, rather than describing how to use it.

Sonarcloud and REGEX use.

Getting a warning about a regex used to extract ORDER BY
Not currently causing any problems, other than a warning.

Make sure the regex used here, which is vulnerable to super-linear runtime due to backtracking, cannot lead to denial of service.

Line of code that is at issue

const order_by = /^(.+?)(\s+ASC|DESC)?$/gi;

Not entirely sure how it can be replaced.
Maybe this? (have to check in code to see if it resolves)

^(.*?)(\s+ASC|DESC)?$

Multiple sets (UNION, UNION ALL, ...) are not resolved in correct order.

The following select statment

   select * from authors UNION ALL select * from editors UNION select * from translators

is being resolved from right to left. The UNION at the right should remove all duplicates, but because EDITORS and TRANSLATORS table is resolved first and then UNION ALL with authors is done next, it leaves duplicates in the result.

GROUP BY with Calculated Field Fails.

Grouping by a calculated field does not work.

select month(date) from booksales group by month(date)

Returns error:

TypeError: Cannot read properties of undefined (reading 'selectColumn') (line 3938).

gsSqlTest Sheet. Compare actual sheet results to expected results.

I have already added the custom menu option to convert all of my tests in the test script, to actual gsSQL() functions on the sheet.
Now the expected results ALSO needs to be included in the generation of the sheet. Probably just to the right of the custom function results.
An automatic comparison of ACTUAL to EXPECTED should be performed.
A summary of results should be displayed in a table for quick analysis to ensure all test SELECT statements work as expected.

COUNT(field) when NULL is incorrect.

When you LEFT JOIN tables and NULLs are returned for missing data, the COUNT() function is incrementing the counter for a NULL in that field (it should not). In my data, the books tables has empty translator_id fields for some books - yet they are counted.

For example:

select author_id, count(translators.id) from books left join translators on books.translator_id = translators.id group by  author_id", "translators

Select * (wildcard) fields incorrect on JOIN.

The select all fields * wildcard is not returning all the data from the JOIN.
Only the fields from the primary (first) table are returned in that case.
There is no problem to return data from the joined table as long as it is referenced by the field name in the select.
For example:

SELECT * FROM books INNER JOIN authors ON books.author_id = authors.id ORDER BY books.id

This SELECT only returns the data from BOOKS table.

JOIN ON is case sensitive.

gsSQL is case sensitive when doing join comparison. The following statement works in mySQL, but it returns nothing in gsSQL()

select customers.id, min(bookreturns.price) from bookreturns join customers on bookreturns.customer_id = customers.id group by customers.id

The fix (for now) is to force to a specific case like:

select customers.id, min(bookreturns.price) from bookreturns join customers on upper(bookreturns.customer_id) = customers.id group by customers.id