Comments (18)
This is great! It should be fairly straight forward to get in, so I'll look at it right after I'm done with benchmarks ;)
from postgres.
First, really nice interfaces on the library all around... I came to mostly say the same regarding async streams... If you do open an object stream interface, that would allow
for await of
[1] syntax with current node releases, which I find very intuitive to use. Maybe aasReadStream()
method?I also feel awaiting any promise returned from the per-row function would be very useful as mentioned in the issue, which is imho the only real shortcoming in this library, as it's something I've come across syncing data to DynamoDB and similar (grouping records), though slightly more complex in terms of sizing.
[1] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/for-await...of
Thanks a lot @tracker1 .. Did you check out the branch with the first stab at an implementation of cursors? https://github.com/porsager/postgres/tree/cursor#cursor-sql-cursorrows--1-fn---promise
I suppose sql` `.cursor()
without a callback function could return an async iterable..
So usage would look like this:
const cursor = sql`select * from generate_series(1, 1000)`.cursor()
for await (row of cursor) {
statement
}
Is it something like that you imagined?
from postgres.
Thanks a lot for the kind words, and for the help thinking this through. It has been a really nice project to do so far.
I'm leaning towards adding support for async iterables too now, but I realize it isn't identical to the current implementation in that the query might not run immediately, but it has to wait until iteration starts. That's currently implicit with the callback implementation. A hack would be to run the first query and keep the rows in memory until iteration starts, but I don't want to do that since it's basically wrong :P
It also has to return an end method on the iterator to support ending early, but that should be fine.
I think, since adding async iterator support later won't be a breaking change, I might just split this up into two releases.
My biggest complaints about some packages relate to excessive unnecessary and bloated dependencies consuming memory and depleting performance as well as convoluted interfaces that needlessly require a lot more code to accomplish an otherwise simple task. I see none of that here.
Yeah, that was also one of the reasons I got started with this in the first place, so it's really nice to hear you see it that way! I'm almost done with the benchmarks too, so it's gonna be interesting to see everything side by side.. Currently I've got pg
, pg-native
, pg-promise
, pg-promise-native
and slonik
. Do you know of others that would be interesting to compare?
I know pg-promise and slonik are using pg
under the hood currently, but it was interesting to find that pg-promise
is faster than raw pg
and slonik
is like 50x - 100x slower than mine ๐คจ
from postgres.
You could also look at using streams in object mode which afford some cool tricks like piping it through a stream transform and I believe the node stream object naturally allows for "for await" loops. A package for mssql can do this however it doesn't respect high water marks set by users so even if you pause the incoming stream the stream buffer will just fill up. I've had it consume gigabytes of RAM during some testing with hundreds of thousands of rows when I specified a maximum of 200.
However, if this looks like a fast/easy fix then i'd deem it a much higher priority.
from postgres.
I started my reply, and realized we're off topic, so I thought it was better in it's own issue which could double as improvement for the documentation. Let's continue here: #4 ๐
from postgres.
I've added cursor support with the callback-style 4156c3f .
I'll work on adding for await of
support later. The reason I'm not adding it now is because it requires som changes to the inner workings because it's lazy by default, and I'd like to do it properly, so some more thinking around it is needed.
from postgres.
I know how an object stream would work, and there's currently no upside to using that over simply implementing and returning { [Symbol.asyncIterator]: ... }
.
The reason I'm not doing it right away is because I would need to not send the query immediately because I would have to wait for the first call to next
. I don't want to hack it in now because that would mean fetching the first chunk unnecessarily before calling next, and worse it would reserve a connection to the db unnecessarily. To make it work it requires a change to the current query flow which I'd like to do properly :)
from postgres.
Hi @Blitzsturm ..
I'm not sure I see the benefit of using streams in object mode and for await
, but that might come down to a taste/coding preference thing.
I've added support for this by adding a new .cursor()
method. I think that is more in line with postgres terminology too, and it won't change / complicate the .stream()
surface api.
Would you like to check it out and let me know what you think? You can do npm install porsager/postgres#cursor
.
Docs are here: https://github.com/porsager/postgres/tree/cursor#cursor-sql-cursorrows--1-fn---promise
from postgres.
First, really nice interfaces on the library all around... I came to mostly say the same regarding async streams... If you do open an object stream interface, that would allow for await of
[1] syntax with current node releases, which I find very intuitive to use. Maybe a asReadStream()
method?
I also feel awaiting any promise returned from the per-row function would be very useful as mentioned in the issue, which is imho the only real shortcoming in this library, as it's something I've come across syncing data to DynamoDB and similar (grouping records), though slightly more complex in terms of sizing.
[1] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/for-await...of
from postgres.
Brilliant, it works as advertised. Though I do have one last suggestion. The below test demonstrates that the cursor flow does throttle the flow and prevents asynchronous row handling from stepping on each-other however there is a pretty significant hit to performance between the two. This would make sense if it's reading one row at a time. To counter this you may want to look at buffering some rows; perhaps the initial request retrieves 1000 rows then when on each await, if that buffer is below 900, get 100 more until the query is out of rows then let the async handler drain them until the buffer is empty then resolve the overall query-cursor promise. That would likely give it a significant performance jump.
Ideally the semantics of how postgres handles the connection and cursor mechanics should be invisible to the end user so doing something like this would make it look and feel the same but much faster; and while it does indeed work for it's intended purpose, poor streaming performance could dissuade power users. pg-cursor allows more control over cursor behavior but can be somewhat cumbersome in code. With that written I could easily write a wrapper for the "for await" behavior if I really want it or maybe add it in later if there is demand.
As previously mentioned Implementing a well tuned object stream would manage this buffering process for you and allow users to "for await" loop to asynchronously iterate directly from it even allowing power users to adjust buffering behavior by manually adjusting watermarks.
Overall, it's looking great. Some fine tuning for performance and I'd call it ready for production use.
Edit: also; the documentation is missing the "async" in ".cursor((row, cancel)" to ".cursor(async (row, cancel)"
"use strict";
const postgres = require("postgres");
Main().catch(console.error);
async function Main()
{
var streamResults = await StreamTest();
console.log("waiting 10 seconds between connections in tests...");
await new Promise(r => setTimeout(r, 10000));
var cursorResults = await CursorTest();
console.log(`Stream Order (${streamResults.order.length}): ${streamResults.order.join(",")}`);
console.log(`Cursor Order (${cursorResults.order.length}): ${cursorResults.order.join(",")}`);
console.log(`Stream Speed: ${streamResults.speed} ms`);
console.log(`Cursor Speed: ${cursorResults.speed} ms`);
}
async function StreamTest()
{
const sql = postgres(process.env.DATABASE_URL);
console.log("================================\r\nStream Ordering");
var streamOrder = [];
await sql`SELECT * FROM generate_series(1,20)`.stream(async (row) =>
{
await new Promise(r => setTimeout(r, Math.floor(Math.random()*1000)));
streamOrder.push(row.generate_series);
console.log(`Stream Row: ${row.generate_series}`);
});
await new Promise(r => setTimeout(r, 1000)); // (Ensure all have completed)
console.log("================================\r\nStream Speed");
var streamStart = new Date().getTime();
await sql`SELECT * FROM generate_series(1,1000)`.stream(async (row) =>
{
console.log(`Stream Row: ${row.generate_series}`);
});
var streamEnd = new Date().getTime();
await sql.end();
return {order: streamOrder, speed: streamEnd - streamStart};
}
async function CursorTest()
{
const sql = postgres(process.env.DATABASE_URL);
console.log("================================\r\nCursor Ordering");
var cursorOrder = [];
await sql`SELECT * FROM generate_series(1,20)`.cursor(async (row, cancel) =>
{
await new Promise(r => setTimeout(r, Math.floor(Math.random()*1000)));
cursorOrder.push(row.generate_series);
console.log(`Cursor Row: ${row.generate_series}`);
});
await new Promise(r => setTimeout(r, 1000)); // (Ensure all have completed)
console.log("================================\r\nCursor Speed");
var cursorStart = new Date().getTime();
await sql`SELECT * FROM generate_series(1,1000)`.cursor(async (row, cancel) =>
{
console.log(`Cursor Row: ${row.generate_series}`);
});
var cursorEnd = new Date().getTime();
await sql.end();
return {order: cursorOrder, speed: cursorEnd - cursorStart};
}
from postgres.
@Blitzsturm sorry afk until later, but did you see the option of returning more than one row per iteration?
What if there was also an option to control next amount of rows by returning a special token with rows value from callback? Eg.
sql` `.cursor(() => sql.ROWS(75))
Also see my iterable suggestion above.
from postgres.
@porsager I apologize, I somehow completely missed that. Using that makes it MUCH faster; though slightly less clean in use. Ideally I'd like it if it would still iterate only one row at a time but pull blocks of a specified amount. Something like the following under the hood. However it's current state is still cleaner than pg-cursor. So, well done.
await sql`SELECT * FROM generate_series(1,1000)`.cursor(100, async (rows, cancel) =>
{
for (let row of rows)
{
console.log(`Cursor Row: ${row.generate_series}`);
}
});
Alternatively I suppose if you need to send blocks of x rows to an external service this is indeed more ideal. My original use case would be better off as it wouldn't have to group rows it's self.
await sql`SELECT * FROM generate_series(1,1000)`.cursor(100, async (rows, cancel) =>
{
await transformAndSend(rows);
});
Perhaps this could be handled as mentioned to @tracker1, I can't imagine a cleaner faster way to do it in syntax.
for await (let row of sql`SELECT * FROM generate_series(1,1000)`.cursor(100))
{
console.log(`Cursor Row: ${row.generate_series}`);
}
from postgres.
Ah good to hear :)
I'm not sure the expanded api surface of opening up for singular iteration but fetching large blocks is worth it compared to simply handling it in userland. It's not that big of an addition for the user, and it is clearer to see what's actually going on. There are also concerns like cancellation that might be non obvious (eg. if you cancel half way through a group of rows (iterated behind the scenes per row) it's not obvious the last half was still fetched from the db).
I'm still thinking the async iterable could be ok, but I don't see what the benefit is over the callback.
await sql`
SELECT * FROM generate_series(1,1000)
`.cursor(async row => {
// await do something with row
})
vs
for await (let row of sql`
SELECT * FROM generate_series(1,1000)
`.cursor()) {
// await do something with row
}
from postgres.
From a use perspective, not a ton of difference other than it's "the new hotness" in JavaScript and considered a standard means to asynchronously iterate through loops. Worst case scenario 5 years from now someone looks at it and says "why'd they do it that way? I mean it works, but it's non-standard" in the same way many legacy libraries don't make use of async/await.
It's really a matter of individual preference; but if it's easy to build in I don't see a reason to not sprinkle in some syntax sugar. If it was somehow monstrously difficult I think it can be skipped.
Ultimately in it's current state with cursor support, it's and outstanding package that I prefer over pg+pg-cursor and should satisfy any major need I can foresee and I can't complain about anything outside of minor nitpicks. But, if you find the time to throw in an extra feature I'd be slightly happier.
Take that for what it's worth; but overall I think you've done a great job. I'll likely be using it and will let you know if I encounter any bugs or performance issues in the future.
My biggest complaints about some packages relate to excessive unnecessary and bloated dependencies consuming memory and depleting performance as well as convoluted interfaces that needlessly require a lot more code to accomplish an otherwise simple task. I see none of that here.
from postgres.
The only other popular database interface I can think of is knex; which just sits on top of pg; but it does have some nice quality of life features. On that topic, I'd love to see the inclusion of "insert" and "upsert" methods that handle common CRUD operations with ease.
I'd imagine something like this:
var rows =
[
{name: "John Doe", email: "[email protected]", age: 46},
{name: "Jane Doe", email: "[email protected]", age: 44},
{name: "Herp Derp", email: "[email protected]", age: 23}
];
var res = await sql.insert("users", ["name", "email", "age"], rows);
/*
INSERT INTO users
(name, email, age)
VALUES
('John Doe', '[email protected]', 46),
('Jane Doe', '[email protected]', 44),
('Herp Derp', '[email protected]', 23)
ON CONFLICT DO NOTHING;
*/
var res = await sql.upsert("users", ["name", "email", "age"], ["email"], rows);
/*
INSERT INTO users
(name, email, age)
VALUES
('John Doe', '[email protected]', 46),
('Jane Doe', '[email protected]', 44),
('Herp Derp', '[email protected]', 23)
ON CONFLICT (email) DO UPDATE SET
name = EXCLUDED.name,
email = EXCLUDED.email,
age = EXCLUDED.age
*/
In addition it would be nice to make column names optional, that when omitted retrieve column names from the data provided. Either by just checking the first row or something more akin to:
function UniqueKeys(rows)
{
var K = {};
rows.forEach(r=>Object.keys(r).forEach(c=>K[c]=null));
return Object.keys(K);
// this could probably be done better with a set or indexOf()
}
(Where rows with unset heads are set to null.)
This would allow for extremely rapid/easy data updates. Imagine a use case where you have a table of user addresses and you want to update them to store latitude and longitude using Google Maps's geocoding API. The code to do so would look something like this:
await sql`
SELECT
id,
address,
lat,
lng
FROM
users
WHERE
address IS NOT NULL
AND lat IS NULL
AND lng IS NULL
`.cursor(100, async (rows, cancel) =>
{
for (let row of rows)
{
var geo = await googleMapsGeoCode(row.address);
row.lat = geo.lat;
row.lng - geo.lng;
}
var upsRes = await sql.upsert("users", ["id"], rows);
if (!upsRes.success) throw new Error(upsRes.errorMessage);
// do thrown exceptions automatically call cancel?
});
insert/updates are especially common yet cumbersome, so something like this baked in would be super nice. I was planning on doing something similar in my own code, it makes sense to be able to put data back in, in the same way you get it out.
from postgres.
Did you check out the helpers section of the docs? ๐ https://github.com/porsager/postgres
Your insert example would look like this ;)
await sql`insert into users ${ sql(rows, 'name', 'email, 'age') }`
It's a bit harder for the update case. I wouldn't go all the way and do it as short as yours as I feel that is going too far. Currently I don't feel I want to include something that "builds" sql with special keywords the user doesn't know about. (Eg. in this case I think it's important users understand the implications of on conflict
)
Maybe there would be a way to do something like this (note doesn't work currently)
await sql`
insert into users ${ sql(rows, 'name', 'email', 'age') }
on conflict(email) do update set
${ sql({ excluded: ['name', 'email', 'age'] }) }
`
from postgres.
The helpers go a long way but still require writing out some extra code manually to get what you want. What I was building was something that works a but like knex but allows for some more niche behavior. Virtually every database flavor has a different way to "upsert" which I often have need of. So, I was working on a means to make it as easy as providing json rows to a handler.
I suppose if I want to be clever about it I could just make a function that effectively does this
await sql`
insert into users ${sql.apply(sql, [rows].concat(Object.keys(rows[0]))}
on conflict(email) do update set
${Object.keys(rows[0]).map(injectionProtection).map(o=>`${o} = EXCLUDED.${o}`).join(', ') }
// Not so sure this last bit would work...
`;
Maybe have some form of "on conflict" helper based on field names?
query text ${sql.conflictIgnore}
query text ${sql.conflictUpdate}
Is there a way to quickly insert programmatically generated components such as dynamic field name selections into a query or would that just be using "unsafe"?
from postgres.
Thanks! this change makes reading (very) large data sets much more scalable. An object stream would auto-handle buffer data to optimize speed while still working in "for await" syntax. Also I can provide an example of how to implement it in something of a wrapper if that would help.
For now I'll see if I can use it in some personal/work projects and raise a ticket if anything goes wrong.
from postgres.
Related Issues (20)
- Connect-Timeout Behavior - Include Waiting for Connection Pool HOT 1
- convert a specific column when reading or writing to the database HOT 2
- nested transactions HOT 1
- arbitrary stuck queries HOT 6
- Next.js: Error: A "use server" file can only export async functions. Found "sql" that is not an async function. HOT 2
- Certain transaction errors are swallowed HOT 11
- Invalid URL on README (TypeScript + Cypress) HOT 4
- Setting max: undefined results in only 1 connection being created
- ERROR: could not load library HOT 2
- Inserting raster data?
- Unable to insert values of type `bigint[]`
- Error while inserting a JSON array (if object contains the key "type")
- Not available on jsr.io HOT 1
- Unable to connect to postgres DB from Keycloak which is FIPS enabled HOT 1
- Parameter type inference not working for VALUES tables? HOT 2
- Migration script hangs after upgrading to [email protected] HOT 1
- How to concatinate multiple partial statements HOT 2
- how to run a subquery with a variable or call function to generate HOT 3
- Migrating from "brianc/node-postgres" to "porsager/postgres" HOT 1
- dynamic column selection with column naming alias is not working HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from postgres.