joist-orm / joist-orm Goto Github PK
View Code? Open in Web Editor NEWa TypeScript ORM for Postgres
Home Page: https://joist-orm.io/
a TypeScript ORM for Postgres
Home Page: https://joist-orm.io/
In the docs for EntityManager#findOrCreate
the documented parameters are not in the same order as the method parameters. upset
and ifNew
are reversed.
As seen in the docs:
class Author extends AuthorCodegen {
constructor(em: EntityManager, opts: AuthorOpts) {
super(em, opts);
}
}
authorConfig.beforeFlush(async () => ...);
authorConfig.afterCommit(async () => ...);
At first glance, this feels confusing. As a developer, I'd assume the hooks should be placed on the model/entity. Something like this could make more sense:
class Author extends AuthorCodegen {
static beforeFlush = [
async () => ...,
async () => ...,
]
}
Secondly, coming from the rails hook hell, it's pretty easy to have hooks being placed in different parts of the app, causing hook hell and losing track of where changes are being made and what hooks are being made.
Why not just use the defacto standard connection URI string. E.g. DATABASE_URI?
Stuffing JSON in an environment string is difficult in deployments that get the connection information from the environment.
Support things like:
// Currently what we support
await expect(author).toMatchEntity({
firstName: 2,
books: [{ title: "b1" }],
});
// Currently our version only supports `author: { id: ... }`, we need
// know "do you expect to drill any further into this entity?"
await expect(book).toMatchEntity({
author: author
});
// Add support for lists of entities
await expect(author.books.get).toMatchEntities([
{ title: "b1" },
{ title: "b1" },
]);
// Add support for unloaded collections?
await expect(author.books).toMatchEntities([
{ title: "b1" },
{ title: "b1" },
]);
When a m2o changes, it'd be nice to be entity.changes.originalParent
and have it handle "is this a new/loaded entity", "is this just an id"?
It would probably need to be async.
In tests, it would be a large DX win to be able to traverse the object graph w/all relations already loaded.
I.e. we could have factories return DeepNew<...>
versions of entities that just assumed all relations are always loaded.
This would require either:
the entire DB would already be loaded into the EntityManager
(which in tests, the data set is usually small enough for this to be doable), or
the DeepNew<...>
would be just a heuristic that is "usually true", but might still fail at runtime if the test/code-under-test did not actually trigger relation to be loaded. For tests that use a single EntityManager
, this heuristic would likely always be true. For tests that use multiple EntityManager
s, i.e. resolver tests, it would probably be "usually true", but not 100%.
Just talking out loud, we could envision a EntityManager.loadEntireDatabase
that just pulled the entire db into the memory. A naive implementation, that just did one SELECT
per table + one SELECT
per o2m/m2m/etc. relation, seems like it'd be incredibly inefficient and not doable.
A fancy implementation idea would be to leverage the flush_database
trick and use a select_all_tables
stored procedure that returned a SELECT *
across all tables as a UNION
. This would give us a single wire call + result set with "all the things", which would be a huge mess, but we could have a dedicated/internal method that knew how to decipher the mega-result set into a) individual entities and b) to force/preload all of the o2m/m2m/o2o collections appropriately.
As a full knex replacement:
https://www.jakso.me/blog/kysely-a-type-safe-sql-query-builder-for-typescript
When auto-batching ~100+ em.find
calls, the behavior gets terrible. Is this inherent or because of a bad implementation detail?
Currently a first_name
column that is not null
in the db is represented as Author.firstName: string
i.e. non-optional / non-null / non-undefined
in the domain model.
This is the most convenient (and accurate) for already-persisted / already-validated entities, as it simplifies business logic that assumes the field is required, as well as mapping to APIs i.e. GraphQL APIs with firstName: String!
where firstName
is exposed to the outside world as "a required string".
(If Author.firstName
was modeled as string | undefined
, then the API layer exposing firstName
would have to constantly do a if !firstName fail()
check, even though in the common/read case, it will always be set.)
Joist generally enforces field required-ness on creation of the entity, i.e. em.create(Author, { firstName: ... })
requires a string
(at least at the type system level).
However, while creating / updating the domain model through methods like Author.setPartial
or em.createOrUpdatePartial
, the firstName
is allowed to temporarily become undefined
, to trigger validation rules about the invalidity of the firstName
.
This means that the types can transiently be incorrect, i.e. Author.firstName
might return undefined
.
A few options to avoid this seem like:
Prevent Author.firstName
from ever becoming undefined
, by rejecting the setPartial
or createOrUpdatePartial
with a validation error (i.e. the parse don't validate approach).
This is probably the ideal, but has some wrinkles:
a) What about bulk updates, i.e. what is the API for bulk updating 20 Author
(seems solvable)
b) Would validation rules become split between "pre-undefined
/ pre-mutation checks" (indicated via a ValidationError
before entities are even mutated with the proposed new values) as well as "post-undefined
/ post-mutation checks" (i.e. a similar ValidationError
but after entities have been mutated (with null-safe value) but then go on to fail other "more than just required" validation rules)?
I.e. more complex validation rules are cross-entity and cross-field, and need a way to access both the incoming/new values as well as existing/current values (unless those are changing as well).
These sort of rules seem best expressed in the "not parsing" approach, i.e. by just assuming the entities exist as they are / have been mutated, and then reaching into them to validate the current state.
And if we conclude that "post-mutation" validation rules are generally necessary/good, then it seems simplest to have all validation rules work that way, instead of creating a pre-mutation/post-mutation split.
A more tactical approach is to just return empty states from getters if/while the entity happens to be invalid
I.e.:
const a = em.create(Author, { firstName: "bob" });
a.setPartial({ firstName: null });
console.log(a.firstName); // prints empty string
book.author.get
(which is required) return a default value while it's transiently unset?Do nothing and keep the "most of the time"-ness aspect of Author.firstName
To a certain extent, in information management systems (i.e. not missile guidance systems), a tongue-in-cheek observation is "is any field ever really required"? I.e. while going through data entry/validation/etc. it is common/normal for a field to a) not be set yet, and b) this is still a invalid state, i.e. don't allow it, but it temporarily exists. Maybe that's fine.
This could also be a pragmatic-ism of JavaScript/TypeScript, in that yes "it's bad" that Author.firstName
is undefined
when the type system thinks it should be string
; but nothing is going to seg fault. At best, a validation error will happen and the transaction will be rolled back; at worse an NPE will happen.
Such that Author.firstName: string
becomes not a guarantee, but an assertion of intent (except for persisted data, where it remains an invariant, this issue is solely about transient mutation of domain objects).
Pass hooks a partialized version of entities
Most of the consumers of "wait, Author.firstName
is undefined" would likely be internal validation rules and hooks that are observing mid-mutation entities.
Given that we pass hooks their entity, we could use a mapped type to partialize all of the fields, so that instead of config.beforeFlush(author => ...)
the author.firstName
would actually return undefined
.
book.author.get
(we've had actual bugs from this)With a constraint that looks like this:
alter table projects
add constraint project_has_client_as_collaborator
FOREIGN KEY (id, client_id)
REFERENCES project_collaborators (project_id, homeowner_id)
DEFERRABLE INITIALLY DEFERRED;
Results in the generation of incorrect interface/class fields:
export interface ProjectOpts {
...
id: ProjectCollaborator;
}
export interface ProjectFilter {
id?: ValueFilter<ProjectId, never>;
...
id?: EntityFilter<ProjectCollaborator, ProjectCollaboratorId, FilterOf<ProjectCollaborator>, never>;
}
export interface ProjectOrder {
id?: OrderBy;
...
id?: ProjectCollaboratorOrder;
}
export class ProjectCodegen {
...
readonly id: Reference<Project, ProjectCollaborator, never> = new ManyToOneReference<
Project,
ProjectCollaborator,
never
>(this as any, ProjectCollaborator, "id", "projects", true);
...
}
Suggested solution is to filter out foreign keys on multiple columns.
deleted_at: null
to any em.find
s that don't otherwise include itInstead of:
(await ptm.load((ptm) => ptm.project.stages.projectItems))
.filter((pi) => pi.isSelection)
.map((pi) => pi.collaboration.load()),
);
Be able to express this as:
await ptm.load((ptm) => ptm.project.stages.projectItems(pi => pi.isSelection).collaboration);
Or maybe even find
-style clauses that could push down into a WHERE
clause:
await ptm.load((ptm) => ptm.project.stages.projectItems({ homeownerSelection: null }).collaboration);
Unique constraint violations are often a result of user input and thus require providing user-friendly errors on how to resolve. A nice quality of life improvement would be for Joist to catch unique_violation
Postgres errors and re-raise them as ValidationErrors
(or equivalent) with a generated user-friendly message. This would reduce duplicate logic by the caller to either pre-query for a value matching the constraint or attempting to parse the error string.
A message using the column name(s) within the constraint (pulled at codgen time) such as {columnName} has already been taken
could be generated by default, with an optional config override:
config.uniqueConstraint("activities_unique_unit_id_sub_milestone_id", {
message: "This Submilestone has already been added to the parent Milestone.",
});
Follow up to #258
This is kind of an implementation detail, but currently our Loaded
type looks like:
export type Loaded<T extends Entity, H extends LoadHint<T>> = T &
{
[K in keyof T]: H extends NestedLoadHint<T>
? LoadedIfInNestedHint<T, K, H>
: H extends ReadonlyArray<infer U>
? LoadedIfInKeyHint<T, K, U>
: LoadedIfInKeyHint<T, K, H>;
};
Note the T &
at the beginning.
This is to work around the fact that, for classes with private fields, a mapped type can never be treated as equivalent to the class:
A potentially work around for this is using a Public<T>
type:
class FooImpl { ... }
export type Foo = Public<FooImpl>
Such that everyone using Foo
thinks they're using your class, but they're really using a mapped type, which has stripped out the private fields, and makes it so that we can now have mapped types of Foo
that can substitute/polymorphically satisfy Foo
itself.
Granted, the FooImpl
is kind of ugly, but given that Joist already controls imports/exports in a very deliberate way (to avoid circular reference issues), we could potentially do something like:
// Author.ts, use the normal Author name
export class Author extends AuthorCodegen { ... }
// entities.ts
import { Author as AuthorImpl } from "./Author"
export type Author = Public<AuthorImpl>
And because we already have the restriction that all imports of Author
must go through import { Author } from ./src/entities
, the rest of the codebase would get the mapped type version for free.
Note that, for our current Loaded
this indirection/de-class
-ification is not necessary, but I'm thinking ahead to ReactiveHint
s where I think we want to be able to actively remove attributes from the type, and so we would not be able to use the T &
approach.
I think where this really becomes a problem is code like:
author.books.add(b);
Which is currently typed as add(Book)
. But if you've got a mapped / loaded / reactive version of the Book
, that's not the same.
Maybe the relations should be typed with add(BookLike)
which, besides being a mapped typed, would be so minimal that it was "just the book entity", i.e. the core id: BookId
.
I.e.
const author = await em.load(Author, "a:1", "...8:00am...");
// Maybe too cute
const author = await em.load(Author, "a:[email protected]:00am...");
I.e. being able to do:
const count = await task.children.count();
Across N tasks in parallel and have it still be 1 SQL query.
Bonus points if this handles WIP changes, i.e. the SQL query says "20 children", but we have local changes that will add 2 new children, and remove 1 existing child. Doing this would require not a COUNT(*)
but bringing back a list of per-entity children ids, so that we could do adds/removes on the list of ids that are currently in the database.
I know I am picking nits here.
But I am still wondering. What is the underlying technical limitation for that?
โ
works: project: { id: args.input.projectId }
Results in a proper join clauses and where project.id IN (?)
hits this case
โ does not work: project: { id: { in: args.input.projectId } }
Results in an incorrect where clause: where project.id = ?
hits this case
The second case should be updated to hit the same case as the working example.
Take example:
const em = newEntityManager();
const company = await em.create(Company, { name: 'company 1' });
const person = await em.create(Person, {
firstName: 'asdsd',
email: '[email protected]',
company: company, // <---- what does this do?
});
I noticed that we cannot set companyId
on create of the Product
entity. Is that intended?
setupTests
?
setupTestEnv.js
, set process.env.STAGE=local
and require dotenv-stages
run.sh
?
STAGE=local
and pass -r dotenv-stages
docker-compose
needs to load the env, done in docker-compose.yml
?
STAGE=local+docker
and NODE_OPTIONS=-r dotnev-stages
STAGE=local+docker,test
NODE_ENV=test
?This is an old RDS-ism that we don't need anymore. We could probably use something from node-postgres
to do DB_...
handling for us.
Current Joist supports m2m between entities, but not a m2m between an entity and an enum.
Note that we do support enum array columns, which can often present the same model w/o the join table, but it would be nice to support both for good measure.
Enum array columns look like:
b.addColumn("development_commitments", { cost_types: enumArrayColumn("cost_type") });
Would be great to add to docs what queries are supported. Examples:
I.e. support for NoSQL and support for esoteric/legacy schemas.
Currently a populate hint like:
const a1 = await em.find(Author, { id: "a:1" }, { books: "bookReviews" } )
Issues three SQL queries:
SELECT * FROM author WHERE id = 'a:1'
SELECT * FROM books WHERE author_id IN (1);
SELECT * FROM book_reviews WHERE book_id in (2,3,4,5);
Most ORMs that implement preloading will issue a single SQL call:
SELECT a.*, b.*, br.*
FROM author a
JOIN books b ON b.author_id = a.id
JOIN book_reviews br ON br.book_id = b.id
WHERE a.id = 'a:1';
Joist does the 1st for three reasons:
Simplicity of implementation. The current 1-query-per-level queries leverage the same auto-batching infra that we get from using dataloader for "avoiding N+1s in a for
loop", so we essentially got hint-based preloading for free.
1-per-level queries integrate nicely with custom references/collections/async properties
Currently preloading is very generic and just calls await ...load()
on whatever relation field is being preloaded. This makes it very easy for our other not-strictly-db-relations modeling to integrate with the prepopulate model in a very natural way.
If we were to implement JOIN
-based prepopulation, we'd have to separate the pre-load hint into JOIN
-able hints vs. load
-able hints, and if a preload hint has both, then do an initial JOIN
for the JOIN
-able data, and then do a 2nd pass to call .load()
on the non-JOIN
able properties
Similar to point 1, with the disclaimer our current app is low-volume, the "1 query per level" has been a sufficient solution to the N+1 problem so far.
Currently joist-graphql
s integration, i.e. entityResolver
, does not bother to do GraphQL look aheads.
E.g. when doing query { author { books { bookReviews } } }
, the code that loads the author
or the books
does not bother trying to figure out that bookReviews
is also going to be loaded.
This is because of three reasons:
GraphQL look aheads are just generally complicated to implement and we haven't needed it yet, and
Due to not using JOIN
s for preloading (see #238), it doesn't actually matter that entityResolver
doesn't do look aheads, b/c they'd result in the same non-JOIN
.load()
calls that we do today anyway, and
Due to Joist's / entityResolver
's purposeful "non 1-to-1 mapping between DB schema and GQL API", any sub-query that has filters is not something that we can automatically turn into SQL.
E.g. for a query like query { author { books(filter: { title: "a" }) { bookReviews } } }
, we purposefully let the books.filter
be pretty free-form (to match the exact needs of the UI that uses it), and so cannot auto-WHERE
-ize it
Assuming that #238 is implemented, this 2nd rationale would change, and it'd be worth investigating pulling in GraphQL look aheads.
Granted, the 3rd rationale is unlikely to change, so look aheads would probably only work for non-argument / non-filtered relations.
When an entity is first created, include a config.placeholder()
line like:
import { someEntityConfig as config } from "./entities";
export class SomeEntity { ... }
config.placeholder();
Where placeholder
is a noop, but it keeps the config
import in place so that when the developer is ready to add custom rules/hooks, they don't have to type out / know about the someEntityConfig as config
convention.
If the DB doesn't start up right away, you'll get a failure:
Creating integration-tests_db_1 ... done
docker-compose exec db ./reset.sh
psql: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
make: *** [db] Error 2
Using jwilder/dockerize
would allow us to hold until the DB's up and ready.
A large collection is a m2m or o2m that is too large to actually load as entities, so if publisher.authors
was a LargeCollection
it would not have a .load()
method and would not be a validation populate hint or reactive hint.
In the past I've generally preferred class-per-table subclasses, but it might be neat to support single-table subclasses with a scheme like:
Designate a type
column and determines the subclass in joist-codegen.json
:
UserEvent: {
fields: {
type: {
subclasses: {
COMMENTED: { name: "CommentUserEvent", fields: ["comment"] },
LIKED: { name: "LikeUserevent", fields: ["post", "author"] },
}
}
}
Or maybe re-arranged slightly:
UserEvent: {
subclassDiscriminator: "type",
subclasses: {
COMMENTED: { name: "CommentUserEvent", fields: ["comment"] },
LIKED: { name: "LikeUserEvent", fields: ["post", "author"] },
}
And then codegen would output:
UserEventCodegen.ts
as-isUserEvent.ts
as-isCommentUserEventCodegen.ts
, new, extends UserEvent, marks comment as requiredCommentUserEvent.ts
, new, touched once, for subclass-specific business logic.Questions:
m2o
s to "only a subclass" i.e. only a CommentUserEvent
. In CPTI (class per table), the m2o
FK just points to the subclass columnAdd methods like:
const hasFollower: Promise<boolean> = collaboration.entityFollowers.includes(user);
That would do a SELECT * FROM entity_followers WHERE collaboration_id = 1 and user_id = 2
so that we could detect m2m / o2m presence without loading the entire collection.
Also create the SELECT
via a dataloader so that multiple field resolvers that call .includes(...)
will be batched into a single SQL call.
Basic idea:
joist-codegen.json
add a lazy: true
to a given column (primitive only?)someBigColumn: LazyColumn
await author.someBigColumn.load()
await em.populate(author, [..., "someBigColumn"])
and then accessed synchronously author.someBigColumn.get
Currently if there are multiple reverse rules, they are treated as separate entities, e.g.:
// BookReview.ts
bookReviewConfig.addRule({ book: "author" }, () => ...);
// Book.ts
bookConfig.addRule("author", () => ...);
Both of these end us as reverse hints in the Author
config, e.g.:
["books", "reviews"]
["books"]
But then in followReverseHint
we invoke books.load()
multiple times. Really instead of multiple separate reverse hint arrays, we need a single tree of hints, i.e:
* books
* reviews
* ...other collections....
And use immutable.js or immer.js for easy deep modifications.
Running into this:
TypeError: typeKinds[kind] is not a constructor
at /Users/mo/Desktop/code/playing-around/orm-comparison/node_modules/pg-structure/src/main.ts:171:12
at Array.forEach (<anonymous>)
at addTypes (/Users/mo/Desktop/code/playing-around/orm-comparison/node_modules/pg-structure/src/main.ts:166:8)
at addObjects (/Users/mo/Desktop/code/playing-around/orm-comparison/node_modules/pg-structure/src/main.ts:368:3)
at pgStructure (/Users/mo/Desktop/code/playing-around/orm-comparison/node_modules/pg-structure/src/main.ts:465:3)
at processTicksAndRejections (node:internal/process/task_queues:96:5)
at async /Users/mo/Desktop/code/playing-around/orm-comparison/node_modules/joist-codegen/src/index.ts:190:16
When console logging, I see that datemultirange
is not supported on pg-structure.
{
oid: 4535,
arrayOid: 6155,
schemaOid: 11,
classOid: 0,
kind: 'm',
category: 'R',
notNull: false,
default: null,
sqlType: null,
arrayDimension: 0,
name: 'datemultirange',
values: '{}',
comment: 'multirange of dates',
relationKind: null
} m
Currently we have the following hooks:
With a few thoughts:
afterValidation
afterCommit
s that push data into external systemsbeforeFlush
running before beforeCreate
b/c we want to apply defaults in beforeCreate
that are seen by beforeFlush
Rough proposal, remove the before
/after
naming convention, and fix the ordering:
onCreate
(fired only on create, and before onSave
)onUpdate
(fired only on update, and before onSave
)onSave
(fired on either create or update, and after onCreate
/ onUpdate
onDelete
(fired only on delete)onCommit
the new afterCommit
AFAIU, Postgres itself (the server) has great support for "1 connection making multiple / simultaneous in-flight queries", but that the client driver libpq only just gained this functionality (todo add link).
And so I'm not sure if the pg
package we use is leveraging this capability, or if it's artificially serializing queries.
I.e. for Joist, we could send UPDATE books ...
, INSERT authors ...
, UPDATE book_reviews ...
, basically all of our mutations, essentially all simultaneously, but I believe/assume the driver is currently serializing them.
If a field is not required, and not specified in the opts, then newTestInstance should not associate it to an existing entity.
If I have:
Author {
book?: Book
}
Then call:
const a1 = newTestInstance(em, Author, { name: "A1", book: { title: "I'm a book by A1" } });
const a2 = newTestInstance(em, Author, { name: "A2" });
// This will fail
expect(a1.book).not.toEqual(a2.book)
I.e. a beforeCreate is passed an Author
will all of the required fields as-is; but b/c the hook sees the entities before the validation rules have verified the types are correct, something like author.name.startsWith
could actually fail.
Seems like we should pass the hooks a "more correct" version of the entity with the required fields turned to optional.
Note that we've solved this for validation rules, b/c the "required fields" rules run before other rules.
The wrinkle is that hooks run even before required-field rules b/c they can set defaults for fields.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.