chipsenkbeil / entity-rs Goto Github PK
View Code? Open in Web Editor NEWA simplistic framework based on TAO, Facebook's distributed database for social graph
Home Page: https://entity.rs
A simplistic framework based on TAO, Facebook's distributed database for social graph
Home Page: https://entity.rs
Create an mdbook in doc/
that will publish to the github pages for this repo.
Follow https://github.com/marketplace/actions/mdbook-action for the actions to use.
Check out https://stackoverflow.com/a/63461325 for a trigger that can happen when a specific path changes. Would want to update our CI to exclude our doc path and then add a new workflow for our book.
In terms of book's contents, modeling after some books like async_graphql, serde, and async-std.
Some options such as no_builder
are currently declared as type bool
, which means no_builder = false
is technically valid. Mechanically, that means "yes, I do want a builder," but it's not clear that a double-negative is required in this case.
darling::util::Flag
provides a bool
-like interface without allowing boolean literals in the attribute; instead only the word form is allowed. Does it make sense to switch over and tighten the API definition?
Rather than requiring the implementor of the database to handle internal state for arc, provide a clone implementation on a database, etc. it would be better to use a weak reference. This way, if a database is destroyed, all active ents will lose their reference to it as well.
Since a weak reference can be created via Weak::new()
without providing an initial value (upgrade will return None), we can get rid of the option.
This will change the connect(...) ent signature and the builder signature.
A refactoring that would also impact the macro. Rather than relying on From and TryFrom being implemented in different directions, have a singular trait that provides methods to convert in both directions. That trait can that have a blanket impl for the From and TryFrom impls. This makes the derive macro easier to understand and better highlights the actual mapping.
Additionally, we should be able to convert a From<...> into the trait, defaulting the other method to always failing.
trait ValueLike {
fn parse(value: Value) -> Result<Self, ValueLikeErr>;
fn to_value(&self) -> Value;
}
Because we will predominately be managing a database using Box<dyn Database>
, we need a trait method to enable loading and saving the database.
The alternative that, thinking through, might be better is to have Database
require AsAny
similar to Ent
and provide a blanket implementation to convert to a specific database. For users of a database, my assumption is only one kind will be used at a time, so explicitly downcasting should be enough to use database-specific options. Would need to validate that this is useable when Arc<Box<dyn Database>>
is used.
Model after https://entgo.io/docs/migrate/ or other ORMs that are following Facebook's TAO model.
My first thought is to encourage code versioning of a schema by renaming an ent and creating a new version and having some macro that generates a conversion between the two ents. This would be create a From<Old> for New
and create some mutation to apply to a database.
After refactoring the derive macros, I think it makes sense to throw out the old tests and make new ones as that would be faster to cover each individual derive macro and the specific error messages we want to verify, rather than repurposing the existing ones.
A common practice in libs is to provide a top level prelude module reminiscent of the stdlib that is used to bring the minimum traits that need to be imported to use trait methods.
Some examples come to mind:
Add derive(EntBuilder)
and derive(EntQuery)
. I think typed methods should stay regardless.
While it's nice to have some included, I'm now a fan of the "batteries not included." It also means we can remove the need to test for no default features if we make it such that there are no features by default.
I must have broken something as UI tests are supposed to be run on nightly. Before 0.1 is released, I need to make sure these are back to being run again.
Two options:
I think sled will be nice to have eventually and being a key-value store it should work well like with hashmaps (or b+ trees). The 1.0.0 version is supposedly stabilizing late January 2021, which would avoid any awkward migration efforts.
For the time being, I guess custom in-memory that uses one or more hashmaps would be the way to go. Those also serialize well whereas sled would require some intermediate serialize/deserialize as described in spacejam/sled#1065.
For either database, main challenge is supporting custom indexing. If we had to search through all ents with each query, it would be too expense. Some possible shortcuts to take:
Similar to derived EntBuilder instances, but derived EntQuery (and untyped query?) will need to contain a boxed trait object reference instead.
darling only supports DeriveInput at the top level, which I'd like to use to remove as much of the boilerplate and cruft from the macro code as possible.
I wasn't sure if it was as simple as using parse_macro_input
like with derive macros, but apparently it is as seen in this example from syn:
#[proc_macro_attribute]
pub fn simple_ent(args: TokenStream, input: TokenStream) -> TokenStream {
let args = /* ... */;
let input = parse_macro_input!(input as DeriveInput);
}
This means that I can have a singular darling definition for structs across both my attribute and derive macros. Same for enums as well.
https://github.com/bkchr/proc-macro-crate provides means to get root crate name and can use that to get root of that crate.
Darling provides a cleaner way to parse the metadata for attributes, so I should switch to using that via https://github.com/TedDriggs/darling
Currently, we have attributes like the following:
#[ent(no_builder, no_query, no_typed_methods)]
These seem almost pointless (not quite). It may be okay to keep them, but the main ones we want to support are the following:
#[ent(builder = "name", query = "name")]
Additionally, for enums, we need a way to specify the list of types and that the enum wraps them once #15 is complete. This would be at the struct field level:
#[ent(edge(enum = "MyEnum", types = ["T1", "T2", "T3"]))]
The above would change how the load_ method works as it would load the untyped version and attempt to case to each of the given types, wrapping the type in the enum using a call to Into for T1, T2, or T3. None would be returned when another, unexpected type is found.
This still depends on MyEnum implementing some query, which may not be desirable. To that end, there are also two options for the queries: #[ent(edge(no_query))]
to remove the query methods on the edge and #[ent(edge(untyped_query))]
to change the query method to be a generic entity::Query
instead of a typed version.
Need to devise some sort of custom indexer interface that can be used to support faster lookup of ent ids.
Use case in mind is for vimwiki-server where each ent has a region associated with it to denote where it is in a file. Before swapping to entity-rs, a special structure was maintained to provide fast lookup given an offset into the appropriate location (by using a tree).
The challenge is having a generic-enough indexer and how to pass it around as Rust isn't dynamic. We'd need to be able to attach an indexer type to the field attribute and then the database would know to create an indexer of that type for the given field.
Maybe an enum of types would be best such as
pub enum Indexer {
Hash, // Will create a hashmap of a field's value using `Value` as key and `Vec<Id>` as value
Geo, // Will create a kd-tree of a field's value -- to support Region from vimwiki, we'd need a 3D search grid... how can we make this generic enough to enable someone to specify?
}
Maybe the best way is to have a trait that is an indexer? It would take a Value
as an argument and provide a Vec<Id>
as the output?
trait Indexer {
fn insert(&mut self, value: Value, id: Id);
fn get(&self, value: Value) -> Vec<Id>;
fn remove(&mut self, value: Value, id: Id) -> Option<Id>;
}
I'm not yet sure if this is possible given that async_graphql does not expose traits or anything else to implement. Instead, it uses its own attribute proc macro to generate code.
This would be fine if I knew that I could attach it to run after my own derive macro by adding an impl block that gets transformed. I'll have to test to see.
#[derive(Clone, Ent)]
#[ent(async_graphql)]
pub struct MyEnt {
// ...
}
Depends on #42 being complete as we need to be able to convert to/from a value. Otherwise, iterating over each type should be the same as a vec.
In the same way that EntBuilder
adds a ::build()
method to the associated struct, EntQuery
should add a ::query()
method so the need to import BlahEntQuery
is no longer required.
Currently, the builder, query, and ent are all under a single derive.
Rather than having three (or four if you count the typed methods) impls bundled under one derive, these should be composable with #[simple_ent]
bringing them all together.
derive(Ent)
does just the Ent trait as is standardderive(EntType)
does the EntType traitderive(EntBuilder)
does the custom builder struct and implderive(EntQuery)
does the custom query struct and implderive(EntTypedFields)
does the field getters and settersderive(EntTypedEdges)
does the edge id getters/setters and the edge loading, which is dependent on implementing entderive(EntDebug)
creates a new Debug impl that excludes the database from the ent to enable debug printing since the database is not required to implement Debugderive(EntWrapper)
does the EntWrapper trait for an enumMaking it this way both enables users to more easily make their own implementations while mixing/matching as well as help me make the code a little cleaner with separated responsibilities. E.g. typetag is only relevant for the ent derive.
This also encourages new integrations such as #23:
derive(AsyncGraphqlEnt)
would produce a new impl using the #[async_graphql::Object]
that exposes the fields of the ent, the ids of the edges, and the edges themselves via the load methodsderive(AsyncGraphqlEntFilter)
would produce a new struct using the #[derive(async_graphql::InputObject)]
that has option fields for every ent field and edge where edge option type is dependent on a similar impl for the edgeUsing our entity logic with the inmemory database, we could make some simple data structures in the future:
Something along the lines of
#[derive(Ent)]
#[ent(from_unchecked)]
pub struct PageEnt {
#[ent(field, index)]
#[ent(field_name = "...")] // Optional attribute to change the method name from title(...)
#[ent(mut)] // Optional attribute to include a mutable method to edit the field
#[ent(into)] // Optional attribute to include a method to consume ent and produce the specific field
title: String,
#[ent(edge)]
#[ent(edge_new = "path::to::method")] // Optional attribute to change the method used, that assumes (database, ent)
#[ent(edge_id_name = "...")] // Optional attribute to change the name of the id method from element_ids
#[ent(edge_name = "...")] // Optional attribute to change the name of the method from to_...
elements: Vec<BlockElementEnt>,
}
results in
pub struct PageEnt(Ent);
impl IEnt for PageEnt { ... }
impl PageEnt {
pub fn title(&self) -> &str {
match self.0.field_value("title").expect("Corrupted PageEnt missing field title") {
Value::Text(x) => x,
x => panic!("Corrupted PageEnt title wrong type: {:?}", x),
}
}
/// Only exists because of attribute
pub fn update_title(&mut self, new_title: String) {
self.0.update_field("title", new_title);
}
/// Only exists because of attribute
pub fn into_title(self -> String {
/// TODO: Need to add an into_field_value method to Ent that consumes and produces single value
match self.0.into_field_value("title").expect("Corrupted PageEnt missing field title") {
Value::Text(x) => x,
x => panic!("Corrupted PageEnt title wrong type: {:?}", x),
}
}
/// Edges get a unique method to return the ids of the edge
pub fn element_ids(&self) -> Vec<usize> {
self.0.edge("elements").expect("Corrupted PageEnt missing edge elements").to_ids()
}
}
impl<D: Database> Connected<D, PageEnt> {
/// Traverses edge, gets ents, and converts to appropriate type
pub fn to_elements(&self) -> DatabaseResult<Vec<BlockElementEnt>> {
self.load_edge("elements").map(|ents| ents.map(|ent| BlockElementEnt::new(self.database(), ent)))
}
/// Special method available to reload data within ent based on current database standing
/// CAN THIS BE ON GENERAL IMPL?
pub fn refresh(self) -> DatabaseResult<Self> {
/// TODO: Provide helper method on database that will return an ent and return a databaseError if none found
Self::new(self.database, self.database.get_required(self.ent.id()))
}
/// Special method available to push all ent changes back to the database
/// CAN THIS BE ON GENERAL IMPL?
pub fn flush(&self) {
// CAN WE DO THIS WITHOUT CLONING?
self.database().insert(self.ent().id(), self.ent().clone())
}
}
impl<D: Database> From<Connected<D, PageEnt>> for Ent {
fn from(x: Connected<D, PageEnt>) -> Self {
x.ent.into()
}
}
impl From<PageEnt> for Ent {
fn from(x: PageEnt) -> Self {
x.0
}
}
/// TODO: Will need custom type for loading an ent and failing
impl TryFrom<Ent> for PageEnt {
type Err = LoadEntErr;
fn try_from(x: Ent) -> Result<Self, Self::Err> {
// TODO: Would need to check all of the fields and edges of the ent to validate
// proper types (for fields) and existence (for edges since we won't traverse)
}
}
/// Only exists because of from_unchecked attribute
impl From<Ent> for PageEnt {
fn from(x: Ent) -> Self {
Self(x)
}
}
Not sure if I want to do this yet. Couple of details I'd need to work out:
Defining an ent! macro that can generate all of the structs, impls, and other Rust code we need. Wasn't sure if modifying a struct's fields was possible with the derive macro approach, so this seems like the next best thing and is simpler/less code than the proc macro of #3.
Must always start with @name <NAME>;
to specify the name of the ent used for the struct, impls, and type name
Attributes come next and can have zero or more in the form of @attr <NAME> [...]
where the macro matches on the first two and then has zero or more arguments to apply. This gets folded into different impls.
a. global_database lets us add a new impl block for a specific database type and expression that yields the database, which would be some global that can be accessed. Helps to reduce boilerplate where we don't need to pass the database around everywhere if it's going to persist the lifetime of the program.
b. *only_database would indicate that we don't need generics and instead only a specific database. An alternative might just be to add a type alias of PageEnt
and have the actual struct use a different name. Not sure if I want this in a macro as it might be too difficult to support the former case and the latter case can be done externally.
Fields and edges are interchangeable in that fields could come before or after edges, but they are always grouped together
a. fields(...) contains all of the fields in the form of [@index]? <NAME> <TYPE>;
and uses the information to do a couple of things:
i. Include name and type in the parameters for constructing a new ent
ii. Include name and type in the fields of a struct that is the builder e.g. PageEntBuilder
iii. Include names as argument values in the body of the constructor method of the ent
iv. If index flag given, will mark the field with attribute index
b. edges(...) contains all of the edges in the form of @maybe|@one|@many [@shallow_delete|@deep_delete]? <NAME> <TYPE>
i. Include name and type (as Option, Id, or Vec) in the parameters for constructing a new ent
ii. Include name and type (as Option, Id, or Vec) in the fields of a struct that is the builder e.g. PageEntBuilder
iii. Include names as argument values in the body of the constructor method of the ent
iv. If shallow/deep delete attribute given, will mark the edge with the appropriate attribute
ent! {
@name PageEnt;
@attr global_database InmemoryDatabase GLOBAL_INMEMORY_DATABASE.lock().unwrap();
@attr only_database InmemoryDatabase;
@fields(
@index title String;
);
@edges(
@deep_delete @one header ContentEnt;
@maybe subheader ContentEnt;
@many paragraphs ContentEnt;
);
}
ent! {
@name ContentEnt;
@attr global_database InmemoryDatabase GLOBAL_INMEMORY_DATABASE.lock().unwrap();
@attr only_database InmemoryDatabase;
@fields(
text String;
);
@edges(
@one @shallow_delete page PageEnt;
);
}
Similar to a typed arena, we need something that provide unique ids that persists in a database. In vimwiki-rs, I wrote an id allocator, which I can port over to here.
Given an ephemeral id of 0 assigned to an ent, the database should replace that id with a unique id when being inserted. This enables us to create new ent instances without the user having to think of a unique id.
For enums like
pub enum MyValueEnum {
One,
Two,
Three,
}
should be able to derive a ValueLike
impl of
impl ValueLike for MyValueEnum {
fn into_value(self) -> Value {
match self {
Self::One => Value::Text(String::from("one")),
Self::Two => Value::Text(String::from("two")),
Self::Three => Value::Text(String::from("three")),
}
}
fn try_from_value(value: Value) -> Result<Self, Value> {
match value {
Value::Text(x) => match x.as_str() {
"one" => Ok(Self::One),
"two" => Ok(Self::Two),
"three" => Ok(Self::Three),
_ => Err(Value::Text(x)),
},
x => Err(x),
}
}
}
Currently not supported because it's complex to edit the macro code that generates predicates for standard types to support types that have generics.
All methods that aren't constructors or implementations for the Ent
interface should be removed.
Because the attribute macro can transform everything, we should be able to support transforming:
#[simple_ent]
struct MyEnt {
#[ent(edge))]
other: MyEnt
}
into
#[derive(Ent, ...)]
struct MyEnt {
#[ent(edge(type = "MyEnt"))]
other: Id,
}
This would be done ONLY when the type is not specified in the edge attribute during the attribute macro expansion.
I don't think this is supposed to be valid:
#[derive(Debug, Ent)]
struct What {
#[ent(id, database, created, last_updated)]
how: String,
}
Experience-wise, the most consistent and precise behavior feels like it would be, "if a field declares multiple ent-field words, they all get errors." That avoids needing to worry about order of appearance in the derive input and avoids leaking to the user the order in which the checks take place in the proc-macro.
This shouldn't otherwise block parsing or validation, so the shape of internal::EntField
wouldn't change, but it's probably useful to have it expose a method for checking this condition to avoid many-way if-statements spreading far throughout the codebase.
One open question: Should a field that claims to be both id
and database
count in the "__ already declared elsewhere" check? If so, then this new validation goes inside the per-field loop before the big if/else tree. If not, then maybe the best approach is to add fn known_field(&self) -> Option<darling::Result<KnownField>>
to internal::EntField
and then branching off that in the per-field loop?
Not all collection types can be returned as &[Id]
. For example, BinaryHeap<Id>
cannot be treated as a slice to my knowledge. We either need to distinguish the types to support both &[Id]
and &BinaryHeap<Id>
or we just have the ids be returned as &<TYPE>
for all situations.
Use case is for vimwiki where I have BlockElement, InlineElement, and other enums that can be one of several types.
Need to be able to do
#[simple_ent]
pub struct Page {
#[ent(edge = "BlockElement"))]
elements: Vec<Id>,
}
#[simple_ent]
pub enum BlockElement {
Paragraph(Paragraph),
// ...
}
This would generate an implementation for the enum that has no builder, maybe a typed query but without anything useful (just id, type, etc) since the page would generate it, and routes all IEnt functions to the underlying variant. Because you can use the field and edge untyped functions, that would be good enough here, especially since you can extract the more specific type from the enum.
Suppose we have two ents: EntOne and EntTwo. If they each have an edge to the other and are deriving EntQuery, there is a method produced for the query that uses the other ent's query type.
struct EntOneQuery { ... }
impl EntOneQuery {
// WILL FAIL IF RETURN TYPE IS NOT IN SCOPE!
pub fn query_two(...) -> EntTwoQuery { ... }
}
Dependent on async-graphql/async-graphql#392.
minimal-version:
name: minimal-version
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions-rs/toolchain@v1
with:
toolchain: nightly-2020-09-21
override: true
- name: Install cargo-hack
run: cargo install cargo-hack
- name: "check --all-features -Z minimal-versions"
run: |
# Remove dev-dependencies from Cargo.toml to prevent the next `cargo update`
# from determining minimal versions based on dev-dependencies.
cargo hack --remove-dev-deps --workspace
# Update Cargo.lock to minimal version dependencies.
cargo update -Z minimal-versions
cargo hack check --all-features --workspace --no-dev-deps
Hitting some issues where transient dependencies are too old.
[build-dependencies]
# Using -Z minimal-versions fails with the async-graphql dependency as it
# has the follow dep chain of
#
# multer 1.2.2 -> mime 0.3.0 -> unicase 2.0.0 -> rustc_version 0.1.0 -> semver 0.1.0
#
# The semver version fails to compile and is ancient, so we are providing a
# requirement on a later version of the build dependency to enable using
# the minimal version check
rustc_version = "0.1.7"
semver = "0.1.20"
Per discussion in async-graphql/async-graphql#373, GraphQL doesn't support unions in input types today. This means that the typed predicates that are unions can't be used directly. Instead, I'll need to be able to generate a unique input type to handle all conditions for a given ent.
enum Predicate {
Equals(...),
GreaterThan(...),
}
// becomes
#[derive(async_graphql::InputObject)]
struct Predicate {
equals: Option<...>,
greater_than: Option<...>,
}
The challenge is that we'd need to make.a unique struct per type: String, Int, Float, Boolean, and ID. This isn't too demanding and could potentially be converted into a Custom Scalar as it supports maps (BTrees of names and values), lists (vec of values), and other types as defined here: https://github.com/async-graphql/async-graphql/blob/master/value/src/lib.rs#L123
If we can get away with making value be a scalar, then we could make custom predicate & filter input types that flatten the enums into structs of options. We'd want to give them names like GraphQLPredicate and GraphQLFilter (and GraphQLQuery).
In addition, we could still generate unique typed predicates for each of the five standard primitives of GraphQL.
Finally, we could add a derive of EntAsyncGraphQL and EntAsyncGraphQLFilter that would provide an object impl using the async_graphql macro and an input type filter that is a more specific MyEntGraphQLFilter where the edge and field are replaced with the graphql equivalents.
Following https://gist.github.com/Koxiaet/8c05ebd4e0e9347eb05f265dfb7252e1
u64
on created/last_updated to ::std::primitive::u64
Span::call_site()
is used for replacement into Span::mixed_site()
root
that is hard-coded to ::entity
)value.clone()
and others to ::std::clone::Clone::clone(&value)
value.to_string()
and others to ::std::string::ToString::to_string(&value)
self::TestEnt
// tests/hygiene.rs
#![no_implicit_prelude]
use entity::Ent;
#[derive(Clone, Ent)]
pub struct TestEnt {
...
}
Already have one for vimwiki-server, so can copy that one, but there are a couple of additions I would like to make:
cargo update -Z minimal-version
as described in rust-lang/cargo#4100Ideally, we can have references to the database encoded in the ents themselves, but it might be nice to be able to configure a global database using.a mixture of static, once_cell
, and an Option<Box<dyn Database>>
that can be initialized via some global functions like entity::configure(database)
.
Then, with generated functions, we could have something like PageEnt::load(id)
that uses PageEnt::with(db).load(id)
or some other builder configuration, or a direct method like PageEnt::load_from(db, id)
.
For builder configuration, we create some struct like PageEntBuilder
.
For now, the schema has extra baggage as it gets encoded with every ent, which has duplicate field and edge names. To avoid maintaining a schema for ents, might be a better idea to have it live directly on the ents themselves and switch back to having an explicit string type assigned to each ent.
Supporting all other std library collections through macros includes support to detect VecDeque
as a Value::List
and BTreeMap
as a Value::Map
. Need to add these as tests to validate the field types are defined correctly. Same goes for PathBuf
and OsString
being treated as Value::Text
.
Tried before, but I may not have done it correctly.
The idea is that we want to remove the need to manually include typetag
, serde
, and (eventually) async_graphql
alongside entity
when using entity_macros
.
e.g.
#[::entity::vendor::typetag]
impl ::entity::Ent for ... {
...
}
Additionally, if we can figure out a way to export a no-op attribute macro, we could avoid the need to check for typetag
inclusion at all and just always include the vendor version.
entity_macros
that we will re-export from entity if we have the macros feature. This would mean that including entity_macros
directly would no longer be an option.There may also be other types that are being used in vimwiki-server, so need to give it a proper runthrough before closing this out. For OsString
, PathBuf
, and Path
we can use the Value::Text
for conversions.
#[simple_ent]
pub struct Wiki {
index: usize,
name: Option<String>,
path: PathBuf,
files: HashSet<PathBuf>,
}
One challenge when generating types via the derive macro was that the import needed to be done explicitly when used in another module.
use crate::{Region, GqlPredicate_Region};
#[simple_ent]
#[derive(AsyncGraphqlEnt, AsyncGraphqlEntFilter)]
pub struct MyEnt {
region: Region,
}
Because modules are in a separate namespace than structs/enums/etc, we should be able to generate a module that contains a common type like Region::GqlPredicate
so we can do a single import:
use crate::Region;
#[simple_ent]
#[derive(AsyncGraphqlEnt, AsyncGraphqlEntFilter)]
pub struct MyEnt {
region: Region,
}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.