Comments (8)
I can help building this feature, if you want.
from elasticsearch-net.
Hi @erik-kallen,
I have to think about this a little bit more. Currently I only see 1 main problem which would have to be solved in the code-generator:
- The search APIs should have a generic argument
TDocument
for the query builder part, etc. and a different generic argumentTPartialDocument
for just the deserialization.
If that would be the case, you could easily deserialize into a partial type that only has the requested properties marked with required
.
This does not solve the problem for meta-fields like the script_fields
in your example, but these are only available after deserialization of the complete response (including all documents) which means we definitely have to materialize the request and perform a transformation afterwards.
For that transformation part, I'm not sure if we really need a helper, but I'm not completely against adding one.
Regarding type safety:
- The
SourceIncludes
property is already of typeFields
which allows to define fields using lambda expression trees - The
ScriptFields
are currently defined asIDictionary<string, Elastic.Clients.Elasticsearch.ScriptField>
which does not allow usage of lambda expressions. A hand-crafted classScriptFields
similar toFields
could be created to improve usability and type-safety
What do you think?
from elasticsearch-net.
The problem with the TPartialDocument
approach is that I need to make sure that properties are named the same in both types, which is something I very much want to avoid. Another problem that would be good to have solved is that whenever a property would be added to this partial document, the SourceIncludes list needs to be manually maintained.
Given that we already have the typesafe ability to generate a Field from an Expression, I don't think it would be a super hard problem to:
- Use the ExpressionVisitor to traverse an
Expression<Func<TDocument, TTarget>>
to find all access chains to properties of the source and collect them to a Fields object, and to collect all ScriptFields. - Use the ExpressionVisitor to transform the
Expression<Func<TDocument, TTarget>>
to an `Expression<Func<Hit, Hit>> - Compile the expression.
- Deserialize the document into a JsonObject instead of TDocument (this might be hard, but it would be required also for the TPartialDocument solution).
- Use the compiled expression to transform the documents.
The only thing I'm a little worried about is that this means that there is a real risk of the user defining the lambda inline, which would lead to a lot of compilations. To solve this, we could implement expression comparisons and reuse the compiled expressions if the expression trees are identical. Or we could have a limit of how many expressions are allowed to be compiled (which will alert the user to the problem). Or we could not compile the expressions but instead evaluate them dynamically.
Actually, I realize that my idea to have the script inline probably won't play well with this cache. Those values probably need to be referenced by some syntax like FieldValue<string>("theFieldName")
instead.
from elasticsearch-net.
The problem with the
TPartialDocument
approach is that I need to make sure that properties are named the same in both types
This sounds like a pretty unique problem to be honest. I fully understand the argument about the required
properties, but renaming properties during retrieval is probably not a very common thing to do.
Another problem that would be good to have solved is that whenever a property would be added to this partial document, the SourceIncludes list needs to be manually maintained.
The Fields
type already provides an implicit conversion operator from PropertyInfo[]
which means you could initialize it with a list of properties for the TPartialDocument
that is previously obtained by reflection.
Given that we already have the typesafe ability to generate a Field from an Expression, I don't think it would be a super hard problem to:
From the technical side, this is definitely doable and the approach you describe sounds reasonable.
However, in the context of this library I'm concerned about using compiled expressions. Besides the actual compilation overhead that you already mentioned, they are interpreted instead of compiled on several platforms like Mono, Xamarin, etc. Even when using AOT compilation in regular .NET framework, expression trees can no longer be compiled. This introduces a significant performance impact on these code pathes.
To solve this, we could implement expression comparisons and reuse the compiled expressions if the expression trees are identical
I've played with expression tree comparison in the past to be able to cache generic EF query results and from my experience, the comparison takes a significant amount of time itself (my code can probably be optimized on a few places) so that it was not worth using for fast queries at all.
I can definitely see the value in such helper function and don't want to demotivate you, but the problem you want to solve is really specific in my opinion. I would rather keep the library more generic and don't increase the scope too much. As I'm the only person working on this project, I always have to keep an eye on maintainability as well. This as well was one major reason for switching from hand-crafted code to the code generation approach. Besides that, there are the technical difficulties with the expression trees.
If you want to work on this anyways, I would love to see the results, but I don't expect it to be merged upstream at the moment.
What I'll definitely add to my list, is to come up with a way to deserialize search query results into a TPartialDocument
. While this does not solve your specific issue, this still is something that I can see to have value for a lot of other users 🙂
from elasticsearch-net.
The problem with the TPartialDocument approach is that I need to make sure that properties are named the same in both types
This sounds like a pretty unique problem to be honest. I fully understand the argument about the required properties, but renaming properties during retrieval is probably not a very common thing to do.
I don't understand what you mean. The problem I am referring to is that if I have a class IndexedEntity
with all properties required
, and another class PartialIndexedEntity
with all properties optional, I need to make sure that the properties have the same name in both classes, or things will break silently.
The Fields type already provides an implicit conversion operator from PropertyInfo[] which means you could initialize it with a list of properties for the TPartialDocument that is previously obtained by reflection.
Yes, but as mentioned above, I don't think the TPartialData approach is better than what I currently have.
However, in the context of this library I'm concerned about using compiled expressions
This is a valid concern. The overload would have to be annotated with RequiresDynamicCode. Or perhaps do whatever EF does (which I don't know the details of, but the problem is very similar).
I can definitely see the value in such helper function and don't want to demotivate you, but the problem you want to solve is really specific in my opinion.
I don't think it's a very specific problem. Of course, I don't know how other people use the library, but I imagine it not being uncommon for people to do client.Search<TDocument>(...).Documents.Select(d => new SomeOtherEntity(...))
. Anyone who does that could potentially benefit from this solution.
As I'm the only person working on this project, I always have to keep an eye on maintainability as well.
I understand that, thank you for your work!
If you want to work on this anyways, I would love to see the results, but I don't expect it to be merged upstream at the moment.
I wouldn't mind working on it, but I won't do it if it won't be merged. Or, if it becomes possible to deserialize to a different type than the query builder type, I might implement it as part of my own project (and I can share the code if I manage to get it working nicely).
What I'll definitely add to my list, is to come up with a way to deserialize search query results into a TPartialDocument. While this does not solve your specific issue, this still is something that I can see to have value for a lot of other users 🙂
While I don't see myself using the TPartialData approach, i think I can probably implement what I want externally if TPartialData is allowed to be JsonElement (or JsonNode).
from elasticsearch-net.
The problem I am referring to is that if I have a class IndexedEntity with all properties required, and another class PartialIndexedEntity with all properties optional, I need to make sure that the properties have the same name in both classes, or things will break silently.
I understand that concern 🙂 Just the way you want to solve this, is very specific (in my opinon and without knowing all of your requirements).
The way I personally would solve this (again, given that we would have a version of the API that allows to specify TDocument
and TPartialDocument
), would just look like this:
public record Partial
{
public required int A { get; init; }
public required int B { get; init; }
}
public record Full : Partial
{
public required int C { get; init; }
}
// In the next step, get `Property[]` of `Partial` by reflection and use it to initialize
// the `Fields` instance of the `SourceInclude` property in the `SearchRequest`
If this is a suitable class design, of course depends on your needs.
And keep in mind, that even with your approach, you would still be forced to update the property names on your already indexed documents, if you rename something.
A solution to this would be to use explicit JsonName
attributes on the properties.
I don't know how other people use the library, but I imagine it not being uncommon for people to do
client.Search<TDocument>(...).Documents.Select(d => new SomeOtherEntity(...))
.
That definitely could be the case. It's just that from my experience:
- people either use the inheritance based approach
- people just quickly map the properties by hand if the amount of properties is small
- large projects are using defacto-standard libraries to do the mapping for them (e.g. AutoMapper)
While I don't see myself using the TPartialData approach, i think I can probably implement what I want externally if TPartialData is allowed to be JsonElement (or JsonNode).
Yes, this should be possible in this case. Deserialization to JsonObject
works fine in theory. For you it was crashing because of accessing the indexer (o => o["prop1]
) which is probably not supported at the moment by the field name inference expression visitor.
from elasticsearch-net.
Inheritance won't work for me because different queries return different subsets.
I don't know how other people use the library, but I imagine it not being uncommon for people to do client.Search(...).Documents.Select(d => new SomeOtherEntity(...)).
That definitely could be the case. It's just that from my experience:
- people either use the inheritance based approach
- people just quickly map the properties by hand if the amount of properties is small
- large projects are using defacto-standard libraries to do the mapping for them (e.g. AutoMapper)
Yes, but the main thing I want to do doesn't really have anything to do with the mapping part. It's about reducing the amount of data fetched from elastic, and AutoMapper won't help with that.
from elasticsearch-net.
Closing this for now as not planned. #7792 / #8186 is still on the roadmap.
from elasticsearch-net.
Related Issues (20)
- 8.15.6 - General API Feedback while upgrading from NEST. HOT 2
- SearchResponse.Documents throws null reference exception on successful call
- Fluent API Documentation missing
- Feature request to add back || and &&. Or to allow for dynamic building and joining of queries HOT 3
- Feature request to publish JsonNetSerializer HOT 1
- 8.15.6 - Term/Terms Filtering feedback HOT 1
- 8.15.6 - TermQuery / TermsQuery inconsistencies and feedback HOT 6
- 8.15.6: TypeMappingDescriptor Dynamic Feedback HOT 3
- Fix implicit conversion for `Union`
- 8.15.6 - QueryDescriptor Operand/Operator (!, ||, &&) overloads HOT 1
- Consolidate usage of `FluentDescriptorDictionary` HOT 3
- 8.15.6 - Sorting Feedback
- Add support for Cluster update settings API HOT 2
- GetSettings doesn't publicly expose the result HOT 3
- SearchResponse and ScrollResponse do not have a common base class HOT 2
- illegal_argument_exception when calling the Field Capabilities API HOT 1
- Make it possible to do a MultiSearch with different types HOT 2
- Usability improvements (meta issue)
- `PropertyNamingPolicy` from `JsonSerializerOptions` of the `DefaultSourceSerializer` is not applied for mappings HOT 1
- Fluent configuration of `Remove` processor on `PutPipelineRequestDescriptor` doesn't support field descriptor HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from elasticsearch-net.