Code Monkey home page Code Monkey logo

entityframework.utilities's Introduction

This a fork that adds various features and bugfixes to the original EFUtilities by Mikael Eliasson. Below is a modified version of the original README.md.

The goal

Performance! EF is quite fast in many cases nowdays but doing CUD over many entities is slooooow. This is a solution for that.

EntityFramework.Utilities provides some batch operations for using EF that the EF team hasn't yet added for us. Suggestions are welcome! Pull requests are even more welcome:)

Right now it's mostly to targeted at EF on SQL server but adding providers should be simple.

Example

Here is a small extract from the performance section later in the document.

Batch iteration with 25000 entities
Insert entities: 880ms
Update all entities with a: 189ms
Bulk update all with a random read: 153ms
delete all entities with a: 17ms
delete all entities: 282ms

Standard iteration with 25000 entities
Insert entities: 8599ms
Update all entities with a: 360ms
Update all with a random read: 5779ms
delete all entities with a: 254ms
delete all entities: 5634ms

Installing

Right now this only works for DbContext. If anyone want to make a failing test or provide a sample project for any of the other variants it will probably be easy to fix.

EF 6.1.3+

Any package from 1.0.3 and up should work.

Nuget package https://www.nuget.org/packages/RudeySH.EFUtilities/

Utility methods

These methods are small helpers that make certain things easier. Most of them work against context so they should be provider independent. It will be stated when this is not the case.

Update single values on an entity

REQUIRES: using EntityFramework.Utilities;

db.AttachAndModify(item).Set(x => x.Property, "NewValue")

A simpler API for working with disconnected entities and only updating single values. This is useful if you want to update a value on an entity without roundtripping the database. A typical usecase could be to update the number of reads of a blogpost. With this API it would look like this

using (var db = new YourDbContext())
 {
      db.AttachAndModify(new BlogPost { ID = postId }).Set(x => x.Reads, 10);
      db.SaveChanges();
}

The Set method is chainable so you could fluently add more properties to update.

This would send this single command to the database:

exec sp_executesql N'UPDATE [dbo].[BlogPosts]
SET [Reads] = @0
WHERE ([ID] = @1)
',N'@0 int,@1 int',@0=10,@1=1

IncludeEFU (A significantly faster include)

REQUIRES: using EntityFramework.Utilities;

The standard EF Include is really really slow to use. They reason is that it cross joins the child records against the parent which means you load a significant amount of duplicate data. This means more data to transfer, more data to parse, more memory etc etc.

Include EFU on the other hand runs two parallel queries and stitch the data toghether in memory.

A very basic query:

var result = db.Contacts
.IncludeEFU(db, c => c.PhoneNumbers)
.ToList();

It's also possible to sort and filter the child collections

var result = db.Contacts
.IncludeEFU(db, x => x.PhoneNumbers
    .Where(n => n.Number == "10134")
    .OrderBy(p => p.ContactId)
    .ThenByDescending(p => p.Number))
.ToList();

VERY IMPORTANT: The plan was to add support for nested collections and projections but it seemed like the next version of EF would have this problem fixed in the core so that functionality was dropped (it was hard to get right). What works right now is that you can include one or more child collections but only on the first level.

Also it's important to know that the queries are run AsNoTracking(). If you use this method you are after read performance so you shouldn't need the tracking. If you might need to update some of the entities I suggest you just attach them back to the context.

db.Database.ForceDelete()

REQUIRES: using EntityFramework.Utilities;

PROVIDER DEPENDENT: This methods uses raw sql to drop connections so only works against sql server

Drops the mssql database even if it has connections open. Solves the problem when you are recreating the database in your tests and Management Studio has an open connection that earlier prevented dropping the database. Super useful for testscenarios where you recreate the DB for each test but still need to debug with management studio.

Batch operations

These methods all work outside the normal EF pipeline and are located on the EFBatchOperation class. The design decision behind this choice is to make it clear you are NOT working against the context when using these methods. That's means change tracking, and 2nd level cache or validation will NOT be run. This is for pure performance and nothing less. These methods are also highly provider dependent. Right now the only existing provider is for Sql Server but it should be easy to add others.

Configuration

EFUtilities supports some simple global settings. You can enable logging and control if default fallbacks should be used and add new Providers.

See https://github.com/RudeySH/EntityFramework.Utilities/blob/master/EntityFramework.Utilities/EntityFramework.Utilities/Configuration.cs for the options

Delete by query

This will let you delete all Entities matching the predicate. But instead of the normal way to do this with EF (Load them into memory then delete them one by one) this method will create a Sql Query that deletes all items in one single call to the database. Here is how a call looks:

var count = EFBatchOperation.For(db, db.BlogPosts).Where(b => b.Created < upper && b.Created > lower && b.Title == "T2.0").Delete();

Limitations: This method works by parsing the SQL generated when the predicate was used in a where clause. Aliases are removed when creating the delete clause so joins/subqueries are NOT supported/tested. Feel free to test if it works an if you have any idea of how to make it work I'm interested in supporting it if it doesn't add too much complexity. No constraints are checked by EF (though sql constraints are)

Warning: Because you are removing items directly from the database the context might still think they exist. If you have made any changes to a tracked entity that is then deleted by the query you will see some issues if you call SaveChanges on the context.

Batch insert entities

Allows you to insert many entities in a very performant way instead of adding them one by one as you normally would do with EF. The benefit is superior performance, the disadvantage is that EF will no longer validate any contraits for you and you will not get the ids back if they are store generated. You cannot insert relationships this way either.

            using (var db = new YourDbContext())
            {
                EFBatchOperation.For(db, db.BlogPosts).InsertAll(list);
            }

SqlBulkCopy is used under the covers if you are running against SqlServer. If you are not running against SqlServer it will default to doing the normal inserts.

Inheritance and Bulk insert

Bulk insert should support TPH inheritance. The other inheritance models will most likely not work.

Transactions

If your best choice is using TransactionScope. See example here MikaelEliasson#26

Making it work with profilers

Profilers like MiniProfilers wrap the connection. EFUtilities need a "pure" connection. One of the arguments is a connection that you can supply.

Batch update entities

Works just like InsertAll but for updates instead. You can chose exactly which columns to update too.

An example where I load all items from the database and update them with a random number of reads and writes-

var commentsFromDb = db.Comments.AsNoTracking().ToList();
var rand = new Random();
foreach (var item in commentsFromDb)
{
    item.Reads = rand.Next(0, 9999999);
    item.Writes = rand.Next(0, 9999999);
}
EFBatchOperation.For(db, db.Comments).UpdateAll(commentsFromDb, x => x.ColumnsToUpdate(c => c.Reads, c => c.Writes));

SqlBulkCopy is used under the covers if you are running against SqlServer. If you are not running against SqlServer it will default to doing the normal inserts.

Partial updates / Not loading the data from DB first

Because you specify which columns to update you can do simple partial updates. In the example above I could have generated the list commentsFromDb from an import file for example. What I need to populate is the PrimaryKey and the columns I specify to update.

Example:

var lines = csv.ReadAllLines().Select(l => l.Split(";"));
var comments = lines.Select(line => new Comment{ Id = int.Parse(line[0]), Reads = int.Parse(line[1]) });
EFBatchOperation.For(db, db.Comments).UpdateAll(comments, x => x.ColumnsToUpdate(c => c.Reads));

Inheritance and Bulk insert

Not tested but most likely TPH will work as the code is very similar to InsertAll

Transactions

If your best choice is using TransactionScope. See example here MikaelEliasson#26

Making it work with profilers

Profilers like MiniProfilers wrap the connection. EFUtilities need a "pure" connection. One of the arguments is a connection that you can supply.

Permissions

The SQL Server provider creates a temporary template. The login you are using must have permissions to create and drop a table.

Update by query

Let you update many entities in one sql query instead of loading them into memory and, modifing them and saving back to db.

            using (var db = new YourDbContext())
            {
                EFBatchOperation.For(db, db.Comments).Where(x => x.Text == "a").Update(x => x.Reads, x => x.Reads + 1);
            }

The modifications you can do should be what EF can support in it's queries. For example it's possible to do:

c#count = EFBatchOperation.For(db, db.BlogPosts).Where(b => b.Title == "T2").Update(b => b.Created, b => DbFunctions.AddDays(b.Created, 1));

To incrememt the day one step. This method should be able to handle any renamed columns but the pitfall here is that this works internally by running the modifier through a where clause to get the SQL and than this where clause is transformed to a set clause. The rules for set and where are different so this might not always be valid. This is the most fragile of the methods but you can always test and if it doesn't work open an issue on github and it might get fixed.

Caveats and overall design decisions

There are some special things to keep in mind when using EFUtilities. Here is a list.

  • The bulk insert should be stable but remember if you use database assigned id's it will NOT return these like normal EF inserts do.
  • Update and Delete is quite "hacky". They work by pretending it was a regular where and take the generated sql (not hitting db) then altering this sql for update or delete. If you use joins that might not work. It will work for simple things but if you are doing complex stuff it might not be powerful enough.
  • All 3 methods works in a way that doesn't really align with the DbContext, things are saved before SaveChanges are called, validation is not done, new ids are not returned and changes aren't synced to entities loaded into the context. This is the reason the methods are placed on EFBatchOperation, to make sure it's clear this is working outside the normal conventions.
  • Because particulary Update/Delete are implemented using hacks that depend on the generated sql from EF I would encourage you to add integrations tests whenever you use these methods. Actually I would encourage you to always do that but that is another story. With integrations tests you will be warned if an EF update break EFUtilities and avoid any unpleasant suprises in production.

Performance

These methods are all about performance. Measuring performance should always be done in your context but some simple numbers might give you a hint.

The standard iteration is optimized in the sense that AutoDetectChangedEnabled = false; It would not be reasonable to delete/insert 25000 entities otherwise.

Here is a test run with EntitityFramework.Utilities on a laptop doing operations on a really simple object Comment(Text:string,Date:DateTime,Id:int,Reads:int)

           Batch iteration with 25 entities
           Insert entities: 23ms
           Update all entities with a: 4ms
           delete all entities with a: 2ms
           delete all entities: 1ms
           Standard iteration with 25 entities
           Insert entities: 12ms
           Update all entities with a: 6ms
           delete all entities with a: 3ms
           delete all entities: 7ms
           Batch iteration with 2500 entities
           Insert entities: 47ms
           Update all entities with a: 22ms
           delete all entities with a: 5ms
           delete all entities: 11ms
           Standard iteration with 2500 entities
           Insert entities: 905ms
           Update all entities with a: 46ms
           delete all entities with a: 22ms
           delete all entities: 552ms
           Batch iteration with 25000 entities
           Insert entities: 281ms
           Update all entities with a: 163ms
           delete all entities with a: 18ms
           delete all entities: 107ms
           Standard iteration with 25000 entities
           Insert entities: 9601ms
           Update all entities with a: 457ms
           delete all entities with a: 250ms
           delete all entities: 5895ms
           Batch iteration with 100000 entities
           Insert entities: 1048ms
           Update all entities with a: 442ms
           delete all entities with a: 60ms
           delete all entities: 292ms

This is on an ultrabook. Here I don't compare to anything so it's just to give you some overview about what to expect. Note that in the batchmode around 100k entities/sec are added when reaching larger datasets.

entityframework.utilities's People

Contributors

adzhiljano avatar andreasbergqvist avatar bartdebever avatar blastdan avatar dahrnsbrak avatar gitter-badger avatar jlikens avatar lukemeads avatar mikaeleliasson avatar nethergranite avatar otf avatar reddeathgithub avatar rubberchickenparadise avatar rudeysh avatar schweinsbauch avatar smithkl42 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

entityframework.utilities's Issues

Method not found error when dropped this fork in place of original repo.

System.MissingMethodException: Method not found: 'EntityFramework.Utilities.IEFBatchOperationBase``1<!!1> EntityFramework.Utilities.EFBatchOperation.For(!!0, System.Data.Entity.IDbSet``1<!!1>)'.

I receive this error when I built this fork and dropped it in place of the original branch.

I'm using it like this.

this.dbInsert = Task.Run( () => { EFBatchOperation.For(this.db, this.db.Changes) .InsertAll(this.changes.GetConsumingEnumerable()); });

Am I missing something in the build process. I just build the repo and added EntityFramework.Utilites as a reference in my project.

Problems with UpdateAll for decimals

Hello, I'm reading in records that contain decimals (which were originally inserted using InsertAll), modifying some of the properties and using UpdateAll to save the records. The decimals on these records are being truncated in the hundredths place. This is happening to tens of thousands of records that I'm trying to update.

For example, say I have a record with the columns "Price" and "UpdateDate". The Price for this record is 9.9949 and UpdateDate is 10/1. The InsertAll correctly inserts the both Price and UpdateDate. However when I read in this record, perform some logic that only modifies the UpdateDate, and use UpdateAll (telling UpdateAll to update both Price and UpdateDate since the logic could have modified either columns), the Price gets saved as 9.9900. The same thing happens if the logic were to update the Price 10.8944. Using UpdateAll to update the record would cause this record's Price to save as 10.8900.

Performance tests have low code quality and are unfair

The performance tests within this respository are pretty unfair because there hasn't been looked at JIT optimalizations and the timing of the actions differs per implementation.

I will be working within a fork to fix these issues. Just wanted to make this issue to spread light on the issue and ask if this project is still active under your development.

The dll in the nuget package does not have a strong name

Yeah... what the title says. The dll in the nuget package does not have a strong name and that causes a compiler warning when using it from an assembly that has a strong name.
You also get a runtime error when trying to use it.
Exception;System.IO.FileLoadException: Could not load file or assembly 'RudeySH.EFUtilities, Version=1.0.3.0, Culture=neutral, PublicKeyToken=null' or one of its dependencies. A strongly-named assembly is required. (Exception from HRESULT: 0x80131044)

Encountered Violation of PRIMARY KEY constraint when calling UpdateAll

I load data from DB into oldCamps List, then do some modification and finally save it back to DB by using UpdateAll like below (camRepo.BatchUpdate is just a wrapper which calls UpdateAll under the hood)

campRepo.BatchUpdate(oldCamps, e => e.ColumnsToUpdate(
                        c => c.Name,
                        c => c.Description,
                        c => c.DeletedDate,
                        c => c.LastCheck,
                        c => c.Modified));    
public void BatchUpdate(IEnumerable<T> entities, Action<UpdateSpecification<T>> columnsToUpdate)
{
    if (entities != null && entities.Count() > 0)
        EFBatchOperation.For(_unitOfWork.Context, _unitOfWork.Context.Set<T>()).UpdateAll(entities, columnsToUpdate);
}

Sometimes I've got the following error

Violation of PRIMARY KEY constraint 'PK__temp_Cam__3214EC075C12819D'. Cannot insert duplicate key in object 'dbo.temp_Campaigns_636958659319135460'. The duplicate key value is (84896).
The statement has been terminated. ~    at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
   at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj)
   at System.Data.SqlClient.SqlBulkCopy.RunParser(BulkCopySimpleResultSet bulkCopyHandler)
   at System.Data.SqlClient.SqlBulkCopy.CopyBatchesAsyncContinuedOnSuccess(BulkCopySimpleResultSet internalResults, String updateBulkCommandText, CancellationToken cts, TaskCompletionSource`1 source)
   at System.Data.SqlClient.SqlBulkCopy.CopyBatchesAsyncContinued(BulkCopySimpleResultSet internalResults, String updateBulkCommandText, CancellationToken cts, TaskCompletionSource`1 source)
   at System.Data.SqlClient.SqlBulkCopy.CopyBatchesAsync(BulkCopySimpleResultSet internalResults, String updateBulkCommandText, CancellationToken cts, TaskCompletionSource`1 source)
   at System.Data.SqlClient.SqlBulkCopy.WriteToServerInternalRestContinuedAsync(BulkCopySimpleResultSet internalResults, CancellationToken cts, TaskCompletionSource`1 source)
   at System.Data.SqlClient.SqlBulkCopy.WriteToServerInternalRestAsync(CancellationToken cts, TaskCompletionSource`1 source)
   at System.Data.SqlClient.SqlBulkCopy.WriteToServerInternalAsync(CancellationToken ctoken)
   at System.Data.SqlClient.SqlBulkCopy.WriteRowSourceToServerAsync(Int32 columnCount, CancellationToken ctoken)
   at System.Data.SqlClient.SqlBulkCopy.WriteToServer(IDataReader reader)
   at EntityFramework.Utilities.SqlQueryProvider.InsertItems[T](IEnumerable`1 items, String schema, String tableName, IList`1 properties, DbConnection storeConnection, Nullable`1 batchSize)
   at EntityFramework.Utilities.SqlQueryProvider.UpdateItems[T](IEnumerable`1 items, String schema, String tableName, IList`1 properties, DbConnection storeConnection, Nullable`1 batchSize, UpdateSpecification`1 updateSpecification)
   at EntityFramework.Utilities.EFBatchOperation`2.UpdateAll[TEntity](IEnumerable`1 items, Action`1 updateSpecification, DbConnection connection, Nullable`1 batchSize)
   at Campaigns.Repository.Repository`1.BatchUpdate(IEnumerable`1 entities, Action`1 columnsToUpdate)

I don't know exactly what UpdateAll method do, also I don't know what the table dbo.temp_Campaigns_636958659319135460 's PRIMARY KEY is. So, don't know where to start to debug.
Could you help me please.

Update By Query once more

Hi, I have similar problem as has been reported in previous issue
there is an example
(like your example EFBatchOperation.For(db, db.Comments).Where(x => x.Text == "a").Update(x => x.Reads, x => x.Reads + 1);)

var value1 = 1
var value2 = 2
EFBatchOperation.For(db, db.Comments)
.Where(x => x.Reads == value1)
.Update(x => x.Reads, x => value2);
The variable name '@p__linq__0' has already been declared... is thrown
Note: If I have used constant instead of variables all is OK
But I would like to execute parametrized sql: "update Comments set Reads = @2 where Reads = @1"
Thanks
Dusan

Batch delete operations throw on queryables EF has optimized down to dumb no-op queries

In the following scenario:

int[] ids = Array.Empty<int>();

EFBatchOperation.For(libraryContext, libraryContext.Books).Where(book => ids.Contains(book.Id)).Delete();

An exception will be thrown because EF has optimized libraryContext.Books.Where(book => ids.Contains(book.Id)) such that the SQL it produces looks something like this:

SELECT Id, Title
FROM VALUES (0, NULL)
WHERE 1 = 0

Which naturally has no schema or table to extract, which breaks IQueryProvider.GetDeleteQuery and IQueryProvider.GetUpdateQuery.

A solution would likely be to abort these methods if the query contains something suspicious like WHERE 1 = 0.

Cannot resolve the collation conflict between "SQL_Latin1_General_CP1_CI_AS" and "Latin1_General_CI_AS" in the equal to operation.

Cannot resolve the collation conflict between "SQL_Latin1_General_CP1_CI_AS" and "Latin1_General_CI_AS" in the equal to operation.

When my server collation is set to a different collation than the database, I get this error when using .UpdateAll(...); When creating the temp table, I think the table is using the server default rather than the database default. I am not an SQL expert, but I will look into it.

My workaround is just changing the server collation to match my database (this is a development server with a db I restored from another SQL server).

UpdateAll deadlocks

Hi,

sometimes there is a problem when using UpdateAll method by multiple separate applications/threads (updating different records). Due to the fact that standard behavior is updating rows using PAGELOCK, deadlocks may occur.

Maybe it's worth to add an additional option to specify lock type for the query (ROWLOCK/TABLOCK/PAGLOCK)?

Only Inserts Surface Entities, Nested Data is not Inserted?

Hello,
for my need i want to do batch operations on nested entities like this:

EFBatchOperation.For(db, db.ResortInfoes).InsertAll(resortsToInsert);

where resorts entity have nested images, timetable & a bunch of other entities but the insert operation only pushed the surface data that is in ResortInfoes and ignored all ICollection related tables.

is this the correct behavior for this lib?

UpdateAll => UpdateSpecification does not support the nested properties update or complex properties update

Hey,

Thank you for your fork implementation of this library.

I have noticed that when you provide expressions to UpdateSpecification like:

x => x.Property.NestedProperty

it results into "NestedProperty" name provided by "GetPropertyName" extension
and fails matching with EF property mapping since there it is stored like "Property.NestedProperty".

as the sidenote, there is no way to update whole ComplexProperty as well because of that issue.

Could you please update "GetPropertyName" extension so it would work properly with nested properties?

UpdateAll multiple colums

Hello.
I'm trying to update multiple columns with your library, but how do I do that?

I've tried: new { c.Col1, c.Col2 }, but it doesn't seem to work... Any help?
Thanks.

How to update multiple columns?

Hi. I want to update multiple columns, but the examples only show how to update 1 column. How do I update multiple?
Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.