ENtity framework 6 code first: what is the best implementation for a baseobject with 10 childobjects

c# entity-framework entity-framework-6

Question

First, we have EF6 code and a baseobject with 10 childobjects.

Five of those ten child objects have just a few (additional) characteristics, while five have many properties (5 to 20). Since we did this, we have one table for the base and one for each child (total 10).

However, this generates GIANT select queries withselect case and unions all over the place, which also requires the EF to create in 6 seconds (the first time).

According to what I've read, the same problem still exists when a table is placed on a slab of concrete.

The only option left is table-per-hierachy, which isn't ideal either because it results in a table with a lot of characteristics.

Exists a different remedy for this?

I considered creating a union view instead of maybe skipping inheritance in order to get all the things from all the child records or objects.

Any further ideas?

I appreciate it.

1
10
6/27/2014 10:07:24 AM

Accepted Answer

Implementing a type of CQRS format where you have distinct databases for writing (command) and reading might be another option (query). The data in the read database may even be de-normalized to make it run more quickly.

I believe your choice actually comes down to Table per Hierarchy and Table per Type if you assume you need at least one normalized model with referential integrity. To perform better, TPH is reported by EF team member Alex James and more recently on the Data Development site for Microsoft.

TPT benefits and why performance is more crucial than benefits:

A higher degree of flexibility allows for the addition of kinds without changing any already-existing tables. Since EF migrations make it simple to construct the necessary SQL to upgrade existing databases without changing data, it is not a major worry.

database validation since there are fewer fields that can be null. Because EF validates data in accordance with the application model, this is not a major concern. Running a background script to validate data is not too difficult if data is being added through other methods. Due to the possibility of two sub-class tables having the same primary key, TPT and TPC are actually worse for primary key validation. The issue of validation by alternative methods remains.

As a result of not needing to retain every null field, storage space is decreased. This is only a very minor issue, particularly if the DBMS has an effective method for managing "sparse" columns.

Design and intuition. It does feel strange to have one really huge table, but this is mainly because most database designers have spent a lot of time standardizing data and creating ERDs. It would appear that the fundamentals of database design are violated by having one huge table. The largest obstacle to TPH is most likely this. For a very passionate dispute, see this article..

The main criticism of TPH is encapsulated in that article as follows:

It's not normalized even in a trivial sense, it makes it impossible to enforce integrity on the data, and what's most "awesome:" it is virtually guaranteed to perform badly at a large scale for any non-trivial set of data.

Most of these are incorrect. The terms integrity and performance were already mentioned, and TPH does not necessarily mean denormalized. There are simply lots of self-referential (nullable) foreign key columns. So, just as we would with a TPH, we can continue developing and standardizing the data. I have many associations between sub-types in a current database and have made an ERD that looks like a TPT inheritance structure. This truly depicts how Entity Framework was implemented using code first. Here is my example.Expenditure class that derives fromRelationship which descended fromContent :

public class Expenditure : Relationship
{
    /// <summary>
    /// Inherits from Content: Id, Handle, Description, Parent (is context of expenditure and usually 
    /// a Project)
    /// Inherits from Relationship: Source (the Principal), SourceId, Target (the Supplier), TargetId, 
    /// 
    /// </summary>
    [Required, InverseProperty("Expenditures"), ForeignKey("ProductId")]
    public Product Product { get; set; }
    public Guid ProductId { get; set; }

    public string Unit { get; set; }
    public double Qty { get; set; }
    public string Currency { get; set; }
    public double TotalCost { get; set; }        

}

The InversePropertyAttribute both theForeignKeyAttribute Give EF the data necessary to do the necessary self joins on the single database.

Additionally, the Product type maps to the same table (also inheriting from Content). The table contains a row for each Product, and rows containing Expenditures will include information in theProductId column, which is empty for rows with any other form of data. Therefore, the data is standardized was simply stored in one table.

The benefit of utilizing EF code first is that, whether using TPH or TPT, the database is designed and implemented (nearly) exactly the same manner. Simply add an annotation to each sub-class mapping them to new tables to convert the implementation from TPH to TPT. It therefore doesn't really matter which one you select, which is fantastic news for you. Simply construct it, produce a stack of test data, test it, alter your approach, and test it once more. TPH should come out on top, in my opinion.

6
7/3/2014 6:27:42 AM

Popular Answer

I have a few recommendations after having dealt with issues comparable to yours. Due to the complexity of the subject and the fact that I don't have all the answers, I'm also open to improvements on these proposals.

When handling non-trivial queries on complicated entities, i.e. those with numerous tiers of child collections, entity framework can be exceedingly slow. It does take an abnormally long time to compile the query in some of the performance tests I've attempted. Theoretically, EF 5 and later should automatically cache constructed queries (even if the context is disposed and re-instantiated), but I'm not sure if this is always the case.

For a complex database, I've read recommendations to make numerous DataContexts using just smaller chunks of your database entities. If you think this will work for you, try it out! However, I suppose this strategy would have maintenance problems.

1) Although it should go without saying, ensure sure your database has the appropriate foreign keys set up for related entities. If you do, entity framework will keep track of these relationships and generate queries requiring foreign key joins much more quickly.

2) Only take what you require. One-size-fits-all approaches to obtaining a complicated thing are rarely the best ones. Let's say you simply need to display the name and ID of these objects in the list of the base object when you receive a list of base objects (to put in a list). Any navigation attributes that are not particularly required should not be fetched; only retrieve the base object.

3) I would totally construct a View in the database and query that instead if the child objects are not collections or if they are collections but you only need 1 item (or an aggregate value like the count) from them. It is SO MUCH faster. EF doesn't need to do any work because the database, which is better suited for this kind of operation, handles everything.

4) Exercise caution. Include(), which connects to point #2 above, is used here. It is preferable to avoid using if you are getting a single object plus a child collection property. Include() so that a separate query will be run to fetch the child collection. (Therefore, not every row in the child collection receives all the base object columns)

EDIT

Here are some additional ideas in response to the comments.

It makes logical sense to maintain separate tables for the extra attributes of the inheriting classes as well as a table for the base class as we are dealing with an inheritance hierarchy. However, there is still disagreement over how to optimize Entity Framework's performance.

Although I previously utilized Entity Framework (EF) for a scenario comparable to this (although with fewer children), I didn't use the Entity Framework generated classes in this instance as the business objects. Direct connections between the EF objects and the DB tables.

To convert to them, I made a set of Mappers and separate business classes for the basic and inherited classes. A query would appear something like this:

public static List<BaseClass> GetAllItems()
{
  using (var db = new MyDbEntities())
  {
    var q1 = db.InheritedClass1.Include("BaseClass").ToList()
       .ConvertAll(x => (BaseClass)InheritedClass1Mapper.MapFromContext(x));
    var q2 = db.InheritedClass2.Include("BaseClass").ToList()
       .ConvertAll(x => (BaseClass)InheritedClass2Mapper.MapFromContext(x));

    return q1.Union(q2).ToList();  
  }
}

Although I'm not saying this is the ideal course of action, it might be a place to start. In this instance, the queries compile quickly.

Welcome to comment!



Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow