Tuesday, 2 August 2016

Learning ASP.Net MVC - Part 4 - Entity Framework Fundamentals

Learning ASP.Net MVC series:
  1. Setup
  2. MVC Concepts
  3. Authentication
  4. Entity Framework Fundamentals
  5. Upgrading project to .NET Core 1.1
  6. Dependency Injection and Services


EF logo The previous version of Entity Framework was 6 and the current one is Entity Framework Core 1.0, although for a few years they have been going with Entity Framework 7. It might seem that they changed the naming to be consistent with .NET Core, but according to them they did it to avoid confusion. The new version sprouted from the idea of "EF everywhere", just like .Net Core is ".Net everywhere", and is a rewrite - a port, as they chose to call it, with some extra features but also lacking some of the functionality EF6 had - or better to say has, since they continue to support it for .NET proper. In this post I will examine the history and some of the basic concepts related to working with Entity Framework as opposed to a more direct approach (like opening System.Data.SqlConnection and issuing SqlCommands).

Entity Framework history


Entity Framework started as an ORM, a class of software that abstracts database access. The term itself is either a bit obsolete, with the advent of databases that call themselves non relational, or rebelliously exact, recognizing that anything that can be called a database needs to also encode relationships between data. But that's another topic altogether. When Entity Framework was designed it was all about abstracting SQL into an object oriented framework. How would that work? You would define entities, objects that inherited from a EntityBase class, and decorate their properties with attributes defining some restrictions that databases have, but objects don't, like the size of a field. You also had some default methods that could be overridden in order to control very specific custom requirements. In the background, objects would be mapped to tables, their simple properties to columns and their more complex properties to other tables that had a foreign key relationship with the owner object mapped table.

There were some issues with this system that quickly became apparent. With the data layer separation idea going strong, it was really cumbersome and ugly to move around objects that inherited from an entire hierarchy of Entity Framework classes and held state in ways that were almost opaque to the user. Users demanded the use of POCOs, a way to separate the functionality of EF from the data objects that were used through all the tiers of the application. At the time the solution was mostly to use simple objects within your application and then translate them to data access objects which were entities.

Microsoft also recognized this and in further iterations of EF, they went full POCO. But this enabled them to also move from one way of thinking to another. At the beginning the focus was on the database. You had your database structure and your data access layer and you wanted to add EF to your project, meaning you needed to map existing tables to C# objects. But now, you could go the other way around. You started with an application using plain objects and then just slapped EF on and asked it to create and maintain the database. The first way of thinking was coined "database first" and the other "code first".

In seven iterations of the framework, things have been changed and updated quite a lot. You can imagine that successfully adapting to legacy database structures while seamlessly abstracting changes to that structure and completely managing the mapping of objects to database was no easy. There were ups and downs, but Microsoft stuck with their guns and now they are making the strong argument that all your data manipulation should be done via EF. That's bold and it would be really stupid if Entity Framework weren't a good product they have full confidence in. Which they do. They moved from a framework that was embedded in .NET, to one that was partially embedded and then some extra code was separate and then, with EF6, they went full open source. EF Core is also open source and .NET Core is free of EF specific classes.

Also, EF Core is more friendly towards non relational databases, so you either consider ORM an all encompassing term or EF is no longer just an ORM :)

In order to end this chapter, we also need to discuss alternatives.

Ironically, both the ancestor and the main competitor for Entity Framework was LINQ over SQL. If you don't know what LINQ is, you should take the time to look it up, since it has been an integral part of .NET since version 3.5. in Linq2Sql you would manually map objects to tables, then use the mapping in your code. The management of the database and of the mapping was all you. When EF came along, it was like an improvement over this idea, with the major advantage (or flaw, depending on your political stance) that it handled schema mapping and management for you, as much as possible.

Another system that was and is very used was separating data access based on intent, not on structure. Basically, if you had the need to add/get the names of people from your People table, you would have another project that had some object hierarchy that in the end had methods for AddPeople and GetPeople. You didn't need to delete or update people, you didn't have the API for it. Since the intent was clear, so was the structure and the access to the database, all encapsulated - manually - into this data access layer project. If you wanted to get people by name, for example, you had to add that functionality and code all the intermediary access. This had the advantage (or flaw) that you had someone who was good with databases (and a bit with code) handling the maintenance of the data access layer, basically a database admin with some code writing permissions. Some people love the control over the entire process, while others hate that they need to understand the underlying database in order to access data.

From my perspective, it seems as there is an argument between people who want more control over what is going on and people who want more ease of development. The rest is more an architectural discussion which is irrelevant as EF is concerned. However, it seems to me that the Entity Framework team has worked hard to please both sides of that argument, going for simplicity, but allowing very fine control down the line. It also means that this blog post cannot possibly cover everything about Entity Framework.

Getting started with Entity Framework


So, how do things look in EF Core 1.0? Things are still split down the middle in "code first" and "database first", but code first is the recommended way for starting new projects. Database first is something that must be supported in perpetuity just in case you want to migrate to EF from an existing database.

Database first


Imagine you have tables in an SQL server database. You want to switch to EF so you need to somehow map the existing data to entities. There is a tutorial for that: ASP.NET Core Application to Existing Database (Database First), so I will just quickly go over the essentials.

First thing is to use NuGet to install EF in your project:
Install-Package Microsoft.EntityFrameworkCore.SqlServer
and then add
"Microsoft.EntityFrameworkCore.Tools": "1.0.0-preview2-final"
to the project.json tools section. For the Database First approach we also need other stuff like:
Install-Package Microsoft.EntityFrameworkCore.Tools –Pre
Install-Package Microsoft.EntityFrameworkCore.SqlServer.Design
Final touch, running
Scaffold-DbContext "<Sql connection string>" Microsoft.EntityFrameworkCore.SqlServer -OutputDir Models

At this time alarm bells are sounding already. Wait! I only gave it my database connection string, how can it automagically turn this into C# code and work?

If we look at the code to create the sample database in the tutorial above, there are two tables: Blog and Post and they are related via primary key/foreign key as is recommended to create an SQL database. Columns are clearly defined as NULL or NOT NULL and the size of text fields is conveniently Max.



The process created some interesting classes. Besides the properties that map to fields, the Blog class has a property of type ICollection<Post> which is instantiated with a HashSet<Post>. The real fun is the BloggingContext class, which inherits from DbContext and in the override for ModelCreating configures the relationships in the database.
  • Enforcing the required status of the blog Url:
    modelBuilder.Entity<Blog>(entity =>
    {
    entity.Property(e => e.Url).IsRequired();
    });
  • Defining the one-to-many relationship between Blog and Post:
    modelBuilder.Entity<Post>(entity =>
    {
    entity.HasOne(d => d.Blog)
    .WithMany(p => p.Post)
    .HasForeignKey(d => d.BlogId);
    });
  • Having the root sets used to access entities:
    public virtual DbSet<Blog> Blog { get; set; }
    public virtual DbSet<Post> Post { get; set; }

First thing to surprise me, honestly, is that the data model classes are as bare as possible. I would have expected some attributes on the properties defining their state as required, for example. EF Core allows to not pollute the classes with data annotations, as well as an annotation based system. The collections are interfaces and they are only instantiated with a concrete implementation in the constructor. An interesting choice for the collection type is HashSet. As opposed to a List it does not allow access via indexers, only enumerators. It is designed to optimize search: basically finding an item in the hashset does not depend on the size of the collection. Set operations like union and intersects can be used efficiently with Hashset, as well.

Hashset also does not allow duplicates and that may cause some sort of confusion. How does one define a duplicate? It uses IEqualityComparer. However, a HashSet can be instantiated with a custom IEqualityComparer implementation. Alternately, the Equals and GetHashCode methods can be overridden in the entities themselves. People are divided over whether one should use such mechanisms to optimize Entity Framework functionality, but keep in mind that normally EF would only keep in memory stuff that it immediately needs. Such optimizations are more likely to cause maintainability problems than save processing time.

Database first seems to me just a way to work with Entity Framework after using a migration tool. It sounds great, but there are probably a lot of small issues that one has to gain experience with when dealing with real life databases. I will blog about it if I get to doing something like this.

Code first


The code first tutorial goes the other direction, obviously, but has some interesting differences that tell me that a better model of migrating even existing databases is to start code first, then find a way to migrate the data from the existing database to the new one. This has the advantage that it allows for refactoring the database as well as provide some sort of verification mechanism when comparing the old with the new structure.

The setup is similar: use NuGet to install EF in your project:
Install-Package Microsoft.EntityFrameworkCore.SqlServer
then add
"Microsoft.EntityFrameworkCore.Tools": "1.0.0-preview2-final"
to the project.json tools section.

Then we create the models: a simple DbContext inheritance, containing DbSets of Blog and Post, and the data models themselves: Blog and Post. Here is the code:
public class BloggingContext : DbContext
{
public BloggingContext(DbContextOptions<BloggingContext> options)
: base(options)
{ }

public DbSet<Blog> Blogs { get; set; }
public DbSet<Post> Posts { get; set; }
}

public class Blog
{
public int BlogId { get; set; }
public string Url { get; set; }

public List<Post> Posts { get; set; }
}

public class Post
{
public int PostId { get; set; }
public string Title { get; set; }
public string Content { get; set; }

public int BlogId { get; set; }
public Blog Blog { get; set; }
}

Surprisingly, the tutorial doesn't go into any other changes to this code. There are no HashSets, there are no restrictions over what is required or not and how the classes are related to each other. A video demo of this also shows the created database and it contains primary keys. A blog has a primary key on BlogId, for example. To me that suggests that convention over configuration is also used in the background. The SomethingId property of a class named Something will automatically be considered the primary key (also simply Id). Also, if you look in the code that EF is executing when creating the database (these are called migrations and are pretty cool, I'll discuss them later in the post) Blogs are connected to Posts via foreign keys, so this thing works wonders if you name your entities right. I also created a small console application to test this and it worked as advertised.

Obviously this will not work with every scenario and there will be attributes attached to models and novel ways of configuring mapping, but so far it seems pretty straightforward. If you want to go into the more detailed aspects of controlling your data model, try reading the documentation provided by Microsoft so far.

Entity Framework concepts


We could go right into the code fray, but I choose to first write some more boring conceptual stuff first. Working with Entity Framework involves understanding concepts like persistence, caching, migrations, change saving and the underlying mechanisms that turn code into SQL, of the Unit of Work and Repository patterns, etc. I'll try to be brief.

Context


As you have seen earlier, classes inheriting from DbContext are the root of all database access. I say classes, because more of them can be used. If you want to copy from one database to another you will need to contexts. The context defines a data model, differentiated from a database schema by being a purely programmatic concept. DbContext implements IDisposable so for nuclear operations it can be used just as one uses an open SQL connection. In fact, if you are tempted to reuse the same context remember that its memory use increases with the quantity of data it accesses. It is recommended for performance reasons to immediately dispose a context when finishing operations. Also, a DbContext class is not thread safe. It stands to reason to use context for as short a period as possible inside single threaded operations.

DbContext provides two hooks called OnConfiguring and OnModelCreating that users can override to configure the context and the model, respectively. Careful, though, one can configure the context to use a specific implementation of IModel as model, in which case OnModelCreating will not be called. The other most important functionality of DbContext is SaveChanges, which we will discuss later. Worth mentioning are Database and Model, properties that can be used to access database and model information and metadata. The rest are Add, Update, Remove, Attach, Find, etc. plus their async and range versions allowing for the first time - EF6 did not - to dynamically send an object to a function like Add for example and let EF determine where to add it. It's nothing that sounds very safe to use, but probably there were some scenarios where it was necessary.

DbSet


For each entity type to be accessed, the context should have DbSet<TEntity> properties for that type. DbSet allows for the manipulation of entities via methods like Add, Update, Remove, Find, Attach and is an IEnumerable and IQueriable of TEntity. However, in order to persist any change, SaveChanges needs to be called on the context class.

SaveChanges


The SaveChanges method is the most important functionality of the context class, which otherwise caches the accessed objects and their state waiting either for this method to be called or for the context to be disposed. Important improvements on the EF Core code now allows to send these changes to the database using batches of commands. Before, in EF6 and previously, each change was sent separately so, for example, adding two entities to a set and saving changes would do two database trips. From EF Core onward, that would only take one trip unless specifically configured with MaxBatchSize(number). Revert to the EF6 behavior using MaxBatchSize(1). This applies to SqlServer only so far.

This behavior is the reason why contexts need to be released as soon as their work is done. If you query all the items with a name starting with 'A', all of these items will be loaded in the context memory. If you then need to get the ones starting with 'B', the performance and memory will be affected if using the same context. It might be helpful, though, if then you need to query both items starting with 'A' and the ones starting with 'B'. It's your choice.

One particularity of working with Entity Framework is that in order to update or delete records, you first need to query them. Something like
context.Posts.RemoveRange(context.Posts.Where(p => p.Title.StartsWith("x")));
There is no .RemoveRange(predicate) because it would be impossible to resolve a query afterwards. Well, not impossible, only it would have to somehow remember the predicate, alter subsequent selects to somehow gather all information required and apply deletion on the client side. Too complicated. There is a way to access the database by writing SQL directly and again EF Core has some improvements for this, but raw SQL changes are opaque to an already existing context.

Unit of Work and Repository patterns


The Repository pattern is an example of what I was calling before an alternative to Entity Framework: a separation of data access from business logic that improves testability and keeps distinct responsibilities apart. That doesn't mean you can't do it with EF, but sometimes it feels pretty shallow and developers may be tempted to skip this extra encapsulation.

A typical example is getting a list of items with a filter, like blog posts starting with something. So you create a repository class to take over from the Posts DbSet and create a method like GetPostsStartingWith. A naive implementation returns a List of items, but this actually hinders EF in what it tries to do. Let's assume your business logic requires you to return the first ten posts starting with 'A'. The initial code would look like this:
var posts=context.Posts.Where(p=>p.Title.StartsWith("A")).Take(10).ToList();
In this case the SQL code sent to the database is like SELECT TOP 10 * FROM Posts WHERE Title LIKE 'A%'. However, in a code looking like this:
var repo=new PostsRepository();
var posts=repo.GetPostsStartingWith("A").Take(10).ToList();
will first pull all posts starting with "A" then retrieve the first 10. Ouch! The solution is to return IQueryable instead of IEnumerable or a List, but then things start to feel fishy. Aren't you just shallow proxying the DbSet?

Unit of Work is some sort of encapsulation of similar activities using the same data, something akin to a transaction. Let's assume that we store the number of posts in the Counts table. So if we want to add a post we need to do the adding, then change the value of the count. The code might look like this:
var counts=new CountsRepository();
var blogs=new BlogRepository();
var blog=blogs.Where(b.Name=="Siderite's Blog").First();
blog.Posts.Add(post);
counts.IncrementPostCount(blog);
blog.Save();
counts.Save();
Now, since this selects a blog and changes posts then updates the counts, there is no reason to use different contexts for the operation. So one could create a Unit of Work class that would look a bit like a common repository for blogs and counts. Let's ignore the silly example as well as the fact that we are doing posts operations using the BlogRepository, which is something that we are kind of forced to do in this situation unless we start to deconstruct EF operations and recreate them in our code. There is a bigger elephant in the room: there already exists a class that encapsulates access to database, caches the items retrieved and creates one atomic operation for both changes. It's the context itself! If we instantiate the repositories with a constructor that accepts a context, then all one has to do to atomize the operations is to put the code inside a using block.

There are also controversies related to the use of these two patterns with EF. Rob Conery has a nice blog post suggesting Command/Query objects instead. His rationale is that if you have to pass a context object, as above, there is no much decoupling involved.

I lean towards the idea that you need a Data Access Layer encapsulation no matter what. I would put the using block in a method in a class rather than pass the context or not use a repository. Also, since we saw that entity type is not a good separation of "repositories" - I feel that I should name them differently in this situation - and the intent of the methods is already declared in their name (like GetPosts...) then these encapsulation classes should be separated by some other criteria, like ContentRepository and ForumRepository, for example.

Migrations


Migrations are cool! The idea is that when making changes to structure of the database one can extract those changes in a .cs file that can be added to the project and to source control. This is one of the clear advantages of using Entity Framework.

First of all, there are a zillion tutorials on how to enable migrations, most of them wrong. Let's list the possible ways you could go wrong:
  • Enable-Migrations is obsolete - older tutorials recommended to use the Package Manager Console command Enable-Migrations. This is now obsolete and you should use Add-Migration <Name>
  • Trying to install EntityFramework.Commands - due to namespace changes, the correct namespace would be Microsoft.EntityFrameworkCore.Commands anyway, which doesn't exist. EntityFramework.Commands is version 7, so it shouldn't be used in .NET Core. However, at one point or another, this worked if you added some imports and changed stuff around. I tried all that only to understand the sad truth: you should not install it at all!
  • Having a DbContext inheriting class that doesn't have a default constructor or is not configured for dependency injection - the migration tool looks for such classes then creates instances of them. Unless it knows how to create these instances, the Add-Migration will fail.

The correct way to enable migrations is... to install the packages from the Database First section! Yes, that is right, if you want migrations you need to install
Install-Package Microsoft.EntityFrameworkCore.Tools –Pre
Install-Package Microsoft.EntityFrameworkCore.SqlServer.Design
Only then you may open the Package Manage Console and run
Add-Migration FirstMigration
Note that I am discussing an SQL Server example. It is possible you will need other packages if using a different type of database.

The result is a folder called Migrations in which you will find two files: a snapshot and the migration itself. Here is an example of the snapshot:
[DbContext(typeof(BloggingContext))]
partial class BloggingContextModelSnapshot : ModelSnapshot
{
protected override void BuildModel(ModelBuilder modelBuilder)
{
modelBuilder
.HasAnnotation("ProductVersion", "1.0.0-rtm-21431")
.HasAnnotation("SqlServer:ValueGenerationStrategy", SqlServerValueGenerationStrategy.IdentityColumn);

modelBuilder.Entity("EFCodeFirst.Blog", b =>
{
b.Property<int>("BlogId")
.ValueGeneratedOnAdd();

b.Property<string>("Url");

b.HasKey("BlogId");

b.ToTable("Blogs");
});

modelBuilder.Entity("EFCodeFirst.Post", b =>
{
b.Property<int>("PostId")
.ValueGeneratedOnAdd();

b.Property<int>("BlogId");

b.Property<string>("Content");

b.Property<string>("Title");

b.HasKey("PostId");

b.HasIndex("BlogId");

b.ToTable("Posts");
});

modelBuilder.Entity("EFCodeFirst.Post", b =>
{
b.HasOne("EFCodeFirst.Blog", "Blog")
.WithMany("Posts")
.HasForeignKey("BlogId")
.OnDelete(DeleteBehavior.Cascade);
});
}
}

And here is one of the migration:
public partial class First : Migration
{
protected override void Up(MigrationBuilder migrationBuilder)
{
migrationBuilder.CreateTable(
name: "Blogs",
columns: table => new
{
BlogId = table.Column<int>(nullable: false)
.Annotation("SqlServer:ValueGenerationStrategy", SqlServerValueGenerationStrategy.IdentityColumn),
Url = table.Column<string>(nullable: true)
},
constraints: table =>
{
table.PrimaryKey("PK_Blogs", x => x.BlogId);
});

migrationBuilder.CreateTable(
name: "Posts",
columns: table => new
{
PostId = table.Column<int>(nullable: false)
.Annotation("SqlServer:ValueGenerationStrategy", SqlServerValueGenerationStrategy.IdentityColumn),
BlogId = table.Column<int>(nullable: false),
Content = table.Column<string>(nullable: true),
Title = table.Column<string>(nullable: true)
},
constraints: table =>
{
table.PrimaryKey("PK_Posts", x => x.PostId);
table.ForeignKey(
name: "FK_Posts_Blogs_BlogId",
column: x => x.BlogId,
principalTable: "Blogs",
principalColumn: "BlogId",
onDelete: ReferentialAction.Cascade);
});

migrationBuilder.CreateIndex(
name: "IX_Posts_BlogId",
table: "Posts",
column: "BlogId");
}

protected override void Down(MigrationBuilder migrationBuilder)
{
migrationBuilder.DropTable(
name: "Posts");

migrationBuilder.DropTable(
name: "Blogs");
}
}

Note that this is not something that copies the changes in data, only the ones in the database schema.

Conclusions


Yes, no code in this post. I wanted to explore Entity Framework in my project, but if I would have continued it like that the post would have become too long. As you have seen, there are advantages and disadvantages in using Entity Framework, but at this point I find it more valuable to use it and meet any problems I find face on. Besides, the specifications of my project don't call for complex database operations so the data access mechanism is quite irrelevant.

Stay tuned for the next post in which we actually use EF in ContentAggregator!

0 comments:

Post a Comment