The Entity-Relationship Data Model, also called ER, is one of the various data models you can use to reason about your data.
In particular, it’s a conceptual data model, as it’s not linked to any particular implementation. That’s a task left to the logic data model.
The ER Data Model is so general, so high level, that it can be implemented by a variety of completely different kinds of databases.
It is great because you don’t think about the implementation details, but instead you only think about your data and how it’s organized. And while doing so, you are forced to analyze the problem in ways that you didn’t think about before.
I find ER diagrams great at helping you analyze a scenario where data is involved.
The ER Model gives you the tools to represent, using a graphical notation, all the data you need to model, the relations between the different kind of data, and the information associated with it.
There are 2 items that compose an ER Model:
- the entities
- the relations
Entities are types of data, like items or people, that have common properties.
Relations are the relations between entities.
Let me give you an example, let’s talk about books and their authors. We have 2 entities:
A particular book is an instance of the book entity.
Since we have 2 entities, we have 2 relations between them. One is the relation between a single book, and the authors entity. One is the relation between a single author, and the books entity. If we think about it, wre have:
- a book has an author
- an author can write many different books
The visual notation for entities
Given this simple example, we can start introducing the visual notation that will help us create the ER Data Model of our scenario.
Note: There are many different ways to draw ER diagrams. I will use the one that, in my opinion, is more visual and makes more sense.
Entities are represented as rectangles, with some text in them to identify them:
The visual notation for relations
The relation between entities is represented, in its most basic form, using a line that connects two relations, and a diamond with some text in it to identify the type of relation:
Note that I did not create 2 relations “book has author” and “author wrote book”. I made a single relation “authored” between a Book and an Author.
Once we have a relation, we must now indicate the numbers involved. At the moment, we have many questions:
- How many authors does can a book have?
- Can an author write multiple books?
- Does an author need to write at least one book to be called author?
- Can a book be written by multiple authors?
- Can a book exist without at least an author?
All those are good questions to ask, and in this case I think the answers are pretty obvious. And when the answer is not obvious, you can think about the problem and add your own constrains.
There are various ways to visually indicate cardinalities on a diagram. Some prefer changing the shape of the line when it links to an entity.
I do prefer numbers, which make things clearer:
The numbers above mean this: a book can be authored by 1 or more authors.
n means any number of elements.
And an author can have authored from 0 books (maybe it’s writing one now) to an infinite number of books.
The first is called a zero-to-many relationship. The second is a one-to-many relationship.
We can also have:
- one-to-one relationships
- many-to-many relationships
- zero-to-one relationships
Each entity can have one or more attributes.
Let’s say we’ll use the above relationship in a bookstore. Each author has a name, a bio, a website URL.
Each book has a title, a publisher, a year of publication, an ISBN. The publisher could be an entity as well, if we want. But we can also define it as an attribute of a book.
This is how we can represent the above information:
Entities must be identified by a unique key. The book entity can be uniquely identified by the ISBN attribute. Every book has a single ISBN (considering that we don’t represent a single copy of a book, but a book “title”).
We identify the primary key attributes by underlying them:
Author, on the other hand, has no unique identifier at the moment. Two authors can have the same name.
We must therefore add a unique key attribute. For example an
Identifier attributes can span over multiple attributes.
For example, the passport ID and the author country uniquely identify the person, and could replace the
id attribute we added:
Which one to choose? It’s a matter of what makes more sense in your application. If we are modeling a bookstore, we can’t expect to have the country and passport id of all book authors. We’ll therefore use a random
id we’ll choose and associate with each author.
Attributes are not unique to entities. Relations can have attributes, as well.
Consider we need to model a library. In addition to the book and author entities, we now introduce the reader entity, a person that borrows the book to read it. We take its name and id card number when they borrow it:
But we still miss information. We need to know when the person borrowed the book, and the date they return it, so we can store the information about all the history of a particular book in our library. This information does not belong to either the book or reader entities; it belongs to the relation:
We talked about primary keys above, and how the help uniquely identify an entity.
Some entities depend on other entities for their existence, and are called weak entities.
Suppose we need to model orders for an online shop.
For each order, we’ll store the order id, which starts from 1 and increases over time, the date and time it was placed, and the information about the customer, so we know who to bill and where to ship it.
Then we also need to know what they ordered. We create a weak entity “ordered item”, which represents each item in the cart at the time of checkout.
This entity will store the item price at the moment of checkout (so when we change the price of the products on sale, it will not affect the orders already placed), the quantity of items that were ordered, and the options chosen. Say we sell t-shirts, therefore we need to store the color and size of the t-shirt ordered.
It is a weak entity because an ordered item entity cannot exist without an order entity.
An entity can have a recursive relation with itself. Suppose we have a person entity. We can model the parent-child relationship in this way:
A person can have from 0 to n children, a children has 2 parents (considering the simplest scenario).
ISA stands for IS-A, and it’s a way to model generalizations in the ER model.
We use it to group similar entities under a common umbrella. For example an author and a reader, in the case of the books and library example, can be generalized using a person entity.
They both have a name, so we an extract the name up to the person entity, and just manage the peculiarities of being an author or reader in the corresponding entity:
Not every relationship is strictly binary. Let’s take a lesson scenario.
A lesson takes place in a room of the school today at 10:00, with a teacher, talking to a class about physics.
So, a lesson is given at a particular time of a day, it involves a subject, a teacher, a class, and a room.
We can model it in this way: