I am very new to data modeling, and according to Microsoft's Entity Framework, tables without primary keys are not allowed and apparently a bad idea. I am trying to figure out why this is a bad idea, and how to fix my model so that I do not have this hole.
I have 4 tables in my current model: User, City, HelloCity, and RateCity. It is modeled as shown in the picture. The idea is that many users can visit many cities, and a user can only rate a city once, but they can greet a city many times. For this reason, I did not have a PK in HelloCity table.
Any insight as to how I can change this to comply with best practices, and why this is against best practices to begin with?
This response is mainly opinion/experience-based, so I'll list a few reasons that come to mind. Note that this is not exhaustive.
Here're some reasons why you should use primary keys (PKs):
TableB.table_a_id = 5then, you're guaranteed to have a row with
id = 5in
TableA. Data integrity and consistency is maintained, and that is good.
In my opinion, not having a PK might be legal (i.e. the RDBMS will let you), but it's not moral (i.e. you shouldn't do it). I think you'd need to have extraordinarily good/powerful reasons to argue for not using a PK in your DB tables (and I'd still find them debatable), but based on your current level of experience (i.e. you say you're "new to data modeling"), I'd say it's not yet enough to attempt justifying a lack of PKs.
There're more reasons, but I hope this gives you enough to work through it.
As far as your
M:M relations go, you need to create a new table, called an associative table, and a composite PK in it, that PK being a combination of the 2 PKs of the other 2 tables.
In other words, if there's a
M:M relation between tables
B, then we create a table
C that has a
1:M relation to with both tables
B. "Graphically", it'd look similar to:
+---+ 1 M +---+ M 1 +---+ | A |------| C |------| B | +---+ +---+ +---+
C table PK somewhat like this:
+-----+ | C | +-----+ | id | <-- C.id = A.id + B.id (i.e. combined/concatenated, not addition!) +-----+
or like this:
+-------+ | C | +-------+ | a_id | <--| +-------+ +-- composite PK instead | b_id | <--| of concatenation (recommended) +-------+
A primary key essentially tags a row with a unique identifier. This can be composed of one or more columns in a row but most commonly just uses one. Part of what makes this useful is when you have additional tables (such as the one in your scenario) you can refer to this value in other tables. Since it's unique, I can look at a column with that unique ID in another table (say
HelloCity) and instantly know where to look in the User table to get more information about the person that column refers to.
HelloCity only stores the IDs for the
City. Why? Because it'd be silly to re-record ALL the data about the
City and ALL the data about the
User in another table when you already have it stored elsewhere. The beauty of it is, say the user needs to update their
DisplayName for some reason. To do so, you just need to change it in User. Now, any row that refers to the user instantly returns the new
DisplayName; otherwise you would have to find every record using the old
DisplayName and update it accordingly, which in larger databases could take a considerable amount of time.
Note that the primary key is only unique in that specific table though - you could theoretically see the same primary key value in your
User tables (this is especially common if you're using simple integers as IDs) but your database will know the difference based on the relationship you build between tables as well as your JOIN statements in your queries.
Another way primary keys help is they automatically have an index generated on their column(s). This increases performance in queries where your WHERE clause searches on the primary key column value. And, since you'll likely be referring to that primary key in other tables, it makes that lookup faster as well.
In your data model I see some columns that already have 'Id' in them. Without knowing your dataset I would hope those already have all-unique values so it should be fine to place a PK on those. If you get errors doing that there are likely duplicates.
Back to your question about
HelloCity - Entity Framework is a little finicky when it comes to keys. If you really want to play it safe you can auto-generate a unique ID for every entry and call it good. This makes sense because it's a many-to-many relationship, meaning that any combination can appear any number of times, so in theory there's no reliable way to distinguish between unique entries. In the event you want to drop a single entry in the future, how would you know what row to refer to? You could make the argument that you search on all the fields and the greeting may be different, but if there are multiple visits to a city with the same greeting, you may accidentally drop all of those records instead of just one.
However, if it was a one-to-one relationship you could get away with making the combination of both
UserId the primary key since that combination should always be unique (because you should never see multiple rows making that same combination).