I am trying to write code to transfer data from one application to a new version of the application.
Both the old and new applications use SQL Server. The old database is poorly designed in 2 main areas: It does not make proper use of Foreign Keys, and it has additional tables which are unnecessary (for example, but not only this, lookup tables are used where an enum should have been used instead).
I believe that the â€œcorrectâ€ way to do this would be using SSIS, however I have no experience with this and I have struggled to get to grips with how exactly to set it up, so I am thinking about a code-based solution (run once console application).
I plan to achieve this by using Entity Framework since I am comfortable using EF, and it is easy to set up.
The crux of the problem is maintaining relationships, and I am looking for guidance on the best way to maintain related data. So for example Customer has many Addresses, Address has Customer Id.
If it were that simple I could simply query Customers, .Include(x => x.Address) and use this to add the Customers to the new database (creating new Ids automatically in the target database along the way). However, since there are quite a few tables (~30) all linked in some way or other, each with many rows of data, I feel that this would be messy and quite intensive since the data would likely need to be loaded into memory in one hit (since I would have to transform the data as well).
An alternative would be to SET IDENTITY INSERT and copy the tables one by one, maintaining the old Ids in the new tables, but this feels wrong.
Other than sending myself on an SSIS course, are there any other alternatives?
SSIS is more suited to transforming data from different sources/destinations. If both databases use SQL Server then I would look at migrating the table structure solely via SQL queries. EF is possible, but will be slower and more resource intensive, especially if not run on the database server itself. (data over the wire and back)
Say for example I had a table that related to an Order table through an "OrderNumber" I want to migrate to use OrderId with a FK. I structure my new table to include an OrderId as Nullable, and also include the OrderNumber, typically with a trailing underscore as a reminder that the column will be dropped. I import the data with the order numbers directed to the temporary column, then perform a second pass to update the OrderId. After a sanity check that the OrderIds are all accounted for (no remaining null values) then the OrderId column can be made non-nullable, establish the FKs & constraints, and drop the extra OrderNumber_ column. When performing these changes it's generally better to set up the tables with no constraints/indexes, then create them after the data is ported. (Faster, but you want to be reasonably sure your data is coming across correctly and completely) Break up the proposed data structure and transfer/re-associate it a table at a time before finalizing the new FKs. Temp tables can be utilized as well for tricky scenarios. All of this can be performed on a restored backup as a proof of concept, scripted, and set up to run on a backed up production server once you're happy it's complete and tested.
Having lookup tables combined with a supporting enumerations would be better than dropping the lookup table. Values with FKs to lookup tables help enforce referential integrity at the DB level and help provide meaning to arbitrary values when queried. I typically suffix lookup tables associated with enums with "Enum" as an indicator that these tables are governed by code and should be read-only. You can also set up Insert/Update/Delete triggers to prevent changes to these tables. (which update scripts can disable and re-establish when you need to expand the enum)