SQL Server Batch Inserts of Parent/Child Data with iBATIS

The common pattern for persistence of our domain classes to the Database is through the use of an ORM layer with support for your standard CRUD operations. Many ORM's have mechanisms for optimizing Selects, N + 1 Selects, lazy loading etc., but often don't have mechanisms for batching insertions. Very often when optimizing database calls and eliminating unnecessary round-trips to the Database servers, I come across the scenario where I need to insert many Parent objects and their Child objects in a single batch. The Parent objects primary key is the foreign key for the Child objects so the Parent objects need to be persisted first, and you need to get the Identity of each Parent before inserting the Client. In SQL Server 2008 they introduced the MERGE statement which can be used for this purpose, however I needed this to work in SQL 2005 as well as 2008. Luckily, SQL Server 2005 introduced the OUTPUT clause, which, when combined with a second surrogate key, can be used to batch insert all the Parent Objects at once, OUTPUT all the Identity's of the Parent objects and then batch insert all the Child Objects.

In my scenario I am using iBATIS, which is a lightweight and flexible ORM that provides some easy mechanisms for handling these edge cases. If you were using standard ADO.NET you could also leverage the new Table-Valued Parameters to pass in the sets of Parent and Child data.

For the example I will use two tables, Parent and Child:

  1.  
  2. CREATE TABLE Parent(
  3. Id INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
  4. Name VARCHAR(255) NOT NULL,
  5. SequenceNumber INT NOT NULL)
  6.  
  7. GO
  8.  
  9. CREATE TABLE Child(
  10. Id INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
  11. ParentId INT NOT NULL REFERENCES Parent(ParentID),
  12. Name VARCHAR255) NOT NULL)
  13. GO
  14.  

I have corresponding Parent and Child classes:

  1.  
  2. public class Parent
  3. {
  4. public int Id { get; set;}
  5. public string Name { get; set; }
  6. public IList<Child> Children { get; set; }
  7. }
  8.  
  9. public class Child
  10. {
  11. public int Id { get; set; }
  12. public string Name { get; set; }
  13. }
  14.  

The SequenceNumber in the Parent table is just a surrogate key used to match the Parents in the batch back to the Children in that same batch, but I don't want it to pollute my Domain classes, so I will merely create a sequence number right before I send it to the iBATIS map using linq and anonymous types:

  1.  
  2. public static int InsertBatch(IList<Parent> parentList)
  3. {
  4. var sequencedParents = parentList.Select(
  5. (parent, index) =>
  6. {
  7. SequenceNumber = index,
  8. Parent = parent,
  9. Children = parent.Children.Select(
  10. child =>
  11. new
  12. {
  13. SequenceNumber = index,
  14. Child = child
  15. }).ToList()
  16. });
  17.  
  18. var sequencedChildren = sequencedParents.Select(p => p.Children);
  19.  
  20. var args = new Hashtable
  21. {
  22. {"Parents", sequencedParents},
  23. {"Children", sequencedChildren}
  24. };
  25.  
  26. return Mapper.Instance().Update("InsertParentBatch", args);
  27. }
  28.  

The InsertBatch method is sending two separate lists to the iBATIS Mapped Statement, the list of Parents and the list of Children, which are associated by the sequence number we assigned which is unique for each Parent in this batch. We will use this to join them when we build the SQL in the following iBATIS map:

  1.  
  2. <statement id="InsertParentBatch" parameterClass="map">
  3. DECLARE @Inserted TABLE (
  4. ID int,
  5. SequenceNumber int
  6. )
  7.  
  8. <isNotEmpty property="Parents">
  9. INSERT INTO Parent
  10. (
  11. Name,
  12. SequenceNumber
  13. )
  14. OUTPUT INSERTED.Id, INSERTED.SequenceNumber INTO @Inserted
  15. SELECT Name, SequenceNumber FROM (
  16. <iterate property="Parents" conjunction=" UNION ALL">
  17. SELECT
  18. #Parents[].Parent.Name# Name,
  19. #Parents[].SequenceNumber# SequenceNumber
  20. </iterate>
  21. ) P
  22. </isNotEmpty>
  23. <isNotEmpty property="Children">
  24. INSERT INTO Children (ParentId, Name)
  25. SELECT
  26. INS.Id,
  27. C.Name
  28. FROM
  29. (
  30. <iterate property="Children" conjunction=" UNION ">
  31. SELECT
  32. #Children[].Child.Name# Name,
  33. #Children[].SequenceNumber# SequenceNumber
  34. </iterate>
  35. ) C
  36. INNER JOIN @Inserted INS ON
  37. INS.SequenceNumber = C.SequenceNumber
  38. </isNotEmpty>
  39. </statement>
  40.  

The above map generates a single SQL Server statement that declares a SQL table variable to store the inserted Id's and the corresponding Sequence Numbers that the OUTPUT clause is returning back from our insert Parents statement. The iBATIS iterate property is simply iterating the list of Parents we sent to the map and doing an inline UNION ALL so we can insert them in one step. The insert into Children statement joins back to the @Inserted table variable on the Sequence number to get the Id for it's Parent.

Although the solution requires the addition of the SequenceNumber column which is only used for this purpose and un-related to our data model, it can be worth the performance increase if you are facing a situation where you have high volumes of data being submitted to your application in batches. By using the anonymous types in the persistence layer and keeping the SequenceNumber out of our domain model we can limit it to an edge case performance optimization that should be well encapsulated.

Tags: , , , ,


You can follow any responses to this entry through the RSS 2.0 feed. You can skip to the end and leave a response. Pinging is currently not allowed.