Replicating the source system with Biml

Today we’ll take a look into how we can replicate the source system schema with Biml and SQL Server DDL statements.
No, this isn’t about replicating data as you tend to do with SQL Server. This is how to mirror the source system schema for scenarios when you’d like to have a staging area that is a 100% copy of the source system.

There are a couple of ways to do this and I’ll list 3 of them.

This example assumes you already have collections defined inside RootNode.Tables.
If not, you can use this SQL Server script to create a table containing most of the datatypes available for SQL Server.

 

1. BimlExtension Method: GetDropAndCreateDdl

This method allows you to output a predefined template DDL statement as a string.

Pros: The DDL template is predefined and you simple call “tableNode.GetDropAndCreateDdl()” to retrieve it.
Cons: Template is not easily manipulated.

Code nugget:

Preview window inside BimlStudio:

getdropcreateddl

 

2. User defined BimlScript:

This method allows you to output and control the DDL statement as a string. Alter as needed.

Pros: DDL template is user defined and it’s up to you how you’d like to output it.
Cons: Code nugget can get out of hand if you need to handle lots of arguments. Should be used in a CallBimlScript file.

Code nugget:

Preview window inside BimlStudio:

userdefinedgetdropcreateddl

 

3. User defined BimlScript:

This method adds “Annotations” when you import a schema using “GetDatabaseSchema” to your “Tables” collections.
The “Annotations” can be used the same way as the previous sample. Simply change the properties into the suitable Annotations tag.

Pros: Earliest definition of data types when imported with “GetDatabaseSchema” method. Other methods shown might contain data type conversions that aren’t what you’d like to be seen done.
Cons: Code nugget can get out of hand if you need to handle lots of arguments. Should be used in a CallBimlScript file.

Code nugget:

Preview window inside BimlStudio:

rawgetdatabaseschema

SSIS parallel processing with LINQ in Biml

First of all, thanks to Ben Weissman and Scott Currie for feedback on this post.

Why should you start using LINQ with your BimlScript? Because LINQ is simply a way of expressing queries inside C#.
Added in C# 3.0, LINQ is general-purpose query language for many kinds of data sources—including plain object collections, XML documents, databases, etc.
Let’s see how LINQ enables us to create a BimlScript template pattern for parallel executions inside a dataflow.

This example assumes you already have collections defined inside RootNode.Tables.

If we were to use the following code inside a Biml file, we would generate one package for each (n)th occurrence of a group.
Each package would contain one dataflow with five OleDbSource tasks.

Revision: 2016-06-22
Current code example provided by Scott Currie.

 

Now let’s start by breaking down the code.

First part consists of defining our boundaries.

“groupSize” defines the limit of how many items we want contained in a group.
Change this value if you’d like to have more or less tasks inside your dataflow.

 

We create a for loop to iterate over the available items inside RootNode.Tables. The loop has 3 actions.

  1. First we declare the initial value for our variable “i”. Only done once upon initializing the loop.
  2. Loop until “i” is greater than our wanted group size. This is the number of items in RootNode.Tables divided by our wanted no of dataflow tasks inside a package. Continues until the statement is false.
  3. Increment “i” by 1 for each iteration. Continues until the statement is false.

 

There’s two steps to actually achieving our parallel tasks inside the dataflow.

The first part is our package and dataflow element . Remember, we are now inside our previously defined for loop.
Our for loop gives us our package groups. So we’d like to name our package and dataflow for the current group.
This is done by the following calculation. i*”groupSize” and (i+1)*”groupSize”. Roughly translated it gives us our start range and end range of the iteration.

 

The second part, now here is where the magic is, it’s by using LINQ over our RootNode.Tables collection and calling .OrderBy, .Skip, .Take that we manage to get our OleDbSources inside the dataflow.

  1. OrderBy (o => o.Name)
    This one contains a lambda expression, roughly translated it does a OrderBy for each item in Ascending order of the property “Name”. The “Name” property isn’t the only way to order which packages could be inside your group. You could have annotations on your tableNodes saying which group it belongs to. You could do a query against the source system to check the row count and sort by the row count.
  2. Skip(i*groupSize)
    Here we tell the foreach loop that we’d like to skip the no of items by the integer value in “groupSize” multiplied by the current iteration.
  3. Take(groupSize)
  4. Here we tell the foreach loop that we’d like to take the no of items defined by the integer value in “groupSize”.

 

And with this simple code example we’ve created a template pattern for parallel execution in your SSIS packages.

Check out the following links for further information on the LINQ methods used.

http://www.dotnetperls.com/orderby
http://www.dotnetperls.com/skip
http://www.dotnetperls.com/take

Revision: 2016-06-21
Old code example:

Use Biml and output .SQL files

If you are developing BIML either in BimlExpress or BimlStudio you have the capabilities to output SSIS packages from a defined template. The templates for SSIS can be created manually with BIML or by using C# to extend BIML and script out your templates.

Examples shown often demonstrate how to automatically generate SSIS packages.
But what about SQL Server DDL defintions?

Lets dig into this example code where I show how easy it is to output SQL Server DDL defintions with BIML.
This example assumes you already have collections defined inside RootNode.Tables.

Now let’s start by breaking down the code.

First part consists of importing the required namespaces for this operation.

We create a string containg the root path for where we’d like to output the result.

Onto the cool bits. Here we loop over all items existing inside RootNode.Tables. We use a C# code nuggets to call File.WriteAllText and pass in three options.

  1. The filename we’d like our output to have.
  2. The content we’d like our output to have.
  3. The encoding we’d like our output to have.
    This is optional since the methods default encoding is UTF-8 encoding without a Byte-Order Mark (BOM).

And with this simple code example we’ve created one .sql file containing a DDL statement for each table defined in RootNode.Tables.