12

I Have to Upload Data in SQL Server from .dbf Files through SSIS. My Output Column is fixed but the input column is not fixed because the files come from client and client may have updated data by his own style. there may be some unused column too or input column name can be different from output column.

One idea I had in my mind was to map files input column with output column in SQL Database table and use only those column which is present in the row for file id.

But I am not getting how to do that. can you suggest me for doing the same or else you have any idea?

Table Example.

+--------+---------------+--------------+--------+ | FileID | InputColumn | OutputColumn | Active | +--------+---------------+--------------+--------+ | 1 | CustCd | CustCode | 1 | +--------+---------------+--------------+--------+ | 1 | CName | CustName | 1 | +--------+---------------+--------------+--------+ | 1 | Address | CustAdd | 1 | +--------+---------------+--------------+--------+ | 2 | Cust_Code | CustCode | 1 | +--------+---------------+--------------+--------+ | 2 | Customer Name | CustName | 1 | +--------+---------------+--------------+--------+ | 2 | Location | CustAdd | 1 | +--------+---------------+--------------+--------+

Hadi
  • 31,125
  • 9
  • 49
  • 111
Shahab Haidar
  • 548
  • 5
  • 20
  • 1
    What output do you want from that table? It seems that your requirement need manual labor work, or using AI maybe.... – Pham X. Bach Feb 20 '19 at 07:47
  • 1
    SSIS works on static metadata, if your source (or even your destination) is dynamic then you are mostly screwed. You can use a custom Script Task component (in c# or visual basic) to correctly parse your dynamic input and generate a standard, well-known output that matches your destination. https://docs.microsoft.com/en-us/sql/integration-services/extending-packages-scripting-data-flow-script-component-types/creating-a-source-with-the-script-component?view=sql-server-2017 – EzLo Feb 20 '19 at 12:09
  • 1
    Check my answer update(section (4)), you may find it interesting – Hadi Feb 27 '19 at 23:47
  • @shahab I concur with EzLo, What you are asking is in the realm of artificial intelligence - you need to update all data points to account for new metadata; including columns and transformations. How would the package know what to do with new data columns? Sure, Hadi's solution will get the data in, but what will you do with it if it needs to be calculated? BE EXTREMELY CAREFUL WITH THIS solution. You should work closely with your data vendor on this requirement. Anyone who can just add or remove columns randomly may not fully understand what they are doing. – J Weezy Mar 05 '19 at 16:35
  • @JWeezy i agree with that. I faced a lot of similar cases. Sometimes you don't have other choices than handling these type of data. I started a class library project to guarantee a good level of dynamicity since it is hard to achieve it in SSIS. Check for the section (4) in my answer. I am open for any suggestion for improving it? – Hadi Mar 05 '19 at 18:00
  • @Hadi There is nothing you can do - that is the point. It would require an intelligence system to analyze and determine what changes need to be made throughout not just the data model, but the system's logic architecture, including; SSIS calculations, stored procedures, reports, views, etc. I submit that changes to the data model should necessarily be a difficult and trying endeavor so that changes are fully analyzed so that their impact is understood entirely. Anything less is malpractice, IMHO. I am not aware of any assembly that can accomplish this level of work. – J Weezy Mar 05 '19 at 18:19
  • @JWeezy sometimes it is only about different columns names. I don't think it always need an intelligent system. Else i agree with you – Hadi Mar 05 '19 at 18:25
  • @JWeezy most of the time schema mapping is defined by domain experts who analyzed the source files schemas – Hadi Mar 05 '19 at 18:45
  • @Hadi If the OP is saying that they only want a way to dynamically map incoming columns into the existing data model, whereby new columns are ignored and missing columns are set to NULL, then I am OK with that. Though, I don't think that should be allowed - just my 2 cents. However, how would we map columns where the name has changed? – J Weezy Mar 05 '19 at 19:41
  • @JWeezy When you will add the the new column name and the desired output inside the mapping table/class. For sure based on the suggestion of analysts – Hadi Mar 05 '19 at 19:56
  • @JWeezy check the link i mentioned at the end of my answer. It contains an example on that – Hadi Mar 05 '19 at 19:57
  • @Hadi I think we are talking past each other here. If I read you correctly, you solve the column name change problem by providing a schema mapper. My point is that someone has to provide this mapping, which just adds another layer of abstraction. Without the schema mapping, how would a computer be able to figure it out? This still requires human intervention and thus is not easily automated. – J Weezy Mar 05 '19 at 20:46
  • 1
    @JWeezy i totally agree with that. – Hadi Mar 05 '19 at 20:47
  • For Dynamic Mapping you can use BIML which can create SSIS Packages on fly. – Srikar mogaliraju Feb 20 '19 at 11:33

1 Answers1

10

If you create a similar table, you can use it in 2 approaches to map columns dynamically inside SSIS package, or you must build the whole package programmatically. In this answer i will try to give you some insights on how to do that.

(1) Building Source SQL command with aliases

Note: This approach will only work if all .dbf files has the same columns count but the names are differents

In this approach you will generate the SQL command that will be used as source based on the FileID and the Mapping table you created. You must know is the FileID and the .dbf File Path stored inside a Variable. as example:

Assuming that the Table name is inputoutputMapping

Add an Execute SQL Task with the following command:

DECLARE @strQuery as VARCHAR(4000)

SET @strQuery = 'SELECT '

SELECT @strQuery = @strQuery + '[' + InputColumn + '] as [' + OutputColumn + '],'
FROM inputoutputMapping
WHERE FileID = ?

SET @strQuery = SUBSTRING(@strQuery,1,LEN(@strQuery) - 1) + ' FROM ' + CAST(? as Varchar(500))

SELECT @strQuery

And in the Parameter Mapping Tab select the variable that contains the FileID to be Mapped to the parameter 0 and the variable that contains the .dbf file name (alternative to table name) to the parameter 1

Set the ResultSet type to Single Row and store the ResultSet 0 inside a variable of type string as example @[User::SourceQuery]

The ResultSet value will be as following:

SELECT [CustCd] as [CustCode],[CNAME] as [CustName],[Address] as [CustAdd] FROM database1

In the OLEDB Source select the Table Access Mode to SQL Command from Variable and use @[User::SourceQuery] variable as source.


(2) Using a Script Component as Source

In this approach you have to use a Script Component as Source inside the Data Flow Task:

First of all, you need to pass the .dbf file path and SQL Server connection to the script component via variables if you don't want to hard code them.

Inside the script editor, you must add an output column for each column found in the destination table and map them to the destination.

Inside the Script, you must read the .dbf file into a datatable:

After loading the data into a datatable, also fill another datatable with the data found in the MappingTable you created in SQL Server.

After that loop over the datatable columns and change the .ColumnName to the relevant output column, as example:

foreach (DataColumn col in myTable.Columns)
    {

    col.ColumnName = MappingTable.AsEnumerable().Where(x => x.FileID = 1 && x.InputColumn = col.ColumnName).Select(y => y.OutputColumn).First(); 

    }

After loop over each row in the datatable and create a script output row.

In addition, note that in while assigning output rows, you must check if the column exists, you can first add all columns names to list of string, then use it to check, as example:

var columnNames = myTable.Columns.Cast<DataColumn>()
                             .Select(x => x.ColumnName)
                             .ToList();  


foreach (DataColumn row in myTable.Rows){

if(columnNames.contains("CustCode"){

    OutputBuffer0.CustCode = row("CustCode");

}else{

    OutputBuffer0.CustCode_IsNull = True

}

//continue checking all other columns

}

If you need more details about using a Script Component as a source, then check one of the following links:


(3) Building the package dynamically

I don't think there are other methods that you can use to achieve this goal except you has the choice to build the package dynamically, then you should go with:


(4) SchemaMapper: C# schema mapping class library

Recently i started a new project on Git-Hub, which is a class library developed using C#. You can use it to import tabular data from excel, word , powerpoint, text, csv, html, json and xml into SQL server table with a different schema definition using schema mapping approach. check it out at:

You can follow this Wiki page for a step-by-step guide:

Hadi
  • 31,125
  • 9
  • 49
  • 111