0

I have to validate all Data rows within the Table as soon as new row is inserted.

DETAILS: Each and every row inserted into the table should pass through set of validation rules, and these rules may vary based on data within the row. When written in c#, it looks like small application. So rather than writing it as a separate application, I am planning it to write as a SQLCLR, so that it will be easy to invoke these SQLCLR written validation rules.

These is the first time I am using/exploring SQLCLR, so i am worried if performance goes down becoz of this. I am doing something like converting c# application to SQLCLR application.

Please suggest me about performance of sqlClr with the above scenario.

Thanks in Advance

sandeep
  • 2,704
  • 9
  • 40
  • 52
  • If performance is a major concern, you'd better off writting the validation code in T-SQL Stored Procedures and avoiding CLR, or, even better, implement the validation declarativelly, using DDL. – Gerardo Lima Apr 19 '12 at 17:33
  • Thanks for reply. In my scenario I am planning to create a separate c# class library and include these DLL into SQL and using the functions written within these DLL to validate rows. Am I correct? Am I violating standard design rules by adding dll in sql?. – sandeep Apr 20 '12 at 06:36
  • 1
    Validation with CLR can be done, technically speaking, but T-SQL is faster for SQL Server data access. If performance is a **major concern** and you control all your validation logic (don't depend on other assemblies), probably there's no reason not to use T-SQL. Even better for performance, consider validating data using DDL (unique keys, for example) and not re-scanning all rows in a table for each new row (this can lead to big scallability issues). – Gerardo Lima Apr 20 '12 at 09:11

1 Answers1

2

Without any specific examples of what validation(s) need to be done on the data, this is a very generalized question which makes it difficult to give anything more than a generalized recommendation. To that point, here is some general information regarding SQLCLR (or the CLR capabilities within SQL Server) vs T-SQL:

  • It is absolutely not true that straight T-SQL is always faster than a CLR routine to perform the same operation. T-SQL is a data language and is not necessarily best suited to do string manipulation or other things that it has functions for. I published a fairly detailed analysis on simple-talk regarding this very issue: http://www.simple-talk.com/sql/t-sql-programming/clr-performance-testing/

  • Data can only be retrieved or manipulated via SQL so there is no real benefit of merely wrapping SELECT or DML statements in a .Net language if you are not going to also do a computation that is not as fast or even possible in straight T-SQL. A SQLCLR routine still needs to open a database connection in order to interact with database objects (tables, views, etc.) and even if you are using the context_connection, if the code does nothing more than straight T-SQL in the SqlCommand, then this is a bad design / use of SQLCLR.

  • SQLCLR should not be seen as a replacement for T-SQL. It needs to be approached as a tool that can aid you in certain places where either the operations would be less efficient in T-SQL or not even possible. It is sometimes possible to extend functionality by using sp_OACreate, etc but that has its own issues. I have found it best to have a middle-ground where certain algorithms which are not easy or possible or efficient to do in T-SQL are exposed from .Net to T-SQL via User-Defined Functions and Stored Procedures that you call from your T-SQL code. I created a library of functions and procedures to do just this which is called SQL# (SQLsharp) that is mostly free and can be found at: http://www.SQLsharp.com/


With all of this in mind, your situation of:

I have to validate all Data rows within the Table as soon as new row is inserted.
...
Each and every row inserted into the table should pass through set of validation rules, and these rules may vary based on data within the row.

sounds like it is best suited to having a regular T-SQL INSERT, UPDATE trigger on the table that uses a combination of straight T-SQL and maybe some SQLCLR routines (depending on the specific validation being done) against the "inserted" table. And again, depending on the specific algorithms you are using for validation, for the ones that are better done in CLR you might gain some efficiency by combining some of those algorithms into a single .Net routine to reduce the start-up cost of calling the external routine.

Of course, whether or not a trigger is best depends on the frequency of INSERTs and UPDATEs and how many validations are being done, and how many rows will be in the table within a few years. If the DML frequency is very high and/or there are a lot of rows, then maybe a disconnected approach is better, but that wouldn't change the recommendation on how to implement the validation algorithms. If the validation is going to take too long (you don't want to hold up the DML operation too long and cause blocking), then you can use a trigger to store the key value(s) in a separate queue table that a SQL job can process every few minutes and decide whether or not to keep.

As I said at the very beginning, examples of the validations would allow for better recommendations on where it is best to handle each algorithm, but hopefully this information gives you what you need to make a better decision.

And remember, only by testing will you know which approach is faster for any particular algorithm!

Solomon Rutzky
  • 41,664
  • 6
  • 112
  • 149