Thursday, August 7, 2014

SQL Server Case Sensitivity and SSIS and SSAS

Currently SQL Server default is normally 'SQL_Latin1_General_CP1_CI_AS', the _CI means "case insensitive" - for case-sensitive we would see _CS in its place,
If we were to change the collation, that may break the ETL if you have any, as you may currently not use case sensitive searches for MERGES, it would take a re-write, re-test ect to make it work, 

Build DIM Tables: 

Collation can be set on Server, DB or by Query:
I would recommend using the "Per Query" collation, we can simply apply a "hint" to our query, and then the query would do a case sensitive search instead of the default, case insensitive, See here for detail:

An example for OLS:
-- case Insensitive DEFAULT
SELECT d.*, b.*, a.*, c.*
 ....
  where value = 'pass'
  
-- case Sensitive DEFAULT
SELECT d.*, b.*, a.*, c.*
  FROM..
Where
Value LIKE '%PASS%' Collate SQL_Latin1_General_CP1_CS_AS


SSIS LookUps
Options include:

1. Another option is to keep your Select DISTINCT, and then in your SSIS lookup, tell SSIS to "do the lookup in the SQL Server", I believe you can do this as follows:
"Set the CacheType property of the lookup transformation to Partial or None, the lookup comparisons will now be done by SQL Server and not by the SSIS lookup component. "


2. Also, in Dimenion processing later on, SSAS takes its collation from the DB default: "SSAS dimension will inherits its collation from underlying database and if you don't change the setting it might use case sensitive collation"

3. Use LOWER() or UPPER() functions in query and pipeline.
 
To me this is an SSIS issue, I would err on the side of simplicity and maintainable because in the future someone would need to maintain the solution and the simpler the solution the quicker someone can fix it.  In SSIS Ive always seen either lookups with Upper()/Lower(), by setting the cache type or pulling the lookup out of SSIS and use T-SQL. 

No comments:

Post a Comment