Wednesday, September 3, 2014

SQL Functions and Plan Caching

I feel that I have become typecast as a Business Intelligence specialist.  However I often employ my SQL skills and I still enjoy the challenge of solving pure SQL performance issues.  Here's an issue, which presented itself as a table valued Function in an OLTP environment that was taking a lot of compilation energy (6.7 seconds) every execution, while the execution was only 0.15 seconds.  It was executing about 1/second at peak.

Now, I am a great fan of functions, but I know that they are not compiled like stored procedures, so I am weary of using them for complex SQL queries.  This query was very complex.  I've got the function's code below, but it is based on a much more complex view, which joins 5 tables.  There is a picture of the function execution above.  See how the execution time is only 125ms, while the parse and compile time is 6638ms!  The obvious solution is to convert this function to a stored procedure, which will enable SQL to store the plan.

However, this would involve changes to .NET code, which would involve a lot of testing and change control.  I found another solution, which enabled us to fix the Function without having any change in .NET code, permitting a quick fix to production.  The fix was to alter the function so that it has multiple lines!  This way SQL Server will treat the 'multistatement table-valued function' as a stored procedure and enable it to be cached.  You can see from the execution results below that their is no parse and compile time, and the execution time is the same 125ms.
How does it do this?  It appears that SQL Server will treat the multistatement table-valued function as a stored procedure.  We can see this from the plan cache query below.

Note, SQL Server is able to cache Adhoc queries, so that your execution of a function with a particular parameter can be cached.  Don't let this cloud your analysis, unless your OLTP environment is going to  execute the function with exactly the same Adhoc text.  You will see these 'Adhoc' cached plans above with objtype='Adhoc'

Note, there is documentation that explains this behaviour, although you might have missed it.  See and read the text near 'multistatement table-valued functions'.

What does this tell us?  Firstly, it reminds us that compile time is not always trivial.  Secondly, it reminds us that we should treat Functions as different to Procedures, and only use Functions where they are appropriate.  I find them most useful where they are doing something like string manipulation and optimisation is not appropriate.

Old function (with 6.8 second compile time every execution)
create function MyFunction 
(@Parameter1 int) 
SELECT t2.*,t1.MyDate
from (select max(t4.TheDate) MyDate, 
      from vBranchCount as t3 
      left outer join  (SELECT *
                        FROM vBranchCount
                        WHERE vBranchCount.Status = 'Applied') AS t4 
ON t3.TranCode = t4.TranCode 
                AND t3.BranchNo = t4.BranchNo 
                AND t3.Cash1 = t3.Cash1 
                AND t3.Cash2 = t4.Cash2 
                AND t3.Cash3 = t4.Cash3 
                AND t3.Cash4 = t4.Cash4 
                AND t4.TheDate < t3.TheDate
      where t3.StockCountPlaceId = @Parameter1
      group by t3.BranchNo,t3.TranCode,t3.Cash1,t3.Cash2,t3.Cash3, t3.Cash4) as t1
inner join vBranchCount as t2 
  on t1.BranchNo = t2.BranchNo
 and t1.TranCode = t2.TranCode
 and t1.MyDate = t2.CashDate
 and t1.Cash1 = t2.Cash1
 and t1.Cash2 = t2.Cash2
 and t1.Cash3 = t2.Cash3
 and t1.Cash4 = t2.Cash4  )

New Function with 6.8 second compile time only on first execution.
create function MyNewFunction 
(@Parameter1 int) 
(  TeamID bigint,
           Rain bit,
           WitchHat varchar(50),
           JoyStormID bigint,
           Happyness int,
           TeamName varchar(250),
           CashIn money,
           CashOut float,
           TheDate datetime,
           JoyClerkID bigint,
           JoyClerkName varchar(150),
           TranCode bigint,
           Bills float,
           CashDate datetime,
           BranchNo bigint,
           Temperature varchar(50),
           StockCountPlaceId bigint,
           Applicability int,
           Readyness int,
           SiteID bigint,
           BillsToChange int,
           Cash1 varchar(50),
           Cash2 varchar(50),
           Cash3 varchar(50),
           Cash4 varchar(50),
           JoyUnit1 int,
           JoyUnit2 int,
           JoyUnit3 int,
           JoyUnit4 int,
           TranSystemId bigint,
           Handle1 varchar(35),
           Handle2 varchar(35),
           Consumer varchar(50),
           JoyConsumerId bigint,
           Rainbow varchar(50),
           StormCode varchar(50),
           ChangeDate datetime)
SELECT t2.*,t1.MyDate
from (select max(t4.TheDate) MyDate, 
 from vBranchCount as t3 
 left outer join  (SELECT *
FROM vBranchCount
WHERE vBranchCount.Status = 'Applied') AS t4 
ON t3.TranCode = t4.TranCode 
AND t3.BranchNo = t4.BranchNo 
AND t3.Cash1 = t3.Cash1 
AND t3.Cash2 = t4.Cash2 
AND t3.Cash3 = t4.Cash3 
AND t3.Cash4 = t4.Cash4 
AND t4.TheDate < t3.TheDate
 where t3.StockCountPlaceId = @Parameter1
 group by t3.BranchNo,t3.TranCode,t3.Cash1,t3.Cash2,t3.Cash3, t3.Cash4) as t1
inner join vBranchCount as t2 
 on t1.BranchNo = t2.BranchNo
 and t1.TranCode = t2.TranCode
 and t1.MyDate = t2.CashDate
 and t1.Cash1 = t2.Cash1
 and t1.Cash2 = t2.Cash2
 and t1.Cash3 = t2.Cash3
 and t1.Cash4 = t2.Cash4   


Query plan cache

SELECT usecounts, cacheobjtype, objtype, text 
FROM sys.dm_exec_cached_plans 
CROSS APPLY sys.dm_exec_sql_text(plan_handle) 
WHERE (text like '%MyFunction%' or text like '%MyNewFunction%')
and objtype <> 'Adhoc'
and cacheobjtype='Compiled Plan'
ORDER BY usecounts DESC;

Wednesday, August 27, 2014

Database IO Throughput

I've written on this before, but I feel I need to say it again in a different way as so many people fail to grasp the significance of IO throughput capacity.  If you see messages in the SQL log along the lines of the following, then you have a very very serious IO throughput issue. I didn't make these logs up.  They are real logs!
The best analogy I can draw is if you can imagine your company has 1000 staff and you have a building of 20 floors with 4 lifts that can adequately house all staff.  Your 1000 staff tend to drift in between 9 and 10AM and the 4 lifts can move the 1000 staff to their desired floors in the hour.  Infrequently, staff will queue for a lift as the first lift will be full, but it's rarely longer than one lift ride for the wait.  During the day, the lift demands are lower and there are no queues.

Now, over time, your company grows to 2500 staff, and you've added another 30 floors to your building and the lifts are extended to cater for all 50 floors.  The CEO is relieved that the building capacity has been increased to house the new staff and that the lifts work across all 50 floors.  However, no one has thought about the increased throughput required by the lifts.  Between 9 and 10AM 2500 staff now drift into the office and want a lift to their floors, however the 4 lifts are only capable of moving about 1200 staff/hour.  The queues for the lifts build up and are not fully drained until after 11AM.  The company is now in trouble with disgruntled staff and find it needs to ask staff to come in earlier or later.  Also, there is the hidden issue of the fire exit stairways also having throughput limits.  With 1000 staff, the building could be evacuated in 15 minutes, now with 2500 staff (and the same stairways) it will take over 40 minutes.

This is analogous to a SQL Server database growing over time, and the SAN (playing the role of the lifts) is extended to hold more TeraBytes, but its throughput capacity (measured in MegaBytes/second of reading and writing) is not increased.  The IO queues start to build up and IO times degrade.  The SQL Log picture above is an example where hundreds of IO every minute are taking longer than 15 seconds to complete!  This is an unbelievably dire situation that can cause SQL Server to shutdown. 

I am not advising that you monitor your SQL Server log for long running IOs, although if you see them, you should act immediately.  I advise you to monitor IO throughput, average latency, and queue length continually, using Perfmon or related tool.  You can detect throughput capacity when MB/second tends to flatline while IO latency and queue lengths are waxing and waning.  Unfortunately, Perfmon, is not a complete tool, but it does have all the base information and there are many third party tools, or you can do what I do and create your own by having Perfmon send records to a common SQL Server database and cube.  You only need a few counters and only record every 60 seconds, or so.  If the Perfmon database is on another server, there will be very little load on your production servers, essentially no IO.  Your Perfmon database could hold performance data for all your production servers, so that you have all the performance information you need in one place.  Also, when you have a performance issue you have historical information of this issue and trends earlier in time.  Companies that don't continually record these metrics are flying somewhat blind, or, at least, flying with blinkers on.  Here's an earlier blog on creating a Perfmon cube, although, unfortunately, the online demonstration is no longer available.  

This posting is not intended to provide answers to the IO throughput issue.  The solution can be many and varied.  The solution will probably involve speeding up the storage system, but it might be to make the database more IO efficient.  The most important point I can make is that you should be monitoring and be able to recognise this issue.  Also, don't let a storage person tell you that you have hit the maximum possible throughput.  There are always ways to increase the throughput performance of your storage, as there are always ways to improve your database performance.

Monday, June 30, 2014

Signs your Data Warehouse is Heading for the Boneyard

I have come across too many data warehouse projects that have obvious signs of failing with project sponsors in denial.  Here are a few signs that you might look out for and take seriously.

  1. You've spent one million dollars and haven't got one million dollars of value.  Data warehouse (DW) projects are never really finished, as there are continual changes in business practices, business software, competition, business strategy, regulation, etc.  DWs need to be flexible so that they can continually be updated/enhanced to adapt to these changes in the business.  If you've spent one million dollars and not got value from that, you are probably developing something that is inflexible.  Typically, I would be encouraging, even very large companies not to spend that much money without showing benefit.  The vision for a DW might be very large, but start with the trickiest, largest, and/or most crucial data.  The DW should manage that data and provide value to the business before more work is commissioned.
  2. Your project is 3 months overdue and the development manager is telling you that more resources are required.  There are probably fundamental design issues that will continue to cause issues.  Such a long development cycle will probably lead to an inflexible DW.
  3. Your user community do not trust the output.  The DW is producing reports, dashboards, alerts, etc., but the user community do not trust the data and aren't using it to run the business.
  4. The same question can be asked to different parts of the warehouse with different answers.  One transaction detail should, ideally, be stored in only one place in the DW.  If it is stored in mulitple places, the design is probably not optimal, and if they are providing different answers, there is definitely a problem.
  5. It's almost impossible to reconcile the data warehouse with your OLTP systems.  The ETL (extract, transform and load) processes are so complex and involve so much custom code that no one can reconcile DW facts with OLTP systems. The ETL processes are probably too complex.
  6. Your developers are telling you that they are spending a significant amount of time developing functionality that will make the data warehouse more flexible.  This is all too common in my view.  Developers like to develop, and they can create large generic applications to schedule jobs, "clean data", manage file transfers etc.  Most generic DW software is already available in Microsoft's SQL Server and related technologies.  The role of the developers is to use this software to create a bespoke application that is as simple as possible.  Try and stick to existing vendor software.  They will maintain it and provide future releases. You will always have the option to move platforms in the future having spent much less money.
  7. The data warehouse requires a full-time person to manage the jobs and exceptions.  DW schedules should be reliable enough to manage themselves, simply providing alerts for late data etc.  Intervention should only be required where there is a defect in the DW and a diagnosis and change is required.  Even when there is a catastrophic failure, the DW should start up and simply continue from where it was.
  8. Some of the development team have left before value is delivered.  DW projects should be delivered in small releases that provide value.  If the release cycle is so long that significant team members are leaving, there is probably an underlying design issue.
  9. There are performance issues that are so severe that it is looking difficult to process one day's data in a day.  This is a bright red flag.  DW will only want to grow with more feed systems and more data.  Such a large performance issue should be seen as a huge question mark over the
  10. The development team are telling you that they need to spend a lot of time up front to develop software that's going to make the DW more flexible!  It's almost a certainty that their software will make the DW more inflexible.  Think of it this way, if it takes 50 dev days to create a DW, it will take a proportion of that 50 dev days to make a change.  If the first deliverable takes 1000 dev days, enhancements will tend to be a proportion of 1000 days.

Monday, January 13, 2014

Determining a wayward query plan while it's executing

Occasionally, I get production performance issues of a long running SQL query, that is consuming lots of system resources and is running for much longer than usual.

You can see the long running query simply by using sp_who2, or some variant, that displays active SQL queries.  So you now have the SPID of the running query that is taking a long time.  I would typically then want to get the SQL command.

You can easily get the SQL for this query by using dbcc inputbuffer(SPID).  I might then use the sql query to get a show plan to see how it should execute.  You can quite easily do this by cutting and pasting the SQL into a new window and requesting a show plan.  However, sometimes the query plan that is running is not the same as the query plan that SQL Server tells you it would use now.  This can be for a variety of reasons including new statistics, parameter sniffing and different connection properties.

So what you really want to get is the plan that is currently executing.  You can obtain this from the sys.dm_exec_query_plan table.  First of all you will need the plan handle, which you can get from sys.dm_exec_requests using the SPID as the session_id.

Use the plan_handle to get the plan from sys.dm_exec_query_plan.  The query plan will return as an xml document.  Simply click on the xml in the query results window for SQLEM to open the query plan in a new window.

Now you have the query plan that is actually executing and you can compare that to the query plan you expected SQL to use.  From here you can tune the query to use a more appropriate query plan.

Note, I would encourage you to use all other options before using index or join hints to get the query plan you desire.  This is because once you override the optimiser with your index or join hint, it will always use that hint even when your table structures or data changes so much that it is obviously silly.  Optimiser hints such as FASTFIRSTROW, redundant predicates etc are OK with me, as they don't override the optimiser, they just tell the optimiser that you want a plan that will get the first row fast.

Saturday, August 10, 2013

Sargable Predicates are usually best practice

Every SQL query writer should know that sargable predicates are generally much faster than non-sargable predicates.  Essentially a sargable predicate can be used by the relational engine to navigate an index.  For example, the predicate Where IpCountryID=@i 

is sargable as the relational engine can use the variable @i to navigate an index on IpCountryID.  Similarly
Where IpCountryID=@i-1

is also sargable as the relational engine can add 1 to @i and use that to navigate an index on IpCountryID.  On the other hand
Where IpCountryID+1=@i
is not sargable as the relational engine cannot navigate an index while there is an arithmetic expression on IPCountryID.  These equivalent sargable and non-sargable predicates and their plans can be seen above.  The sargable predicate query uses an index seek to get the particular record, while the non-sargable predicate uses the same index but it scans the entire index checking if IPCountryID+1=@i.  An example of how the sargability of predicates is important is where you are joining a table to itself where the key has an arithmetic function.  For example, getting the previous record as in the following example
SELECT t1.*,t2.value
FROM MyTable t1
LEFT JOIN MyTable t2
Notice how the predicate is sargable on the left side but not the right.  This means that the relational engine can navigate to t2 using an index.  If the predicate was (logically equivalent) the relational engine could not navidate to t2 using an index.  

I found an interesting situation while tuning a query on a couple of nested views.  It's not easy to reproduce, and to reproduce it I need to include other joins and logic, which will make it hard for you to comprehend.  The essence of the first view was to restrict the records to the last rolling 53 weeks.  It had the following predicate
WHERE TransactionDate >= DATEADD(WW,-53,GETDATE())
This predicate is sargable, which should be a good thing.  However, there are several views on top of this base view that are asking for month date ranges. So they have predicates with hard coded date ranges (on top of the first view) ie
WHERE TransactionDate between '20160101' and '20160131'
This predicate is also sargable, which should also be a good thing.  However, the issue I uncovered was that the optimiser now had a choice of TransactionDate>=DATEADD(WW,-53,GETDATE()) and TransactionDate>='20160101' as the lower bound to the index seek.  Sometimes it was choosing TransactionDate>=DATEADD(WW,-53,GETDATE()) as the lower bound, which is much slower than TransactionDate>='20160101', since 20160101 is a higher date (and there are no transactions for 2016 yet).  The solution to these poor performing queries was to change the first view to make the sargable predicate in the base query non-sargable.  It became 
WHERE DATEADD(WW,+53,TransactionDate) >= GETDATE()
Now the optimizer has no choice, it will use '20160101' and '20160131' as seek predicates on the index and the query runs orders of magnitude faster.

Please don't takeaway the feeling that non-sargable predicates are good.  The takeaway is that it is worthwhile looking into the plans of long running queries, understanding why the optimizer chose that path and help it choose a faster path, if one exists.

Friday, August 9, 2013


I might begin by apologizing for such a long gap since my last blog.  I've had family issues that have taken a lot of pain, time and energy.   This blog is a reminder to everyone that as important as your career and interesting as the technology is, you need to spend more time with the people you love and doing the things you enjoy. For much of the last 20 years I haven't always done this and I've had a couple of big reminders recently.  One of the things I enjoy is unicycling, and last month I had the opportunity of competing in the North American Unicycle Championships in Pennsylvania.  So much fun, and so many interesting people.  That's me, the big guy on the right in white, early on in the marathon race, where I won my age group.

The local newspaper did a little article on me when I returned to Sydney.

So remember to stop work everyday and do something you enjoy and spend more time with people you love.  The technical challenges will still be there tomorrow.

Next blog, I promise, will be a technical one.  It will be about tuning a query by making a predicate non-sargeable!  Yes, I hear you, that's against all the best practices, but there is a good reason for it in this particular case.

Thursday, April 4, 2013

Avoiding a Common Deadlock

A common cause of deadlocks is concurrent identical update/delete queries.  For example, a query such as
DELETE FROM MyTable WHERE MyCode=@code
When two queries start executing this query with different codes, they have a good chance of deadlocking.  This can be via lock escalation, where query A requests an escalation to a table lock, which will wait on the query B.  As soon as the query B requests a lock escalation, or any lock that query A has, there is a deadlock.  You can have a deadlock without table escalation, by the two concurrent queries requesting page locks.  Eventually query A might ask for a page that query B has and query B is waiting on query A.
A very simple solution to this kind of deadlock is to take a Table lock at the beginning of the query.  That way whoever gets in first will finish before the other query starts, removing the possibility of a deadlock on that one table.  This can be as simple as
Taking a table lock will reduce parallelism.  No other queries (except NOLOCK) will be able to read the table while the TABLOCKX is held.   However, the reduction in parallelism is generally less painful that experiencing deadlocks.  Also, queries that DELETE/UPDATE many rows in a single query will necessarily lock large parts of the table (often the whole table if lock escalation is triggered), so taking an explicit table lock for the query generally is not too onerous.
Note, another trick that I sometimes use in stored procedures that are looping through many records with possible updates is to take an explicit lock on the table at the beginning of a transaction.  For example, the following code will take an explicit table lock on MyTable, which it holds until the transaction is committed.
-       - - put your code in here to perform updates/inserts/deletes on the table(s)
A couple of notes for deadlocks and locking
  • The amount of locking and deadlocks exponentially increase as transaction time increase.  So, anything you can do to reduce transaction time will reduce contention.
  • It is common practice for middle tier applications (eg .NET) to start transactions and call the database server multiple times within the transaction.  This is OK, but it is even safer (and reduced transaction time) if you encapsulate the logic in a stored procedure, and let the stored procedure begin/commit the transaction.
  • SELECT queries can be called without locks (NOLOCK).  DELETE/UPDATE/INSERT queries cannot be called without locking.
  • Read queries (DELETE/UPDATE/INSERT queries also read queries for this point) generally only lock the current record/page by using CURSOR STABILITY lock isolation.  It is possible for them to use more onerous strategies, such as, REPEATABLE READ, in which case they do not release read locks until the end of the transaction.  This can lead to contention and deadlocks with concurrent update queries. Sometimes these REPEATABLE READ read queries will benefit from an explicit table lock.
  • Lock escalation is generally your enemy in avoiding deadlocks.  If you see that a query is regularly escalating to a table lock, it is probably better for you to explicitly take a table lock at the beginning of the transaction/query as I outlined above.
  • Transactions can be started and committed while a stored procedure holds position on a cursor.  This way, you can commit and release locks every n records while processing rows in a cursor.
  • Use TRY CATCH in your procedures to handle exceptions such as deadlocks.

Wednesday, March 20, 2013

Pitfalls of Big Data

I have just assisted in another Big Data project review.  There appear to be a set of common pitfalls that new Big Data developers fall into.   In this blog I hope to shed light on a few pitfalls that are quite easy to avoid, but difficult to remedy. 

  • Insisting on flat file input sources, then simply load these into flat database tables.  I understand that some source systems are remote and that flat files are the appropriate vehicle to transport data to your warehouse.  However, if the source system is a relational system within your operations centre, then it will be much easier and more reliable to simply read the data straight from the source system.  It is more reliable, as databases will ensure data typing and referential integrity.  It will be faster as it avoids all the IO involved with writing the flat file and reading it again.  There is also the maintenance issue of adding/removing columns from flat files.  Typically both consumers and providers need to be synchronised for such a change, whereas a database can accommodate this change without the need for consumer and providers synchronising their change.
  • Measuring your Big Data project by the number of tera-, peta- or exa- bytes. This is an almost irrelevant figure.  Just like the amount of money companies spend on R and D won't tell you which one will come up with the next big idea. Big Data projects should be measured by the beneficial value of the information they provide.  When specifying storage requirements, amount of storage always comes after performance specifications (MB/second and ms/IO, for example).
  • Engaging the services company with the best Big Data sales person. Consultancy and software firms have discovered that Big Data projects can be very large and profitable.  Consequently, their best sales persons are employed to win, and they aim for the biggest projects. The big project is not always in the interest of the client, and the best salespersons are not always followed by the best developers. 
  • Technology is not your most important decision. The technology you choose is important, but it's not the overriding factor.  Also, as a corollary, only choose the technology, as you need it.  For example, if you are planning on taking 12 months to load the data, leave selection of the visualisation technology until after you have started loading.  This will enable you to make a better choice on the software (you will have some good sample data) and the software market will be a few months more mature.  
  • Purchasing production hardware before you need it.  If your project isn't going live for 12 months, there is no need to purchase production software until close to that time.  At that time you will have a better idea of the specifications required, and the hardware will be a little more advanced and/or cheaper.
  • Taking Kimball or Inmon literally and make all dimension attributes Type II.   Some naive Big Data developers worship Kimball and measure their technical prowess by how many Type II dimensions and attributes they have.  So much so that not only do they make your DW performance suffer, they remove natural keys making querying on current values impossible for end users.  It's OK to have Type II dimensions, but only where they benefit the business users.  Almost always, when you have Type II dimensions, you should also keep your business (natural) keys in the fact tables.  For example, a transaction record might have a surrogate key for Customer, which points to the customer record, as it was at that time.    If the Customer attributes change, later transactions will point to a different Customer record with a higher surrogate key.  Now, when the business selects transactions for customers that belong to "Harry", they wont see transactions for customers that belong to Harry unless Harry was the customer owner at the time of the transaction.  This can be very confusing and annoying to business users.
  • Reload entire tables that only get inserted to from source systems.  This is quite obvious even to naive developers and consequently an uncommon pitfall.
  • Reject data from the business because it doesn't conform to a Big Data rule.  For example, reject transactions because they have an invalid ItemId column.  This is a kind of "Holier than thou" issue.  The Big Data custodians want to keep their warehouse clean so they reject bad data.  The problem is that this record was for a sale, it has an amount and is part of the gross sales.  By rejecting it, the warehouse has got an invalid view of the business.  A valid view would be to include the transaction, albeit, with an "unknown" sales item.  It might be that at a later time the sales item table comes through with that missing ItemId, which the warehouse should accommodate without rejection or prejudice.
  • Force your business users to using one querying/visualisation technology.  The Big Data project will have selected specific software for collection, loading, storing, cubing, data mining etc.  However, the business users should be free to select the technology of their preference for querying and visualising this information.  Nowadays, structured data is well suited to OLAP cubes, and most warehouses exploit OLAP cubes.   There are hundreds of software applications that can consume cubes, and the business users should be free to use the ones they prefer.  It doesn't matter to the warehouse how many different cube browser technologies there are.  It might matter how many concurrent queries there are, but that's a different matter.

Thursday, January 24, 2013

MDX Running totals on a nested ordered set

I like MDX puzzles, and just got a good one today.  The requirement was to have running totals over an ordered set, but the set was only ordered at the lowest level, and the running totals were just within the parent member.  Ie, a list of countries and cities, where cities are ordered descending by [net sales] within country, and [profit] has a running total for the cities within that country.

As I have said before, anything is possible with MDX, and here is the solution for ordering cities within countries and having running totals of profit of cities within countries.  I haven't ordered the countries here, but they could be independently ordered.

Essentially, what the query does is order the cities (along with city.All).  For the running total it takes the members from the City.All member to the current member and aggregates them.  Simple really.

set OrderedCities as
,[Measures].[Net Sales], DESC)  
member [Measures].[CityRank] as rank(([Geography].[Country].CurrentMember,[Geography].[City].currentmember),OrderedCities)
member [Measures].[CountryRank] as rank(([Geography].[Country].CurrentMember,[Geography].[City].[All]),OrderedCities)

member [Measures].[ProfitRunningTotal] as
Topcount(OrderedCities-TopCount(OrderedCities, ([Measures].[CountryRank]))

{[Measures].[Net Sales]
} on columns,

OrderedCities  on rows

from [MyCube]

Friday, December 14, 2012

What's happened to

For the last 11 years I have operated a public business intelligence demonstrations on
  •  and 
These sites are no longer available, I am afraid.   Last week I had a hardware failure and I don't have a budget for new hardware.  Hopefully, everyone is now clear how powerful and scalable Microsoft Analysis Services is and everyone is aware of the features in SQL Server data mining, Excel Services, PerformancePoint, close to real-time cubes with 1,000's of inserts/second etc.

There is a chance that someone might donate some hardware and I might put the demonstrations on again, but there is no plan at the moment.

Please don't hesitate to contact me should you have any questions regarding these demonstrations.

Sunday, December 9, 2012

Our SQL Server Administrator has left with the SA password. What can we do?

You have inherited a SQL Server 2012 system for which you don't know (or it doesn't exist) the sa password and don't have a Windows account with sysadmin privileges.  Or perhaps you have forgotten the SA password.  What on earth can you do?  All you have is the dbo (database owner) privileges.

You could try
  • Restoring Master database from backup.  However you won't succeed without the sa password.
  • Restore the Master database.  However, you will lose all server level configurations including logins, permissions, linked servers, replication, legacy etc.
  • Re-installing SQL Server from scratch.  However you will also lose all the server level configurations.
Here is a "backdoor" to SQL Server 2012, which can help you gain sysadmin access to SQL Server, so long as you are a member of the local Windows administrators group on the SQL Server machine.

SQL Server 2012 allows members of Local Administrators group to connect to SQL Server with SysAdmin privileges.

To create a new SYSADMIN account
  1. Start a command prompt using a Windows administrator account.  Navigate to the SQL Server 2012 binn folder (or have this folder in your environment path).  It is commonly found in c:\Program files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Binn
  2. From the command prompt type "SQLServr.exe -m" or "SQLServr.exe -f", which will start SQL Server.  Do not close the command prompt.
  3. Open up another command prompt and use the SQLCMD command to connect to a server and instance. 
  4. SQLCMD -S EasternMining\SQL2012
  5. You are now logged onto SQL Server from the command prompt.  You can now create a new account and granting it any permission. The following command will create the user RichardAdmin
  6. CREATE LOGIN RichardAdmin WITH PASSWORD='AV3ry53curep@ssw0rd'

  7. Now add the user to SysAdmin

  9. You have now added the new user and granted it SYSADMIN privileges.  Now, all you need to do is stop SQL Server and start in normal mode.  
  10. From SQL Management Studio you will be able to login using the new account and grant SYSADMIN privileges to other accounts.
  11. Don't forget the sa password or ensure there is a Windows account that has SYSADMIN privileges.

What's the future of SQL Server Analysis Services

There are now two main technologies to choose from when creating cubes in Microsoft's SQL Server 2012, the traditional multidimensional and the new tabular technology.  You really have to choose between these technologies as there is no easy way to switch between them and each one has its own server process.

I have written previously on how to choose between these (see ).  However, it is becoming more apparent to  me that Tabular is strategic and multidimensional is becoming legacy.  Not to say that Microsoft has whispered a single squeak on not supporting multidimensional in the future, but there are a few clues for us to see that Microsoft is not putting effort into this technology.    The multidimensional cubes have not changed much since SQL Server 2005.  This is through 3 SQL Server releases (2008, R2 and 2012).  Another big clue that has come to my attention is the survey where the SSAS team are asking for us to rate enhancements for the next version of SQL Server.  See   This survey only mentions Tabular by name.  There are no questions that pertain to multidimensional.  My conclusion from this survey (and other hints) is that Microsoft are not planning on enhancing multidimensional.

What does this mean?  Does it mean that new enterprise developments should be in Tabular?   I feel that we should definitely consider Tabular for all new developments.  If it supports the business requirements, as well as multidimensional, then use it.  Multidimensional has a wider feature set, so it will not be uncommon that you will decide to stay with multidimensional.  Two features come to forefront

  1. Tabular will only work if there is sufficient memory for the entire cube in memory.  Else you won't be able to open the database.  
  2. Several features in multidimensional are not supported in Tabular.  For example many to many dimensions.
It will be very interesting to see if the missing features in Tabular are worked on the the next version of SQL Server.

By the way, I would love to be wrong in this assessment, and will be happy to blog a retraction when proved wrong.  There is a massive base of multidimensional cubes supporting enterprise applications today.

Wednesday, August 29, 2012

Vote for cubes in memory

As I have blogged before large multidimensional cubes are not retained in memory even though there is plenty of RAM and msmdsrv is using less than LowMemoryLimit.  Msmdsrv tends to hold about 2GB of aggregation cache, although this isn't a limit and it may be a result of the cache cleaner being too aggressive.   Whatever the reason, the problem is that msmdsrv won't hold data in cache even though there is plenty of RAM.   I have raised this as an issue with Microsoft, but I need your support to get this changed in the product.  Please go to  and vote for this enhancement.

This enhancement will mean that if you have a 20GB cube on a dedicated 100GB server, msmdsrv will retain the entire cube in memory and no further IO will be required.  As the product is now, with a high query load, the IO throughput can be measured in terabytes during a busy day.

By the way, in SQL Server 2012 we have Tabular models, which will store the entire cube in memory, but these models don't have the functionality that multidimensional cubes have, and they only work if the entire cube can fit in memory.

Thank you for helping make SQL Server a better product.

Saturday, June 9, 2012

SQL Server 2012 Fast Table Loading

SQL Server 2012 Integration Services is similar to previous versions in that it supports extremely fast (optionally minimal logging) table loading.  Like previous versions, fast loading is assisted by having large commit sizes.  That is many thousands of rows are inserted in the same transaction.  This can be a challenge when there are occasional records which error on load, such as duplicate primary key, or data type error.  This post is about achieving very fast data loads while handling infrequent errors.  Essentially the pitfall to avoid is that when one record has an error, all records in that transaction are sent to the rejects, while you only want to send the error record(s) to the rejects.  Note, this isn't the only way, or necessarily the recommended solution, but it does illustrate the issue well, and will help you determine the best architecture to implement at your site.
Imagine the scenario where we have 2,000,000 rows to insert into a 50 billion row table.  We want to do it fast, but there is a possibility that some of the rows will already have been inserted (so we want to ignore them) and other rows might have data out of range and are truly rejects.  If we just implement a Data Flow with rejects going to an error table, we will have good records going to the rejects as all records in the "error" batch go to error destination.

So what we can do is take the error output and perform a lookup.  If the PK exists already we can ignore the record.  If the PK doesn't exist, then we try to insert it to the destination table for a second attempt.  This time, we set the commit size to 1, so that any records that error will not take good records with them.  The error output from this second attempt are truly bad records, since they don't exist and they have caused an error themselves.  During most executions there won't be duplicate records or errors so extremely fast loading will be achieved.  When there are errors, these will be handled efficiently.

Here is an example Data Flow from SQL Server 2012.  The input data has 2,000,001 records, one of which exists already in the Data Warehouse and can be ignored.  Notice how there are 2,000 rows rejected from the Data Warehouse, this is the batch size.  There is one duplicate record and the other 1,999 rows in the transaction are valid.  The Lookup task finds the match and sends that row to a row count (logging purposes) while the remaining 1,999 rows can be loaded into the Data Warehouse successfully.  The second attempt to insert is done with a batch size of 1.   If there were any errors in this second attempt they would be redirected to the rejects table.  Generally the rejects table will have a much lower data constraints so they don't error.

This is the connection properties on the Data Warehouse table (first attempt).  Notice the Maximum insert commit size is 2000.

This is the lookup component to see if the PK exists already.  This lookup is without a cache, as the destination table contains billions of rows and the package will generally be looking up few, if any, rows.  A full cache would be extremely expensive on such a large destination table.  Don't do that.  A partial cache probably is of no use since the lookups will tend to be all for different PKs.

This is the connection properties on the Data Warehouse table (second attempt).  Notice how the maximum commit size is 1.  So any errors will only affect 1 record.
As I mentioned earlier, there are many ways to perform fast load.  The main take away from this post is that an error record will spoil the whole batch.  So you have to deal with that.  The mechanism above, essentially retries to load records from the spoilt batches with a commit size of 1, so the residual errors are truly errors.  I have seen developers do something similar by performing checks before the load.  You should determine what is appropriate for your application, keeping in mind the behaviour of error batches.

Wednesday, May 23, 2012

SQL Server 2012 Tabular versus Multidimensional

Microsoft SQL Server 2012 has released with a new Analysis Services technology, Tabular.   Many organisations will now be choosing between Tabular or Multidimensional services.  This blog is aimed at helping decide between Tabular and Multidimensional, and implications of using them in tandem.
Tabular is simpler than Multidimensional, it does not support features such as write-back, complex calculations, named sets, many to many relationships and very large cubes.  So why would you use Tabular?  There might be several answers, but the one I am most interested in is Tabular models are held entirely in memory, providing exciting performance potential.

To test the memory aspects of Tabular versus Multidimensional, I created a large-ish cube in both.  Since Tabular is entirely in-memory, I had to experiment with a cube that didn't require more memory than I had available in my x64 server.  I had 8GB which translated to a fact table of about 450 Million rows from my WebLogs database.  I created equivalent Tabular and Multidimensional cubes with the same fact measures and 11 dimensions.

During the design process, Tabular model imports and massages the data.  I find this a bit annoying, but, it might be good feedback to a less experienced designer.  In processing; both models completed in less than 60 minutes, with the Tabular model about 20% faster.

Now, when I ran queries on the two models, Tabular was able to answer all queries with negligible IO.  The Multidimensional service was performing IO and its memory usage was fluctuating with the queries and managing its cache.  Both models were able to satisfy my random queries quickly.  However, it would be fair to say that the Tabular model returned random results faster, which, was largely due to it having all data in memory.  It was also noticeably faster at distinct count (calculated) queries.  The Multidimensional model tended to be just as fast when the submitted query required similar data to a recently executed query (answer in cache).  It was a little slower for queries on data that hadn't been queried recently (answer not in cache).  But it still returned results in less than 2 seconds for almost all random queries.  Note, my server did not have a solid state drive, which would have assisted the Multidimensional model.  Both models had only one partition, and I was interested to see that some Tabular queries were able to consume all 4 CPUs concurrently.

The resulting databases, on disk, were not too dissimilar in size, Tabular=7.5GB, Multidimensional=5.4GB.  You can see from the Task Manager, above, that at idle, the Tabular service was holding most of the 8GB of memory (peaking at 10.7GB during processing), leaving very little for any other service (SQL Server was squashed down to 350MB).  The Multidimensional service was only holding 280MB and had a peak working set of 4.9GB (peak during processing).  This is a bugbear of mine, that the Multidimensional service doesn't hold onto more memory, when it is available.  Something else you should be aware of, Tabular service will hold onto memory even though there are no queries active and memory is short, which could degrade other services on the same server.  I had to shut down Tabular when testing Mulitdimensional, so that it had a fair go at the RAM. 

For all the effort Microsoft has put into creating the Tabular model, I am left questioning why they didn't enhance Multidimensional's memory management to use memory when available.  The Multidimensional service will not hold a large cube in memory even though there is plenty of RAM to do so.  The caching in multidimensional service has several components, but the aggregation cache doesn't appear to exceed 2GB.  It also relies on the Windows file cache, which doesn't appear to offer much more caching ability.  This is particularly noticeable with cubes larger than 4GB and smaller than the available memory.  See my blog Why SSAS Doesn't Cache The Entire Cube?   This is the most significant performance limitation I have come across with large Multidimensional cubes and the reason I strongly recommend solid state drives for very large cubes.  It is also why adding RAM to a server stops being advantageous after a point.  Enhancing Multidimensional to use more RAM, when available, would have diminished the need to have a second olap technology.
In summary, I would like to make the reader aware that when employing Tabular models, the model must stay within the memory constraints of the server. So, if your server has 100GB of RAM, and is dedicated to your Tabular model, your Tabular model can not be larger than 100GB.  If you try to process larger models, they will fail with a memory error.  In a practical sense, more RAM is required to accomodate updating in parallel with querying the processed version.  Also be aware that Tabular models will consume memory, which could negatively impact other services.  On the other hand, if you can ensure your Tabular models are within RAM limitations, query performance will be reliably fast and completely independent of IO stress.  For my enterprise customers, since Multidimensional has no absolute size limit and is much more functional, it is likely to be their cube of preference for a while longer, with Tabular an option for specialty needs.

Saturday, April 21, 2012

The Perils of Using Asterisks in SQL Views

Asterisks (*) are extremely useful and widely used in writing SQL.  However, in a SQL Server view, the asterisk can cause some unusual errors and/or missleading data.
The perils of asterisks comes when one of the underlying tables is altered to add a new column, the view does something quite unexpected.  The view will still return the same number (and names) of columns, but the new column's data will be returned in the column to its immediate right, and all other columns to the right will display data from the column to its left.  The very right most column will have the same name, but its data will not be returned anywhere.  Because columns are returning data for other columns, the data types are no longer correct.  An INT can display a CHAR() etc.

Here is an example of a view where the first 2 columns are coming from t1 (using *) and CustomerBalance, CustomerDOB and CustomerAddress are coming from t2.

Here is the same view after adding a VARCHAR column to t1.  Notice how CustomerBalance now has character data, CustomerDOB has balance, CustomerAddress has DOB and the customer's address appears nowhere!  This character data in CustomerBalance is from the new column.

Here is the same view after executing sp_refreshview.  Notice that the new column is now appearing and all columns contain the correct data.

My recommendation is to avoid using asterisks in views to return all columns from a table, instead specify the columns required.  This is mildly annoying, but much less annoying than getting address data in a balance column.

However another possible solution is to use sp_refreshview which will update the metadata for the view (and include the extra columns in the view). 

Note, I have actually used 'SELECT * FROM v1' in the demonstration below, but the same results would be returned even if I had used 'SELECT t1PK, CustomerName, CustomerBalance, CustomerDOB, CustomerAddress FROM v1'

Also note, sp_recompile does not fix the issue, nor does a server reboot.
To reproduce the issue execute the following script

IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[t1]') AND type in (N'U'))
DROP TABLE [dbo].[t1];
IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[t2]') AND type in (N'U'))
DROP TABLE [dbo].[t2];
IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[v1]') AND type in (N'V'))
DROP VIEW [dbo].v1;

create table t1 (t1PK int, CustomerName VARCHAR(50));
create table t2 (t2PK int, CustomerBalance DECIMAL(13,2), CustomerDOB DATE, CustomerAddress VARCHAR(100));
insert into t1 select 1, 'Richard Lees';
insert into t1 select 2, 'Richie McCaw';
insert into t2 select 1, -500, '20580221', 'Sydney, Australia';
insert into t2 select 2, +1000, '19801231', 'Christchurch, New Zealand';

create view v1 as
 select t1.*,t2.CustomerBalance,CustomerDOB,CustomerAddress from t1
 inner join t2 on t1PK=t2PK;
SELECT 'View is OK at this stage' _;
select * from v1;
alter table t1 add t1_comments varchar(20) not null default 'Da di da';
SELECT 'View is NOK.  The column t2PKint now contains char data' _;
select * from v1;
sp_refreshview v1;
SELECT 'View is OK again' _;
select * from v1;

Thursday, April 19, 2012

SSAS Freezing. Needs restarting or server reboot.

Have you had the situation where SSAS becomes unresponsive, consuming too much memory and the only way out is to restart SSAS or to reboot the server?

With the default settings of SSAS this can happen with heavy MDX queries.  When I say heavy queries, I really mean queries that require large amounts of memory.  It is helpful to think of SSAS satisfying queries from within memory, which isn't exactly true, but it helps understanding.  This is very different to SQL Server, where a resultset does not need to be instantiated in memory, so that a 5 TB table can be returned from one query with a modest memory overhead.  With SSAS, the query cellset is generally constructed in memory.  So if your query asks for 100 columns and 500,000 rows, then think of 50 Million cells instantiated in SSAS memory. 

The memory required for a single query is not restricted by Memory\TotalMemoryLimit and can continue to consume memory until either the query completes or the system grinds to a halt with Windows page thrashing.

One might argue that you should write MDX queries that do not consume so much memory, but I believe that queries should not be capable of crippling the system.  Unfortunately there is no way to cap the memory consumption of individual queries.

The easy and recommended (by me) solution to this issue is to set the Memory\HardMemoryLimit (new in SQL 2008) to a value between Memory\TotalMemoryLimit and 100TotalMemoryLimit is 80 by default (that's 80%), so you might set HardMemoryLimit to 90, which will cause SSAS to begin cancelling queries once memory exceeds 90%.  The queries that get cancelled would have had little chance of succeeding.  They recieve the following message.
Server: The operation has been cancelled due to memory pressure.
My recommendation is to set the HardMemoryLimit even before you suffer from these heavy queries. It is better to have queries terminated than SSAS request excessive memory and cause Windows to thrash. If you want to err on the side of not cancelling queries, set the HardMemoryLimit closer to 100. If you want to minimise the possibility of having Windows thrash, set the HardMemoryLimit closer to TotalMemoryLimit. Never set the HardMemoryLimit equal to or less than the TotalMemoryLimit.

If you find that queries are being cancelled that you would like to have complete, then I would suggest you either tune the queries to use less memory or add more RAM to your x64 system.

Wednesday, February 29, 2012

Getting Previous Day's Value Efficiently

I have this saying; It is always possible to make something go faster.  Whether it is a database load, cube process, SQL query, MDX query or running a marathon.  If you put enough effort into it, you can almost always find a way of going faster (a corollary of the rule is a diminishing return effect).  I came across an interesting example of this today when a cube process was scanning a daily stocktake table, and it needed to also return the previous day's value.  That is the same product, but for the previous day.  The natural way to do this would be to join on the same table using the DATEADD() function with a DD of -1.  However, since this is a function, it obscures from the optimiser the sequence of the values returned and the optimiser thinks it needs to use a sort or hash table to cater for out of sequence.  This makes the query much slower.
Here is the query and its query plan
 ,st.StockOnHand-stPrev.StockOnHand StockChange
from tbStockTake st
inner join tbStockTake stPrev
on stPrev.ProductId=st.ProductId
and stPrev.StockTakeDate=DATEADD(dd,-1,st.StockTakeDate)

Notice how the join on the second tbStockTake table is sorted so it can be merged with the first tbStockTake table.  I am scanning 750 Million rows, so this sort is not trivial.  We know there is no need to sort the data, since the join will be in the same sequence as the first table.

Here is a very simple enhancement to the query
 ,st.StockOnHand-stPrev.StockOnHand StockChange
from tbStockTake st
inner join tbStockTake stPrev
on stPrev.ProductId=st.ProductId
and stPrev.StockTakeDate=st.StockTakeDate-1

Notice how this new query plan is able to Merge  Join the two tables without a sort.  This dramatically improves the query performance and tempdb resources.  The table is simply scanned with two cursors and merged.

The SQL Server optimiser is very clever in understanding that a clustered scan by ProductId, StocktakeDate is in the same sequence (so can be merged) with st.ProductId=stPrev.ProductId and st.StocktakeDate=stPrev.StockTakeDate-1.  It is not quite clever enough to determine this for st.StocktakeDate=DATEADD(dd,-1,stPrev.StocktakeDate), somewhat understandably since a function is involved.

Datetime in SQL Server (at least up until the current version 2008 R2) can have arithmetic applied to it.  1 is equivalent to 1 day, and fractions of 1 are equivalent to portions of the day.  For example, 0.25 is equivalent to 6 hours. 

Interestingly (or should I say unfortunately) we cannot use the same trick where a Date datatype is used instead of the Datetime datatype.

So, whenever you have something that you would like to go faster, simply apply some focus on it, since the chances are, there is a faster option.