Richard Lees on BI

Monday, June 21, 2021

Using REST API to Get JSON Data Into SQL Server

JSON via REST API (aka RESTful API) is becoming a common offering of cloud providers for their clients to access data. JSON format has an advantage of self defining arrays, sets, and attributes. However, it also has the disadvantage of being extremely verbose, with every field in every record
having its own definition. It's not uncommon for each http call to take 100-200ms. That means that you will max out at 5-10 calls per second. This is problematic for system (retail for example) where there are hundreds or thousands of transactions per second. Note, it's not unusual to have multiple records per REST API call, but it is still very verbose. Compare this to a compressed file, or, even faster, a direct database connection, where data is kept in native data formats, such as binary and decimal. Direct database connections have very little meta-data, which keeps the data compact, and means there is little transformation required on importing.

Having said that, there is no discussion if the cloud vendor only supports REST API. Fortunately, this is relatively simple in SSIS, but it does require an SSIS Script task. The Script task could be a Data Source in a Data Pump, or it could just be a Script task that also inserts within the task. I like using a Data Pump, although it's difficult to output columns of Varchar(max). A workaround is to break the output record into multiple 8000 byte columns. This isn't a very elegant solution, so you may want to write the JSON object directly to a staging database from the Script task. You could unpack the JSON objects in the Script task, but SQL has great JSON functions, and I prefer that Script tasks do as little as possible, as they are "hidden" within the SSIS package. So, all the script task needs to do is get the data and write the outside JSON object (maybe several records within) to a staging database. And if there are multiple pages in the http response, loop through the pages. A subsequent SQL proc can then unpack and load the data into the consolidated SQL database.

Unpacking the JSON data is relatively easy with SQL's JSON functions, JSON_VALUE, JSON_QUERY, and SQL's CROSS APPLY. Essentially, every array (defined by square brackets) will become an SQL table. So each array within an array becomes a dependent (child) table. Each JSON attribute will become an SQL column, and each JSON set (defined by curly brackets) will become column attributes, but their name will inherit the name of the set.

A good REST API cloud provider will

Make it clear what the primary keys are for all arrays (as they were/become tables).
Make it clear what the data types are for all attributes. That's not part of the JSON definition, and if you get it wrong, it will bite you when an attribute arrives that is out of bounds.
Give you the option of filtering records based on last updated/inserted datetimes (or high water marks of some other kind). Otherwise, you will have to download and unpack all records to find out what's changed or added.

There are several vendors that offer tools that assist SSIS packages in making REST API calls. They can assist, but, in my experience, they add another layer of abstraction, and some complications in upgrading, and setting up test servers. So, I prefer to avoid them where I can and simply write a few tailored lines of script code.

Friday, April 30, 2021

Data Tune - Don't Wait For a Problem

Just like the human body, we should keep our databases tuned and running efficiently. Your doctor will recommend to you to monitor and manage your cholesterol, body fat, and general health. We shouldn't wait until we need a gastric band before we change our eating habits. It's the same with a database system. We shouldn't wait for a problem. Rather, we should continually monitor and tune our databases.

How do we do this

Continually monitor total CPU usage. If it's over 90% for sustained periods, there's little headroom and you should tune your expensive queries (and/or add more cpu resource). Note, expensive queries are the ones that use the most "total" CPU. So a query that runs 5 times per second using 0.5 seconds is 100 times more expensive than a query that runs once per hour using 90 seconds CPU.
Continually monitor average read and write IO latencies. If average latency is over 15ms for more than a few minutes per day, then you have an emerging IO bottleneck. The solution is often to tune queries that incur lots of IO. These queries often only require better predicates, updated statistics, or an index to prevent a table scan. Note, if your queries are well tuned, the solution might be to increase RAM (SQL holds more in the buffer so needs less IO) or to upgrade the disk subsystem to support higher IO rates.
Periodically enable SQL Profiler and find the most expensive SQL queries. It might surprise you how much you can take off the total system resources by tuning the top 10 SQL queries
Look at your compression strategy. SQL Server likes compressed tables. They take less space in the data cache and require fewer IOs for a full/partial table scan. The CPU overhead in reading these tables is almost negligible.
Look at your fillfactor for large tables. If you are sequentially (by clustered key) loading these tables, the default of 10% is just wasting space on disk and data cache. Use fillfactor=100 for sequentially loaded tables.
Etc.

If you ignore your database until there is a performance issue, it will be hard to fix. With a performance bottleneck, adding Profiler traces etc, will only make it more unstable, initially. So it will be hard to find the issue without further slowing down the system.

Continual tuning will also keep your system overhead to a minimum and you'll get payback from lower hardware and licensing costs. Note, as you add cpu (and to some extent RAM) to your server, your SQL Server licensing costs will increase too. So keep your database systems tuned and healthy.

I am happy to discuss this topic in more detail as it applies to your systems.

Monday, April 12, 2021

Stop Unnecessary Alerting

I hate alert systems that send out so many alerts that they are relegated to an unmonitored folder. That makes the alerts of no use. However, it is useful to have alerts for failed jobs.

Here is some simple code for SQL Server Agent that monitors the execution log. This code can run every few minutes (or hours) to check for alerts and alert resets. It only reads msdb.dbo.sysjobs_view and employs a user log table. It monitors all SQL Agent jobs.

This alert application, will only send out alerts when a job first fails, and will suppress subsequent alerts until the job runs successfully (resets alert). It will also send out an "OK Now" alert after a "NOK" alert once the job successfully runs.

This way, you should not need to ignore any "NOK" alerts. Also, the only success alerts will be the "OK Now" once a failed job completes successfully.

Here is the raw code. I suggest you use it to create a stored procedure and tailor it for the tolerances you prefer. MyDev, is your own work databse.

USE MyDev
go
CREATE TABLE [dbo].MyAlertLog (
       [MyLogDateTime] [datetime] NOT NULL DEFAULT getdate(),
       [JobId] [uniqueidentifier] NOT NULL,
       [JobName] [varchar](512) NOT NULL,
       [AlertStatus] [varchar](25) NOT NULL,
       CONSTRAINT [PK_JobAlertLog] PRIMARY KEY CLUSTERED
       (      [MyLogDateTime] ASC,
              [JobId] ASC
       ) ) ON [PRIMARY]
GO
delete from MyDev..MyAlertLog
       where MyLogDateTime<getdate()-365;
if object_id('tempdb..#Jobs') is not null
    drop table #Jobs;
if object_id('tempdb..#RunStatus') is not null
    drop table #RunStatus;
select --get list of all candidate jobs
              j.job_id
              ,j.name
       into #Jobs
       from msdb.dbo.sysjobs_view j
       where j.enabled=1
declare @JobId as uniqueidentifier
              ,@JobName as varchar(512)
              ,@LastRunStatus as varchar(25)
              ,@MaxRunStatus as int
              ,@MySubject as nvarchar(255)
              ,@MyBody as varchar(8000)
              ,@LastSuccessfulRun datetime
              ,@MyRecipients varchar(8000);
set @MyRecipients = 'Me@MyCompany.com;Someone@TheirCompany.com';
if cursor_status('global','MyJobsCursor')>=-1
       begin
              deallocate MyJobsCursor;
       end
declare MyJobsCursor cursor for
       select * from #Jobs;
open MyJobsCursor;
fetch next from MyJobsCursor
       into @JobId, @JobName;
while @@FETCH_STATUS = 0
       begin
              select @LastRunStatus=
                      (select top 1
                                           case when h.run_status=1 then 'Success' else 'Failure' end JobStatus
                             from msdb.dbo.sysjobs j
                                    inner join msdb.dbo.sysjobhistory h
                                           on j.job_id=h.job_id
                             where j.job_id=@JobId
                             and h.run_date>=convert(int,convert(varchar(10),getdate()-3,112))
                             and step_id=0
                             order by h.run_date desc, h.run_time desc);
              if @LastRunStatus='Failure'
                      and --Check that this is first Failure so that repeated Failure alerts aren't sent out
                             ((select top 1 AlertStatus from MyDev..MyAlertLog where JobId=@JobId order by MyLogDateTime desc)='Success'
                             or not exists (select top 1 AlertStatus from MyDev..MyAlertLog where JobId=@JobId))
                      begin --Failure. Check for passing threshold first
                             select top 3 h.run_status --Top 2, so that, at least 2 job failures before alerting
                                           into #RunStatus
                                    from msdb.dbo.sysjobs j
                                           inner join msdb.dbo.sysjobhistory h
                                                  on j.job_id=h.job_id
                                    where j.job_id=@JobId
                                           and h.run_date>=convert(int,convert(varchar(10),getdate()-1,112))
                                           and step_id=0
                                    order by h.run_date desc, h.run_time desc;
                             select @MaxRunStatus=max(run_status) from #RunStatus;
                             if @MaxRunStatus=0
                                    begin --Failure alert
                                           set @MySubject= 'MyApplication Job Failure Alert:'+@Jobname;
                                           set @MyBody='Last successful run date (yyyymmdd) =' +(select top 1 convert(varchar(20),run_date)+', time (hhmmss)='+convert(varchar(20),run_time)
                                                                        from msdb..sysjobhistory
                                                                        where job_id=@JobId
                                                                        and run_status=1
                                                                        order by run_date desc, run_time desc);
                                           if @MyBody is null
                                                  select @MyBody='Last successful run unknown';
                                           EXEC msdb.dbo.sp_send_dbmail
                                                  @recipients = @MyRecipients,
                                                  @body = @MyBody,
                                                  @subject = @MySubject;
                                           insert into MyDev..MyAlertLog
                                                  select getdate(),@JobId, @JobName,'Failure';
                                    end
                             if object_id('tempdb..#RunStatus') is not null
                                    drop table #RunStatus;
                      end
              else if @LastRunStatus='Success'
                      and (select top 1 AlertStatus from MyDev..MyAlertLog where JobId=@JobId order by MyLogDateTime desc)='Failure'
                      begin --Success
                             set @MySubject= 'MyApplication Job Failure Alert:'+@Jobname+', NOW OK'
                             set @MyBody='Last successful run =' +(select top 1 'date: '+convert(varchar(20),run_date)+', time (hhmmss): '+convert(varchar(20),run_time)
                                                          from msdb..sysjobhistory
                                                          where job_id=@JobId
                                                                 and run_status=1
                                                          order by run_date desc, run_time desc);
                             EXEC msdb.dbo.sp_send_dbmail
                                           @recipients = @MyRecipients,
                                           @body = @MyBody,
                                           @subject = @MySubject;
                             insert into MyDev..MyAlertLog
                                    select getdate(),@JobId, @JobName,'Success';
                      end
              if @LastRunStatus='Failure'
                      and --Check that there has been a success since Failure alert
                             (select top 1 AlertStatus from MyDev..MyAlertLog where JobId=@JobId order by MyLogDateTime desc)='Failure'
                      begin
                             select @LastSuccessfulRun=
                                    (select top 1 dateadd(hh,convert(int,left(right('0'+convert(varchar(6),run_time),6),2))
                                                  ,dateadd(mi,convert(int,substring(right('0'+convert(varchar(6),run_time),6),3,2))
                                                  ,dateadd(ss,convert(int,substring(right('0'+convert(varchar(6),run_time),6),5,2))
                                                                        ,convert(datetime,convert(char(8),run_date),112)
                                                                        )))
                                                                 from msdb..sysjobhistory
                                                                 where job_id=@JobId
                                                                        and run_status=1
                                                                 order by run_date desc, run_time desc);
                             if @LastSuccessfulRun>(select top 1 MyLogDateTime from MyDev..MyAlertLog where JobId=@JobId order by MyLogDateTime desc)
                                    --Last run was Failure, but there was a success, so send out OK
                                    begin --Success
                                           set @MySubject= 'MyApplication Job Failure Alert:'+@Jobname+', NOW OKish'
                                           set @MyBody='Last successful run =' +(select top 1 'date: '+convert(varchar(20),run_date)+', time (hhmmss): '+convert(varchar(20),run_time)
                                                                        from msdb..sysjobhistory
                                                                        where job_id=@JobId
                                                                               and run_status=1
                                                                        order by run_date desc, run_time desc)
                                                                        +'. Note, there have been failure(s) since.';
                                           EXEC msdb.dbo.sp_send_dbmail
                                                  @recipients = @MyRecipients,
                                                  @body = @MyBody,
                                                  @subject = @MySubject;
                                           insert into MyDev..MyAlertLog
                                                  select getdate(),@JobId, @JobName,'Success';
                                    end
                      end
              fetch next from MyJobsCursor into @JobId, @JobName;
       end
close MyJobsCursor;
deallocate MyJobsCursor;