Back when SQL Server 2012 was still in beta, I blogged about the new FORMAT()
function: SQL Server v.Next (Denali) : CTP3 T-SQL Enhancements : FORMAT().
At that time, I was so excited about the new functionality, that I didn't even think to do any performance testing. I did address this in a more recent blog post, but solely in the context of stripping time from a datetime: Trimming time from datetime – a follow-up.
Last week, my good friend Jason Horner (blog | @jasonhorner) trolled me with these tweets:
Hey @AaronBertrand found this today: SELECT DISTINCT CAST(CAST(GETDATE() AS DATE) AS VARCHAR) AS EmailDate FROM dbo.ReallybigTable
— Jason Horner (@jasonhorner) June 25, 2015
The refactoring for all you home gamers: SELECT format(GETDATE(),'yyyy-MM-dd')
— Jason Horner (@jasonhorner) June 25, 2015
My issue with this is just that FORMAT()
looks convenient, but it is extremely inefficient compared to other approaches (oh and that AS VARCHAR
thing is bad too). If you're doing this onesy-twosy and for small resultsets, I wouldn't worry too much about it; but at scale, it can get pretty expensive. Let me illustrate with an example. First, let's create a small table with 1000 pseudo-random dates:
SELECT TOP (1000) d = DATEADD(DAY, CHECKSUM(NEWID())%1000, o.create_date) INTO dbo.dtTest FROM sys.all_objects AS o ORDER BY NEWID(); GO CREATE CLUSTERED INDEX d ON dbo.dtTest(d);
Now, let's prime the cache with the data from this table, and illustrate three of the common ways people tend to present just the time:
SELECT d, CONVERT(DATE, d), CONVERT(CHAR(10), d, 120), FORMAT(d, 'yyyy-MM-dd') FROM dbo.dtTest;
Now, let's perform individual queries that use these different techniques. We'll run them each 5 times and we'll run the following variations:
- Selecting all 1,000 rows
- Selecting TOP (1) ordered by the clustered index key
- Assigning to a variable (which forces a full scan, but prevents SSMS rendering from interfering with performance)
Here is the script:
-- select all 1,000 rows GO SELECT d FROM dbo.dtTest; GO 5 SELECT d = CONVERT(DATE, d) FROM dbo.dtTest; GO 5 SELECT d = CONVERT(CHAR(10), d, 120) FROM dbo.dtTest; GO 5 SELECT d = FORMAT(d, 'yyyy-MM-dd') FROM dbo.dtTest; GO 5 -- select top 1 GO SELECT TOP (1) d FROM dbo.dtTest ORDER BY d; GO 5 SELECT TOP (1) CONVERT(DATE, d) FROM dbo.dtTest ORDER BY d; GO 5 SELECT TOP (1) CONVERT(CHAR(10), d, 120) FROM dbo.dtTest ORDER BY d; GO 5 SELECT TOP (1) FORMAT(d, 'yyyy-MM-dd') FROM dbo.dtTest ORDER BY d; GO 5 -- force scan but leave SSMS mostly out of it GO DECLARE @d DATE; SELECT @d = d FROM dbo.dtTest; GO 5 DECLARE @d DATE; SELECT @d = CONVERT(DATE, d) FROM dbo.dtTest; GO 5 DECLARE @d CHAR(10); SELECT @d = CONVERT(CHAR(10), d, 120) FROM dbo.dtTest; GO 5 DECLARE @d CHAR(10); SELECT @d = FORMAT(d, 'yyyy-MM-dd') FROM dbo.dtTest; GO 5
Now, we can measure the performance with the following query (my system is pretty quiet; on yours, you may need to perform more advanced filtering than just execution_count
):
SELECT [t] = CONVERT(CHAR(255), t.[text]), s.total_elapsed_time, avg_elapsed_time = CONVERT(DECIMAL(12,2),s.total_elapsed_time / 5.0), s.total_worker_time, avg_worker_time = CONVERT(DECIMAL(12,2),s.total_worker_time / 5.0), s.total_clr_time FROM sys.dm_exec_query_stats AS s CROSS APPLY sys.dm_exec_sql_text(s.[sql_handle]) AS t WHERE s.execution_count = 5 AND t.[text] LIKE N'%dbo.dtTest%' ORDER BY s.last_execution_time;
Results in my case were fairly consistent:
Query (truncated) | Duration (microseconds) | |||
---|---|---|---|---|
total_elapsed | avg_elapsed | total_clr | ||
SELECT 1,000 rows | SELECT d FROM dbo.dtTest ORDER BY d; |
1,170 |
234.00 |
0 |
SELECT d = CONVERT(DATE, d) FROM dbo.dtTest ORDER BY d; |
2,437 |
487.40 |
0 |
|
SELECT d = CONVERT(CHAR(10), d, 120) FROM dbo.dtTest ORD ... |
151,521 |
30,304.20 |
0 |
|
SELECT d = FORMAT(d, 'yyyy-MM-dd') FROM dbo.dtTest ORDER ... |
240,152 |
48,030.40 |
107,258 |
|
SELECT TOP (1) | SELECT TOP (1) d FROM dbo.dtTest ORDER BY d; |
251 |
50.20 |
0 |
SELECT TOP (1) CONVERT(DATE, d) FROM dbo.dtTest ORDER BY ... |
440 |
88.00 |
0 |
|
SELECT TOP (1) CONVERT(CHAR(10), d, 120) FROM dbo.dtTest ... |
301 |
60.20 |
0 |
|
SELECT TOP (1) FORMAT(d, 'yyyy-MM-dd') FROM dbo.dtTest O ... |
1,094 |
218.80 |
589 |
|
Assign variable | DECLARE @d DATE; SELECT @d = d FROM dbo.dtTest; |
639 |
127.80 |
0 |
DECLARE @d DATE; SELECT @d = CONVERT(DATE, d) FROM dbo.d ... |
644 |
128.80 |
0 |
|
DECLARE @d CHAR(10); SELECT @d = CONVERT(CHAR(10), d, 12 ... |
1,972 |
394.40 |
0 |
|
DECLARE @d CHAR(10); SELECT @d = FORMAT(d, 'yyyy-MM-dd') ... |
118,062 |
23,612.40 |
98,556 |
And to visualize the avg_elapsed_time
output (click to enlarge):
FORMAT() is clearly the loser : avg_elapsed_time results (microseconds)
What we can learn from these results (again):
- First and foremost,
FORMAT()
is expensive. FORMAT()
can, admittedly, provide more flexibility and give more intuitive methods that are consistent with those in other languages like C#. However, in addition to its overhead, and whileCONVERT()
style numbers are cryptic and less exhaustive, you may have to use the older approach anyway, sinceFORMAT()
is only valid in SQL Server 2012 and newer.- Even the standby
CONVERT()
method can be drastically expensive (though only severely so in the case where SSMS had to render the results - it clearly handles strings differently than date values). - Just pulling the datetime value directly out of the database was always most efficient. You should profile what additional time it takes for your application to format the date as desired at the presentation tier - it's highly likely that you're not going to want SQL Server to get involved with prettying format at all (and in fact many would argue that this is where that logic always belongs).
We're only talking microseconds here, but we're also only talking 1,000 rows. Scale that out to your actual table sizes, and the impact of choosing the wrong formatting approach could be devastating.
If you want to try out this experiment on your own machine, I've uploaded a sample script: FormatIsNiceAndAllBut.sql_.zip
The post FORMAT() is nice and all, but… appeared first on SQLPerformance.com.