How to use a recurring Integration Endpoint for importing data
You are here
Does Specifying Columns in Select Statements Matter?
Lets find out if specifying columns in our SELECT statements has any performance impact.
The Setup
In order to test if specifying column names in a select matter, we'll have to select several records from the same table where we have not selected that specific record before our test. We want to measure true performance including SQL overhead plus binding to a common buffer. If caching is present, we'll not be capturing the "true" performance. Caching may help overall performance but we can specifically rely on it to say our code is "fine" from a performance perspective as some data may get cached and other data may not. Caching is a variable we want to exclude as much as possible. In the code below, I have 6 basic tests selecting data from table CustTable. First, we're selecting all columns, next just 1 column, next 10 columns, after that we're selecting all columns again with a firstOnly keyword, then 1 column with a firstOnly keyword, then finally 10 columns with a firstOnly keyword. For each test, we're capturing the time it takes to fetch and bind the data to a common buffer. If fetching more columns takes more time, this should expose those. All times below are in milliseconds.
The Code
internal final class DatabaseColumnSelectTester
{System.Diagnostics.Stopwatch stopWatch;
public static void main(Args _args)
{
CustTable CustTable;System.Diagnostics.Stopwatch stopWatch;
//WarmUp
select firstonly1 * from CustTable;//RecId chosen from USMF contoso data to get different records so caching isn't specifically observed
//IIS and SQL restarted as well right before running this test.
stopWatch = new System.Diagnostics.Stopwatch();stopWatch.Start();
select * from CustTable
where CustTable.RecId == 68719490889;stopWatch.Stop();
Info(strfmt("select * from CustTable in %1", stopWatch.ElapsedMilliseconds));
stopWatch = new System.Diagnostics.Stopwatch();
stopWatch.Start();select AccountNum from CustTable
where CustTable.RecId == 68719490890;stopWatch.Stop();
Info(strfmt("select 1 from CustTable in %1", stopWatch.ElapsedMilliseconds));
stopWatch = new System.Diagnostics.Stopwatch();
stopWatch.Start();select PaymTermId, LineDisc, PartyCountry, AccountNum, AccountStatement, BankAccount, Blocked, CashDisc, CashDisc, ClearingPeriod from CustTable
where CustTable.RecId == 68719490891;stopWatch.Stop();
Info(strfmt("select 10 from CustTable in %1", stopWatch.ElapsedMilliseconds));
stopWatch = new System.Diagnostics.Stopwatch();
stopWatch.Start();
select firstonly * from CustTable
where CustTable.RecId == 68719490892;stopWatch.Stop();
Info(strfmt("select firstonly * from CustTable in %1", stopWatch.ElapsedMilliseconds));
stopWatch = new System.Diagnostics.Stopwatch();
stopWatch.Start();select firstonly AccountNum from CustTable
where CustTable.RecId == 68719490893;stopWatch.Stop();
Info(strfmt("select firstonly 1 from CustTable in %1", stopWatch.ElapsedMilliseconds));
stopWatch = new System.Diagnostics.Stopwatch();
stopWatch.Start();select firstonly PaymTermId, LineDisc, PartyCountry, AccountNum, AccountStatement, BankAccount, Blocked, CashDisc, CashDisc, ClearingPeriod from CustTable
where CustTable.RecId == 68719490894;stopWatch.Stop();
Info(strfmt("select firstonly 10 from CustTable in %1", stopWatch.ElapsedMilliseconds));
}
}
Test Results
select firstonly 10 from CustTable in 2
select firstonly 1 from CustTable in 2
select firstonly * from CustTable in 9
select 10 from CustTable in 2
select 1 from CustTable in 3
select * from CustTable in 15
The results are clear that in our simple example selecting all columns from a table is slower than selecting just 1 ( or 10 columns ). The use of a firstOnly does help with performance also. Selecting all columns without a firstOnly keyword takes approximately 500% more time than selecting just 1 column. Selecting all columns with a firstOnly keyword takes approximately 450% more time than select just 1 column. We're working with a limited data set with sample data but we can see clear indications that whenever practical we should specify the columns we wish to select from a table and also use a firstOnly whenever we are selecting only 1 record. When creating a find() on table, we should also be using firstOnly. Keeping our usage of columns highly specific will help ensure all cusotmizations are performing at their highest and no new performance issues are introduced due to extensions. That is not to say we can specifically improve the base product - we can only make it not worse.
Caching
I thought it'd be interesting to see the results of running the same test twice without ensuring that caching wasn't altering our test results. Those results are below. You can see once caching comes into play, using a time measurement of milliseconds doesn't give us enough precision to measure the differences between performance for each test.
select firstonly 10 from CustTable in 0
select firstonly 1 from CustTable in 0
select firstonly * from CustTable in 0
select 10 from CustTable in 0
select 1 from CustTable in 0
select * from CustTable in 0