How to use a recurring Integration Endpoint for importing data
You are here
Database Inserts - Code Data Entity Comparison
Reading this article is a good place to start as we continue testing but with data entities for the same table. We review performance from a Microsoft-hosted Sandbox environment as well a (new) PPAC-hosted Unified Development Environment, or UDE, which used to be known as a UNO environment - Unified Operations. In the charts below, The "DEV' label is a standard Dev VM deployed in Azure with the defaults taken from LCS ( most importantly just 3 disks ). The "TST" label is for a standard Microsoft hosted Sandbox as a UAT environment available as part of the standard license offering. Lastly, the "UNO" label is for a new PPAC based development environment. This time, however, i got some very unexpected results.
Single Insert
The explanation for this test can be found here. Similar to the this post on the table for this entity, Here we have the sets of workloads on 3 different deployment topologies. Each topology was given the same workload using the code pattern linked earlier. Each topology was tested with a workload size of 1, 10, 250, and 1000. The UNO group, now called a Unified Development Environment, performs the worst for all test sizes, next with a tier 2 in the middle, and our best performing topology is an Azure VM / CHE environment deployed from LCS. Across all 3 topologies though, when doing a single insert pattern, you get the best throughput with a set insert size of around 250 records. However, in general this pattern performs quite poorly as we'll see later on.
But Is A Data Entity Faster Than The Table It Represents?
No. In the graph below we have the same set of tests as above but on the left hand side we have 3 graphs for the data entity for a code table and on the right we have the same set of tests being run against the table directly rather than the data entity. With the exception one data point saying otherwise, across all topogies, workloads, and workload sizes, tables perform better than the data entities. And this makes sense when you consider a data entity is just another layer of abstraction / work to be performed when working with a table.
Multi Insert
The explanation for this test can be found here. We have the same workloads and topologies but all we're doing is creating a single transaction, inserting all records, the committing the transaction. Larger numbers are better. This results is much better performance. Again we can see the Azure VM / CHE performs the best, with a tier 2 in a distant second and a UNO/UDE environment in third place across all workload sizes. Again the "sweet spot" is around the workload size of 250 to the highest throughput. I assume this is due to workload sizes under 1k don't have to wait for SQL to log a set of changes on disk ( or something similar ) as we would with workload sizes at or above 1k record. I think the take away here is if you find that result for the UNO/UDE to be acceptable, similar results in Production will also be acceptable.
But Is A Data Entity Faster Than The Table It Represents?
Again, nope. Same graph presentation as above. Tables are faster than data entities for this workload and the Azure VM out performs all others for both a table and data entity insert workload.
Insert Record Set
The explanation for this test can be found here. The result of this test was very difficult to explain. When using a small number of inserts per transaction, the performance across all toplogies is rather unimpressive with small being less than 10 or so. However, at a workload size of 250 and above, we a massive improvement in performance that I can't specifically explain. The data entity, SalesOrderPoolEntity, isn't specifically configured to do set based processing so i'm not sure exactly what is explaining the fairly impressive performance numbers we're seeing here.
But Is A Data Entity Faster Than The Table It Represents?
Yes. I'm not sure how exactly, but yes. Same presentation as above with the left half being for the data entity and the right half for the table.
Record Insert List
The explanation for this test can be found here. However, I couldn't get test to actually run for this test. I gor a run time error stating that I cannot insert data into a read-only view. I accepted this and moved on. It's possible the RecordInsertList construction is aware of how an entity is different from a table and throws this error to prevent problems - potentially in a dual write scenario.
Record Sorted Insert List
The explanation for this test can be found here. Again, i got the same result as with the Record Insert List test - a run time error stating data cannot be inserted into a read-only view.
SysDA Insert Object
The explanation for this test can be found here. Same methodology as all other tests but again we see some interesting numbers showing that the SysDAInsertObject performs even better than a Insert_RecordSet. The SysDA API classes generate SQL much "closer" to the table than other options we have so maybe we're getting some kind of set based boost that we don't get with other options for inserting data. It's difficult to say specifically. However, the available throughput for inserting records makes SysDA appear to be a no-brainer for workloads that demand high performance and don't consider outside consumers into their execution. I suspect this doesn't fully support scenarios where dual-write is enabled.
But Is A Data Entity Faster Than The Table It Represents?
Yes. Again, i'm not sure how exactly, but yes. Same presentation as above with the left half being for the data entity and the right half for the table.
QueryInsertInsertRecordsetInsert
The explanation for this test can be found here. Same set of tests and topologies as before but again another set of potentially anomalous number. This performs than the Insert_RecortSet test but not as well as the SysDAInsertObject test.
But Is A Data Entity Faster Than The Table It Represents?
Yes. How? Not sure, but yes. Same presentation as above with the left half being for the data entity and the right half for the table.
General Reccomendations
I think general recommendations are clear. If you're using data entities in X++ and your work load is small, use the multi insert pattern ( open transaction, loop insert records, close transaction ). However, if you're not sure of the number of records you'll be inserting, use any set operator used above that works the best for you. More than likely, SysDA will work well for what you're try to do. Below is a large chart with all of the workloads and toplogies but you can which workload type is the clear winner for each type most of the time. As no surprise, the single insert pattern ( begin loop, open transaction, insert record, close transaction, loop until loop end condition ) always performs terrible. I have noticed that quite a few instances where a Query object has been used historical have been replaced with usage of a SysDA related alternative ( a projection ) so the performance improvements around SysDA are being used by Microsoft as well. All source related to how this data was generated can be found on GitHub.