Let's walk through creating a standard Build VM
You are here
Boosting Customer Hosted Environment (CHE) Performance
Let's examine how to get the most out of Azure spend for Tier 1 machines.
A while ago Denis Trunin wrote an article on Visual Studio performance for F&O. I thought it was a well written and researched article. However, I wanted to see what the limit was on performance per unit of money spent. I also wanted to see what the limits were, in general. If twice as more is twice as better, then 4 times as much must mean 4 times more better, right? We'll be using the DOOM Super Shotgun methodology of 2 is better than 1, and 4 is better than 2. I'll also be likening each part of the article to a map from classic DOOM (1993). For those of you DOOM fans out there, this should be fun. For the rest of you, let's take a journey. There is a summary at the end also, if you'd just like to skip to that. As a call out, I want to state I only focused on development experiences using specific metrics related to development. This may not specifically translate to any other experience inside a VM and certainly not to any Microsoft Hosted Environment (MHE) as those are created against a service fabric, not inside of a virtual machine.
The Setup (Hanger)
We'll be testing 1 metric that is very easy to measure in a systematic way using HTTP protocols. We will be testing the amount of time it takes for the VM to process the first request for the home dashboard page of F&O for an environment with demo data. The VM size, disk configuration and disk type will be changing so we'll be doing our best to tease out just storage component performance and keep all other items, such as RAM and CPU, as close to a DS13V2 as possible. There will be some variance based on size but we should be able to collect enough data to have a clear winner. For this series of test, we'll be ignoring cost just to get raw performance numbers. Later in the article we'll add the cost factor back in with a series of recommendations.
The Tests (Phobos Lab)
VMs Deployed With Defaults (Computer Station)
When reviewing performance of a VM deployed with the default number of disks from LCS, 3, you can see that modifying that value to any number higher will improve the overall performance for cold page load times. On the left, that's the default number of disks; 3. In the middle we have 16 and on the right we have 30. As a call out, when creating a VM in LCS, the value entered for number of disks will be 2 additional disks added for temp storage and the OS.
However, the same isn't exactly true for a cold database sync. With more disks, the operation becomes slightly slower. The chart below has a range of approximately 2 seconds so while we have a clear winner, it is a winner by 1 second of of 359 seconds. I'd call this test inconclusive given then time duration of the test. On the left is the default value of 3 disks. In the middle is 16 and on the right is 30 disks.
Next, if we add more disks, we can see a minor performance degradation when doing a cold compile of the Application Suite module. While 3 disks is the clear winner, given the time duration of the test i'd still count this test as inconclusive. However, as we'll see later, performance improvements and degradations are situational.
Testing Various VM Types from Azure (Deimos Lab)
Next, what does the VM series do for our tests? Below is a chart of 5 different VM sizes all deployed with defaults from LCS - only the VM Size (and Series) changed. With all things held equal except size, you can see we have 2 standouts in the pack when looking at cold page load times. From left to right, a B16MS takes 4th place, D8AS V4 takes third, D8DS V5 takes first, DS13V2 takes last place, L8ASv3 takes a very close second place with it performing .26 seconds worse than the first place D8DS V5.
But what about a cold database sync? The B series VM take last place, followed by the DS13, next the D8AS V4, with the D8DS V5 and LS8AS V3 being the top spots by a large amount - with D8DS V5 taking first.
Next, let's review a cold compile (smaller is better). This is were things got interesting. In last place was the D8AS V4 by a significant amount. in fourth place was the D8ds V5, followed by the L8as V3 in third, the Ds13 V2 in second and, much to my surprise, the B16MS took first place. The B series are a little different than the other VM sizes so we'll discuss later why this outcome could have been possible.
We've established a fairly compressive baseline for using 3 disks that are of the "HDD" Type. So let's see what kin of performance improvements we get from upgrading from "slow HDDs" to faster SSDs. We'll be comparing using the Premium SSDs from Azure so we can see if SSDs make a huge impact for different workloads. From left to right in the chart below, we have pairs of results with the results for the cold page load time test using HDDs on the left side of a pair and the result of using SSDs on the right hand side. For the left to entries, we can see that upgrading to Premium SSDs actually made performance worse. Again, this is the B series showing that its different than the rest of our VM sizes. The next two shows that for a D8AS V4 VM, SSDs get a minor improvement. The next two entries show that for a D8DS V5, there is no real difference when using an HDD or SSD. For the next two entries, we can see that premium SSDs improve the test results for the DS13V2 but even with premium hardware, it still performs worse than all other VM sizes. The last 2 entries show that for the L8AS V3 VM size, SSDs degrade performance every so slightly.
What happens when we add more disks? plainly, more is better, right? In the chart below, we ran the same test but with 16 disks rather than 3. As a call out, some VM sizes didn't support having 16 disks so those series where removed from the test. The D8AS V4 and D8DS V5 were removed. Just like the last chart, we have results in pairs comparing HDDs to SSDs for the same VM size. The left 2 entries show that for a B16MS with 16 drives, SSD and HDD makes nearly no difference. Next For the DS13V2, we get similar results with SSDs being worse by a tiny margin. The last 2 results show that with 16 disks, the L16AS V3 outperforms the other 2 VM sizes with both drives types and the HDD drives outperforming the SSDs.
Next, we have the same test but with 30 disk this time rather than 16. We nearly identical outcomes here.
Finally, what performs the best by VM size and disk count? The winner is.... really, the winners are in first place, the D8DS V5 with 3 disks with either HDD or SSDs for the drive types. in a very close second was the L Series VMs in any disk configuration for number of disks and type of disks. While the L Series didn't take the top spot here, we're starting to see a trend of this L VM series being a consistent very close to first contender.
Why are VMs slow when I first use them? (Spawning Vats)
Throughout this article, I've been referring to test with the qualifier of "cold". This means the test was run against a VM that hadn't run any processes, preloaded any assets, processed any requests, and batch was disabled. We are measuring the true amount of time for a process to occur under its longest possible runtime scenario. F&O / IIS plus SQL Server and even batch processing can help cache some data and resources on the VMs temporary storage so when the OS asks for something, its cached on the VM rather than in storage that is "far away". At a very high level, VM infrastructure is spread out in the cloud inside a data center. Unlike your laptop or desktop where everything is very near everything else, in the cloud that is not the cause. Electrons travel very fast but because of the physical requirements around how data centers are built, it's going to introduce a very tiny amount of lag time. This lag time comes from my VM is running on one compute cluster and when it needs data, it was to reach out to a storage cluster that is 400 feet away. that 400 feet maybe accounts for 1/1000000000 of a second, but that is per storage request so it is possible to start to see that lag when working with workloads that have lots of small files with lots of tiny reads. F&O falls into this type of workload because we have thousands of relatively tiny XML files that have to be loaded. Another great way to test performance is when the VM is "warm". A warm VM is something that has processed that request once and it has already "paid" for the spin up costs related to processing a request. Once warm, most everything IIS / F&O plus SQL needs is already cached so it can just pull it from a cache next to the operating system, not from a storage cluster some distance away. While cold performance can show up raw power related to getting everything required to process a request, warm performance shows us how rapidly the request can be processed when getting resources from storage isn't a major factor in the equation.
Warm vs Cold Performance (Tower of Babel)
While cold performance is a good metric on performance, we are interested in warm perform as that is likely what we'll be experiencing most of the time. In the chart below, we are again looking at page load times but comparing "warm" to "cold". As a general rule, if cold performance numbers are high, so are warm performance numbers relatively speaking. Below, we can see that the 3 best performing are the L series, the B series and the D8 V4 series. Lower is better for all charts in the section and the warm metrics are orange.
Next, we're going to compare cold vs warm for database sync. As we have seen earlier, the D8 V5 and L series VMs are the top performers. The B series VM is our worst performer but that could be to the unique way in which B series VMs work.
Next, we're look at compiling Application Suite cold vs warm. In this test, we get some result that don't align with other results. The fastest performing VM is the B series, with the DS13V2 VM in second and our historical top performers taking the last 3 spots.
Next, we're look at warm vs cold but also comparing HDD to SSD. No real surprises here in that our SSDs perform better, albeit a tiny bit, than HDDs for warm workloads most of the time.
Next, we're looking at cold vs warm page load times but seeing how have 16 disks effects overall performance. As a general rule, SSDs perform a tiny bit better than HDDs for a warm page load when the VM has 16 disks.
Same test as the last graph but we're looking at the differences based on 30 disks comparing warm HDD to warm SSD. In this configuration, there is nearly no difference between the two for this test
Next, this is the graph with all results in one view so we can compare and contrast. Key take aways from this graph are that the L series and D8 V5 series VMs perform well regardless of the disk configuration with the B series getting an honorable mention.
Further Testing of VS With Different VM Types (Slough of Despair)
After all of that testing, I wanted to attempt to draw out some meaningful performance numbers for Visual Studio. While doing a database sync or processing a page request is a good metric to show performance, it mostly don't matter for the developers out there. They want Visual Studio to be responsive and not freeze for 107 seconds because the F&O tools are busy doing something. For this series of tests, I installed dotTrace so I could collect some telemetry on VS itself.
To collect the telemetry for this test, I started dotTrace, began a trace then performed the following steps:
Open designer view of first table
View code of first table
When looking at the graph, we have somewhat mixed results. The top warm performer is a 3 drive HDD based L8S V3. From there, we have a somewhat mixed bag of results as the performing improvement percentage for some configurations is much larger than others. Also, some of the telemetry seem to contradict my personal experienced. The DS13 V2 VM taking second place for the warm test results with 3 SDDs doesn't align with my personal experience. DS13 V2 VMs seem slow to me.
Optimizing For The Wrong Thing (House of Pain)
One thing that can be lost along the way is what kind of performance we want. Each VM series in Azure is going to have different strengths and weaknesses for what we want the machine to do. Unlike the VHD experience, we're going to have to pick which types of experiences are the most important for a given cloud based VM. Let's look at the chart below. This is all "cold" tests for VMs with 3 disks with all axes' using seconds. The overall best performer is the grouping that uses the least total number of seconds - That's more than likely the B series entries. This again could be down to how the B series are unique compared to the other V series, or it could just be better for cold workloads. In these tests, Premium SSD vs HDD doesn't appear to offer much difference.
However, if we look at the same set of tests just against a "warm" VM, we get to see some differences. First, most everything scored lower (fewer seconds) which is expected. However, we see some areas run much faster than others. So, this is where we start to get to a decision point. Are low page load times what I'm most interested in or the lowest compile time? maybe the lower database sync? If we're using a VM as a Golden config ( or Gold ) environment, page load times are probably the primary metric we are about. If we're using the VM as a development machine, compile and db sync times become important. Based on what you want the best experience for, you pick a VM and config that matches that outcome. You can see some that different configurations result in much better db sync performance compared to application suite compile time.
Next, let's review the same tests for cold work loads but with 16 Disks. We see similar results but some of the much larger values related to compiling are now much lower; approximately 120 seconds. However, values in other areas are a little bit higher. This shows that for cold work loads having more disks improves compiling but slows down other operations.
Next, we'll look at the "warm" test results when using 16 disks. Similar results as the last chart with some of the extreme values being leveled out but some tests getting a little slower. however, we do have 1 stand out. The L series VM shows improvements in all areas as well as being the best performing of the group.
Again, similar set of cold tests but with VMs that have 30 disks. Depending on the workload, we see some improvements in some tests and overall less performance in others as compared to 16 disks. This supports the phrase that "more isn't better" unfortunately.
Similar to the last round of testing but with warm VMs that have 30 disks, see mostly minor improvement across the board except for the DS12V2 with SSDs.
Does Host Caching Matter? (Sever The Wicked)
Results are mixed. Depending on the workload you may see some very modest improvements or some not so modest degradations. In general, I wouldn't enable it per disk. We can squeeze some extra performance out of our disks with this option but it is not an area of major improvement. In the chart, running tests with no disk caching are displayed on the left half while the same set of tests are display on the right hand side with disk caching enabled. Improvement performance depends on the workload but overall, i would say that host write caching is a net negative, albeit a small one.
General Recommendations (Against Thee Wickedly)
So what are the major take aways from this? I have a few that fall into different categories.
Do These Always
You should schedule your VM availability if it is not required to be online 24 hours a day. There are several ways to do this but it can help with overall service health and also ensure that a VM is operational when you expect it to be. Because VMs use IIS Express, it's possible for IIS Express to crash and not re-spawn the process leaving the machine to appear offline. A daily reboot schedule will ensure at the beginning of the day that IIS Express has had a chance to start. Additionally, as a general rule, don't take the defaults in LCS and/or Azure. Abel Wang was right; Don't accept the defaults. Once a VM has been provisioned, change the size of the VM to be on the latest generation of the hardware if that's not available in LCS. For instance, L series and B series aren't an available option in LCS (at time of writing) but newer Generation E series and D series are. If the size you are selecting in LCS doesn't end with a "v4" or "v5", chances are high you'll need to change the size in the Azure portal after the VM has been provisioned. Always get the latest generation for VMs unless on a very strict budget. Lastly, based on all the charts, diagrams, and various other visualizations, I would say that premium disks such as any SSD aren't specifically required unless you have very specific needs such a high volume data migration VM. Perhaps we can revisit data migration performance some day. This is called foreshadowing.
When Creating a Developer Machine
In LCS, when you are asked how many disks you would like the VM to have, answer 14. LCS will create a VM with 14 + 2 disks. When asked for disk size, enter a value of 128. This will create 16 disk at 128 GB (OS disk varies a small amount). After the VM has been created, in the Azure Portal, change the VM size to be any of the available options in the B series if you are on a budget. If you are not on a budget, use anything in the L series. If L series are too expensive, you any series that you like ending in v4 or v5.
When Creating a Demo, Data Migration, or any other type
In LCS, when you are asked how many disk you would like the VM to have, answer 30 and give each disk a size of 64. This will create 32 disks at 64 GB (OS disk varies a small amount). Next, after the VM is created change it's size in the Azure Portal to the lowest cost L series VM available. This will give you most performance for a warm environment. Also, L series are somewhat expensive so be sure to schedule the availability of ths machine.
But what about the B Series?
The Azure B-Series VMs are unique in that they are burstable in a way that the other VM series are not. I won't opine here about the B-series other than for a pure development work load - extended sessions of typing hyphenated by periodic spikes in CPU and disk activity from compiles and database sync - the B series could be a great option if you're ok with not getting the best performance *all* the time. The other VMs we looked at in our tests where standard you pay for X, you get 100% of X. B series don't follow that model so you could have "banked performance tokens" that you can cash in should you need them. In order to truly test B series VMs against all others, I would have to control testing against B series VM for test execution as well as measured time between tests in order to get an accurate set of readings. I didn't do that and I think that is what accounts for the B series in all battery of tests being sort of all over the place in terms of values. It may have used some "performance tokens" or it may not have had any to spend. I can't say. However, the B series is around 150 USD cheaper than a DS13V2 with similar specs. B series are also a great option if you want to leave your VMs on, unattended. This isn't recomended but not all projects have a need for VM automation.
Failed Ideas To Improve Performance (Unto the Cruel)
Every VM has a warm up script that is executed on boot. You can find in the scheduled tasks on the VM. The script itself is in E:\AppRing3\10.0.886.48\retail\Services\DevToolsService\Scripts\ in script WarmUpDiskCache.ps1 (or similar folder path). You can modify this script to include any number of tasks that would help warm your VM. The things I tried are:
- Run Full Compile
- Run Database Sync
- Copy all files from K:\AosService\PackagesLocalDirectory to D:\Temp
- Use Powershell to connect to IIS Express to process the first page load
Results were mixed. This assumes that the VM is scheduled to boot at a given time and built into that schedule is time to process any type of warm up requests. I was able to reduce some cold process requests but I didn't feel like the work was worth it. Any changes to the WarmupDiskCache script will be removed on any install of a MSFT software deployable package such as a quality update and/or a new release. However, Andre Arnaud de Calavon has some ideas that could assist for a demo.
Next Steps (Hell On Earth)
When looking at costs, you can see here a standard chart of costs for the VMs and specs reviewed in this article below
You can see the B series are the lowest cost, regardless of specifications, when run in US East 2 for 730 hours. Using the costs above, we can start to see some very clear winners and losers in terms of performance / dollar spent. Creating a easy to understand graph with all the data collected plus costing was difficult so after a few attempts, I've given up. What the cost chart shows you is that number of disks marginally increases costs in most circumstances. What all the others charts show is that, in general, more disks is better than faster disks, newer hardware is faster than older hardware, and should cost not be a major factor, use the L series VMs. Lastly, you can mix and match different settings to get different effects so feel free to try different things and find settings that work for your workload and your budget.