The results you get out of any performance prediction exercise are bound to be wrong. The goal is to make them as least wrong as possible. Rob Bogue will help you understand how to avoid getting them too wrong.
One of my least favorite discussions in development is the discussion about performance. It's one of my least favorite because it requires a ton of knowledge about how systems work, and either a ton of guesswork or some very detailed work with load testing. I generally say that the results you get out of any performance prediction exercise are bound to be wrong. The goal is to make them as least wrong as possible.
I'm going to try to lay out some general guidelines for performance improvement through improving understanding about what performance is, how to measure it, and finally solutions to common problems. This article will cover the core understanding of the performance conversation. The second and third articles will cover session management and caching because they have such a great impact on performance — and on what solutions you can use to improve performance. The final article in the series will focus specifically on ways to improve performance.
No series of article (or book for that matter) could cover every possible situation that you can get into with computer system performance. My background includes nearly 20 years of work with systems from a VAX running VMS to more current projects which are based on Microsoft products such as .NET, Microsoft SQL Server, and Microsoft SharePoint. The concepts in this article are applicable to any complex system; however, I use the Microsoft platform including Windows, .NET, and SQL Server.
Performance and Scalability are often discussed together because they're linked. As you add more users (scalability) you reduce performance. While the conversation here is titled performance because the focus is around maintaining the performance of individual requests, thus scalability techniques are often discussed to reduce the amount of load on a server (number of users) to a point where performance is acceptable.
Of course, you can make decisions that will be good for performance but potentially bad for scalability—particularly around caching—however, for the most part when you improve performance you improve scalability and vice versa.
The reason for this approach is that performance is measurable. Scalability isn't measurable—or at least isn't easy to measure. As I'll discuss in the load testing section, without historic data, scalability assessments turn into educated guesses.
<h2BackgroundBefore the specifics of performance can be talked about, a few of the infrastructure things that can impact performance (and reliability) have to be discussed. As I get into talking about session state management I'll have to talk about how reliability factors into some trade offs between performance and resiliency. The first stop on the journey to understand how infrastructure impacts these discussions is network load balancing.
Network Load Balancing
Any solution of significant size will include a network load balancer. Whether that's the free Network Load Balancer (NLB) that is a part of Windows or whether it's dedicated hardware from a provider like F5 BigIP, multiple servers means balancing the load between them. It should be said that clustering, which is a fault tolerant strategy for ensuring availability is often confused with load balancing. As a general statement you load balance your application layers including web servers and set of back-end application servers and you cluster the databases. Load balancing improves reliability and scalability and clustering, in the Microsoft definition, only improves reliability.
In general, network load balancers have two basic modes. The first mode is session independent. In other words, a decision is made at the time of a request based solely on the perceived load to the servers. The server with the least perceived load gets the request.
I say perceived because the load balancer doesn't really know which server is the least busy, it can just take a guess by historic CPU time, responsiveness to requests, etc. It basically makes a guess at the right answer. Most of the time the guess is probably right but sometimes it's wrong. On the whole it works out, and generally in this mode the requests are serviced with a relatively even response time.
However, the goal of performance planning isn't generally to service all requests equally. That's like saying we want to serve everyone equally badly. In other words, if in order to serve people equally you have to serer them all poorly, then perhaps serving some folks better than others is acceptable. (You may remember “Harrison Bergeron”, Kurt Vonnegut’s story about a world where everyone was equal.) Generally speaking, the goal of any performance exercise is to service all requests with the best overall performance. This presumes that there’s not an overriding factor like transaction resiliency that take precedence. So in performance tuning, the total seconds to respond should be lower, even if occasionally a request or two takes a bit longer. That's where the idea of sticky sessions—or targeted sessions —fits in.
In a sticky sessions mode, also called pinning or affinity, the load balancer ensures that a user who starts on one web server stays on that web server for their entire browsing experience except in the case of a failure of the web server. This is done a variety of different ways including by the SSL session key—which is best, or as simply as the incoming IP address. So why would anyone want to do this. It would seem on the surface that distributing to the least busy server is the best answer.
The reason this isn't always the best answer is because the server that last serviced a user can cache a lot of information about that user and the content they're interested in. If Joe is looking at Product 123, then both Joe's profile and the product information for product 123 are likely going to be cached at some level on the server that Joe last used. This means that the next request can best —as in with less resources—be serviced by the last server Joe was on—not a different server in the farm. This reduces the workload for servicing Joe's request and in aggregate across thousands of requests reduces the overall load of the servers and the supporting infrastructure.
If you can accept that you want sticky sessions, despite the natural inclination that you want to spread requests equally between a set of servers, then you have a foundation to be able to make some intelligent choices about what has to be persisted about a session and where it needs to be persisted to be protected. You have the ability to decide whether to persist session data in memory on the server a user is on, store the session information in a database, or transmit it back and forth as a part of the page. Deciding that you want sticky sessions can also impact your caching strategies because you may be less concerned with managing distributed cache issues if users will be on the same server for all of their requests.
<h2BottlenecksIn computer systems we really deal with four primary bottlenecks. They are: CPU, Memory, Disk, and Network (or Communications). Most performance and scalability challenges break down into one of these four areas. When you're building a system you should consider the impact on each of these resources and ideally to test your system while monitoring these resources.
|In computer systems we really deal with four primary bottlenecks. They are: CPU, Memory, Disk, and Network (or Communications).
CPU issues are perhaps the easiest issues to spot these days. In Windows, you can fire up task manager and you'll see the CPU utilization. The key issue today for measuring CPU is to watch out for single threading. Any time you max out a single CPU in the system, you've got a problem.
The only other concern with looking at CPU time is to determine the overall utilization over a reasonably long period of time. 100% utilization for one second isn't a problem ; however, for fifteen minutes it's definitely an issue.
To get statistics on CPU utilization use Performance Monitor and in the Processor object include Percent (%) Processor Time for each CPU.
Memory issues are hard to find because there aren't good indicators for memory. The best answer is to look for at the Memory objects' Pages/ sec counter. This is a count of the times that requests for something had to be satisfied from disk rather than physical memory. Opinions vary about what this value should be. Generally, I don't get too concerned with activity below 100 pages per second, while ideally it should be zero or near zero.
One thing that you can do, from an infrastructure perspective that is just a configuration change, is minimize the paging file on the server. Paging files are really a holdover from when memory was expensive and it was occasionally necessary to swap out parts of a program to disk. In today's world memory isn't that expensive so you can generally buy all of the memory that you need. The problem with a large paging file is that some applications ask for the available memory to make decisions on how much to cache and can try to over cache when the virtual memory settings are high. One notable exception to this is SQL server which is exceptionally good at managing memory. It will make its allocations based only on physical memory and not on virtual memory.
Figuring out how much of your disk is in use isn't difficult; however, it can be tedious because ultimately it's necessary to measure the performance of each disk (or at least each disk array.) One of the most common challenges with disks is that most folks look almost exclusively at capacity when planning a system. From a performance perspective the concern is about how many IO operations you can get from the drive. This number is impacted by a number of factors like the interface of the drive (SAS is faster than SATA), the rotational speed of the drive (15K is faster than 10K which is faster than 7.2K), the track seek time, the number of partitions, the partition alignment (see Jimmy May's information on partition alignment), and the which array standard is in use (RAID 10 is better, from a performance perspective, than RAID 5).
The actual metrics for disk use are in the Physical Disk object. The first counter that is interesting is Avg. Disk Queue Length. This tells you how busy the drive is. The other counters that you want to watch are Avg. Disk sec/Read and Avg. Disk sec/Write. This tells you how long it takes to read/write information from the drives. Ideally this would be less than 20 ms. Each instance should be monitored separately since it can quite easily be that you're focusing disk activity onto one disk or disk array. It should be said that the counters in Windows report for each of the physical drives reported from the storage controllers. Most frequently these counters are per disk-array and aren't actually the individual disks.
Finally, there's little point in evaluating the disk performance numbers until you've resolved any memory issues because in low memory situations the disks are used as virtual memory. This isn't a desirable or normal situation so the results you see will be skewed when compared to normal operation.
The final area that can be a problem is network. Network could mean either the network connectivity to the clients of the application or can also mean connectivity between the servers in the solution. The good news here is that there are simple counters that you can look at. The Network interface object includes counters for Bytes Received/Sec and Bytes Sent/Sec. Since most connections are full duplex these days each number can be as high as the network connectivity. A one GB connection can send one GB and receive one GB at the same time—at least in theory.
The challenge with network interfaces is that the network card in the server may or may not be able to send and receive data at this rate. If there's a problem with the network interface card you'll likely see it with the Output Queue Length counter. This counter shouldn't be more than a few (less than 10). It can get higher than that if the network card isn’t capable at transmitting at the rate that the applications on the server want to send.
Making it more difficult is that the statistics from network interface cards are notoriously bad so you may not be able to trust the numbers that you're getting back from this counter. You'll want to cross check these numbers with the numbers from the switch the server is connected to.
The solution to network bottlenecks is to use a faster interface or to aggregate multiple network interfaces into one logical network interface. Most servers are shipping with two network adapters. Through configuration on the server and on the switch [look for Link Aggregation Control Protocol (LACP)] you can create a link aggregation group that can leverage two (or more) network interfaces as if they were one. This can help to address network connectivity issues.
Load Testing and Stress Testing
Watching performance numbers when a system is idle is like watching a pot waiting for it to boil—when you've not turned on the stove. Performance numbers are important things to watch but only when the system is active. If the system isn't active in some way there's nothing to watch. That being said, what do you do when you don't have real users on the system—or the next revision of the system—yet? The answer is a category of tests called load tests. In this set of tests you create artificial load on the system to measure the performance, the scalability, or determine the most likely points where the system will break.
Load testing which is often used to describe the broad category of tests is specifically the generation of user simulating load on the system. The fact that load tests are supposed to simulate the user is the largest part of the problem with the load testing concept. It presumes that you can break user behaviors into repeatable patterns which can be run over-and-over with some randomization of the data being used—and that you can estimate what percentage of the users will be following each of the different paths that you choose. The problem with this is that if you get the workloads (behavior patterns) wrong you end up with a test that isn't valid. If you get the balance between workloads wrong you end up with a test that isn't valid. In order to get a set of numbers you can reasonably rely upon you’ll need to run different workload mixes against the platform and compare the numbers to see what might happen if your guesses are wrong.
True load testing, often called scale testing, is very difficult to get right. Not only do you have the challenges with estimating user behavior—particularly if you have no baseline data to work from—but also because the implementation in the tooling isn't the greatest. Creating a load test generally involves recording a set of interactions and then going back and adding the data sources behind it. In other words, you record your clicks as you navigate to each page in a site and then go back later and manually add data sources to control those clicks so they don't hit the exact same pages over and over again. This is important because caching will have an unusually large positive influence if you keep hitting the same few pages. As a result the work to generate the load tests is pretty tedious and extremely fragile. If you change the user interface even slightly your load test scripts may need to be regenerated from scratch.
A variant of the load test is a performance test. In this case you're not concerned with the scalability of the application you're concerned exclusively with the performance (or responsiveness) of the application to the user. In this case you can simply record a script and review the responsiveness of the web site to the inquiries. This can be done with and without various artificial loads. This will provide some level of understanding about how the system should perform. It should be noted that often times loads interact with one another and so considerations should be made to test mixed loads.
The final type of load testing is called stress testing. In this kind of testing the objective is to break the system. Stress tests are designed to make the system work as hard as possible to see how it breaks. Sometimes web sites break completely when heavy loads are applied. Other systems simply slow down until their responsiveness isn't something the users would tolerate. The goal would be identify and resolve any parts of the application which cause the site to literally break. This is particularly true for any parts of the application which would require that the administrator to reset things to get them operational again.
Performance isn’t the deep dark scary secret that some folks make it out to be, however, it’s also not an exact science either. With some understanding of how systems work and how to measure the impact of software on systems you’ve got a foundation for making some decisions to test performance and develop better performance. Our next stop on our quest for performance improvement is session state since it can have a huge impact on overall performance.
About the Author
Robert Bogue, MS MVP Microsoft Office SharePoint Server, MCSE, MCSA:Security, etc., has contributed to more than 100 book projects and numerous other publishing projects. Robert’s latest book is The SharePoint Shepherd’s Guide for End Users. You can find out more about the book at http://www.SharePointShepherd.com. Robert blogs at http://www.thorprojects.com/blog You can reach Robert at Rob.Bogue@thorprojects.com.