Introduction

For some scenarios, MAP requires the collection of performance data. Use the Performance Metrics Wizard to gather information about the CPU, memory, disk, and network utilization of computers for a duration you specify. The minimum time required is 30 minutes of successful collection. If MAP fails to collect at least 30 minutes of performance data, you will not be able to run any of the wizards that require performance data. Be aware that time spent performing inventory data collection of machines may delay the start of performance data collection; it is therefore recommended that you perform an inventory of the target machines prior to running performance collection.

It is recommended that you conduct an initial 30-minute test run to ensure you are collecting data from the target machines. Once you are satisfied that MAP is able to collect data from the target machines, you can conduct a full run. When deciding how long to run the performance data collection, we recommend you consider the following:

  • A minimum of 2 days’ worth of data is needed for the statistical calculations to start lining up with real-world performance.
  • A run of 7 days or more is recommended in order to capture any variation in the utilization profiles of the targeted machines over the work week plus the weekend.
  • If there is reason to believe that the utilization profiles may vary for some other reason, such as end-of-month activities, then the performance collection should be run during that period of time as well.
  • Be aware that gaps of time between performance collection runs can affect the calculations used for aggregating performance data. It is therefore recommended that you not have long gaps of time, and that you collect from the same set of machines each time.

Performance counters are collected from each computer in 5-minute intervals. The number of computers from which the MAP Toolkit can collect performance counter data successfully depends upon factors such as network latency and the responsiveness of servers.

Note   If you have previously gathered performance data, you will be prompted on subsequent performance counter gathering runs to either delete existing data or to append the newly gathered data to what was collected previously. If you split up your target computers to improve performance, select No in the Performance Data Exists dialog box.

How It Works: Performance Data Normalization and the Time Series Placement Algorithm

Starting in MAP 6.0, two significant changes were made to how the performance data is used and aggregated in the Server Consolidation scenario. These same comments apply to the Microsoft Private Cloud Fast Track scenario that was also introduced in MAP 6.0. The changes are:

  • Time is taken into account when a machine is considered for placement on a host Hyper-V server.
  • A 95th percentile aggregate is used to aggregate the performance metrics sampled over time rather than a max or average aggregate.

Another significant change in MAP 6.0 is the introduction of the notion of an “infrastructure” in order to support the Microsoft Private Cloud Fast Track scenario. This same idea of an infrastructure was also added as an option to the existing Server Consolidation scenario. The following sections give more details on how these changes work.

95th Percentile of a Performance Metric

When you collect performance data with MAP, a variety of performance metrics are sampled every 5 minutes for the included machines. Consider the metric %CPU utilization for a hypothetical machine Guest1. The sequence of %CPU utilization samples taken from Guest1 over time might look like the following where each pair is the elapsed time expressed as Hours:Minutes:Seconds since data collection began followed by the %CPU utilization:

(00:00:00, 25.5),  (00:05:00, 36.2),  (00:10:00, 24.4),  (00:15:00, 41.33),  (00:20:00, 57.41),  ...,  (47:55:00, 29.6),  (48:00:00, 33.7)

When you have a sequence of %CPU utilization samples over time like this, one natural question to ask is “What was the %CPU utilization of Guest1 over the entire time span?" This, correspondingly, begs the question of how you aggregate this sequence of numbers into a single number representing the %CPU utilization for the entire time span; this is where aggregates like average, max and 95th percentile come into the picture.

Prior to MAP 6.0, the average or max aggregates were used when reporting performance metrics or when using performance metrics to place guests in the Server Consolidation scenario. However, for capacity planning exercises like the Server Consolidation scenario, a better aggregation method is to use a percentile aggregation with the 95th percentile being the typical choice. The 95th percentile of a sequence of %CPU utilization samples like the above is defined as the minimum sample S for which 95% of the samples in the sequence are less than or equal to S. Typically this will mean that 5% of the samples are greater than S. Why is this a good aggregation choice for capacity planning? If you plan enough capacity for the 95th percentile of a sequence of %CPU utilization samples over time, then this means 95% of the time you will have enough CPU capacity to service the observed load.

Correspondingly, 5% of the time your systems may be over utilized, but this is a reasonable tradeoff between hardware costs and the fraction of time when responsiveness is degraded. A similar observation can be made for other resources such as disk IO, memory, and network, whose utilization varies over time.

Performance Data Normalization

For reasons that will become clear in the following section on the Time Series Placement Algorithm, we have to normalize the performance data collected from different machines so that sequences of performance metrics collected from different machines can be added together. Suppose we have two machines Guest1 and Guest2 with the following sequences of network utilization values in Mbps (megabits/sec):

Guest1: (00:00:00, 3.5), (00:05:00, 11.1),  (00:10:00, 5.4),  (00:15:00, ?.??),  (00:20:00, 3.71),  ...,  (47:55:00, 1.19),  (48:00:00, 15.0)

Guest2: (00:00:15, ?.?),  (00:05:35, ??.?),   (00:10:47, 2.7),  (00:16:03, 7.12),  (00:21:13, 1.04),  ...,  (47:58:22, 1.19),  (48:03:35, ??.?)

The two sequences are lined up such that the samples that are closest in time from Guest1 and Guest2 are stacked one right on top of the other. Notice, however, that the samples are not taken at exactly the same time. Due to all sorts of variables in the environment and the resource limits of the machine running MAP, performance metrics cannot be sampled from all machines

Moreover, notice that some of the samples are marked with question marks like “?.??” to indicate that these samples were unavailable because the target machine was offline, or MAP was not collecting performance data for that machine at the time. Clearly, the raw performance data has some rough edges. So what if we want to add these two sequences of network utilization metrics together to get the combined utilization of Guest1 and Guest2? This is where performance data normalization comes in.

Without going into exhaustive detail on how the normalization works, here is the basic idea:

  1. The total time span over which normalized data exists is taken to be the minimum time for which raw performance data exists for some machine up to the maximum time for which raw performance data exists for some machine (call this time span Tmin to Tmax).
  2. Tmin to Tmax is chopped up into 10 minute intervals starting at Tmin to give the sequence of times Tmin, Tmin+10, Tmin+20, ..., Tmin+NNN. The NNN in the last value in the sequence is the minimum multiple of 10 such that Tmin+NNN is greater than or equal to Tmax.
  3. For each of these 10 minute intervals, if 1 or more samples of a performance metric exists for a machine in the bounds of that 10 minute interval then those samples are aggregated into a single normalized sample for the 10 minute interval. If no samples exist for a machine in the 10 minute interval, then an aggregate of samples nearby in time are used generate a normalized sample to fill in the hole in the data. So after normalization the sequences of network utilization data for Guest1 and Guest2 might look like this:

Guest1: (Tmin, 9.05),  (Tmin+10, 5.4),  (Tmin+20, 4.1),  (Tmin+30, 13.3),  (Tmin+40, 7.50),  ...,  (Tmin+NNN-10, 11.7),  (Tmin+NNN, 8.69)

Guest2: (Tmin, 2.70),  (Tmin+10, 6.3),  (Tmin+20, 1.8),  (Tmin+30, 11.2),  (Tmin+40, 10.1),  ...,  (Tmin+NNN-10, 2.31),  (Tmin+NNN, 1.19)

Since Guest1 and Guest2 now have normalized samples for the same set of times and there is a sample for each time, it becomes obvious how to add the two sequences of resource utilization values together; namely, you just add together each pair of numbers at the same normalized times.

Data Quality Considerations

Given the explanations in the previous two sections on how the 95th percentile aggregate and performance data normalization works, a natural question to ask is how you know that the 95th percentile aggregates of the normalized performance data accurately represent the real world behavior of your systems; this is where data quality considerations come into play.

If you think about the definition of the 95th percentile aggregate---the minimum sample S for which 95% of the samples in the sequence are less than or equal to S---, then it becomes clear that this aggregate is not very useful for small data sets. For example, what does it mean for 95% of 8 samples to be less than or equal to one of these 8 samples? MAP can still compute the 95th percentile aggregate in this case because it uses a deterministic algorithm to compute the value (the computed value will be or be close to the max value), but the statistically interesting properties of the 95th percentile only show up for much larger data sets. This means you should plan to collect performance data for at least 2 days before you can expect to get good values from the 95th percentile aggregate. That said, you should not hesitate to collect performance data for shorter periods of time when doing test runs or familiarizing yourself with the MAP tool.

Another data quality issue to consider is what happens when MAP normalizes the performance data and fills in values for times at which machines are missing values by using aggregates of other samples nearby in time. Filling in these “holes” is necessary so that we can add the sequences of performance metrics from different machines together as described in the previous section, but if there is a large percentage of missing values (say more than 5%), then this may significantly distort the statistical properties of the normalized performance data compared to the real world behavior. What does this mean in terms of how you use MAP? Here are some rules of thumb:

  • When you collect performance data in MAP, you want to collect performance data for the same period of time for all machines.
  • After collecting the performance data, generate the Performance Metrics report and look at the CollectionStatistics worksheet. For good results, you should only use machines in the consolidation scenarios (Server Consolidation and Microsoft Private Cloud Fast Track) that have a high success percentage when collecting performance data.

One consequence of the first rule of thumb is that the results will not be as accurate if you use the functionality in the Performance Metrics wizard that allows you to append the collection of new performance data to existing data. The primary purpose behind this feature is to let you continue collecting performance data from a set of machines if it was interrupted unexpectedly for some reason (for example, the MAP machine collecting the performance data rebooted after applying an update). If, for example, you use this feature to collect performance data from one group of machines on Monday and Tuesday and then another group of machines on Thursday and Friday, then the normalized time period over which data was collected is Monday through Friday. This means that 3 of the 5 days will be missing data for all of the machines and the “holes” in the data that the normalization process fills in will be at least 60% of the normalized data. This obviously is not a desirable state of affairs, so using the append functionality of the Performance Metrics wizard in this fashion is not a recommended practice.

The Time Series Placement Algorithm

In versions of MAP prior to 6.0, a single aggregate (max or average depending on the resource) was computed for the entire time span over which performance data was collected for a machine for each resource type (CPU, memory, etc.). These numbers were used to determine if there was room for the machine on a Hyper-V host in the Server Consolidation scenario. This approach is not ideal because it does not deal well with machines whose resource utilization is uneven over time. 

For example, if the machine Guest1 is used by people in North America and Guest2 is used by people in Asia, then these machines are likely to have inverted resource usage profiles over time and would fit well together on the same Hyper-V host. Analogously, if two machines served the same geographic region then their resource usage profiles over time would likely be similar and they may not be a good fit together on the same Hyper-V host. However, using an average metric over the entire time span for which performance data was collected misses these subtleties. In addition to serving different geographic regions, there are myriad other reasons that machines could have usage profiles over time that fit well together or not. 

These observations led to the method introduced in MAP 6.0 to determine if there is room for a machine on a Hyper-V host while taking time into account.

The previous section described how MAP normalizes the raw performance data, and how this enables adding together the sequences of normalized resource utilization metrics for two or more machines over time. This ability to add sequences of normalized resource utilization metrics is at the heart of the Time Series Placement Algorithm introduced in MAP 6.0. 

While running the algorithm, suppose that MAP has determined that machines Guest1, Guest2, ..., Guest4 will fit on Hyper-V host Host1. How do we know if Guest5 will fit on Host1 as well? What the algorithm does is add together the sequences of normalized resource utilization metrics for Guest1, ..., Guest5 for each resource we care about (CPU, memory, etc.) and determines if Host1 has enough capacity in each resource dimension. What is enough capacity? Host1 has enough capacity if the 95th percentile aggregate of the sum of the normalized sequences for Guest1, ..., Guest5 is less than or equal to the total capacity of Host1 for that resource dimension.

There are numerous other subtleties to the Time Series Placement Algorithm, not the least of which is determining which potential host Host1, ..., HostN provides the “best” fit for the next candidate machine GuestM. That said, the ability to sum up sequences of normalized resource utilization metrics and take the 95th percentile of the summed sequence is the fundamental insight behind how MAP takes time into account when making consolidation suggestions in the Server Consolidation and Microsoft Private Cloud Fast Track scenarios.

Infrastructures

MAP 6.0 introduces the notion of an “infrastructure” used in the Microsoft Private Cloud Fast Track scenario, and optionally in the Server Consolidation scenario. Basically an infrastructure is the resource enclosure in which a group of Hyper-V hosts is provisioned; i.e., a server rack with associated disk (SAN) and network resources along with the Hyper-V hosts. This allows you to run consolidation scenarios that are more reflective of how an organization may be buying server resources; namely, in units of pre-provisioned server racks with everything necessary to run a “private cloud” rather than buying the individual components themselves with further assembly required.

So how is the infrastructure level taken into account during the Time Series Placement Algorithm when determining if a candidate machine will fit on a Hyper-V host residing in a particular infrastructure? Basically MAP just does the same thing at the infrastructure level that it does at the host level: it sums up the normalized resource utilization sequences for all the guests in the infrastructure across all hosts and determines if the 95th percentile aggregate of this generated sequence exceeds the SAN or network capacity of the infrastructure.

What Do the Numbers in the Reports Represent?

The previous sections provide the basic information concerning how the raw performance data is normalized and aggregated and how this data is used in the placement algorithm underlying the Server Consolidation and the Microsoft Private Cloud Fast Track scenarios. This, however, does not explain which numbers are showing up where in the various reports; this section provides these remaining details.

PerfMetricsResults-<date>.xlsx

  • PlacementMetricsSummary – The numbers in this worksheet are derived from the normalized performance data.

ServerVirtRecommendation-<date>.xlsx

  • HostMachineDetails – This worksheet provides the configuration details of the host that was used for the placement recommendations.
  • UtilizationSetting - This worksheet provides the maximum utilization of the infrastructure/host. An additional infrastructure/host is required if any resource reaches its ceiling with more virtual machines needing to be placed.
  • ConsolidationRecommendations – The numbers in this worksheet are derived from the normalized performance data with the details for placed guests, hosts and infrastructures being as follows:
    • Placed Guests – All metrics are the 95th percentile of the normalized performance data except for Disk Space Utilization which uses the maximum value of the normalized performance data. Finally, the CPU utilization values for the placed guests have also been rescaled to the CPU configuration of the host machine.
    • Hosts – All metrics are the 95th percentile of the summed sequences of the normalized performance data of the guests placed on that host except for Disk Space Utilization which is just a sum of the maximum values for the guests. The utilization values for memory and CPU for the hosts also include reserves of 1 GB and 5% CPU for the host itself.
    • Infrastructures – All metrics are the 95th percentile of the summed sequences of the normalized performance data of the guests placed on all the hosts in the infrastructure except for Disk Space Utilization which is just a sum of the maximum values for the guests.

Important   Because the 95th percentile of the sum of sequences of performance metrics from the placed guests is not the same as the sum of the 95th percentile of each of those sequences, adding up the guest utilization values on this worksheet will not give you the value of the host utilization except in the case of disk space usage which does not use the 95th percentile aggregate. A similar observation can be made about the utilization values for the infrastructures.

  • UtilizationBeforeVirtualization – All metrics are the 95th percentile of the normalized performance data except for Disk Space Utilization which uses the maximum value of the normalized performance data.
  • UnplacedMachinesReport – This worksheet provides host names of the machines that could not be placed with the reason they could not be placed.

Microsoft Private Cloud Fast Track Consolidation Report-<date>.xlsx

  • UtilizationSetting – This is the maximum utilization of the Microsoft Private Cloud Fast Track configuration as defined from the wizard. Additional Microsoft Private Cloud Fast Track Infrastructures are required if any resource reaches its ceiling with more virtual machines needing to be placed.
  • ConsolidationOnBaseConfig – Analogous to the ConsolidationRecommendations worksheet in the ServerVirtRecommendation-<date>.xlsx workbook described above where the host configuration is the base configuration of the Microsoft Private Cloud Fast Track infrastructure selected in the Microsoft Private Cloud Fast Track Consolidation Wizard. 

Note   There are no memory and disk space overhead values specified for the placed guests in this wizard

  • ConsolidationOnMaximumConfig – Just like ConsolidationOnBaseConfig except that the infrastructure configuration is the maximum configuration of the infrastructure selected in the Microsoft Private Cloud Fast Track Consolidation Wizard.
  • UnplacedMachinesReport – Details of the machines that could not be placed into the default or maximum infrastructure setting. It also lists the reasons for failure.
  • UtilizationBeforeConsolidation – All metrics are the 95th percentile of the normalized performance data except for Disk Space Utilization which uses the maximum value of the normalized performance data.

InfrastructureProfile – Provides the overview of the Microsoft Private Fast Track Infrastructure hardware profile.