Managing virtualisation performance
Thursday, 21 May, 2009
As virtualisation is deployed in production environments, experienced users are demanding production-quality performance management and optimisation technologies to maintain service levels. The Taneja Group has identified a new category of emerging technologies -- which we have labeled virtual infrastructure optimisation, or VIO to address these demands.
Our dsicsussion of VIO uses Taneja Group research on how the dynamic nature of virtualisation necessitates advanced performance management technologies in production environments, especially when virtual storage is involved, and considers the essential features of VIO tools and emerging VIO technologies.
Traditional server consolidation planning
In the past, server consolidation for development and testing generally improved performance because larger, more powerful physical servers were used, shared storage was implemented and test and development virtual machines (VMs) were often idle. Sharing created better service because shared resources were often overprovisioned. Server consolidation ratios (enabled by more efficient use of available CPU cycles) were adequate and saved enough money to justify excess storage and memory capacity.
As a result, server virtualisation planning to date has been mainly capacity driven. Performance gains experienced by users are often a coincidence as opposed to having been engineered. When contention issues arise, they're usually resolved manually by moving a VM or resizing it. Planning solely in terms of capacity - - coupled with reactive troubleshooting - - in a virtualized production environment can mask serious underlying architectural problems and become operationally overwhelming. Enterprises that have deployed mission-critical applications on virtualized infrastructures have exposed some hidden, complex contention problems. As virtualisation continues to evolve, it demands a new class of performance planning and optimisation tools.
Enterprise virtualisation demands advanced management technologies
In IT operations, it's well-known that you can't manage what you can't measure. Traditional IT operations have focused on element management tasks such as measuring the behavior of servers, switches, storage arrays and network devices. virtualisation changes the game drastically because these elements are now mobile, dynamically reconfigured and connected to one another on the fly. The runtime interaction of elements has become as important as their individual operating profiles. As a result, a VIO approach must cross domains and focus on the performance of the entire virtual infrastructure.
VIO technologies enhance element management and server capacity planning. As discussed, the mobility of virtual machines and the nature of their connections to other physical and virtual resources in an infrastructure create capacity problems. Existing technologies –often designed for static environments-- haven't adequately addressed these capacity issues.
Taneja Group research validates this premise. The research reveals that a lack of virtual infrastructure visibility leads more than 85% of enterprise customers to rely on initial conditions testing and internal best practices to estimate and construct proper configurations. Moreover, lack of runtime visibility leads as many as three-quarters of VMware customers to limit or disable live migration (or VMware VMotion) in production because they have no way of determiningits performance impact. These restrictions diminish the benefits of virtualisation, reduce consolidation rates and increase the overall cost of virtualized applications.
Contention issues with virtualized storage
One area where virtualisation compounds these management and monitoring issues is at the virtual storage level. When server virtualisation and storage virtualisation are combined, new contention issues arise that expose a lack of end-to-end visibility across the server and storage domains. More than 70% of VMware deployments use storage virtualisation -- most often with a Fibre Channel storage area network (SAN) -- making performance at the storage level integral to overall application performance. Storage I/O latencies are 10 to 100 times greater than server CPU and memory latencies, and are more likely to impact overall infrastructure performance.
Also, in many enterprises, storage is managed by a specific operations team with unique skills and special tools. As a result, there may be adequate element visibility in both the server and storage domains, but if the data isn't integrated there isn't a clear picture of who "owns" a performance issue. Add to this a complex, multivendor storage infrastructure and VM architectures that virtualize the I/O path, and it's clear that storage optimisation is essential.
In many production-level virtualisation deployments, a lack of visibility and instrumentation often leads to overprovisioning, and a reduction in performance and service levels. This lack of visibility across the virtual server and storage layers -- from "server to spindle"-- is more than theoretical. According to research from the Taneja Group, many large enterprises using production-level virtualisation report such issues, which can undercut the cost savings of server consolidation.
Without real-time instrumentation, IT operations can become reactive, labor-intensive and sloppy. In addition, if there is no clear picture of the root causes of performance problems, administrators are less likely to make adjustments to the infrastructure or automate management operations.
Here, we explore how to address this lack of management visibility -- including the must-have capabilities of a virtualized management technology -- and some common stumbling blocks in achieving an optimized infrastructure.
Building a cross-domain performance monitoring system
Virtual infrastructure optimisation, or VIO, is an emerging category of virtualisation management technologies designed to build a correlated performance profile of a virtual infrastructure to optimize its performance. VIO technologies go beyond capacity planning, which is limited to point-in-time snapshots and rule-of-thumb estimates, to size-specific tiers of a virtual infrastructure. While capacity-planning tools have rapidly emerged for virtual servers, none of these tools cross domains or incorporate runtime metrics.
Virtual infrastructure management technologies validate and continually verify capacity estimates by collecting, correlating and documenting runtime performance data from every virtualized tier in the application stack. A comprehensive VIO solution must include the following capabilities:
- Independence. You should choose a VIO tool that is independent of vendor bias in every virtualized application tier (server, storage, desktop, etc.) and remain a "neutral party" with respect to problem diagnosis.
- Depth. You should employ a technology that provides detailed metrics and offers multiple levels of depth to serve a wide range of needs. Also, it should provide real-time data management, metadata optimisation and historical accuracy.
- Breadth. Your VIO technology should span multiple virtualisation domains and provide composite data that clearly identifies performance dependencies. It should also integrate with existing systems monitoring and management tools, and offer a broad range of industry-standard communications interfaces.
- Impact. The technology you choose should be out-of-band, passive and as nondisruptive as possible. It should also offer deployment options that allow customers to incrementally add additional data collection modules as needed.
- Usability. The VIO technology should present actionable information in the form of a customizable dashboard run from a unified, configurable data store. In addition, it should support decision-making processes from teams whose functions span an entire virtual environment.
- Scalability. The technology should be scalable to support the largest enterprises, which may have tens-of-thousands of servers and tens of petabytes of storage.
Virtual infrastructure optimisation includes capabilities from related disciplines (existing and emerging). There's an element of capacity planning in VIO technologies because these technologies are ideal for developing a performance baseline prior to deployment. But while most capacity planning tools address a single virtualized tier, VIO solutions are cross-domain. VIO also encompasses application service management and performance management capabilities, which focus on optimisation from the application, or end-user, perspective.
Common problems in optimizing virtual infrastructures
The Taneja Group has interviewed several virtualisation administrators who have deployed VIO solutions to resolve complex server and storage area network (SAN) contention issues. Every interviewee has experienced SAN response times that exceed both the original design goals and acceptable service levels, because they have deployed increasingly more production applications in virtual servers.
In most cases, storage and server administration teams were unable to agree on the root cause of the decline in performance, so emergency measures were implemented. Steps such as adding storage ports and taking nonproduction servers offline at peak load times failed to solve the problem. This left interviewees wrestling with the following set of questions:
- What is our optimal virtual host server-to-storage array ratio?
- How can we determine optimal storage traffic balance and overhead?
- What is the impact of storage on a new virtual server?
- How should virtualisation impact our I/O path planning?
- How can we pinpoint vendor subsystem configuration issues faster?
After grappling with these questions, each interviewee deployed one of the leading VIO technologies. Initially, baseline data was collected and then augmented with ongoing runtime data collected during periods of high and low demand. By combining baseline and runtime data, customers could quickly validate root causes and discover additional configuration and architectural issues such as the following:
- Overloaded storage processors. Incorrect load balancing across storage ports.
- Unnecessary traffic. Noncritical file system management processes generated as much as 20% to 30% of traffic during peak loads.
- Storage port configuration issues. Queue-depth settings were suboptimal
- Firmware mismatches. Storage firmware versions were incorrect for use with VMware and created incompatibilities between edge and core switch firmware versions.
- Host bus adapter (HBA) issues. Round-robin host bus adapter configurations caused abnormally high-read latencies.
None of these issues were easily detected without a cross-domain VIO technology in place. These user testimonials validate the capabilities of available optimisation technologies. In the third and final tip in this series, we'll examine some of the leading virtualisation performance management tool providers and technologies in this emerging category of products.
Before we delve into specific VIO technologies, let's review the benefits of VIO. Armed with detailed runtime metrics, administration teams were able to agree on performance root causes, identify and validate complex contention issues, and productively interact with storage and virtualisation vendors to fix configuration problems. These administration teams reported that VIO technologies allowed them to consistently achieve several goals:
-
- Eliminating overprovisioning. Determining the limiting factors in the infrastructure, identifying unused switch capacity, increasing port utilization and avoiding expensive new purchases.
-
- Improving productivity. Enabling cross-domain teams to resolve problems at the "speed of virtualisation" while reducing the number of trouble tickets and lowering operating costs.
-
- Increasing consolidation ratios. Providing accurate metrics, determining how I/O impacts application performance on virtual machines (VMs) and reducing the overall cost per virtualized application.
-
- Improving service levels. Improving the response times of administrators by providing automated alerts about potential performance bottlenecks.
Available storage performance optimisation tools
While the virtual infrastructure management market is evolving, several providers offer products that promise significant gains for cross-domain operations teams. Virtual Instruments' VirtualWisdom and Akorri's BalancePoint focus on the interaction between the server and storage tiers, take an operations view of performance and stress the storage I/0 path. In the opinion of the Taneja Group though, VirtualWisdom provides deeper insight at the Fibre Channel storage area network (SAN) level, making it an appropriate solution for midsized and large enterprises with significant investments in Fibre Channel SANs.
In fact, one customer using VirtualWisdom reported the following benefits:
-
- immediate detection of VMware Inc.'s configuration problems and the root causes of storage I/O traffic delays.
-
- data showing the correlation of virtual server events (such as VMware VMotion activity) with storage performance degradation.
-
- rapid diagnosis of the causes of host bus adapter (HBA) latency issues and identification of extraneous management traffic sources.
-
- prevention of storage port overloads and overprovisioning costs.
Application performance management technologies
In addition to technology providers such as Akorri and Virtual Instruments, some vendors take an end-user -- or application -- view of performance, a discipline referred to as application performance management. The Taneja Group includes these vendors and their technologies in the VIO category because they have made solid progress toward mapping, monitoring and analyzing multi-tiered virtualized applications.
Application performance management technologies -- such as BlueStripe Software's FactFinder and VMware's soon-to-be-released AppSpeed -- use automated discovery to address the problem of cataloging interactions between multiple application components. This task is especially difficult when the components are mobile and connect to one another dynamically, as is the case with virtual resources. After completing their automated discovery operations, these technologies identify bottlenecks and latencies at the application and end-user level and suggest remedies.
Virtual infrastructure management vendor comparisons
The Taneja Group believes that storage performance optimisation is directly related to the improvement in user response-time provided by the application performance management technologies mentioned above. Taken together, these two types of virtual infrastructure management tools span a wide range of storage and network transaction metrics. Table 1 provides a high-level overview of how these selected vendors in the VIO space compare with one another.
Capability | Virtual Instruments VirtualWisdom | Akorri BalancePoint | VMware AppSpeed | BlueStripe FactFinder |
Independence | Yes | Yes | VMware-owned | Yes |
Depth | SNMP, FC TAP, vCenter. Measures I/0 data continuously in real time. | SNMP, vCenter. Gathers I/O data by polling storage devices on a schedule. | Out-of-band IP traffic analyzer. | TCP connection analyzer in operating system agent. |
Breadth | Server and storage. Integrates with all leading management frameworks | Server and storage. Integrates with all leading management frameworks | Applications and server. Integrations unknown at this time. | Applications and server. Integrates with all leading management frameworks. |
Impact | Storage: Out-of-band
Server: Out-of-band |
Storage: Host-based
Server: Host-based |
Server: Host-based
Network: Out-of-band |
Server: VM-based
Network: VM-based |
Usability | Provides cross-domain actionable information | Provides cross-domain actionable information | Provides application-level discovery and dependency mapping | Provides application-level discovery and dependency mapping |
Scalability | Most deployed in large enterprise environments | Mostly deployed in smaller-scale SMB environments | Non-applicable | Mostly deployed in smaller-scale SMB environments |
Table 1
Can you benefit from virtual infrastructure management technologies?
Whether a customer should explore one or more of the technologies that have been discussed in this series depends on individual challenges. Are your virtual servers wreaking havoc in your SAN? Are your server consolidation ratios too low? Do you spend too much on additional SAN capacity without confidence that it can solve your problems? Have you taken advantage of the advanced availability features of your virtual infrastructure? Or have you lost track of where your VMs run and can't respond quickly to user response-time complaints? Whatever your particular virtualisation pain points, the comprehensive diagnostic solutions available today go beyond monitoring and enable new levels of infrastructure optimisation.
It's out with chatbots, in with empathetic AI concierges
Despite not always living up to customer expectations, chatbots have laid the foundation for more...
Safeguarding Australia's global resiliency
There are three essential steps to design applications for maximum resiliency.
Staying ahead: business resilience in the hybrid cloud era
The rise of cloud computing and advancements in virtualisation have revolutionised how businesses...