Dude, where’s my app?

By Rafi Katanasho*
Friday, 04 November, 2011

Server virtualisation is growing ever more popular thanks to its potential to drive down costs and increase flexibility. But pre-virtualisation server performance metrics no longer apply, making any application performance problems hard to diagnose, leaving users wondering where their app went and IT with no answer. Rafi Katanasho, Compuware, explains what you can do about it.

Under constant pressure to do more with less, IT departments are rapidly moving to virtualisation. In 2009 the respected industry analyst group, Gartner, reported that 16% of all x86 architecture server workloads had been virtualised. By June 2011 this figure had grown to at least 40% (see Gartner’s ‘Magic Quadrant for x86 Server Virtualization Infrastructure’). It’s a trend that right now, shows no signs of slowing.

The driving forces behind virtualisation include the desire to achieve cost savings, particularly in the areas of hardware, power and cooling, and the need for enterprises to become more flexible in the way they respond to fluctuating computing demands.

It sounds great but it’s important to also realise that virtualisation adds another layer to an already complex IT infrastructure. All the gains of virtualisation can be quickly lost if applications slow down and perform poorly for end users, or if applications become harder for IT to troubleshoot and fix. After all, website visitors and application users rarely give a toss about the technology behind the screen. All they want is to be able to get on with the task at hand.

The difficulty is that performance issues rarely come to light until unhappy users report them and by then, the damage is most likely to have occurred. The ramifications can range from reduced employee productivity to abandoned shopping carts and lost revenue, slower customer service leading to a drop in customer satisfaction and worse still, negative publicity. From an IT perspective, performance problems are also likely to undermine IT credibility and slow down approvals for future projects.

The challenges of a virtual environment

Almost any application is a candidate for moving to a virtual environment. To date, IT organisations have imposed some constraints with Tier 1 applications that have large CPU and I/O requirements, but continuous improvements in server hardware and virtualisation software means that even these types of applications can be successfully virtualised.

Data centres have historically employed a ‘one service, one server’ strategy (especially on x86-based servers), where physical servers are dedicated to running a pre-defined single service or a group of highly connected services. Many of the application monitoring approaches which have been in place for the last 10 years leverage physical server parameters - CPU and memory utilisation, disk and network card ‘health checks,’ etc, - as an indicator of application performance.

For example, if users start complaining about long log-in times, a look at CPU utilisation on each of the servers involved (the database server and the LDAP server) immediately shows that the LDAP server application has high CPU utilisation, warranting further investigation. High CPU activity on the database server is quite normal.

When applications are virtualised and collapsed inside a single piece of hardware, the ‘one-to-one’ relationship between applications and hardware becomes a ‘many-to-one’ relationship and legacy monitoring solutions lose their analytical capabilities. The key issues are:

Limited visibility into transaction, especially between VMs on the same ESX host.
Limited visibility into the physical-to-virtual relationship between hardware and applications.
Difficulty understanding the performance impact of Virtual Machine Managers (VMM).

Each of these has a serious impact on any application performance management solution that isn’t configured to operate in a virtual environment.

It’s tough to troubleshoot

The problem is that if an issue such as an intermittent slowdown does arise, it can be very hard to pinpoint the cause. Which of the myriad vendors and service providers involved in the site should be responsible? Where do you look and what exactly do you look for?

Many slowdowns are not resolved until they escalate to a high level, at which time the typical response is to set up a war room to look for clues. Bringing together representatives from all the different information silos within the enterprise, a war room, is likely to involve anywhere between six and 15 people for days on end. It’s an unwieldy and frequently unsuccessful approach.

Past approaches

Historically, to track down and isolate application performance issues, war rooms have relied on component-level monitoring tools, software instrumentation, log files and tools for virtualisation vendors. Unfortunately, none of these are completely effective in the virtualised environment because they can only ever show part of the picture.

Component-based monitoring tools, for example, take a bottom-up approach, starting with the low-level infrastructure. They may give some visibility into the performance of the virtual container, but none into the applications running in that container. Many metrics collected by these lower-level tools - such as CPU usage, network load and disk I/O - are meaningless in a virtualised environment. They simply don’t see the end-to-end application flow with virtual servers being dynamically provisioned and de-provisioned.

Software instrumentation, or the insertion of lines of code into an application to generate time codes, may help to reveal additional clues about the problem, but the parameters returned will not likely pinpoint any root cause, since these are not linked to the virtualisation. In the real world, it is simply too labour-intensive and costly to be feasible.

Examining log files, the first line of defence for many IT teams, may help you to feel as though you are doing something, but it is yet another time consuming process that rarely reveals any meaningful clues.

Virtualisation vendor tools are great for monitoring the virtual environment, but they can’t relate it to the physical environment. This means that metrics such as a response time or transaction time, collected by a guest operating system, will not be accurate, as they do not account for an end-to-end view of the application and its transactions.

The holistic need

Rather than turning to piecemeal measures to solve an application performance issue, virtualisation means that enterprises must take a more holistic approach. The key is to measure application performance from the end user’s viewpoint rather than from an internal IT perspective. Secondly, this analysis should be integrated with a top-down view of the business impact of any issues. This ensures that IT resources are directed towards the most important issues first.

At the same time, IT priorities should never be considered carved in stone. IT needs a way to dynamically prioritise issues based on location, time of day, day of the week, number of slow transactions, number of issues affected and many other business rules.

What’s clear is that in a virtualised environment that sits on top of an infrastructure made up of many different pieces, from many different vendors, traditional systems management no longer applies. Instead, we now need to analyse and predict application performance at the transaction level across all production environments, and to probe deeply into the various technology domains: Java and .NET applications; network, server and mainframe levels; and transactions between virtual machines.

To meet this challenge, a new class of application performance management (APM) solutions has emerged. Designed for the virtualised environment, these solutions quickly pinpoint the root cause of any issue and determine its true business impact. Their role begins prior to virtualisation with the capture of baseline application performance metrics, an essential task for the future validation of success.

They monitor performance from the end user’s perspective and can provide end-to-end analysis regarding any offending transaction. Critically, issues can be traced and diagnosed from specific user through all network and server infrastructure, enabling isolation of the root cause of problems, no matter how intermittent or elusive they may be.

Compared to the war room of old, application performance management systems significantly cut the time and resources required to fix problems. They process application metrics through business rules to generate a real-world set of priorities for IT and they deliver objective facts that eliminate finger pointing and improve collaboration between teams.

*Rafi Katanasho is Application Performance Management Director for Compuware Australia and New Zealand. He is responsible for deploying IT best-practice solutions for Compuware’s Gomez platform, a solution for optimising the performance, availability and quality of web, non-web, mobile, streaming and cloud applications. Katanasho has more than 17 years’ experience in the IT industry, in senior sales management, consulting and support roles.

Dude, where’s my app?

Who should take the lead in responsible AI?

Why there's no efficient automation without integration

AI-driven observability: fundamental for business continuity?

Content from other channels on our network