To gather insights on the current and future state of Performance Testing and Tuning, we talked to 14 executives involved in performance testing and tuning. We asked them,"What are some real-world problems being solved by performance testing and tuning?" Here's what they told us:
Performance
- "Any flakiness in mobile apps due to poor bandwidth, poor device. We’ve helped a professional social media platform with latent, non-responsive apps; a major airline with their mobile and web applications; rich media companies with streaming difficulties; and banks trying to facilitate the transactions of their customers."
- "We’ve helped some of the world’s largest companies improve the performance of their infrastructure while understanding what they’re trying to accomplish with their apps. This includes companies such as: Cisco, Go Daddy, GE, T-Mobile, MetLife, AT&T, Boeing, Dell EMC, Hewlett Packard, LinkedIn, New York Life, Oracle, and Unilever."
- "Real-world problems being solved include loss of productivity and efficiency due to IT latencies. So many organizations rely on their employees’ ability to rapidly and reliably access email, applications, web, and other business-critical IT components. Any problems in these areas are a huge blow to productivity, especially if you don’t have an intelligent performance monitoring solution in place. Without proactive monitoring, you spend a lot more time troubleshooting and trying to pinpoint the problem than actually fixing it."
- "I'm unable to get into the specifics as there are confidentiality concerns. However, a recent example was a scenario where it was desirable to run a particular workload virtualized that had always been run on bare metal before. The application in question was complex with multiple tiers and often ran in a distributed fashion. The intent was to allow it to be hosted for external entities as well as the usual ability to scale up or down as demand required. Unfortunately, the application in question also had a 50-80% slowdown versus bare metal when run virtualized. There were multiple root causes for the slowdown affecting many aspects of the system. Through analysis, each was identified in turn and mitigated or eliminated where possible. In some cases, it was possible to tune for it but in others it was necessary to modify the virtualization tooling, the machine emulator and the kernel's support for virtual machines. Ultimately, it was possible to get all the test cases running within 5% of bare metal. Another example involved a deployment of an application that had extreme low-latency requirements and had a SLA for maximum latency. The challenge was to tune the system for both low latency and to reduce outside interference to acceptable levels. That had to be done without using as real-time kernels prioritize deterministic performance which is not necessarily the same as high performance."
Scalability
- "We had a social media client that wanted to launch in a week, had a big budget for advertising, and anticipated having 500,000 users in two months. However, they had only tested their application with 10 users. We created a test database to see how the application would function with 500,000 users and it took the home page three hours to load. The client delayed the launch and we re-architected the features and queries to accelerate response. Benchmarking and capacity planning is very important."
- "Regardless of the application, the biggest concern with any development is, 'How well does it scale?' and this is exactly what we tackle during our performance testing."
- "We helped a group that rescued wild animal send notifications to a much broader network of volunteers. We use the same technology for healthcare companies to reach nurses in emergency situations. Cold storage temperature monitoring to avoid waste if the temperature gets too high. We strive to provide faster resolution of incidents. We deal with a lot of burst traffic. Provide faster MTTR. "
- "[We worked with] Citibank’s digital channel to increase engagement with mobile and web to increase the number of transactions, reduce time to market, good quality, good performance, and improve the UX. Lockheed Martin Orion space program with a digital cockpit. Get the experience right — functional and timely in a stressful environment."
- "The most obvious problems are related to general site availability and reliability under load. Many of our customers start with us in response to an event that has already happened to them in production. Application servers crashing under load for example. The more proactive customers are testing in advance of an event, such as a sale or high-volume period, for example. Quite often the real-world problems are the same — the difference is when you find out about it, in production or in test."
Security
- "DNS testing can reveal attacks or incidents with DNS resolution. We recently helped a company identify a DNS attack on their authoritative DNS. The connection errors reported by Catchpoint’s DNS tests were researched and the attack was uncovered. Being able to identify attacks earlier reduces the impact they have on end users and the organization as a whole. The faster attacks can be identified, the quicker mitigation strategies can be put in place. Today, applications use a variety of third-party elements and tags. Visitors to a site don’t care whether a slow down or failure is due to a third party or content controlled by the site owner. Synthetic testing can help organizations identify poor performing tags and quickly resolve performance issues. On Thanksgiving Day some retailers experienced performance slowdowns due to poor performance of a third-party tag. This tag slowed down performance of the site overall. With monitoring, once a poor performing tag is identified organizations can take steps to remove the tag to improve site performance."
Other
"1) Retail has more hybrid apps while they still host their own ERP and POS systems in internal data centers. Everything else (CDNs, DDOS) need tuning and optimization. 2) Online businesses and educational video delivery on-demand application experience. 3) Modern video delivery with movies streamed to movie theaters, must understand all component communications."
- "We provide full-stack monitoring across the entire development and production process. [With] auto-log monitoring, we are able to see full specification data and are able to detect more problems related to connections and architecture. We have automated instrumentation of all processes, containers, and networks. We use AI to identify problems and root causes. Performance engineering is able to leverage this data. We can see more problems in the cloud and in microservices and we’re able to integrate with legacy systems."
- "We’re a DevOps shop so we’re making changes every two weeks. Since we cannot foresee the impact of the changes we are making we need to check production, understand if the environment has changed, analyze quickly, and restore to normal or stop production. A lot of trial and error."
Here’s who we spoke to:
- Dawn Parzych, Director of Product and Solution Marketing, Catchpoint Systems Inc.
- Andreas Grabner, DevOps Activist, Dynatrace
- Amol Dalvi, Senior Director of Product, Nerdio
- Peter Zaitsev, CEO, Percona
- Amir Rosenberg, Director of Product Management, Perfecto
- Edan Evantal, VP, Engineering, Quali
- Mel Forman, Performance Team Lead, SUSE
- Sarah Lahav, CEO, SysAid
- Antony Edwards, CTO and Gareth Smith, V.P. Products and Solutions, TestPlant
- Alex Henthorn-Iwane, V.P. Product Marketing, ThousandEyes
- Tim Koopmans, Flood IO Co-founder & Flood Product Owner, Tricentis
- Tim Van Ash, S.V.P. Products, Virtual Instruments
- Deepa Guna, Senior QA Architect, xMatters