Thanks to our increasing dependence on technology, the growth in Big Data and the Internet of Things, reliability of hardware and software is more crucial than ever. Together, they form the foundation for an organization’s Line of Business (LOB) applications and provide the critical stability needed to support ‘always on’ physical, cloud and Edge environments. So, what do the current risks look like, and how can we learn to mitigate them?
Hardware is getting more reliable
As reliance increases, server hardware and operating systems are getting more robust, as a recent survey by consultants ITIC testifies. They found that market leading IBM z Systems Enterprise servers had just eight seconds of ‘blink and you’ll miss it’ downtime a month on average.
Hardware may be getting more resilient, but that doesn’t mean that datacenter outages are falling – and, with our ever-increasing reliance on connectivity, their effects are getting more catastrophic. Denial of Service (DoS) attacks, for instance, accounted for just 2% of unplanned downtime in 2010, rising to 22% in 2016.
So what’s the real risk?
Human error and security are now the biggest factors affecting server and operating system reliability, causing downtime. Of those surveyed by ITIC, 80% said that human error (such as misconfiguration or underestimating right-sizing server workloads) was the biggest risk factor to performance. That’s backed by various other studies, including the Ponemon Institute’s 2016 report on the cost of downtime, which attributed 70% of datacenter outages to active (deliberate) or inactive (non-deliberate) actions by staff.
Could simulation be the answer?
