Specifically,
there are three key pieces of industry guidance that go some way to
assisting the understanding of resilience: Cobit 5, ITIL v3 and the
US FFIEC IT Examination Handbook.
Cobit
5
Cobit 5,
as part of managing critical IT assets (Cobit 5 - BAI09.02) and
maintaining a continuity strategy (Cobit 5 - DSS04.02), statesi:
- Maintain the resilience of critical assets by applying regular preventive maintenance, monitoring performance, and, if required, providing alternative and/or additional assets to minimise the likelihood of failure; and
- Assess the likelihood of threats that could cause loss of business continuity and identify measures that will reduce the likelihood and impact through improved prevention and increased resilience.
ITIL
v3
The IT
Infrastructure Library v3 (ITIL v3) defines resilience as “the
ability of a Configuration Item or IT Service to resist Failure or to
Recover quickly following a Failure. For example an armoured cable
will resist failure when put under stress.”ii
ITIL
provides further guidance in Services Operations highlighting that
“resilience is designed and built into the system, for example
multiple redundant disks or multiple processors. This protects the
system against hardware failure since it is able to continue
operating using the duplicated hardware component.”iii
ITIL v3
also provides guidance with respect to software resilience
recommending “software, data and operating system resilience is
also designed into the system, for example mirrored databases (where
a database is duplicated on a backup device) and disk-striping
technology (where individual bits of data are distributed across a
disk array – so that a disk failure results in the loss of only a
part of data, which can be easily recovered using algorithms)…
setting up and using virtualization systems to allow movement of
processing around the infrastructure to give better
performance/resilience in a dynamic fashion.”iv
ITIL v3
defines fault tolerance as “the ability of an IT service or other
configuration item to continue to operate correctly after failure of
a component part.”v
ITIL v3
defines a countermeasure as referring to “any type of control. The
term is most often used when referring to measures that increase
resilience, fault tolerance or reliability of an IT service.”vi
ITIL v3
defines redundancy as “the use of one or more additional
configuration items to provide fault tolerance. The term also has a
generic meaning of obsolescence, or no longer needed.”vii
ITIL v3
defines high availability as “an approach or design that minimizes
or hides the effects of configuration item failure from the users of
an IT service. High availability solutions are designed to achieve an
agreed level of availability and make use of techniques such as fault
tolerance, resilience and fast recovery to reduce the number and
impact of incidents.”viii
FFIEC
The
FFIEC IT Examination handbook defines resiliency as “the ability of
an organization to recover from a significant disruption and resume
critical operations” and resiliency testing as “testing of an
institution’s business continuity and disaster recovery resumption
plans.”ix
So
what is IT Resilience?
From the
preceding literature review of industry guidance, resilience
comprises the following:
- Failure risk assessment and preventative countermeasures
- Rapid incident detection and response
- Recovery and countermeasure improvement
What
this practically would look like would be that IT failure risk
assessments would be performed at an end-to-end service application
and infrastructure level (i.e. a business service is delivered
through applications hosted on infrastructure). These risk
assessments would then be used to design and implement preventative
countermeasures.
Countermeasures
you’d expect to see would be redundancy, clustering, load
balancing, fault tolerance or automatic failover switching features
in the architecture with no single points of failure.
When an
incident occurs that impacts either the assessed risks or the actual
resilience features in the architecture, you’d expect this to be
detected early and to see a well rehearsed, tested and informed
incident management process respond to the incident to ensure
recovery of resilience features.
Finally,
you’d expect to see appropriate recovery options available to be
able to support rapid recovery such as up to date backups, fully
tested disaster recovery sites and associated IT business continuity
plans that have been well tested.
Endnotes
i
ISACA, Cobit
5 - Enabling Processes,
United States, 2012. Available at:
http://www.isaca.org/COBIT/Pages/COBIT-5-Enabling-Processes-product-page.aspx
(Accessed 6 March 2014).
ii
AXELOS Limited, ITIL
glossary and abbreviations,
United Kingdom, 2011. Available
at: http://www.itil-officialsite.com/InternationalActivities/ITILGlossaries_2.aspx
(Accessed 6 March
2014).
iii
Ibid
iv
Ibid
v
Ibid
vi
Ibid
vii
Ibid
viii
Ibid
ix
Federal Financial Institution
Examination Council, The
FFIEC IT Examination Handbook - Glossary,
United States of America, 2006. Available at:
http://ithandbook.ffiec.gov/glossary.aspx
(Accessed 6 March
2014).
PS: This is also published in the IT Risk Practitioner
PS: This is also published in the IT Risk Practitioner
No comments:
Post a Comment