Tuesday, 11 March 2014

Financial services systems change failures and how to control them

When it comes to systems change there are a number of notable failures in the financial services industry:

January 2009: - It was reported that IT systems engineer Rajendrasinh B. Makwana almost brought down 4,000 critical servers with a logic bomb, embedded in developed scripts, which could have lost Fannie Mae “many millions of dollars” that was only discovered by chance by another engineeri.

January 2010: It was reported that a HSBC Mainframe upgrade shut down cash machines and online banking for HSBC customers as part of upgrade to One HSBC platformii. This was in addition to a similar outage in June 2009 a further telephone banking outage in February 2008 due to “coding” changesiii.

September 2010: It was reported that J.P. Morgan’s online banking service was offline for 3 days due to third party database software “corrupting the login process” impacting 16 million customersiv. It was reported that J.P. Morgan appeared not to have a roll-back plan so they could recover while continuing business as normalv.

June 2012: It was reported that the Royal Bank of Scotland to pay £125 million in costs related to a glitch in the CA7 batch process scheduler as part of systems maintenance activity that resulted in 12 million customer accounts being frozen for almost a weekvi.

August 2012: It was reported that Knight Capital Group lost $440 million in 30 minutes and wiped 62% of its stock price, due to a trading software algorithm glitch that generated erratic trades and that bought high and sold low for nearly 150 stocksvii. The glitch resulted in 4 million additional trades in 550 million shares that would not have occurred otherwiseviii.

August 2013: It was reported that Goldman Sachs lost $100 million due to an automated trading systems glitch that caused a number of incorrect options trades that disrupted US exchange trading affecting shares with listing symbols starting with the letter H through Lix. The glitch caused automated trading systems to accidentally send indications of interest as real orders to be filled at the US exchanges. The cause was reported to be due to inadequate software testingx.

September 2013: It was reported that Clydesdale Bank was fined £8.9 million by the Financial Conduct Authority for failing to inform customers of their rights after a software glitch caused the miscalculation of repayments on over 42,500 mortgagesxi.

Risk and associated controls

A good, actionable risk statement that captures these events is:

Customer data leakage, corruption or system unavailability caused by defective or malicious system changes resulting in financial losses of UK £100 million, customer churn of 6.4 percentxii and regulatory sanction by the Financial Conduct Authority and Information Commissioner’s Office.”

This risk statement is a lower level risk that contributes to the organisational level risk of for example:

Loss of market share caused by eroded customer confidence in the organisation’s information security resulting in net revenue reduction to the order of hundreds of millions and bank share value reduced from loss of market confidence in operational management.”

From the lower level risk statement we can then identify the risk causes that need to be controlled. In this case we need to control defective or malicious systems changes that might result in customer data leakage, corruption or systems unavailability.

To take these in turn, we’d need to implement a change quality testing process to ensure that system changes are adequately tested which may include activities such as code quality reviews, unit, functional, systems, integration and regression testing. An additional step for business supporting systems would be user acceptance testing by the business that also includes tests for boundary conditions and invalid data inputs to the system data input interfaces.

We’d then need to implement a change control strategy that uses technical and administrative controls to restrict the ability to make changes to production or critical systems unless these changes are approved. The approval should not be a simple tick in the box but should require appropriately senior stakeholder approval of changes with high risk changes signed off at senior executive levels within the IT and business areas. Part of this sign-off should be that they have assured themselves that the change has been adequately tested and is fit for purpose.

There is a further control required to make these two controls work. This control is to ensure there is a technically enforced separation of duties so that those making changes cannot implement these changes in the target environment.

In order to ensure these controls are adequately and effectively implemented there needs to be clearly articulated and enforceable policies, standards, procedures and guidelines in place. The policies and standards need to be clear and unambiguous, have an owner and describe the enforcement actions that will be taken if the policy or standard is not complied with. These enforcement actions must then be applied for all cases of non-compliance. Where a non-compliance is expected this needs to be pre-approved with the policy owner and clearly highlighted to the system senior stakeholders and approved at the appropriate senior executive level within the technology and business areas involved in the change.


i Keizer, G., Ex-Fannie Mae engineer pleads innocent to server bomb charge, United States of America, January 2009. Available at: (Accessed 6 March 2014).
ii, HSBC mainframe outage causes major HSBC network crash, United States, January 2010. Available at: (Accessed on 11 March 2014).
iii Ibid
v Ibid
vi Flinders, K., RBS computer problem costs £125m, United States, August 2012. Available at:
vii Philips, M., Knight Shows How to Lose $440 Million in 30 Minutes, United States, August 2012. Available at:
viii Ibid
ix Holley, E., Goldman Sachs trading error is “a warning to all”, United States, August 2013. Available at:
x Ibid
xi Nguyen, A., Clydesdale Bank fined £8.9m over mortgage system problem, United Kingdom, September 2013. Available at: (Accessed 11 March 2014).
xii Figure of 6.4% customer churn comes from: Ponemon Institute, 2011 Cost of Data Breach Study: United Kingdom, United Kingdom, March 2012.