|
Our Ref. : |
B1/15C
B9/29/2C |
7 January 2008
|
The Chief Executive
All Authorized Institutions |
Dear Sir/Madam,
Examinations on Controls over Information
Technology (IT) Problem and System Change Management
The Hong Kong Monetary Authority
(HKMA) has recently completed a round of on-site examinations of selected
authorized institutions (AIs) on their controls over IT problem and system
change management. The examination results indicate that there are rooms
for improvement in respect of certain aspects of the controls over IT
incident management and implementation of emergency system changes by the
examined AIs.
Specifically,
our examinations found inadequacy in the process of reporting, risk
assessment, escalation and rectification of IT problems. For instance, some
AIs have underestimated the risk implications or incorrectly categorised the
severity of certain reported IT incidents. As a consequence, lower priority
and insufficient resources have been assigned for identifying and rectifying
the root cause of these IT incidents, which is intended for preventing the
incidents from recurring or evolving into a major system disruption
incident. In addition, control weaknesses are identified in relation to the
implementation of emergency changes to production systems and IT
infrastructure. Some emergency system changes are not supported by formal
documentation, and access to and use of high-privilege user and system IDs
are not adequately controlled. These control weaknesses will obviously
increase the risk of unauthorised access to the production system
environment and thus the chance of system disruptions.
To help the banking sector improve
the controls over IT problem and system change management,
I set out
in Annex 1 and Annex 2
respectively
for your reference a list of
major common issues and some good
practices adopted by the examined AIs. I
would also like to take this opportunity to remind your institution of the
need to regularly assess the adequacy of the IT problem and system change
controls within your operating environment.
Should you have any questions
about the content of this circular, please contact Mr. Shu-pui Li at
2878-1826 or Mr. Nelson Chow at 2878-1470.
Yours faithfully,
Arthur Yuen
Executive Director
(Banking Supervision)
Annex 1 – Common Controls Issues Identified
Reporting, risk assessment, escalation and management of IT problems
Delays in IT problem resolution
and recurring of incidents
-
Delays in replacing a faulty
system component have finally caused a major system disruption incident.
-
Although some reported IT
incidents are found to have affected the customer services such as
outages of the ATM and phone banking services, these incidents are only
assigned with a low severity level. It resulted in delays of problem
resolution.
-
A number of recurring IT
incidents are believed to be caused by insufficient testing before
system implementation.
Misleading problem trend
analysis reports
-
Misclassification of problem
severity of (e.g. lower severity level assigned to severe incidents),
and wrong root causes being identified for, IT problems resulting in
possible misleading outcome of problem trend analysis.
-
Some AIs do not adopt
automated tools for IT problem reporting and management. In some cases,
the problem records are manually maintained and resulted in incomplete
records.
System change requests and
implementation
Inappropriate timing for
scheduled changes to critical systems
Handling of high-privilege IDs
for change implementation
-
A few AIs do not monitor and
review the usage of high-privilege user and/or system IDs, particularly
after change implementation. In addition, access attempts to firewalls
performed by IT support staff have not been reviewed. As a result,
unauthorised changes
and/or errors made during routine maintenance work to critical network
infrastructure might not be detected promptly.
Inadequate emergency change
request process
Annex 2 – Good practices adopted by certain AIs
Senior management oversight
-
Several AIs produce regular
and good quality problem statistics and trend analysis reports
(including categorisation and detailed root cause analysis of the
incidents) for review by the senior management.
-
A number of AIs have
established a dedicated change management committee or function to
review and prioritise system and infrastructure change requests. Such
dedicated functions help ensure that scheduled changes are properly
managed, prioritised and approved, and sufficient resources are
allocated to the change requests.
Reporting, escalation and management of IT problems
-
All AIs examined have
established a designated incident response team and structure to oversee
the problem management process.
-
A few AIs have developed a set
of comprehensive procedures for problem reporting and escalation,
including a mechanism for assessing the need to report the incidents to
relevant authorities.
Scheduled system change requests and implementation
-
A few AIs have established
"Change Windows" (i.e. the periods of time that changes to production
systems are allowed to be made) for individual systems which are
mutually agreed between the IT department and business users. The
“Change Windows” for high-risk changes are required to be allocated to
non-business days (e.g. weekends / long weekends) and long before
commencement of business hours to allow sufficient time for fallback
implementation if required.
-
Some AIs have implemented an
effective network monitoring and management system to record and monitor
user activities and system changes to network equipment to ensure timely
detection of unauthorised
changes.
-
A few AIs have implemented a
remote console system that enables IT support staff to have direct
logical access to the production system and infrastructure to facilitate
problem troubleshooting, and implementation of emergency changes. The
remote console system avoids the need of IT support staff to enter the
data centre physically. The system is installed in a secured room and
access to the system is restricted to authorized staff only. Audit
trails of access to the production systems and infrastructure are also
maintained.
Emergency change requests and implementation
Others
-
A number of AIs have
implemented automated systems for problem and change management (e.g.
problem assignment, approval process and status monitoring).
-
A few AIs have implemented
automated systems for managing password assignment and reset of
high-privilege users' and systems’ passwords to facilitate monitoring of
their usages in particular for problem troubleshooting or change
implementation.
|