No matter how well you plan and organise your teams, you are bound to encounter some form of unexpected disruption or challenges that impact your team productivity and overall project progress. Incident management is a valuable IT service management process that helps identify, analyse, manage and resolve incidents to keep your projects moving.
What is an Incident?
Before jumping into incident management, it is important to lay down foundational definitions. An Incident is any kind of event or occurrence that can disrupt or reduce the quality of a product and/or service. As a term most often used in the IT space, incidents are often discussed in terms of IT or service-related disruptions such as a business application going offline or a web-server crashing. An incident’s impact can range from affecting a single user to an entire organisation.
What is Incident Management?
Incident management involves the governing practices that manage the restorative responses and actions to any interruption in service due to issues such as outages or performance limitations. It is a critical aspect of IT Service Management (ITSM) and is most often used in tandem with release management.
One of the most common visible forms of incident management practices and tools includes the incident tracking system that enables customers to flag service incidents and for IT teams to actively track, address and communicate actions to the necessary stakeholders.
Benefits of Incident Management
Our world is increasingly dependent on IT solutions, tools and services to run smoothly. From banking to working to communication and so much more, there is nearly no aspect of our lives that is not impacted by or dependent on the consistent functioning of IT services. During this time, service incidents have significant impacts.
In fact, research from Gartner shows that the average cost of IT downtime is $5,600 per minute. Their survey found that 33% of enterprises could lose upwards of $1-5 million for every hour of IT service downtime. However, proper incident management does so much more than improve cost savings and reduce downtimes. Other valuable benefits of incident management include:
- Enhanced customer and employee experience through consistent quality service
- Faster future incident resolution through continuous learning and improvement
- Greater overall efficiency and productivity within teams
- Improved visibility, transparency and accountability with both teams and customers
- Further insight into service quality and areas that require improvement
Incident Management Process
Successful incident management all flows from having a robust incident management framework or process. While incident management implementation may vary slightly depending on the type of organisation, project or service in use, most incident management processes typically follow along the lines of the following five stages.
Stage One: Incident Identification
The first step in the incident management process is to identify the incident. An incident can emerge from any part of a project and properly taking notice and logging the incident allows teams to promptly address it.
When identifying and logging an incident, some critical information includes the following:
- Incident name or identification number
- Incident description
- Incident date and time
- Name of the person who reported the incident
Stage Two: Incident Categorization
Once the incident has been identified, it needs to be further categorized to help make sure the incident is being addressed by the right people. An incident category is a high-level description that describes the type of incident with a related or relevant keyword. It should be logical and intuitive to your incident to avoid any confusion.
This is particularly useful for an ITSM service desk as it allows incident ‘tickets’ to be effectively sorted and allocated to the appropriate teams as well as easily highlight high-priority incidents. For example, an important category to includes “network” with a sub-category called “network outage”. For a service dependent organisation, a network outage can be classified as a high priority issue and would require an immediate incident response.
Having clear and defined incident categorisation is also useful in providing accurate incident tracking data as it allows teams to easily identify patterns within select categories. This gives teams the opportunity to spot areas in their incident management process or teams that may be lacking and requires improvement.
Stage Three: Incident Prioritisation
While incidents all require some form of response, some require a more immediate response than others. That is why incidents need to be properly prioritised according to their urgency and impact. The urgency reflects how quickly a response is needed and impact measures the potential damage the incident can inflict upon the project.
Incidents are typically prioritised through a three-tiered priority level indicator:
- Low-priority incidents: do not require immediate action as it does not disrupt users or the business. These incidents can typically be resolved in due process and can be worked around in the meantime.
- Medium-priority incidents: can impact some business and customer operations, leaving both inconvenienced.
- High-priority incidents: can impact a large number of customers, disrupt the business and impact service delivery. These incidents typically have a substantial financial impact on the business.
Stage Four: Incident Response
Once the incident has been identified, categorised and prioritised, it can then be assigned to the right person, also known as the incident owner, who typically has to respond to the issue through the following fice steps.
1. Initial diagnosis
This is typically where the team performs a general investigation into the described incident, sometimes asking the customer or team member about the incident so they can either troubleshoot the issue by following the appropriate procedures. It is helpful for teams to have access to a diagnostic manual or knowledge base during this stage. Depending on the type of incident, the incident may be resolved based on the initial diagnosis or it may need to be further escalated.
2. Incident escalation
If front-line teams are not able to address the incident, it may require more advanced support from a higher-level support team. Though the majority of incidents can be addressed at the first level, some will need to be properly escalated and having a clear escalation process and structure is important to make sure the incident is addressed promptly.
3. Incident investigation and diagnosis
In this stage, the high-level support member will undertake a deeper analysis to test and probe the initial hypothesis made during the initial diagnosis stage. With the further diagnosis, teams will be able to recommend and apply the appropriate solution.
4. Incident resolution and recovery
Ideally, the incident will progress to this stage and the solution will be implemented through clearly defined steps. In the situation that this does not occur, the incident will regress into the investigation and diagnosis stage to test another hypothesis and provide a new solution.
Not all solutions will work instantaneously. That is why the recovery stage, basically the time it takes for operations to be fully restored, must be actively considered. The recovery stage may require some additional testing before the proper resolution can be made.
5. Incident closure
Once the incident has been resolved, it is passed back to the IT service desk to be properly closed. Service desk employees have the task of making sure that the incident owner has already directly communicated the incident resolution with the person who reported it and ensures that the resolution itself is satisfactory. Only then can the incident be fully closed and the incident response ends.
Stage Five: Incident Process Review
While the incident response process may have come to an end, teams should take time to review their incident management processes. This can be conducted periodically, be it monthly, quarterly or yearly, but it is an essential step that allows teams to reconvene, analyse their current practices and identify new opportunities for improvement.
As mentioned prior, having a clear incident categorisation structure is a valuable tool to help future incident management analysis. However, the most valuable tool to have is a robust incident management solution that can store, manage and further visualise all your incident management related activities for later reflection. Make sure your next PPM solution has a proper incident management tool that can help you make the most out of your incident management data and activities.
Take your incident management and ITSM activities to the next level
Incident management is a critical part of effective IT service management. If your organisation is looking to take your ITSM activities to the next level, look no further than pmo365.
pmo365 is a cloud-based all-in-one project portfolio management software that adapts to your organisation’s unique needs by creating bespoke solutions that are fit for purpose, particularly for IT projects. With the power of Microsoft’s Power Platform and a highly qualified team of developers, we not only apply ITSM in our own teams but are also experts at creating ITSM solutions that have the features, flexibility and adaptability you need in an effective ITSM solution.