No matter how well you plan and organise your teams, you’re likely encounter a challenge that will impact your project. Managing incidents in I.T. is a process which helps identify, analyse, manage, and resolve incidents to keep your projects moving. By effectively managing any incidents which pop up, you’ll maintain the health of your I.T. systems without issues.
Before learning about managing incidents in I.T., it is important to identify some foundational definitions. An incident is any event that can disrupt a product and/or service. When used in the I.T. space, we often discuss incidents in terms of I.T. or service-related disruptions. This can include a business application going offline or a web-server crashing. An incident’s impact can range from affecting a single user to an entire organisation.
Incident management involves the practices that manage the restorative responses and actions to any interruption in service. These interruptions could be because of outages, performance limitations, or unexpected results when transitioning between systems. Incident management is a critical aspect of I.T. Service Management (ITSM), and we often use in tandem with release management.
One of the visible forms of incident management practices includes an incident tracking system. This system should enables customers to flag service incidents. The same system should allow I.T. teams to track, address, and communicate actions to the necessary stakeholders.
Our world is increasingly dependent on I.T. solutions, tools, and services to run smoothly. During this time, service incidents have significant impacts.
In fact, research from Gartner shows that the average cost of I.T. downtime is $5,600 per minute. Their survey found that 33% of enterprises could lose upwards of $1-5 million for every hour of I.T. service downtime. However, proper incident management does so much more than improve cost savings and reduce downtimes. Other valuable benefits of managing incidents in I.T. include:
Successful incident management flows from a robust incident management framework or process. Most incident management implementations vary depending on the organisation, project or service in use. However, there are five main steps which you should follow when implementing an incident management process.
The first step in the incident management process is to identify the incident. An incident can emerge from any part of a project. Therefore, being aware of potential incidents and logging the incident allows teams to promptly address it.
When identifying and logging an incident, some critical information includes the following:
Once we have identified the incident, it needs to be further categorised. This categorisation ensures the incident is being addressed by the right people. An incident category is a high-level description that describes the type of incident with a related or relevant keyword. It should be logical and intuitive to your incident, to avoid any confusion.
Categorisation of incidents are particularly useful to an ITSM service desk. This is because categorisation ensures incident ‘tickets’ are be effectively allocated to the appropriate teams. It also as well as easily highlight high-priority incidents. For example, an important category to includes “network” with a sub-category called “network outage”. For a service dependent organisation, a network outage can be classified as a high priority issue and would require an immediate incident response.
Having clearly defined incident categorisation is also useful to provide accurate incident tracking data. This is because it allows teams to easily identify patterns within select categories. As a result, managers have the opportunity to spot areas in their incident management process or teams that may require improvement.
While incidents all require some form of response, some require a more immediate response than others. That is why we need to prioritise incidents according to their urgency and impact. We can determine the urgency by how quickly the incident requires a response. To measure the impact, check the potential damage the incident can inflict upon the project.
We can prioritise incidents through a three-tiered priority indicators:
Once we have identified, categorised, and prioritised the incident, we can assign to the right person. This person, also known as the incident owner, typically has to respond to the issue through the following five steps.
This is when the team performs a general investigation into the described incident. This sometimes requires asking the customer or team member about the incident so they can troubleshoot the issue. It is helpful for teams to have access to a diagnostic manual or knowledge base during this stage. Depending on the type of incident, you may choose to resolve the incident on the initial diagnosis, or it may need to further escalate it.
If front-line teams are not able to address the incident, it may require more advanced support from a higher-level support team. Though you may be able to address the majority of incidents at the first level, it’s wise to escalate it appropriately. If your organisation has a clear escalation process, you’ll be better equipped to ensures the incident is addressed promptly.
In this stage, the high-level support member will undertake a deeper analysis to test and probe the initial hypothesis made during the initial diagnosis stage. With the further diagnosis, teams will be able to recommend and apply the appropriate solution.
Ideally, your team will progress the incident to this stage, and will implement the chosen solution. In the situation that this does not occur, the incident will regress into the investigation and diagnosis stage to test another hypothesis and provide a new solution.
Not all solutions will work instantaneously. That is why it’s important for us to consider the recovery stage, or the time it takes for your team to fully restore operations. The recovery stage may require your team conducts some additional testing before the final resolution of the incident.
Once your team has resolved the incident, the I.T. desk will formally close it. Service desk employees have the task of making sure that the incident owner has already directly communicated the resolution with the person who reported it. Once this has been confirmed, they should confirm that the resolution itself is satisfactory. Only then can you fully close the incident, and end the incident response.
While the incident response process may have come to an end, teams review their incident management processes. Many teams choose to conduct this periodically, as it allows them to reconvene, analyse their current practices, and identify new opportunities for improvement.
As mentioned prior, having a clear incident categorisation structure is a valuable tool to help future incident management analysis. In addition, a robust incident management solution that can store and manage all your related activities is a valuable tool to have. Ensure that your PPM solution has a proper incident management tool, so that you can make the most from your incident management data and activities.
Incident management is a critical part of effective I.T. service management. If your organisation is looking to take your ITSM activities to the next level, look no further than pmo365.
pmo365 is a cloud-based all-in-one project portfolio management software that adapts to your organisation’s unique needs. We create custom PPM solutions that are fit for purpose, and we can configure them to best support your I.T. projects. Our developers create ITSM solutions through the Microsoft’s Power Platform that have the features, flexibility and adaptability you require.
If you want to find out more about how pmo365 can help elevate your ITSM activities, make sure to read more about our services here, or talk directly to our PPM experts!