Problem Management

Original post in Brazillian portuguese by Emerson Dorow. He authorized me to post in english language.

http://www.profissionaisti.com.br/2009/03/gestao-de-problemas/

In our last post on the ITIL processes, we talked about the Incident Management, which in short, has the responsibility to restore service as quickly as possible with minimal impact to the business. However, the Incident Management does not care to take measures to prevent the failure from occurring again, and this is the primary responsibility of Problem Management.

The Problem Management also helps the Incident Management in the event of serious incidents. In practice, the same team may be responsible for these two processes, because they complement each other. However, it is interesting that people are designated to attend incidents or problems. Maybe make a rotation of the team for weeks or months is interesting so that anyone are not always “doing the same thing.”

The Problem Management also has the task of registering all known bugs and their workarounds. Below are some important definitions to be observed:

  • Problem: It is the unknown cause of one or more incidents;
  • Cause: an error in a Configuration Item;
  • Known Error: It is a problem whose cause has been diagnosed and for which there is a solution Outline;
  • Outline Solutions (Workaround): no definitive solution;
  • Solution: Final Solution.

Another important role of Problem Management is to scan the incidents or calls that have been opened and closed and see if any asset (configuration item) may be experiencing problems or can be improved.

A practical example: The staff of Problem Management found that in the last month there were three stops in the network department Y and to deal with the incident the staff of Incident Management is always restarting the switch XYZ. The staff of the Problem Management requested the replacement of equipment after carrying out an analysis of the same hardware, discovering the cause of the problem. It is extremely important that everything is correctly registered. All this information generated by Problem Management can be used by Incident Management, for example, to know that for the incident of the Software X there is a workaround to be implemented and fix it temporarily.

In the analysis of a problem (incident with unknown cause) is essential to find the “root cause” of the problem because it triggers all the subsequent incidents. When you find a solution to the problem, this problem becomes a “known bug”. To resolve this “known bug” must adopt a solution or workaround through change management that will evaluate the feasibility and business impact of this solution because the change in hardware or software may affect other configuration items.

Thanks!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 94 other followers