Incident management - severities and priorities

ITSM

Jakub Roszkiewicz · May 2026 · 10 min read

Incident: a production line stops for two hours - this is a critical case with the shortest acceptable response time. Incident: one employee's mailbox does not work - this is a lower-priority case. In manufacturing, incident management is not just help desk; it is a process with direct operational and financial impact: every hour of line downtime is a real loss. In this article I break down severity vs priority, the classification matrix, SLA for manufacturing, and how to set the target MTTR.

P1-P4

incident classification by impact and urgency

RCA

root cause analysis after every critical incident

SLA

measurable response and resolution times

Severity vs Priority - what is the difference

Severity - business impact. How bad is it? Can people work? Is the network down for 500 people or just for one?

Priority - urgency of the fix. How quickly does it have to be fixed?

Example: The CEO's email is not working (severity: LOW - 1 person, but priority: CRITICAL - because it is the CEO). The network is down all Friday (severity: CRITICAL - 100+ people, priority: CRITICAL).

Severity x Priority matrix

Severity \ Priority	P1 (Immediate)	P2 (Urgent)	P3 (Standard)	P4 (Low)
Critical (entire production)	P1-CRIT (1h MTTR)	P1-URG (2h)	P2 (4h)	P3 (8h)
High (department/team)	P1-URG (2h)	P2 (4h)	P3 (8h)	P4 (24h)
Medium (1 user)	P2 (4h)	P3 (8h)	P4 (24h)	P4 (48h)
Low (1 OS, no impact)	P3 (8h)	P4 (24h)	P4 (24h)	P4 (48h)

MTTR benchmark - how much time do you have?

P1-Critical (prod down): max 1 hour. In practice: IT on site in 15 minutes, diagnosis in 20, fix in 30. After resolution: RCA within 2 days.

P2-Urgent (department down): max 4 hours. IT in 30 min, 30 min diagnosis, 2h fix. RCA within 1 week.

P3-Standard (1 person cannot work): max 8 hours. The fix can be a "temporary patch" - e.g. application restart, password reset, if the permanent fix will be ready tomorrow.

P4-Low (something works but slowly, not critical): max 48 hours. This can wait until the next maintenance window.

RCA after a P1 incident - mandatory for manufacturing

Always! After every P1 - the team performs an RCA within 2 days. You document:

Timeline (at 10:30 the network goes down, at 10:35 IT is called, at 11:00 the router is restarted, at 11:15 the service is restored)
Root cause (a router firmware upgrade rolled out on Wednesday without testing introduced a bug)
Resolution (rollback to the previous version, router restart)
Long-term fix (e.g. testing every upgrade in a QA environment before production rollout)
Prevention (procedure: every upgrade must be tested, approval by the change board)

Without RCA it is easy to repeat the same mistake. A solid root cause analysis after a critical incident allows you to eliminate the source of the problem and avoid further costly outages.

Incident management in ManageEngine SDP

Setup:

Admin -> Incident Management -> Priorities - define P1-P4 and SLAs
Admin -> Impact/Urgency - define the severity matrix (Critical/High/Medium/Low)
Configure escalation rules: P1 -> notify the IT manager + VP Operations, after 30 min
Configure notifications: P1 -> SMS + email + Slack alert to all technicians
Reports -> SLA compliance - track what % of P1s meet the 1h MTTR target

SLA compliance in manufacturing - what to track

KPI #1: % of P1 meeting MTTR < 1h - Target: 90%+. Below 80% = the process is not working.

KPI #2: MTTR trend per priority - Is MTTR rising or falling? A downward trend = good, people are learning.

KPI #3: Repeat incident rate (% of recurrences) - After a P1 there should be an RCA and a fix. If the same incident comes back - the RCA did not work.

KPI #4: Time to detect incident - Ideally a P1 should be auto-detected by monitoring (network goes down = alert in 30 seconds). If a P1 is reported by an employee via email = monitoring is misconfigured.

Jakub Roszkiewicz

CTO · Rotech Group · manufacturing incident management expert

Incident management for your manufacturing site?

Rotech Group will configure incident management in ManageEngine SDP, define SLAs for P1-P4 and train the team in RCA. We will help you set measurable compliance targets for your plant.

Book a consultation

Incident management in manufacturing -severities, priorities and SLA