ITSM

Incident management in manufacturing -
severities, priorities and SLA

P1-P4 classification, Severity x Priority matrix, post-incident RCA, MTTR benchmarks for a manufacturing plant, SLA compliance.

Back to Blog
ITSM
Jakub Roszkiewicz · May 2026 · 10 min read

Incident: a production line stops for two hours - this is a critical case with the shortest acceptable response time. Incident: one employee's mailbox does not work - this is a lower-priority case. In manufacturing, incident management is not just help desk; it is a process with direct operational and financial impact: every hour of line downtime is a real loss. In this article I break down severity vs priority, the classification matrix, SLA for manufacturing, and how to set the target MTTR.

P1-P4
incident classification by impact and urgency
RCA
root cause analysis after every critical incident
SLA
measurable response and resolution times

Severity vs Priority - what is the difference

Severity - business impact. How bad is it? Can people work? Is the network down for 500 people or just for one?

Priority - urgency of the fix. How quickly does it have to be fixed?

Example: The CEO's email is not working (severity: LOW - 1 person, but priority: CRITICAL - because it is the CEO). The network is down all Friday (severity: CRITICAL - 100+ people, priority: CRITICAL).

Severity x Priority matrix

Severity \ Priority P1 (Immediate) P2 (Urgent) P3 (Standard) P4 (Low)
Critical (entire production) P1-CRIT (1h MTTR) P1-URG (2h) P2 (4h) P3 (8h)
High (department/team) P1-URG (2h) P2 (4h) P3 (8h) P4 (24h)
Medium (1 user) P2 (4h) P3 (8h) P4 (24h) P4 (48h)
Low (1 OS, no impact) P3 (8h) P4 (24h) P4 (24h) P4 (48h)

MTTR benchmark - how much time do you have?

P1-Critical (prod down): max 1 hour. In practice: IT on site in 15 minutes, diagnosis in 20, fix in 30. After resolution: RCA within 2 days.

P2-Urgent (department down): max 4 hours. IT in 30 min, 30 min diagnosis, 2h fix. RCA within 1 week.

P3-Standard (1 person cannot work): max 8 hours. The fix can be a "temporary patch" - e.g. application restart, password reset, if the permanent fix will be ready tomorrow.

P4-Low (something works but slowly, not critical): max 48 hours. This can wait until the next maintenance window.

RCA after a P1 incident - mandatory for manufacturing

Always! After every P1 - the team performs an RCA within 2 days. You document:

Without RCA it is easy to repeat the same mistake. A solid root cause analysis after a critical incident allows you to eliminate the source of the problem and avoid further costly outages.

Incident management in ManageEngine SDP

Setup:

  1. Admin -> Incident Management -> Priorities - define P1-P4 and SLAs
  2. Admin -> Impact/Urgency - define the severity matrix (Critical/High/Medium/Low)
  3. Configure escalation rules: P1 -> notify the IT manager + VP Operations, after 30 min
  4. Configure notifications: P1 -> SMS + email + Slack alert to all technicians
  5. Reports -> SLA compliance - track what % of P1s meet the 1h MTTR target

SLA compliance in manufacturing - what to track

KPI #1: % of P1 meeting MTTR < 1h - Target: 90%+. Below 80% = the process is not working.

KPI #2: MTTR trend per priority - Is MTTR rising or falling? A downward trend = good, people are learning.

KPI #3: Repeat incident rate (% of recurrences) - After a P1 there should be an RCA and a fix. If the same incident comes back - the RCA did not work.

KPI #4: Time to detect incident - Ideally a P1 should be auto-detected by monitoring (network goes down = alert in 30 seconds). If a P1 is reported by an employee via email = monitoring is misconfigured.

JR
Jakub Roszkiewicz
CTO · Rotech Group · manufacturing incident management expert

Incident management for your manufacturing site?

Rotech Group will configure incident management in ManageEngine SDP, define SLAs for P1-P4 and train the team in RCA. We will help you set measurable compliance targets for your plant.

Book a consultation