Maintenance & ITSM

Machine fault ticketing in a plant
how to stop firefighting

Excel, the phone, and a Post-it on the maintenance office door are not a system. They are the cause of unquantifiable production losses. See how to deploy professional machine fault management in 4 weeks.

~10 min read
Updated: May 2026
Rotech Group, ITSM implementations team

This article is a practical guide for maintenance managers, IT departments, and operations directors in manufacturing companies. We cover CMMS, IT help desk, SLA configuration, MTTR/MTBF metrics, and a realistic implementation model based on ManageEngine ServiceDesk Plus.

1. Why machine fault tickets in Excel or on the phone are a disaster

A maintenance technician takes a call from an operator, writes something in Excel or on a whiteboard, fixes the machine, and moves on to the next call. Breakdown history? Does not exist. Downtime? Nobody measures it. MTBF for a specific machine? The answer: "we have a feel for it".

That is the state of affairs in most plants of up to 500 employees. And that is the source of quantifiable losses that can be eliminated at relatively low system cost.

Status quo

Excel, phone, sticky notes

  • X No repair history: each breakdown starts from scratch
  • X No prioritization: whoever shouts loudest gets served
  • X Uncountable downtime: nobody knows the cost of one hour of stoppage
  • X No MTBF/MTTR analysis: no data = no path to improvement
  • X Knowledge locked in one technician's head: single point of failure risk
  • X No compliance with ISO 55000 and asset management norms
  • X No way to plan preventive maintenance on data
Target

Fault ticketing system

  • OK Full repair history per machine with dates and technicians
  • OK Automatic prioritization by SLA: the system decides what is urgent
  • OK Measurable response and repair time: MTTR to the minute
  • OK MTBF data that lets you plan PPM (Planned Preventive Maintenance)
  • OK Knowledge available to the whole maintenance team, not one person
  • OK Audit trail for certifications and machine insurance
  • OK Integration with spare parts inventory and schedules

The most common objection: "we do not have time to deploy a system because we have too many breakdowns". That is a thinking trap. A ticketing system surfaces recurring fault patterns and enables the shift from firefighting to prevention - and data-driven preventive maintenance reduces the number of unplanned stoppages. Without data there is no way to act preventively. You can only put out fires.

2. The difference between CMMS and IT help desk

Before choosing a tool, see how the two main system types differ. Many manufacturing companies try to use only one of them for the whole thing. The result is compromises that hurt both departments. More context in our article on ITSM for manufacturing: deploying a help desk in a factory.

CMMS: Computerized Maintenance Management System

CMMS is a tool designed for machines and physical infrastructure. Its center is the asset: a specific machine, production line, or piece of equipment. Typical CMMS features:

IT help desk (ITSM)

IT help desk, more broadly ITSM (IT Service Management), focuses on IT services and users. The center is the incident or service. Typical ITSM features:

Why manufacturers need both systems or an integration

In a modern manufacturing plant, the IT/OT (Operational Technology) line is blurry. A PLC controller fault is a technical incident (OT) but may require IT involvement (network, vendor remote support). A "the label printer on line 3 is not working" ticket can be both IT (driver) and maintenance (belt feed mechanics).

Two solutions: either one system with rich category configuration (hybrid), or two separate systems linked by integration. We discuss both models in detail in chapter 5.

3. Key system features for manufacturing

Not every help desk fits manufacturing. Before choosing software, check whether it has these specific features. Without them the system will not match shop floor reality:

Fault priorities tailored to manufacturing

Manufacturing prioritization has to reflect real impact on line continuity. A "low / medium / high" scale is not enough. You need at least four levels tied to specific machines and lines:

PriorityDefinitionResponse timeRepair time
P1 CRITICALProduction line stopped, production not possible15 minutes2 hours
P2 URGENTLine performance degraded by more than 30%, risk of stoppage1 hour4 hours
P3 NORMALNon-blocking defect, fix within the current shift4 hoursNext shift
P4 PLANNEDPreventive maintenance, inspections, calibrationsNext Business DayPer plan

Fault ticket flow: from operator to maintenance technician

Fault ticket flow diagram

1
Report
Operator: portal, app, phone, QR scan at the machine
to
2
Classification
System auto-assigns priority based on machine and fault category
to
3
Dispatch
SMS/push notification to the on-call technician or maintenance manager
to
4
Execution
Technician confirms pickup, records progress, orders parts
to
5
Closure
Description of work, photos, repair time: input for MTTR

Other key features

4. How to configure SLAs for machine faults

SLAs on the shop floor are not a marketing statement. They are an internal binding commitment that the system enforces automatically. Proper SLA configuration requires answers to three questions: what do we measure (start event), from when do we count, and when do we stop the clock.

Start event and stop event

Examples of real SLA configuration

Breakdown scenarioPriorityResponseRepairEscalation to
CNC machining center stoppedP115 min2 hMaintenance manager after 10 min
Welding robot faultP115 min3 hProduction director after 30 min
Auxiliary compressor: pressure degradationP21 h4 hMaintenance manager after 2h
Conveyor belt: excessive vibrationP21 hNext shiftMaintenance manager after 3h
Oil leak: spindle, not blocking productionP34 hNext dayMaintenance manager after 1 day
Planned maintenance: filter replacementP4NBDPer planNo automatic escalation

Important rule: The P1 SLA must be expressed in minutes, not hours. If the production line generates 50,000 PLN of revenue per hour, every minute of stoppage is more than 830 PLN lost. A 2-hour repair target for P1 means a maximum acceptable loss of about 100,000 PLN from one incident. That should be a board decision, not a system default.

$

Downtime cost formula: calculate your cost before SLA configuration

Use the formula below to set the real cost of one hour of downtime for a specific line. This number should be the starting point for negotiating SLAs inside the organization.

Downtime cost = (Daily revenue / Production hours) x Margin_% + Fixed_cost_per_hour + Labor_cost_per_hour
Example: (80,000 PLN / 16h) x 0.35 + 1,200 PLN + 800 PLN = 3,750 PLN/h
Each hour of stoppage on this line costs the company about 3,750 PLN
P1 SLA = 15 min response + 2h repair = max about 8,000 PLN per incident

5. IT help desk + CMMS integration: two models

When a manufacturer has both IT and maintenance departments, the question of system architecture comes up. The choice depends on scale, organizational maturity, and budget. Here are the two main models we deploy in practice:

A
One system: IT and OT together

ManageEngine ServiceDesk Plus or a similar ITSM system handles both IT tickets and machine faults via rich category, group, and SLA configuration.

  • Category: Machines/Line 1/CNC/Spindle - automatically P1
  • Category: Machines/Infrastructure/Compressors - P2
  • Category: IT/Workstations - separate SLA for IT
  • Separate dispatcher groups: IT department and maintenance department
  • Shared report portal, different forms
B
Two systems with an API integration

ManageEngine SDP for IT + a dedicated CMMS for production, linked by a two-way REST API integration.

  • SDP handles IT incidents, changes, and IT asset management
  • CMMS handles maintenance work orders, PPM, machine history, parts inventory
  • Integration: a machine fault with an IT component creates a ticket in both systems
  • Data flows into a shared BI/reporting dashboard
  • Option of separate certification tracks (ISO 20000 for IT, ISO 55000 for maintenance)

In both models you have to factor in OT networks. Operational Technology networks (PLC, SCADA, HMI) in plants are often isolated from IT networks for safety reasons. ManageEngine SDP runs on-premise without permanent internet access after license activation, which means it can be installed in a network segment reachable from both zones (IT and OT) through a properly configured firewall, with no need to push data outside the plant.

6. What a 4-week implementation looks like

Deploying a machine fault ticketing system does not need a multi-month project. With an organization that has clearly defined maintenance processes and resources ready to commit, 4 weeks is a realistic timeline to operational readiness on the first production line.

Week
1

Infrastructure install and directory integration

  • Install ManageEngine SDP on an on-premise server (Windows Server or VM)
  • Active Directory integration: automatic user import
  • Role configuration: Maintenance technician, Maintenance dispatcher, Operator (reporter), Maintenance manager
  • Outbound email and SMS notification configuration (ManageEngine supports SMTP + an SMS gateway)
  • Import the machine register (minimum: name, line, serial number, category)
Week
2

Category, SLA, and automation configuration

  • Build the fault category tree (Line - Machine - Subassembly - Fault type)
  • Configure 4 SLA levels with response and repair times
  • Automation rules: machine category - P1/P2/P3/P4 priority
  • Escalation rules: no acknowledgement after X minutes - notify the manager
  • Report forms for operators (simplified view, only necessary fields)
  • Configure QR codes at machines: link to the report form
Week
3

Training and pilot on one line

  • Maintenance technician training (2-4h): handling tickets, status, closure with description
  • Operator training (1h): how to report a fault via the portal or QR
  • Manager training (2h): dashboards, SLA reports, escalations
  • Pilot launch on one production line and feedback collection
  • Form and automation corrections based on the first tickets
Week
4

Go-live on all lines and stabilization

  • Roll out to all production lines
  • Shut down/freeze old methods (Excel, whiteboard): single source of truth
  • Launch weekly reports for maintenance management
  • Integration with the spare parts inventory (optional, if inventory is ready)
  • Set a monthly MTTR/MTBF review cadence with management

7. Metrics you should measure after 3 months

Three months after launch you have the first data that lets you genuinely assess maintenance and set goals for the next quarter. Here are four key metrics and how to read them:

Repair time

MTTR: Mean Time To Repair

Average time from reporting a fault to bringing the machine back online. Measure separately per line and per priority.

MTTR = Sum_of_repair_times / Number_of_faults
Reliability

MTBF: Mean Time Between Failures

Average machine operating time between consecutive failures. A rising MTBF means the machine is becoming more reliable or prevention is working.

MTBF = Operating_time / Failures_in_period
Workload

Tickets per line

Identifies lines needing special attention or modernization investment. Lets you justify maintenance budgets with data, not intuition.

Trend: current_month / previous_month
Financial

Downtime cost per incident

Product of MTTR and downtime hourly cost (see the formula in chapter 4). The board's metric, justifying maintenance investment.

Cost = MTTR_h x Downtime_hourly_cost

After 3 months, a properly working system should show: which machines generate 80% of P1 tickets (Pareto), the MTTR trend (which should fall as technicians learn to document resolutions in the knowledge base), and on which lines P1 SLAs are most often breached (pointing to staffing shortages or missing spare parts).

8. Case study: a manufacturing plant. From SMS tickets to an ITSM system

Case study: illustrative scenario

Metalex Sp. z o.o., a steel components manufacturer, 210 employees

Metalex is a mid-sized manufacturing plant with three machining and assembly lines, 18 maintenance technicians, and one department manager. Before the implementation: tickets taken by phone or SMS, written in Excel at the end of the day by one employee. No prioritization: "whoever shouts loudest gets served".

Problems identified before the implementation: no way to report downtime to the machine insurer, the customer's production contract required documented breakdowns (AS9100D does not define an exact retention period; the customer sets requirements in the specification; typically 10-15 years for critical documentation), three experienced maintenance technicians were planning to leave, meaning knowledge loss risk without documentation.

-40%
MTTR reduction after 6 months
+28%
MTBF growth on line L1
4 wk.
Time to go-live on all lines

Solution: ManageEngine ServiceDesk Plus On-Premise in model A (one system for IT and maintenance). Installation on an existing Windows server, AD integration, 4 SLA levels for machines, QR codes at each machine linking to a report form on a tablet at the line. Maintenance technicians get push notifications on phones via the SDP mobile app.

Key result after 6 months: data analysis revealed that 68% of all P1 tickets concerned one machine: a 2011 machining center. Its MTBF was 11 days. Data from the system justified a new machine purchase as an ROI-positive investment: 12 months of downtime cost exceeded 60% of the new machining center's value.

9. FAQ: frequently asked questions

Do operators need to be computer-literate to report faults?
Q

No. The operator's report form can be configured as a minimal view with 2-3 fields: line, fault type (drop-down), free-text description. Access via QR code at the machine leads directly to a simplified form on a tablet or phone. For P1 faults, a phone report to the dispatcher who enters data into the system is enough. That is still better than no record.

How long does it really take to deploy the system, with training and production go-live?
Q

For a plant up to 300 employees with a clearly defined maintenance process: 4-6 weeks to operational go-live. A full implementation with ERP integration, parts inventory, and reporting automation is 8-12 weeks. The longest stage is usually agreeing the machine category scheme and SLAs - not technology, but organizational decisions.

Does the system work offline when the plant network is unavailable?
Q

ManageEngine SDP On-Premise, after license activation, does not require internet and runs fully on the plant's local network. The SDP mobile app buffers some actions (status change, notes) and syncs them when the connection to the local server is restored. In OT zones isolated from the IT network, we recommend a dedicated terminal (tablet) on a separate VLAN with access to the SDP server.

How do you integrate the fault ticketing system with an ERP (SAP, IFS, Comarch)?
Q

ManageEngine SDP exposes a REST API that supports two-way ERP integration. Most common scenarios: (1) spare parts order from an SDP work order - automatic PO in the ERP; (2) ticket closure in SDP - update of maintenance cost records in the ERP; (3) PPM schedule from the ERP - automatic SDP tickets. Integration requires development work (a few to a few dozen days, depending on ERP complexity).

What is MTBF and why is it more important than the raw number of failures?
Q

MTBF (Mean Time Between Failures) is the average machine operating time between consecutive failures. The raw failure count is misleading: a machine that fails 10 times for 5 minutes is less critical than one that fails once every 3 hours for 2 hours. MTBF combines frequency and duration, giving a real comparison. A rising MTBF signals that preventive maintenance is working or the machine has been overhauled. A falling MTBF signals aging and the need for intervention or investment.

Planning to deploy a fault ticketing system in your plant? See what a Rotech Group implementation looks like.

Ready to stop firefighting?

We will show you a working fault ticketing system configured for a manufacturing plant, with real SLAs, QR codes at machines, and an MTTR/MTBF dashboard.

Book a free demo More articles
All articles
Back to blog
Next article
Help desk for a manufacturer: how to pick and deploy an IT system