This article is a practical guide for maintenance managers, IT departments, and operations directors in manufacturing companies. We cover CMMS, IT help desk, SLA configuration, MTTR/MTBF metrics, and a realistic implementation model based on ManageEngine ServiceDesk Plus.
1. Why machine fault tickets in Excel or on the phone are a disaster
A maintenance technician takes a call from an operator, writes something in Excel or on a whiteboard, fixes the machine, and moves on to the next call. Breakdown history? Does not exist. Downtime? Nobody measures it. MTBF for a specific machine? The answer: "we have a feel for it".
That is the state of affairs in most plants of up to 500 employees. And that is the source of quantifiable losses that can be eliminated at relatively low system cost.
Excel, phone, sticky notes
- X No repair history: each breakdown starts from scratch
- X No prioritization: whoever shouts loudest gets served
- X Uncountable downtime: nobody knows the cost of one hour of stoppage
- X No MTBF/MTTR analysis: no data = no path to improvement
- X Knowledge locked in one technician's head: single point of failure risk
- X No compliance with ISO 55000 and asset management norms
- X No way to plan preventive maintenance on data
Fault ticketing system
- OK Full repair history per machine with dates and technicians
- OK Automatic prioritization by SLA: the system decides what is urgent
- OK Measurable response and repair time: MTTR to the minute
- OK MTBF data that lets you plan PPM (Planned Preventive Maintenance)
- OK Knowledge available to the whole maintenance team, not one person
- OK Audit trail for certifications and machine insurance
- OK Integration with spare parts inventory and schedules
The most common objection: "we do not have time to deploy a system because we have too many breakdowns". That is a thinking trap. A ticketing system surfaces recurring fault patterns and enables the shift from firefighting to prevention - and data-driven preventive maintenance reduces the number of unplanned stoppages. Without data there is no way to act preventively. You can only put out fires.
2. The difference between CMMS and IT help desk
Before choosing a tool, see how the two main system types differ. Many manufacturing companies try to use only one of them for the whole thing. The result is compromises that hurt both departments. More context in our article on ITSM for manufacturing: deploying a help desk in a factory.
CMMS: Computerized Maintenance Management System
CMMS is a tool designed for machines and physical infrastructure. Its center is the asset: a specific machine, production line, or piece of equipment. Typical CMMS features:
- Machine register with technical passports and repair history
- PPM (preventive maintenance) planning based on time or hour counters
- Spare parts and inventory management
- Work Orders assigned to maintenance technicians
- KPI analysis: MTBF, MTTR, OEE (Overall Equipment Effectiveness)
IT help desk (ITSM)
IT help desk, more broadly ITSM (IT Service Management), focuses on IT services and users. The center is the incident or service. Typical ITSM features:
- Incident, problem, and change management (ITIL)
- Service catalog with self-service portal access
- Knowledge base and resolutions for typical problems
- SLAs measured per service (server availability, application response time)
- Active Directory / LDAP and IT monitoring integration
Why manufacturers need both systems or an integration
In a modern manufacturing plant, the IT/OT (Operational Technology) line is blurry. A PLC controller fault is a technical incident (OT) but may require IT involvement (network, vendor remote support). A "the label printer on line 3 is not working" ticket can be both IT (driver) and maintenance (belt feed mechanics).
Two solutions: either one system with rich category configuration (hybrid), or two separate systems linked by integration. We discuss both models in detail in chapter 5.
3. Key system features for manufacturing
Not every help desk fits manufacturing. Before choosing software, check whether it has these specific features. Without them the system will not match shop floor reality:
Fault priorities tailored to manufacturing
Manufacturing prioritization has to reflect real impact on line continuity. A "low / medium / high" scale is not enough. You need at least four levels tied to specific machines and lines:
| Priority | Definition | Response time | Repair time |
|---|---|---|---|
| P1 CRITICAL | Production line stopped, production not possible | 15 minutes | 2 hours |
| P2 URGENT | Line performance degraded by more than 30%, risk of stoppage | 1 hour | 4 hours |
| P3 NORMAL | Non-blocking defect, fix within the current shift | 4 hours | Next shift |
| P4 PLANNED | Preventive maintenance, inspections, calibrations | Next Business Day | Per plan |
Fault ticket flow: from operator to maintenance technician
Fault ticket flow diagram
Other key features
- Production line view: real-time dashboard showing the state of each line and active tickets
- Machine history: full log of all breakdowns, repairs, inspections, and part changes for a specific asset
- Spare parts inventory integration: reserve and issue parts directly from the work order
- SMS and push notifications: critical P1 alerts must reach the technician even when on the shop floor without a computer
- Self-service portal for operators: a simple report form available from a tablet at the machine or via QR code
- Time-based escalation: automatic priority bump if a P1 ticket is not acknowledged within 10 minutes
4. How to configure SLAs for machine faults
SLAs on the shop floor are not a marketing statement. They are an internal binding commitment that the system enforces automatically. Proper SLA configuration requires answers to three questions: what do we measure (start event), from when do we count, and when do we stop the clock.
Start event and stop event
- Response SLA start: moment the ticket is registered in the system
- Response SLA stop: moment of the technician's first physical contact with the machine (confirmation in the system)
- Repair SLA start: moment the technician confirms ticket pickup
- Repair SLA stop: moment the ticket is closed with "machine operational and production resumed"
- SLA pause: waiting for a spare part from outside (must be documented in the system)
Examples of real SLA configuration
| Breakdown scenario | Priority | Response | Repair | Escalation to |
|---|---|---|---|---|
| CNC machining center stopped | P1 | 15 min | 2 h | Maintenance manager after 10 min |
| Welding robot fault | P1 | 15 min | 3 h | Production director after 30 min |
| Auxiliary compressor: pressure degradation | P2 | 1 h | 4 h | Maintenance manager after 2h |
| Conveyor belt: excessive vibration | P2 | 1 h | Next shift | Maintenance manager after 3h |
| Oil leak: spindle, not blocking production | P3 | 4 h | Next day | Maintenance manager after 1 day |
| Planned maintenance: filter replacement | P4 | NBD | Per plan | No automatic escalation |
Important rule: The P1 SLA must be expressed in minutes, not hours. If the production line generates 50,000 PLN of revenue per hour, every minute of stoppage is more than 830 PLN lost. A 2-hour repair target for P1 means a maximum acceptable loss of about 100,000 PLN from one incident. That should be a board decision, not a system default.
Downtime cost formula: calculate your cost before SLA configuration
Use the formula below to set the real cost of one hour of downtime for a specific line. This number should be the starting point for negotiating SLAs inside the organization.
5. IT help desk + CMMS integration: two models
When a manufacturer has both IT and maintenance departments, the question of system architecture comes up. The choice depends on scale, organizational maturity, and budget. Here are the two main models we deploy in practice:
ManageEngine ServiceDesk Plus or a similar ITSM system handles both IT tickets and machine faults via rich category, group, and SLA configuration.
- Category: Machines/Line 1/CNC/Spindle - automatically P1
- Category: Machines/Infrastructure/Compressors - P2
- Category: IT/Workstations - separate SLA for IT
- Separate dispatcher groups: IT department and maintenance department
- Shared report portal, different forms
ManageEngine SDP for IT + a dedicated CMMS for production, linked by a two-way REST API integration.
- SDP handles IT incidents, changes, and IT asset management
- CMMS handles maintenance work orders, PPM, machine history, parts inventory
- Integration: a machine fault with an IT component creates a ticket in both systems
- Data flows into a shared BI/reporting dashboard
- Option of separate certification tracks (ISO 20000 for IT, ISO 55000 for maintenance)
In both models you have to factor in OT networks. Operational Technology networks (PLC, SCADA, HMI) in plants are often isolated from IT networks for safety reasons. ManageEngine SDP runs on-premise without permanent internet access after license activation, which means it can be installed in a network segment reachable from both zones (IT and OT) through a properly configured firewall, with no need to push data outside the plant.
6. What a 4-week implementation looks like
Deploying a machine fault ticketing system does not need a multi-month project. With an organization that has clearly defined maintenance processes and resources ready to commit, 4 weeks is a realistic timeline to operational readiness on the first production line.
1
Infrastructure install and directory integration
- Install ManageEngine SDP on an on-premise server (Windows Server or VM)
- Active Directory integration: automatic user import
- Role configuration: Maintenance technician, Maintenance dispatcher, Operator (reporter), Maintenance manager
- Outbound email and SMS notification configuration (ManageEngine supports SMTP + an SMS gateway)
- Import the machine register (minimum: name, line, serial number, category)
2
Category, SLA, and automation configuration
- Build the fault category tree (Line - Machine - Subassembly - Fault type)
- Configure 4 SLA levels with response and repair times
- Automation rules: machine category - P1/P2/P3/P4 priority
- Escalation rules: no acknowledgement after X minutes - notify the manager
- Report forms for operators (simplified view, only necessary fields)
- Configure QR codes at machines: link to the report form
3
Training and pilot on one line
- Maintenance technician training (2-4h): handling tickets, status, closure with description
- Operator training (1h): how to report a fault via the portal or QR
- Manager training (2h): dashboards, SLA reports, escalations
- Pilot launch on one production line and feedback collection
- Form and automation corrections based on the first tickets
4
Go-live on all lines and stabilization
- Roll out to all production lines
- Shut down/freeze old methods (Excel, whiteboard): single source of truth
- Launch weekly reports for maintenance management
- Integration with the spare parts inventory (optional, if inventory is ready)
- Set a monthly MTTR/MTBF review cadence with management
7. Metrics you should measure after 3 months
Three months after launch you have the first data that lets you genuinely assess maintenance and set goals for the next quarter. Here are four key metrics and how to read them:
MTTR: Mean Time To Repair
Average time from reporting a fault to bringing the machine back online. Measure separately per line and per priority.
MTTR = Sum_of_repair_times / Number_of_faultsMTBF: Mean Time Between Failures
Average machine operating time between consecutive failures. A rising MTBF means the machine is becoming more reliable or prevention is working.
MTBF = Operating_time / Failures_in_periodTickets per line
Identifies lines needing special attention or modernization investment. Lets you justify maintenance budgets with data, not intuition.
Trend: current_month / previous_monthDowntime cost per incident
Product of MTTR and downtime hourly cost (see the formula in chapter 4). The board's metric, justifying maintenance investment.
Cost = MTTR_h x Downtime_hourly_costAfter 3 months, a properly working system should show: which machines generate 80% of P1 tickets (Pareto), the MTTR trend (which should fall as technicians learn to document resolutions in the knowledge base), and on which lines P1 SLAs are most often breached (pointing to staffing shortages or missing spare parts).
8. Case study: a manufacturing plant. From SMS tickets to an ITSM system
Metalex Sp. z o.o., a steel components manufacturer, 210 employees
Metalex is a mid-sized manufacturing plant with three machining and assembly lines, 18 maintenance technicians, and one department manager. Before the implementation: tickets taken by phone or SMS, written in Excel at the end of the day by one employee. No prioritization: "whoever shouts loudest gets served".
Problems identified before the implementation: no way to report downtime to the machine insurer, the customer's production contract required documented breakdowns (AS9100D does not define an exact retention period; the customer sets requirements in the specification; typically 10-15 years for critical documentation), three experienced maintenance technicians were planning to leave, meaning knowledge loss risk without documentation.
Solution: ManageEngine ServiceDesk Plus On-Premise in model A (one system for IT and maintenance). Installation on an existing Windows server, AD integration, 4 SLA levels for machines, QR codes at each machine linking to a report form on a tablet at the line. Maintenance technicians get push notifications on phones via the SDP mobile app.
Key result after 6 months: data analysis revealed that 68% of all P1 tickets concerned one machine: a 2011 machining center. Its MTBF was 11 days. Data from the system justified a new machine purchase as an ROI-positive investment: 12 months of downtime cost exceeded 60% of the new machining center's value.
9. FAQ: frequently asked questions
No. The operator's report form can be configured as a minimal view with 2-3 fields: line, fault type (drop-down), free-text description. Access via QR code at the machine leads directly to a simplified form on a tablet or phone. For P1 faults, a phone report to the dispatcher who enters data into the system is enough. That is still better than no record.
For a plant up to 300 employees with a clearly defined maintenance process: 4-6 weeks to operational go-live. A full implementation with ERP integration, parts inventory, and reporting automation is 8-12 weeks. The longest stage is usually agreeing the machine category scheme and SLAs - not technology, but organizational decisions.
ManageEngine SDP On-Premise, after license activation, does not require internet and runs fully on the plant's local network. The SDP mobile app buffers some actions (status change, notes) and syncs them when the connection to the local server is restored. In OT zones isolated from the IT network, we recommend a dedicated terminal (tablet) on a separate VLAN with access to the SDP server.
ManageEngine SDP exposes a REST API that supports two-way ERP integration. Most common scenarios: (1) spare parts order from an SDP work order - automatic PO in the ERP; (2) ticket closure in SDP - update of maintenance cost records in the ERP; (3) PPM schedule from the ERP - automatic SDP tickets. Integration requires development work (a few to a few dozen days, depending on ERP complexity).
MTBF (Mean Time Between Failures) is the average machine operating time between consecutive failures. The raw failure count is misleading: a machine that fails 10 times for 5 minutes is less critical than one that fails once every 3 hours for 2 hours. MTBF combines frequency and duration, giving a real comparison. A rising MTBF signals that preventive maintenance is working or the machine has been overhauled. A falling MTBF signals aging and the need for intervention or investment.
Planning to deploy a fault ticketing system in your plant? See what a Rotech Group implementation looks like.