Data Driven Degradation Detection in Predictive Maintenance

Background
-
Predictive maintenance and anomaly detection are in the forefront of the application of AI in the industrial context.
-
Predictive maintenance is a domain where data is collected over time to monitor the state of an asset with the goal of finding patterns to predict failures.
-
Machine learning (especially deep learning) is able to predict the Remaining Useful Life (RUL) of different engine components.
-
The purpose is to detect how much time is left before the next fault in the machinery in order that maintenance can be planned in advance.

Problem Statement
-
On the factory floor the company utilizes a fleet of identical machines in production around the clock with different life durations.
-
The engines operate normally with different degrees of initial wear and manufacturing variation at the start of each working cycle.
-
Occasionally and unexpectedly develop a degradation at some point during the cycle, and the fault grows in magnitude until the engine fails.
-
Such unforeseeable shutdowns causes costs in terms of time and expenses, while scheduled maintenance is only partly able to prevent untypical standoff.

Analytical Goal
Predict the number of remaining operational cycles before failure in the machinery log data that the engine will continue to operate.
-
1
While the original data records the log information in operational cycle number, the regression task is turned into a classification with three level of event priority.
-
2
Define thresholds when the engine is considered to transit into another (more critical) level of operational stability.
-
3
Question to answer: given these engine operation and failure events history, can we predict when an in-service engine will step in a critical operational phase?
-
4
Our solution applies deep learning (recurrent neural network) to the task.

Data Description
Data Set
Fleet of engines of the same type (100 in total)
Descriptors
3 operational settings
21 sensor measurements in the form of multivariate time series (observations in terms of time for working life)
Temporal dimension
RUL in cycles
Target
Three-class classification
Classification
2: Predicted failure is more than 30 cycles away (low-priority event)
1: Predicted failure is between 15 and 30 cycles
0: Predicted failure is sooner than 15 cycles (high priority event)

Solution
Data Preparation
50-step-long time sequence data with data cleaning and normalization
Modeling
Long-Short Term Memory Recurrent Neural Network (TensorFlow)

Performance
Accuracy (overall: 94%; high-priority category: 98%)
F1-Score (0.87)
Precision (0.92)
Recall (0.83)
Deployment
Containerized microservice (REST API)