Today I will start to build very simple prediction Model. As promised, I will present firstly model which is trained on the data from existing sensors on measuring machines, without vibration sensors, which are described in last blog.
Gathering data for predictive maintenance with machine learning is very long and difficult process. Maybe that is one of the reason why it is not so popular yet. But when the right data is gathered Model building and learning is not very difficult. Here is very good explained which data is needed and how to train model: https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/cortana-analytics-playbook-predictive-maintenance
As Prof. Andrew Ng recommend, when you start building Model in machine learning, do not try to make very good model from beginning. It is better to make a simple working model not spending much time on it, 1 day maximum. After first simple model woks, maybe not perfectly but works, try to understand what is needed to make it better. Maybe more data, another data, model complexity, longer training or maybe absolutely different model is needed.
Firstly let´s have a look on code:
In preprocessing step not relevant columns are dropped and rest columns are renamed. Training_data1, training_data2 and training_data3 are sensor data when machine runs in normal operation mode (labeled with 0).
Training_data4 and training_data5 data with changed parameters which influence machine behavior when running (labeled with1). This data simulates machine behavior, when it should be checks, while something is going wrong.
Here is how looks data before preprocessing:
training_data1.head()
lag X[mm] speed X[mm/s] accel X[mm/ss] DAC X[V] time [s]
0 0.001429 0.000 0.00 -0.001526 0.000
1 0.001229 0.005 0.13 -0.003357 0.005
2 0.001229 0.005 0.25 -0.002747 0.010
3 0.001229 0.005 0.13 -0.004883 0.015
4 0.001429 0.000 0.00 0.044557 0.020
and after preprocessing:
X.head()
lag speed accel DAC
0 0.471573 0.499784 0.490041 0.499819
1 0.471165 0.499793 0.490115 0.499704
2 0.471165 0.499793 0.490183 0.499742
3 0.471165 0.499793 0.490115 0.499609
4 0.471573 0.499784 0.490041 0.502701
There are 4 columns of data: lag distance, machine speed, machine acceleration and motor voltage.
For training I used RandomForestClassifier in Scikit-Learn. Parameters n_estimators=100 and max_depth=10 showed the best result with this small data. After model evaluation I got following result:
print(random_forest.score(X_train, y_train))
print(random_forest.score(X_test, y_test))
0.9196666666666666
0.858
Not very bad for such a simple model. But the problem here is that model is trained when only 2 machine parameters are changed. Machine has hundreds of parameters which can influence machine behavior. That is why it is not the best way to make predictive maintenance with this model.
One of the solution could be to collect more data which represents different machine possible behavior, but it is very time consuming. Maybe I will come back to this in a future.
What I want to do next is use different approach. A couple of weeks ago I came across one very interesting paper from www.arxiv.org : Tie Luo and Sai G. Nagarajan, “Distributed Anomaly Detection using Autoencoder Neural Networks in WSN for IoT”. I like the Idea which is presented there and will try to use for my purposes. In next blog I will describe the paper and the Ideas how it could be used.