Analysis of Transformer Health Index Using Statistical and Machine Learning Techniques
DOI:
https://doi.org/10.24996/ijs.2026.67.1.42Keywords:
Dissolved Gas Analysis, Exploratory Data Analysis, Support Vector Machine, Random Forest, XGBoost, k-Nearest NeighboursAbstract
Data Science and Machine Learning have been playing a major role in assessing, predicting, and maintaining the health of power transformers using data analysis. This paper focuses on leveraging data science techniques to analyze and interpret Dissolved Gas Analysis (DGA) datasets associated with power transformers to predict Health Index (HI). The Exploratory Data Analysis (EDA) involving the correlation matrix and heat maps showed the correlation among all the features and indicated that the dataset considered is not balanced hence, the data balancing technique of oversampling is employed to balance the data. Principal Component Analysis (PCA) is used to estimate the principal components of the data, helping in selecting the features which are most useful in the prediction. Classifiers, namely Support Vector Machine (SVM), Random Forest (RF), XGBoost, and k Nearest Neighbors (KNN) are employed on both the balanced data as well as the imbalanced data and the results are compared. RF classifier outperformed all the other classifiers with an accuracy of 96.9%.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Iraqi Journal of Science

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.



