Enhancing power equipment defect identification through multi-label classification methods | Scientific Reports
HomeHome > News > Enhancing power equipment defect identification through multi-label classification methods | Scientific Reports

Enhancing power equipment defect identification through multi-label classification methods | Scientific Reports

Oct 16, 2024

Scientific Reports volume 14, Article number: 21805 (2024) Cite this article

172 Accesses

Metrics details

Accurate identification and classification of equipment defects are essential for assessing the health of power equipment and making informed maintenance decisions. Traditional defect classification methods, which rely on subjective manual records and intricate defect descriptions, have proven to be inefficient. Existing approaches often evaluate equipment status solely based on defect grade. To enhance the precision of power equipment defect identification, we developed a multi-label classification dataset by compiling historical defect records. Furthermore, we assessed the performance of 11 established multi-label classification methods on this dataset, encompassing both traditional machine learning and deep learning methods. Experimental results reveal that methods considering label correlations exhibit significant performance advantages. By employing balanced loss functions, we effectively address the challenge of sample imbalance across various categories, thereby enhancing classification accuracy. Additionally, segmenting the power equipment defect classification task, which involves numerous labels, into label recall and ranking stages can substantially improve classification performance. The dataset we have created is available for further research by other scholars in the field.

With the widespread application of power equipment and the continuous development of the power grid, regular maintenance of equipment is crucial for enhancing overall system reliability. During this process, maintenance personnel collect a substantial amount of information related to power problems, including defect records and case reports. This unstructured data contains a wealth of information about power equipment defects, holding significant importance for defect identification and classification. Traditional defect classification methods often require manual intervention, relying heavily on the knowledge and experience of personnel, leading to a low utilization rate of defect records. Due to the complexity and variability of defects, the subjectivity of personnel can influence the outcome of defect handling, resulting in classification errors. Therefore, it is imperative to leverage advanced technologies such as natural language processing to extract and analyze information from historical defect records1. This contributes to promoting defect handling, improving processing efficiency, and enhancing classification accuracy.

According to the classification standards for primary equipment defects in power transmission and transformation, power equipment defect types encompass the defective component, component type, location, defect description, defect grade, and classification criteria. Our literature review reveals that no previous research has simultaneously addressed the detection of these comprehensive defect types, including defective component, component type, location, defect description, defect grade, and classification criteria. Many studies on defect classification2,3,4 have focused solely on identifying defect grades to broadly assess equipment health, neglecting critical information such as defective components and defect categories. Identifying defective components, component types, locations, defect descriptions, defect severity, and classification criteria is crucial for prioritizing the immediate handling of defects. In this study, we aim to classify defect texts by using defect records as our dataset source and defining the task as a multi-label text classification problem. Each defect text may be associated with multiple labels that cover various aspects, such as components, locations, defect descriptions, defect grades, and classification criteria. By thoroughly analyzing these labels, we strive to improve the fine-grained classification of power equipment defects, thereby providing more accurate and efficient support for defect management.

Defect classification in power equipment differs from general tasks such as news classification and sentiment analysis5; it belongs to a specialized domain with data that is challenging to obtain. Firstly, the need for authentic power equipment defect data in defect classification tasks requires prolonged observation and collection. This data necessitates careful analysis and annotation to ensure quality and accuracy. Secondly, the multitude of defect types in power equipment involves various equipment types, components, locations, and defect descriptions. Constructing a comprehensive and accurate defect classification model requires a diverse dataset that covers a wide range of situations and scenarios.

We have developed a multi-label defect classification dataset through extensive observation and collection of defect records from electrical equipment, accurately reflecting the real-world conditions of power equipment defects. These defects are varied, encompassing different components, locations, and descriptions. The dataset for this multi-label classification task is comprehensive, covering a wide range of scenarios and conditions. This enables the trained classification model to perform finer classifications and better capture the multiple features or attributes of the samples. Each sample in the dataset is associated with multiple relevant labels, allowing the model to effectively capture label correlations and complex semantic relationships, thereby improving classification accuracy.

For multi-label defect type detection, we employed traditional machine learning and deep learning methods with multiple classifiers. The problem transformation methods used include binary relevance6, label powerset7, and classifier chains8. These algorithms demonstrated quite impressive performance in multi-label classification tasks.

The objective of this paper is to enhance researchers’ capability to analyze defect types within defect texts by offering a detailed dataset for classifying power equipment defects, along with corresponding classifiers. The key contributions of this study are as follows:

A dataset for defect type detection, which includes the identification of defective components, categories, and grades in defect records. This dataset is made publicly available for researchers who are interested in furthering this line of research.

An evaluation of the performance of traditional machine learning and deep learning methods in the multi-label classification of defect records, resulting in the identification of the optimal classification strategy for this task.

Zhou et al.2 chose transformer defect record data as the focus of their research. The dataset comprised defect descriptions and defect grades. Despite the authenticity of the data, there was a lack of a comprehensive re-labeling process to ensure dataset reliability. On the other hand, Feng et al.3 examined circuit breaker defect records as their research subject. They utilized defect record forms that encompassed defect grades, along with detailed descriptions of the defect’s specific position and cause. The authors constructed a defect grade classification model based on a bi-directional long short-term memory network (BiLSTM) and incorporated an attention mechanism. Compared to the convolutional neural network (CNN) used by Zhou et al.2, this method effectively extracts features and performs classification from long texts containing complex information, demonstrating superior classification performance. The model based on this dataset only focuses on the defect phenomenon and grade in the power equipment defect record and fails to fully utilize the information provided by the defect occurrence to identify specific defect types. Tian et al.9 selected defect phenomena and fault categories from defect records as the focus of their study. They categorized the fault categories based on defect phenomena and created a dataset consisting of six fault categories: blockage, failure, misalignment, leakage, invalid, and other. Using this dataset, they built a multi-channel pyramid CNN model that extracts text features from different positions using various windows, thus enhancing global features. However, the model failed to identify specific defect positions and types, which prevented maintenance personnel from taking accurate and prompt actions.

Liu et al.10 utilized defect records from electrical equipment to construct a defect knowledge graph and employed graph search techniques to retrieve similar historical defect records. However, if the defect text to be classified lacks crucial defect location information or has unclear defect descriptions, it may lead to the inability to accurately pinpoint the defect subject or result in contradictory paths leading to classification errors. After pruning, partitioning, and reconstructing the dependency syntax trees of both the defect texts for main transformers and the defect classification standard texts for state grid power transmission and transformation primary equipment, Shao et al.11 employed a dependency tree matching algorithm. This algorithm was used to identify the standard text that is semantically most similar to the actual defect text. However, if the features of defect texts are not sufficiently distinct or if there are significant differences among samples, the adaptability of the restructuring method may decrease, potentially leading to matching errors. The aforementioned methods rely on the completeness and clarity of defect texts, lacking the capability to handle vague or incomplete descriptions.

This study introduces multi-label classification approach that not only evaluates the severity of transformer defect records but also detects defect components, component types, locations, defect descriptions, and classification criteria.

Multi-label text classification tasks present several challenges compared to single-label classification. As the number of categories increases, the number of label sets grows exponentially. In some cases, positive samples may be scarce for certain labels. To address this challenge, controlling the output space and leveraging the interrelationships or dependencies between labels becomes crucial. Imbalance can manifest within a class, where there is a significant disparity between the number of positive and negative samples. The label set itself can also be unbalanced, with certain label combinations having more samples than others. Moreover, the sample distribution across different classes can be unbalanced, with some classes having a significantly larger number of samples than others. This creates a label sparsity problem. Extreme multi-label classification refers to the task of assigning specific labels to text from an extensive set of labels. The label set can be extremely large, requiring substantial computational and storage resources to handle effectively.

Multi-label text classification has two main approaches: problem transformation and algorithm adaptation. Problem transformation involves converting the multi-label classification problem into other established scenarios. The basic binary relevance (BR) model6 treats each label independently and transforms the multi-label task into multiple binary classification sub-tasks. However, BR ignores label correlation. The classifier chains (CC) model8 includes previously predicted label outputs as features for subsequent classification, but label sequence positioning can impact performance, and using all previous outputs may introduce noise. The Label Powerset (LP) technique7 treats each label combination as a separate class, simplifying multi-label tasks. However, LP faces challenges due to the exponential increase in label combinations and class imbalance when instances are limited. Algorithm adaptation methods, on the other hand, extend single-label algorithms to handle multi-label data. For example, Zhang et al. proposed the ML-kNN algorithm12, which applies the k-nearest neighbors (kNN) algorithm to determine the label for a sample based on its K nearest neighbors. However, this method may be susceptible to challenges such as the curse of dimensionality and label noise, leading to suboptimal performance when handling large-scale and complex textual data.

Deep learning-based methods for multi-label text classification utilize neural network models. Liu et al. proposed a Coattention Network with Label Embedding (CNLE)13. This approach leverages label information allowing for the co-encoding of text and labels through mutual attention to extract highly differentiated text representations. Zhang et al.14 introduced a novel multi-task learning method called LAbel COrrelation (LACO) for multi-label text classification. This approach utilizes a document-label cross-attention mechanism to generate more discriminative document representations and captures label correlations through subtasks. For long-tail labels in the label space, the aforementioned method may exhibit subpar performance. Several loss functions can be employed to address class imbalance in multi-label classification. Distribution-balanced loss (DB)15 has been successfully applied in image recognition and natural language processing. Additionally, the authors16 designed a new loss function called Class-balanced focal loss with negative regularization (CB-NTR). In the context of Extreme Multi-label Text Classification (XMC), existing methods such as AttentionXML17 and X-Transformer18 utilize multiple models for training and prediction. However, these methods often involve static negative label sampling during the training of the label ranking model, which can lead to reduced efficiency and accuracy. To address these challenges, Jiang et al.19 proposed LightXML, an approach that incorporates end-to-end training and dynamic negative label sampling.

Leveraging the aforementioned traditional machine learning and deep learning methods, we conducted initial experiments on the detection of defect types within defect texts seeking the optimal classification strategy.

We designed several baseline models for comparison, categorizing them into traditional machine learning and deep learning methods. Firstly, we implemented various popular traditional machine learning algorithms, employing problem transformation methods including Binary Relevance with Support Vector Machines (BR-SVM)6, Classifier Chains with Support Vector Machines (CC-SVM)8, Label Powerset with Support Vector Machines (LabelPowerset-SVM)7, and Multi-Label k-Nearest Neighbors (ML-kNN)12.

Additionally, we selected a high-performance traditional deep learning model called CNLE13d. The CNLE model employs a collaborative encoding approach to handle text sequences and label sequences, generating target labels for text classification by utilizing cooperative representations of these two sequences. The Text Label Co-attention Encoder (TLCE) takes text sequences and label sequences as input, generating representations that capture their mutual attention. The Adaptive Label Decoder (ALD) utilizes these text and label representations to calculate the probability for each category. Text classification is treated as a generation problem, where a label is generated at each decoding time step.

We also compared several algorithms based on pre-trained language models. Firstly, we explored various loss functions suitable for multi-label classification. Due to the prevalent dominance of the head class and the presence of negative instances, the common Binary Cross-Entropy (BCE) is susceptible to label imbalance. To address the challenges of class imbalance in the long-tail dataset in the context of multi-label text classification, CB20 further reweighted the focal loss, capturing the diminishing marginal returns of data. This method effectively reduces redundancy related to the head class. The DB15 loss function alleviates redundancy in label co-occurrence by merging rebalancing weights and assigns lower weights to easy-to-classify negative instances through Negative Tolerant Regularization (NTR).

Additionally, we experimented with methods based on document-label joint embedding and correlation-aware multi-task learning14. The multi-label classification, PLCP, and CLCP subtask modules generate probability distributions for their specific target labels based on context-aware representations derived from joint embeddings. The multi-task framework demonstrates strong capabilities in predicting low-frequency labels and learning label correlations.

Finally, we explored the approach of LightXML19 in the context of extreme multi-label text classification using pre-trained models. The Transformer model encodes the original text information into high-dimensional representations, enriching information content and enhancing the model’s generalization ability. Utilizing a collaborative network for label recall and label ranking in end-to-end training effectively reduces training time and model size. In this study, we conducted preliminary experiments on multi-label device defect detection in defect records using these 11 multi-label classification methods.

In the context of multi-label classification problems, several commonly used evaluation metrics are employed: Micro-F121, Macro-F1, and Accuracy. Micro-F1 is calculated by taking the class-weighted harmonic average of precision and recall, with the weights determined by the frequency of each label. This metric provides a comprehensive assessment of the model’s performance across all classes, considering the varying frequencies of different labels. Macro-F1 is obtained by calculating the arithmetic mean of precision and recall, treating all labels equally. However, it is more sensitive to classes with a smaller sample size, as they can disproportionately influence the overall performance. Absolute accuracy refers to the evaluation based on the correctness of all labels predictions for each sample. It considers a prediction correct only if the predicted label matches the true label exactly. Any difference in the predicted and true label values results in an incorrect prediction. For Category i, the counts for true positive, false positive, and false negative can be denoted as \(T{{P}_{i}}\) , \(F{{P}_{i}}\) and \(F{{N}_{i}}\) respectively. The Micro-F1, Macro-F1 scores are calculated as follows:

The traditional machine learning approach utilizes support vector machines (SVM) as classifiers, relying on TF-IDF features. TF-IDF captures the importance of individual words in documents in the dataset. The machine learning algorithms are implemented using the scikit-learn library22 and the scikit-multilearn library23 specifically designed for multi-label classification tasks.

The CNLE method adopts a specific approach for processing the tags. It randomly selects the order of tags and connects them to form a tag sequence. For the word embedding of the text sequence, a frozen merge-sgns-bigram-char300-dimensional Chinese word vector is used. Out-Of-Vocabulary words and label embeddings are initialized based on a uniform distribution within the specified range \(\left[ -0.01,0.01 \right] \). The internal model dimension d is set to 256. The output dimension of the BiLSTM is set to d/2. The final output is obtained by concatenating the outputs of both directions. The co-attention layer has a hidden dimension of 150 and utilizes 3 attention headers. The decoder utilizes a greedy search decoding strategy. The model employs the Adam optimizer, with parameters \({{\beta }_{1}}\), \({{\beta }_{2}}\) and \(\varepsilon \) set to 0.9, 0.98 and \({{10}^{-9}}\), respectively. Additionally, gradient clipping is applied during training to rescale the gradient norm within the certain range \(\left[ 0.0,1.0 \right) \), preventing unstable optimization.

Pre-trained language models were fine-tuned using bert-base-chinese. Specifically, the LACO method is implemented in TensorFlow with a batch size of 8 and a maximum total input sequence length of 400. It also utilizes an additional layer window size of 10. The Adam optimizer is employed with a learning rate of 0.00005. In the PLCP task, the ratio \(\gamma \) of co-occurrence and non-co-occurrence of label datasets is set to 0.5. We used BCE as a baseline and compared it to DB and CB-NTR loss functions. In the BCE loss function, all instances and labels have the same weight. \(\beta \) is 0.9, k is 0.05, and \(\lambda \) is 2 for CB-NTR. \(\alpha \) is 0.1, \(\beta \) is 10, \(\mu \) is 0.9, k is 0.05, and \(\lambda \) is 2 for DB. The training data batch size is 32, and the AdamW optimizer with a weight decay of 0.01 is used. The learning rate is determined through a hyperparameter search. The experiment is implemented in PyTorch. For the LightXML experiment, the batch size for the dataset is 8, and the maximum input label length is 512. To avoid overfitting, a dropout rate of 0.5 is applied to the text representation and random weight averaging. The AdamW optimizer is used with a learning rate of 0.0001. The defect classification dataset has a limited number of labels, resulting in each label cluster having only one label. This allows the label recall component to provide the score for each label.

To facilitate the training and evaluation process, the dataset is divided into three sets: a training set, a validation set, and a test set. The training set consists of 794 samples, the validation set contains 250 samples, and the test set includes 250 samples. The training dataset is utilized to optimize the model. After each training epoch, the F1 score is computed on the validation dataset. This ensures that the model achieves optimal generalization performance, surpassing the scenario where the model is solely divided into training and testing sets.

In this study, our focus lies in the evaluation of the performance of multi-label classification methods in identifying device defects. Through experimentation with 11 diverse multi-label classification methods, we compile and present their performance outcomes in Table 1.

Through conducting experiments with machine learning approaches for various problem transformation methods, we observed that considering label correlations significantly enhances classification performance compared to situations where label correlations are not taken into account. Specifically, the CC and LP methods, which consider inter-label correlations, outperform the BR method in terms of Macro-F1 and absolute accuracy evaluation metrics. Macro-F1 scores exhibit improvements ranging from 1.5% to 5.6%, whereas absolute accuracy demonstrates enhancements ranging from 17.2% to 19.6%. These findings underscore the importance of considering label correlations, especially in the context of learning low-frequency labels. Additionally, in comparison to the SVM algorithm, ML-kNN exhibits suboptimal performance in handling high-dimensional datasets, and its classification performance regarding labels falls short of that of the SVM algorithm.

Based on the observed experimental results, the CNLE method exhibits a significant improvement in classification effectiveness compared to traditional machine learning approaches, with a 3.0% increase in Micro-F1 and a 3.2% enhancement in absolute accuracy. The CNLE method leverages a shared attention network to handle both text and label sequences, thereby extracting highly differentiated textual representations, contributing to more effective and accurate classification outcomes.

The LACO method employs a document-label cross-attention mechanism to generate more discriminative document representations, achieving a notable improvement in absolute accuracy compared to a baseline model based on BERT, with a 5.6% increase. Furthermore, the multitask framework demonstrates robust capabilities in predicting low-frequency labels and learning label correlations. Specifically, the LACO method enhances Macro-F1 scores by 2.5% compared to the baseline model. Additionally, by considering higher-order correlations in the CLCP task, the LACO method aids in capturing more complex relationships among labels, achieving superior performance in terms of the Macro-F1 metric, with a 0.7% improvement compared to the PLCP task.

Based on the classification results, the introduction of the CB-NTR loss function significantly enhances performance, with an 8% increase in Macro-F1 and a 4.8% improvement in absolute accuracy. CB-NTR reduces redundancy associated with the dominant class, demonstrating advantages in handling class imbalance and enhancing the classification performance of long-tail labels. The experimental results of CB-NTR and DB suggest that balancing the loss function is an effective strategy for addressing class imbalance and label co-occurrence challenges in multi-label text classification for defects. LightXML outperforms all the mentioned methods across three evaluation metrics. Compared to the BERT baseline model, LightXML shows a 1.1% improvement in Micro-F1, a 9.1% improvement in Macro-F1, and an 8% increase in absolute accuracy. LightXML enhances the model’s generalization ability through high-dimensional text representations. Further analysis reveals that the LightXML method achieves micro-precision and micro-recall rates of 92.4% and 93.1%, respectively. This indicates that the recall component can capture as many relevant labels as possible and the ranking stage accurately predicts the most relevant labels. For device defect multi-label classification tasks with numerous labels, the collaboration between label recall and label ranking components significantly enhances classification accuracy.

Accurate and efficient classification of defect grades is essential for the automated assessment of the operational status of power equipment. The defect grade provides crucial information regarding the severity and potential impact of a defect. By categorizing defects into different levels, such as general, serious, and critical, maintenance personnel can prioritize their actions and allocate resources effectively. General defects have a mild severity, with no significant impact on safe operation. Serious defects pose substantial threats to human life or equipment and can be temporarily operated but need prompt attention. Critical defects directly jeopardize safe operation and require immediate attention.

During the comprehensive evaluation of six multi-label classification methods based on BERT, this study placed particular emphasis on three defect severity categories, assessing performance through F1 scores and accuracy. The experimental results, as detailed in Table 2, showcase Micro-F1 scores reaching 89.8%, Macro-F1 scores reaching 88.4%, and accuracy reaching 93.1%. This indicates that multi-label classification methods can achieve satisfactory performance in defect grade recognition.

Notably, the LightXML method outperforms other approaches in terms of Micro-F1 and accuracy. This method not only excels in accurately identifying specific defect types but also achieves optimal performance in defect grade classification. For long-tail distribution labels, such as the tail label “critical,” both CB-NTR and DB methods demonstrated significant performance improvements. The DB method achieved an F1 score of 93.33% and also attained the best result on the Macro-F1 metric. This further validates the effectiveness of the balanced loss function in addressing class imbalance issues.

The automatic classification of defect texts in power equipment is a crucial step in maintenance decision-making. This study aims to enhance researchers’ ability to analyze defect types in defect texts by providing a fine-grained power equipment defect classification dataset and classifier. We constructed a dataset for identifying power equipment defects using annotation guidelines and standard annotation rules. This dataset comprises six classification dimensions: defective components, component types, locations, defect descriptions, defect grades, and classification criteria. Our dataset is made publicly available for researchers interested in studying defect identification in power equipment.

After completing the construction of the multi-label classification dataset for power equipment defects, we conducted a comprehensive evaluation of various multi-label classification methods in the task of defect identification. The experiments revealed that methods considering label correlations significantly improved the performance of the multi-label classification task for equipment defects, particularly excelling in learning low-frequency labels. This indicates that multi-label classification methods can effectively learn the correlations between different labels. For example, certain defect types may be more likely to occur in specific components or locations, thus enhancing classification accuracy. Secondly, the use of a document-label cross-attention mechanism proved beneficial in capturing key information in the document, generating highly differentiated document representations, and significantly improving defect identification accuracy. Furthermore, experimental results demonstrated that balancing the loss function helps address the imbalance in the number of samples across different categories, thereby enhancing performance in power equipment defect identification tasks. Lastly, in the context of multi-label classification tasks with numerous labels, the collaboration between label recall and label ranking components was found to significantly improve classification accuracy. The LightXML method exhibited the best performance in the power equipment defect identification task, particularly outperforming other methods in the defect grade classification task. In summary, through a comprehensive evaluation of various multi-label classification methods, our research has provided valuable insights into the field of power equipment defect identification, laying the foundation for future research and practical applications. Our research findings are expected to significantly enhance the maintenance efficiency of power equipment, reduce downtime, and improve equipment safety in practical applications.

However, the limited scale of the dataset may pose challenges to classification performance, especially for categories with insufficient samples. To address this limitation and further enhance classification performance and generalization, future work could employ multi-dimensional classification methods for defect identification in power equipment, encompassing various dimensions such as components, component types, locations, defect descriptions, defect grade, and classification criteria. Multi-dimensional classification methods can facilitate the balancing of the dataset, as classification is independently performed on each dimension, taking into account interactions between different dimensions. On the other hand, expanding the dataset’s scale and diversity is also a crucial direction for improving classification performance.

In this study, we collected defect records related to oil-immersed transformers from the Shandong Power Grid for the years 2021-2022 as the research subjects. Samples were selected based on having complete defect content and descriptions. The defect record encompass defect number, defect status, equipment, defect content, equipment type, component, component type, location, manufacturer, defect grade, defect description, maintenance suggestions, classification criteria, and other information. The defect content contains the most information which describes the specific location, phenomenon and cause of the defect. We established dataset annotation rules for classifying equipment defects according to the standards specified in the “Defects Classification Standard of Primary Transmission and Transformation Equipment”. The classification dimensions and their respective definitions are detailed in Table 3.

Defect content is extracted from defect records as samples, and components, component types, locations, defect grades, defect descriptions, and classification criteria are manually annotated to construct a multi-label classification dataset for power equipment defects. Defect records often contain incomplete defect information, necessitating the reannotation, review, and correction of the extracted data to ensure the accuracy and consistency of the dataset. In this study, to address the challenges of incomplete defect information and misclassification, we designed a standardized annotation process for annotating defect content. The specific annotation process is illustrated in Fig. 1. We collaborated with domain experts to develop annotator guidelines, providing task definitions and examples to assist annotators in understanding the annotation task. The experts involved in the annotation process possess extensive domain experience, particularly in power equipment defect diagnosis and classification. They had previously contributed to the development of standard defect types for transformers, providing valuable guidance for our annotation guidelines. During discussions with the experts, they identified ambiguities in certain definitions within the guidelines and suggested specific annotation examples to enhance clarity. For instance, when analyzing the defect text “Inspection found: main transformer on-load tap changer silicone discoloration exceeding two-thirds,” the experts pointed out that, although the text mentions “on-load tap changer,” the implied defective component is the “tap changer,” with the component type being “on-load switch.” Based on their feedback, we clarified how to handle such implicit information in the annotation guidelines and revised the relevant examples in the guidelines.

Three-stage annotation process for the power equipment defect classification dataset.

Firstly, through a scrupulous examination of existing data, we assigned appropriate components, component types, and locations to the defect content, thereby narrowing the potential scope of defect descriptions and ensuring a one-to-one correspondence between defect content and defect types. Additionally, based on the defective equipment and classification basis, we identified the defect descriptions and severity levels of the equipment. Finally, through the combination of manual annotation and expert consultation, we assured the accuracy and consistency of the annotations. In Table 4, we present four examples from the dataset we constructed, showcasing meticulous annotations for defect descriptions, severity, and equipment. This meticulous annotation approach provides a reliable foundation for subsequent applications of the dataset and ensures accurate and consistent results in power equipment defect classification research.

The manually constructed power equipment defect multi-label classification dataset, based on defect records, comprises 168 different labels derived from six classification dimensions within the classification standard. The metrics for data representation can be categorized as follows: basic features, label distribution data, label relationship indicators, and metrics related to label imbalance24. Table 5 provides statistics on the dataset, revealing an average of 40.92 instances per label and an average of 5.23 labels per instance, highlighting a significant label imbalance. To better understand the label distribution, we have categorized the labels into head labels, with 30 or more instances, and tail labels, with fewer than 30 instances. On the other hand, Fig. 2 illustrates the distribution of the dataset, showing a long-tail pattern where only a few labels have a significant number of samples. This label distribution characteristic has a significant impact on model training and evaluation; therefore, label imbalance must be carefully considered in the study.

The long-tailed distribution of the dataset.

All the data used in this study is available from the corresponding author upon request.

Pu, T., Qiao, J., Han, X., Zhang, G. & Wang, X. Research and application of artificial intelligence in operation and maintenance for power equipment. High Volt. Eng. 46, 369–383. https://doi.org/10.13336/j.1003-6520.hve.20200131001 (2020).

Article CAS Google Scholar

Zhou, J., Luo, G., Hu, C. & Chen, Y. A classification model of power equipment defect texts based on convolutional neural network. In Sun, X., Pan, Z. & Bertino, E. (eds.) Artificial Intelligence and Security - 5th International Conference, ICAIS 2019, New York, NY, USA, July 26-28, 2019, Proceedings, Part I, vol. 11632 of Lecture Notes in Computer Science, 475–487, https://doi.org/10.1007/978-3-030-24274-9_43 (Springer, 2019).

Feng, B. et al. Power equipment defect record text mining based on bilstm-attention neural network. Proc. CSEE 40, 1–10. https://doi.org/10.13334/j.0258-8013.pcsee.200530 (2020).

Article Google Scholar

Cheng, H., Gao, I., Yu, H. & Li, P. A determination method of defect grades in electrical equipment based on combination neural network optimized by attention mechanism. Electr. Meas. Instrum. 61, 83–90. https://doi.org/10.19753/j.issn1001-1390.2024.01.013 (2024).

Article Google Scholar

Rei, L. & Mladenić, D. Detecting fine-grained emotions in literature. Appl. Sci.[SPACE]https://doi.org/10.3390/app13137502 (2023).

Article Google Scholar

Boutell, M. R., Luo, J., Shen, X. & Brown, C. M. Learning multi-label scene classification. Pattern Recogn. 37, 1757–1771. https://doi.org/10.1016/j.patcog.2004.03.009 (2004).

Article Google Scholar

Tsoumakas, G. & Vlahavas, I. Random k-labelsets: An ensemble method for multilabel classification. In Kok, J. N. et al. (eds.) Machine Learning: ECML 2007, 406–417, https://doi.org/10.1007/978-3-540-74958-5_38 (Springer Berlin Heidelberg, Berlin, Heidelberg, 2007).

Read, J., Pfahringer, B., Holmes, G. & Frank, E. Classifier chains for multi-label classification. Mach. Learn. 85, 333–359. https://doi.org/10.1007/s10994-011-5256-5 (2011).

Article MathSciNet Google Scholar

Tian, X., Li, C. & Zhao, B. A novel classification model sa-mpcnn for power equipment defect text. ACM Trans. Asian Low-Resour. Lang. Inf. Process.[SPACE]https://doi.org/10.1145/3464380 (2021).

Article Google Scholar

Liu, Z. & Wang, H. Retrieval method for defect records of power equipment based on knowledge graph technology. Autom. Electr. Power Syst. 42, 158–164 (2018).

Google Scholar

Shao, G. et al. Precise information identification method of power equipment defect text based on dependency parsing. Autom. Electr. Power Syst. 44, 178–185 (2020).

Google Scholar

Zhang, M.-L. & Zhou, Z.-H. Ml-knn: A lazy learning approach to multi-label learning. Pattern Recogn. 40, 2038–2048. https://doi.org/10.1016/j.patcog.2006.12.019 (2007).

Article ADS Google Scholar

Liu, M., Liu, L., Cao, J. & Du, Q. Co-attention network with label embedding for text classification. Neurocomputing 471, 61–69. https://doi.org/10.1016/j.neucom.2021.10.099 (2022).

Article CAS Google Scholar

Zhang, X., Zhang, Q., Yan, Z., Liu, R. & Cao, Y. Enhancing label correlation feedback in multi-label text classification via multi-task learning. In Zong, C., Xia, F., Li, W. & Navigli, R. (eds.) Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, Online Event, August 1-6, 2021, vol. ACL/IJCNLP 2021 of Findings of ACL, 1190–1200, https://doi.org/10.18653/v1/2021.findings-acl.101 (Association for Computational Linguistics, 2021).

Wu, T., Huang, Q., Liu, Z., Wang, Y. & Lin, D. Distribution-balanced loss for multi-label classification in long-tailed datasets. In Vedaldi, A., Bischof, H., Brox, T. & Frahm, J.-M. (eds.) Computer Vision – ECCV 2020, 162–178, https://doi.org/10.1007/978-3-030-58548-8_10 (Springer International Publishing, Cham, 2020).

Huang, Y., Giledereli, B., Köksal, A., Özgür, A. & Ozkirimli, E. Balancing methods for multi-label text classification with long-tailed class distribution. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 8153–8161, https://doi.org/10.18653/v1/2021.emnlp-main.643 (Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 2021).

You, R. et al. Attentionxml: Label tree-based attention-aware deep model for high-performance extreme multi-label text classification. In Wallach, H. M. et al. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, 5812–5822 (2019).

Chang, W.-C., Yu, H.-F., Zhong, K., Yang, Y. & Dhillon, I. S. Taming pretrained transformers for extreme multi-label text classification. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’20, 3163-3171, https://doi.org/10.1145/3394486.3403368 (Association for Computing Machinery, New York, NY, USA, 2020).

Jiang, T. et al. Lightxml: Transformer with dynamic negative sampling for high-performance extreme multi-label text classification. Proc. AAAI Conf. Artif. Intell. 35, 7987–7994. https://doi.org/10.1609/aaai.v35i9.16974 (2021).

Article Google Scholar

Cui, Y., Jia, M., Lin, T.-Y., Song, Y. & Belongie, S. Class-balanced loss based on effective number of samples. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 9260–9269, https://doi.org/10.1109/CVPR.2019.00949 (2019).

Manning, C. D., Raghavan, P. & Schütze, H. Introduction to information retrieval (Cambridge University Press, 2008).

Book Google Scholar

Pedregosa, F. et al. Scikit-learn: Machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

MathSciNet Google Scholar

Szymanski, P. & Kajdanowicz, T. A scikit-based python environment for performing multi-label classification (CoRR, 2017) arXiv:1702.01460.

Google Scholar

Herrera, F., Charte, F., Rivera, A. J. & del Jesus, M. J. Multilabel Classification, 17–31 (Springer International Publishing, 2016).

Google Scholar

Download references

State Grid Shandong Electric Power Research Institute, Jinan, 250002, Shandong, China

Wenjie Zheng, Yi Yang & Fengda Zhang

Beijing University of Posts and Telecommunications, Beijing, 100876, China

Wenxiu Lv

State Grid Shandong Electric Power Company, Jinan, 250001, Shandong, China

Yong Li & Sun Li

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

Methodology, Z.W.; software, Z.F. and L.W.; validation, L.W.; formal analysis, Z.W. and Z.F.; investigation, L.Y. and L.S.; resources, Y.Y.; data curation, Y.Y.; writing—original draft preparation, Z.F. and L.W.; writing—review and editing, Z.W.; visualization, L.W.; supervision, Y.Y.; project administration, Z.W.; funding acquisition, Y.Y. All authors have read and agreed to the published version of the manuscript.

Correspondence to Yi Yang.

The authors declare no competing interests.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

Zheng, W., Yang, Y., Zhang, F. et al. Enhancing power equipment defect identification through multi-label classification methods. Sci Rep 14, 21805 (2024). https://doi.org/10.1038/s41598-024-71996-x

Download citation

Received: 13 July 2024

Accepted: 02 September 2024

Published: 18 September 2024

DOI: https://doi.org/10.1038/s41598-024-71996-x

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative