An efficient approach on risk factor prediction related to cardiovascular disease around Kumbakonam, Tamil Nadu, India, using unsupervised machine learning techniques.

Journal: Scientific reports
PMID:

Abstract

Nowadays, human beings suffer from varieties of diseases due to the environmental circumstances and their residing habits. Cardiovascular diseases (CVD) are the leading cause of mortality among all diseases. CVDs are heart-related diseases. In early days, the lack of technological advancements resulted in the loss of many human lives. That is, delay in diagnosis resulted in delay in treatments, which obviously becomes the reason for loss of human lives. Hence, the prediction of diseases in advance becomes an inevitability that subsequently supports in providing the necessary treatments. Thus, the present paper deals with the risk factor prediction based on unsupervised learning methods and also identifying the predominant parameters that are vital to risk factors by using principal component analysis. In this article, we have collected the patient data of size 130 × 12 from four different laboratories in and around Kumbakonam, Tamil Nadu, and India. Here, various clustering techniques like k-means clustering, partition around medoids (PAM) clustering, hierarchical clustering, and fuzzy clustering have been applied to the patient data, and the results show that data can be taken in clusters of "patients with risk" and "patients with no risk". The optimal number of clusters is determined using elbow and silhouette methods. The efficiency of the clustering is evaluated using the Hopkins statistic, Dunn's index, and average Silhouette widths. The agglomerative coefficients computed indicate that there is a strong cluster structure in the dataset. The stability of the clusters is tested using bootstrapping cluster analysis, and the result showed that the clusters are highly stable. We have applied feature selection using principal component analysis. Also, on applying PCA, out of 12 parameters, it is inferred that Total Cholesterol is the highly correlated factor which plays an important role in the identification of risk factors among patients.

Authors

  • K Kannan
    SASTRA Deemed to be University, Kumbakonam, Tamil Nadu, India.
  • A Menaga
    SASTRA Deemed to be University, Kumbakonam, Tamil Nadu, India. anbumenakaa@gmail.com.