Order from us for quality, customized work in due time of your choice.
Abstract
Data mining is a practice that is performed on large databases for extracting hidden patterns by using combinational approach from statistical analysis, machine learning, and database technology. Further, the medical data mining is an extremely essential research field due to its importance in the development of various applications in flourishing healthcare domain. Diseases and injuries of bones are the major causes of abnormalities of the human skeletal system. The identification of the possibility of bone disease in a person is complicated task for medical practitioners because it requires years of experience and intense medical tests to be conducted. In this work, four data mining classification algorithms namely Decision Trees, Support Machine Vector, Logistic Regression, and K-Nearest Neighbor are used to develop a prediction system in order to analyze and predict the possible type of bone diseases. The main objective of this significant research work is to identify the best algorithm suitable for providing maximum efficiency. Thus, prevention of permanent damage at an earlier stage is possible. The experimental setup has been made for the evaluation of the performance of algorithms with the help of the wealth of health records shared by the patients suffering from various bone diseases. It is found that Logistic Regression algorithm performs the best with 80% precision when compared to other algorithms for bone disease prediction.
Keywords Algorithm, Bone disease, Data mining, Dataset, Logistic Regression.
I. Introduction
Bone Diseases occur due to abnormalities of one or more factors like the metabolic disorder, genetic disorder, hormonal imbalance, loss of bone mineral density, an endocrine disorder, and nutrition deficiencies. These factors present to the development of bone disease either in the early or later stages of ones life rely on gender, age, medical condition, family history, and lifestyle.
Various kinds of disorders are:
- Osteoporosis
- Pagets Disease
- Osteitis fibrosa
- Rickets
- Renal Osteodystrophy
- Osteogenesis imperfecta
Data mining techniques has been utilized in healthcare domain. Medicinal data mining can utilize the uncovered patterns present in huge medical data which otherwise is left undiscovered. Data mining techniques which are useful to medical data consist of association rule mining for finding frequent patterns, prediction, classification, and clustering. Data mining techniques are very useful in predicting bone diseases, heart diseases, breast cancer, lung cancer, diabetes, etc.
II. Bone disease
Bone is the base of our Skeletal System. Bone disease is any of the diseases or injuries that affect human bones. Diseases and injuries of bones are major causes of abnormalities of the human skeletal system.
The types of bone diseases are shown in Table 1
- Type of Bone Disease
- Examples
- Traumatic
- Fracture or any bone injury
- Inflammatory
- Septic and Rheumatoid Arthritis, Synovitis
- Infective
- Trauma, Polio
- Degenerative
- Degenerative Arthritis
- Hormonal/Metabolic
- Rickets or Osteomalacia
- Bone Tumor
- Osteochondromas, Chondrosarcoma
- Congenital
- Osteogenesis imperfecta, Clubfoot
Table 1: Types of Bone Diseases
This research focuses on predicting two common types of bone diseases:
- a) Degenerative bone disease
- b) Traumatic bone disease
- a) Degenerative bone disease: It is a condition in which the protective cartilage that cushions the top of bones degenerates or wears down with age. Degenerative bone disease is shown in Fig.1.
Fig.1: Degenerative bone disease
- b) Traumatic bone disease: It is a condition in which the bone is damaged due to some trauma or accident. Traumatic bone disease is shown in Fig. 2.
Fig. 2: Traumatic bone disease
III. Literature survey
A very few works have been found related to Bone Disease Prediction Using Data Mining Techniques. The dataset, the algorithms, the methodology used by the authors, and the observed results along with the future work is carried out in finding out efficient models of medical diagnosis for various bone diseases.
According to A. Keerthana[3] there are many models to predict and prevent various bone diseases. In the respective research, the author has used oomph models as the openings to the proposed method, then single-layer and multi-layer learning approaches are introduced to construct different disease memories. They have proposed their model focusing on the prediction and educational Risk Factor selection for bone diseases. The author analyzes the performance of the algorithms through evaluation criteria such as sensitivity to skewed class, sensitivity to noisy data, and parameter selection.
Saeko Fujiwara[4], mainly focuses on low bone mineral density, which is the important predictor of future fractures. The authors examined the association of Bone Mineral Density (BMD) with the risk of fracture of the spine or hip among a cohort of 2356 men and women aged 4795 years, Follow-up averaged 4 years after baseline measurements of BMD that were taken with the use of Dual-energy X-ray Absorptiometry (DXA). The vertebral fracture was assessed using semi-quantitative methods, and the diagnosis of hip fracture was based on medical records. Poisson and Cox regression analysis were the models used.
Paul D. Miller M.D.[5], considers Low Bone Mineral Density (BMD) as a risk factor for fracture. The author studied the relationship between Bone Mineral Density measurements at peripheral sites and subsequent fracture risk at the hip, wrist/forearm, spine, and rib in 149524 postmenopausal women. The test considered T-scores that are the measure of Bone Mineral Density for the prediction. The authors considered BMD measurement, Questionnaires, and Data analysis for the prediction purpose. The authors analyzed the performance of the algorithms through evaluation criteria such as sensitivity and specificity Paul Gerdhem[6], the author studied that different markers of bone turnover predict the fracture in 1040 elderly women. The different markers considered were Serum bone-specific alkaline phosphatase and four different forms of serum osteocalcin (S-OC), and others as markers of bone resorption. They considered Sampling procedures, Bone markers formation, Bone markers resorption, Bone markers urine osteocalcin, and other measurements for the prediction.
Hui Li[7], the author trained an independent model based on a specific group of patients. They considered Comprehensive Disease Memory (CDM), which captures the characteristics for all patients to predict the disease. Bone disease memory (BDM) memorizes the characteristics of those individuals who suffer from bone diseases. Similarly, the Non-Disease Memory (NDM) memorizes attributes for non-diseased individuals. They have used Shallow Restricted Boltzmann Machine and 2-Layer Deep Belief Network for the prediction purpose of M Saranya[8], the author considers that risk factor analysis is the process of finding bone diseases in various stages. The author, in the proposed methodology, analyzed the risk factors in 2 levels. In first level, disease prediction is done with relevancies present in different risk factors and the next level is Deep Belief Network (DBN) Algorithm is applied on 2 specific forecast tasks, they are osteoporosis and bone loss rate.
IV. Dataset
The Dataset is collected from Sun Orthopedic Hospital, Mathikere. It has 29 rows and 2 classes. The dataset description is given in Table 2.
- Attribute
- Value
- Description
- Name
- String
- Name of the patient
- Age
- Integer
- Age of patient entered in years
- Gender
- Boolean
- Yes, if Male. No, if Female.
- Body
- Boolean
- The region affected due to bone disease is body. (0=No; 1=Yes)
- Thigh and Knee
- Boolean
- The region affected due to bone disease is thigh & knee. (0=No; 1=Yes)
- Ankle and Foot
- Boolean
- The region affected due to bone disease is ankle & foot. (0=No; 1=Yes)
- Lumbous Hip and Hip
- Boolean
- The region affected due to bone disease is lumbous hip & hip. (0=No; 1=Yes)
- Shoulder and Shoulder Joint
- Boolean
- The region affected due to bone disease is shoulder & shoulder joint. (0=No; 1=Yes)
- Wrist and Thumb
- Boolean
- The region affected due to bone disease is wrist & thumb. (0=No; 1=Yes)
- Hand
- Boolean
- The region affected due to bone disease is hand. (0=No; 1=Yes)
- Leg
- Boolean
- The region affected due to bone disease is leg. (0=No; 1=Yes)
- Multiple Joint
- Boolean
- The region affected due to bone disease is multiple joint. (0=No; 1=Yes)
- Lower Back
- Boolean
- The region affected due to bone disease is lower back. (0=No; 1=Yes)
- Around Neck
- Boolean
- The region affected due to bone disease is around neck. (0=No; 1=Yes)
- Spine
- Boolean
- The region affected due to bone disease is spine. (0=No; 1=Yes)
- Elbow
- Boolean
- The region affected due to bone disease is elbow. (0=No; 1=Yes)
- Pain
- Boolean
- The symptom that indicates the bone disease is pain in affected region. (0=No; 1=Yes)
- Buckling
- Boolean
- The symptom that indicates the bone disease is buckling. (0=No; 1=Yes)
- Weakness in Muscle
- Boolean
- The symptom that indicates the bone disease is weakness in muscle. (0=No; 1=Yes)
- Swelling
- Boolean
- The sign that indicates the bone disease is swelling. (0=No; 1=Yes)
- Redness and Sweating
- Boolean
- The sign that indicates the bone disease is redness & Sweating. (0=No; 1=Yes)
- Itching
- Boolean
- The sign that indicates the bone disease is itching. (0=No; 1=Yes)
- Ankle Deformality
- Boolean
- The sign that indicates the bone disease is ankle deformality. (0=No; 1=Yes)
- Feverish due to Pain
- Boolean
- The sign that indicates the bone disease is fever due to pain. (0=No; 1=Yes)
- Tenderness
- Boolean
- The sign that indicates the bone disease is tenderness. (0=No; 1=Yes)
- Water Content in Joint
- Boolean
- The sign that indicates the bone disease is water content in joint. (0=No; 1=Yes)
- Bone Dislocation
- Boolean
- The physical incapacity to do chores due to bone disease is bone dislocation. (0=No; 1=Yes)
- Difficulty in Doing Daily Activities
- Boolean
- The functional disability due to bone disease is difficulty in doing daily activities. (0=No; 1=Yes)
- Difficulty in Movement
- Boolean
- The functional disability due to bone disease is difficulty in movement. (0=No; 1=Yes)
- Class
- Integer
- Indicates the type of bone disease.(1=Traumatic bone disease; 2=Degenerative bone disease)
Table 2: Bone disease dataset
Since the attribute Name does not contribute for the purpose of prediction, we have not used it for prediction.
V. Methodology
The algorithms which are used for the prediction of bone diseases are Decision Trees(DT), Logistic Regression(LR), Support Vector Machine(SVM), and K-Nearest Neighbor(KNN). The description of these algorithms is given in the following section.
- 1. Decision Trees (DT):
The decision Trees algorithm can be used for solving regression and classification problems. Decision Tree creates a training model which will be used to predict class or value of target variables by learning decision rules inferred from the training data.
- Step 1. Place the best attribute of the dataset at the root of the tree.
- Step 2. Split the training set into subsets. Subsets should be made in such a way that each subset contains data with the same value for an attribute.
- Step 3. Repeat step 1 and step 2 on each subset until you find leaf nodes in all the branches of the tree.
Fig. 3: Pseudocode of Decision Tree
- 2. Logistic Regression (LR):
Logistic regression is a classification and predictive algorithm. LR is used to outline data and to describe the connection between one dependent binary variable. There are one or more independent variables that govern the result. The binary logistic model is used to determine the probability of a binary response based on one or more predictors. It is used to predict a binary outcome such as, 0 or 1 which may represent Yes or No, True or false in a given set of independent variables.[11]
Logistic Regression Equation is shown in Equation (1) and its respective Sigmoid curve is shown in Fig. 4.(1)
- where S(x) represents Sigmoid function,
- x represents real number.
Fig. 4: Logistic Regression Sigmoid Curve
The basic equation for the generalized linear model is shown in Equation (2):
- g(E(y)) = ± + ²x1 + ³x2 (2)
In the equation,
- g(): link function
- E(y): the expected value of the target variable
- ±, ², and ³: merits which are to be predicted
- 3. Support Vector Machine (SVM):
Support Vector Machine is utilized for classification and regression analysis. In SVM algorithm, it will design each data item set as a point in n-dimensional space. In this space, n is used for number of features in training dataset and with the value of each feature being the value of a specific coordinate. Then, we achieve classification by finding and constructing the hyper-plane on dataset that divides the dataset into two classes.[10]
- candidates = { closest pair from opposite classes }
while there are violating points do
Find a violator
- candidateSV = candidateSV S violator
- if any ±p < 0 due to addition of c to S then
- candidateSV = candidateSV/ p
- repeat till all such points are pruned
- end if
- end while
Fig. 5: Pseudocode of Support Vector Machine
- 4. K-Nearest Neighbor (KNN):
The KNN algorithm is a non-parametric method utilized for classification and regression. The input contains the k closest training instances in the feature space. The output depends on whether k-NN is used for classification or regression.[9] K nearest neighbors are measured by a distance function, distance function considered is Euclidean distance.
Input: Let K be the number of nearest neighbors and D be the set of training examples.
- for each test example, z=(x’,y’) do
- compute d(x’,x), the distance between z and every example, (x, y) D
- select Dz in D, the set of k closest training examples to z.
- y’=argmax £ (xi,yi) Dz I(v=yi)
- end for
Fig. 6: Pseudocode of KNN
VI. Results
The results of applying the Decision Trees (DT), Logistic Regression(LR), Support Vector Machine(SVM) and K-Nearest Neighbor(KNN) algorithms is shown in the Table 3.
- Model
- Accuracy
- Error rate
- DT
- 0.68
- 0.32
- LR
- 0.74
- 0.25
- SVM
- 0.65
- 0.34
- KNN
- 0.72
- 0.27
Table 3: Results
The graph in Fig. 7 shows the comparative performance of the models. The logistic regression (LR) model gives better accuracy than other models.
Fig. 7: Comparative Performance Analysis of models
VII. Conclusion
The health care industry is facing challenges now, and recent development in advanced technologies has broad opportunities for confronting such challenges. In this research, we consider greater number of features of the real-time data and efficient algorithm to predict bone diseases more accurately. The prediction of bone diseases considered in our work is Degenerative and Traumatic bone diseases. The system evaluates various data mining techniques such as Support Vector Machine, Logistic Regression, Decision Trees, and K-Nearest Neighbor. Through our evaluation, we found that accuracy of Logistic Regression algorithm is highest among all the other algorithms. Hence, Logistic Regression algorithm is used for predicting the bone diseases. With the training dataset, we develop a model that predicts the type of bone disease. The developed Logistic Regression model performs classification and prediction of the test dataset based on the training dataset. The experimental result of this work proves that the proposed methodology provides performance improvement than the existing methodologies in terms of more accuracy.
IV. Acknowledgement
We express our sincere gratitude to Sun Orthopedic Hospital and their team for sharing their pearls of wisdom with us during the course of this research.
IX. References
- Orthopedic Center of Southern Illinois, Accessed: April 15th, 2019, Available: https://orthocenter-si.com/sites/all/files/images/Knee-arthritis-can-cause-pain-inside-knee.jpg
- Christian Nordqvist, December 14th, 2017, Medical News Today, Accessed: April 15th, 2019, Available: https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTSUTzf5fVbRdbiAYbhIvi9Qe5FKVaflCAJgwpFqEKnnDK1OtQY
- Paul D. Miller M.D., Prediction of Fracture Risk in Postmenopausal White Women With Peripheral Bone Densitometry: Evidence From the National Osteoporosis Risk Assessment , Journal of Bone and Mineral Research., published.
- Paul Gerdhem, Biochemical Markers of Bone Metabolism and Prediction of Fracture in Elderly Women , Journal of Bone and Mineral Research., published.
- Saeko Fujiwara, Fracture Prediction From Bone Mineral Density in Japanese Men and Women , Journal of Bone and Mineral Research., published.
- Hui Li, Xiaoyi Li, Murali Ramanathan, and Aidong Zhang, Prediction and Informative Risk Factor Selection for Bone Disease, IEEE, DOI 10.1109/TCBB.2014.2330579, 2013.
- M Saranya and Dr. K Sarojini, An Improved and Optimal Prediction of Bone Disease Based on Risk Factors, IJCSIT, ISSN: 0975-9646, Volume 7(2), 2016.
- A Keerthana and Mrs. P Renukadevi, Predict and Prevent the Bone Disease using Data Mining Techniques, IJETCSE, ISSN: 0976-1353, Volume 21, Issue 4, APRIL 2016.
- Belur V. Dasarathy, ed. (1991). Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques. ISBN 978-0-8186-8930-7.
- Data Flair, SVM Support Vector Machine Tutorial for Beginners, Data Flair team, November 19, 2018, [Online]. Available: https://data-flair.training/blogs/svm-support-vector-machine-tutorial/ [Accessed: April 14, 2019].
- Jason Brownlee, Machine Learning Mastery, Logistic Regression Tutorial For Machine Learning, April 4, 2016, [Online]. Available:https://machinelearningmastery.com/logistic-regression-tutorial-for-machine-learning/ [Accessed: April 14, 2019].
Order from us for quality, customized work in due time of your choice.