i

R Programming Complete Tutorial

Heart Risk Analysis using LR

Coronary heart diseases remain one of the leading causes of death all over the world. One of the biggest contributors to coronary disease is a lack of commitment to a heart-healthy lifestyle and the consequences associated with it.

This project aims at early detection that means finding out whether the patients have the risk of coronary heart disease in the next ten years. We have used the Data Set, heartdata for this analysis.

Step 1: We are going to split the dataset into training and test set.

Step 2: Train the model using the training set.

Model Summary:

This summary function will display the detail of the model.

The first column is for the predictor's name. The next one is for the value of the co-efficient, then Standard error, after that z value and finally p-value.

The standard significance level is 5%. We are going to choose all the predictors whose p-value is less than the significance level. So male, age, cigsPerDay, sysBP, totalChol, and glucose are the significant predictors.

Step 3: Model Prediction using Test data set.

We will predict TenYearCHD, for this separate test dataset using our Logistic Regression model heartLR. It will generate the below result.  

Outcome:

For row 17, it has predicted TenTearCHD as .1557. For the 300th row, the predicted result is .2739. Our model will return all the predicted outcome for TenYearCHD. 

Step 4: Model Evaluation using Confusion Matrix and ROC Curve:

In this section, we are going to evaluate the model first using the Confusion Matrix and then using the ROC Curve.

Confusion Matrix:  A confusion matrix is a table often used to define a classification model's performance on a set of test data for which the true values are identified. It allows the visualization of the performance of an algorithm.

We are going to use the table function to create our confusion matrix. We will compare our predicted value with the expected value. We will set a threshold, and based on that; we will run the function. It will generate the below table.

Graphical Representation:

Model Accuracy:

Baseline Accuracy:

Other Parameters:

AUC-ROC Curve:  AUC - ROC curve is a performance measurement for the classification problem at various threshold settings. ROC is a probability curve, and AUC represents the degree or measure of separability. It tells how much model is capable of distinguishing between classes.

ROC-AUC Curve: