Learning Tool for Data Extraction from Images

Classification Techniques

Once variable extraction is complete, we can create a model to show how descriptive our new variables are. Classification techniques are methods that allow researchers to categorize data. Once these categories are created, we can then compare this catagorization with the actual categories that exist within the data. This allows us to see how effective different variables are at classifying an image.

In the case study, we use logistic regression as our classification technique. This is done using the glm() function. Then, a confusion matrix is created to show correct and incorrect classifications based on the logistic regression model. The model correctly predicts 82 non-cancerous images out of 89 total non-cancerous images and 51 cancerous images out of 71 total cancerous images. Finally, sensitivity (the true positive rate) and specificity (the true negative rate) are calculated. A sensitivity of 0.718 and a specificity of 0.921 are obtained. This shows that the logistic regression model is extremely specific, but only moderately sensitive. It can be concluded that further variable extraction and exploration of classification techniques is needed to obtain adequate sensitivity.

Code for Logistic Regression

Code to calculate sensitivity and specificity

Continue

Classification Techniques

Different Classification Techniques: