Breast Cancer Detection

Logistic Regression
The Logistic Regression is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear combination of one or more independent variables.

The Logistic Regression Model came out with the following results

Training Data Score : 97.45% - Testing Data Score : 98.3%

True Positive : 110 , False Positive : 1 , False Negative : 2 , True Negative : 64

Here we tested the random forest against different levels of bias in the data through different levels of undersampling in the majority class. In the end the regular full dataset got better results more frequently. We believe this is due to the reduced overall training data being used to create the balanced class data. There were more tools available to explore this further, though again due to limited time we would propose that as something for later exploration. We averaged in the 96% - 98% range in all three sets. Looking at a confusion matrix we found the logistic regression tended to give mostly false negatives over false positives when it made errors at a false negative rate of around 3%