The greatest headache for any machine learning engineer is the problem of overfitting. The model we trained works perfectly on the training dataset but when applied to other new dataset it fails miserably. This is because of overfitting where our classifier learns the provided dataset accurately but fails when applied on new data.
One good solution to this problem is to train the modal using the K-fold cross validation technique which is done this way:
- Divide the dataset randomly into k partitions.
- Train the classifier using k-1 partitions, keep one partition aside for testing the classifier.
- Keep repeating this process until you have k different trained classifiers and k performance metrics for them
- Take the average of all the metrics, this is the best estimate of the performance of your classifier.