The Pitfalls and Insights of Log Facies Classification for a Machine Learning Contest
Marcelo Guarido, David J. Emery, Marie Macquet, Daniel O. Trad, Kristopher A. H. Innanen
FORCE: Machine Predicted Lithology was a classification contest using well logs from the Norwegian coast of the North Sea. While lithology is the general physical characteristics of rocks our Machine Learn approaches concentrated on the petrology or composition of rocks sample by sample. We used different solutions for the provided data set, and created workflows that clean and complete the data. An additional problem was that the training data was not balanced with 1 class making up 62% of the training data and 7 of the classes less than 4%. We built two different models, one for balanced predictions using a gradient boosting algorithm, and another focusing on the common classes using a model that stacks gradient boosting and random forest probability predictions. The primary Machine Learning pitfall was to balance the petrophysical analysis with the lithofacies associated training classes. The FORCE label training classes also contain a mixture of lithofacies within each class and thus a high degree of mineralogy variation or crosstalk in the confusion matrix. A second pitfall was how the Machine Learning contest was scored used a penalty matrix metric that did not compensate for the imbalance of the input data. The first of our approaches had a great balanced accuracy score of 0.561, but with a poor score for the contest metric, scoring -1.35. The second model scored -0.58 on the contest metric, with a trade-off on the balanced accuracy score, which reduced to 0.41.