Competitions are scored using a variety of functions, and the most common for binary classification tasks with confidence is something called log-loss, which is essentially \(\sum_{i=1}^{n} p_i\cdot\log(p_i)\), where \(p_i\) is your model's claimed confidence for test data point \(i\)'s correct label. Why does Kaggle use this scoring function? Here I'll follow Terry Tao's argument.

Ideally what we'd like is a scoring function \(f(x)\) that yields the maximum expected score precisely when the claimed confidence \(x_i\) in the correct label for \(i\) is actually what the submitter believes is the true probability (or frequency) of that outcome. This means that we want \[L(x)=p\cdot f(x) + (1-p)\cdot f(1-x)\] for fixed \(p\) to be maximized w…