The team then used the outcome with a combination of ML and conventional statistics, a method called a random-forest approach, to identify the likely winner. As for random-forest approach, it is a technique that has emerged in recent years as a powerful way to analyse large amounts of data dodging the risks of other methods.
This approach is based on decision tree method in which an outcome is calculated at each branch by reference to a set of training data. Since, this decision tree technique is not efficacious and suffers from overfitting, a problem which gives distorted results from the training data at the latter stages of the branching process, the researchers moved on to random-forest technique that instead of calculating the outcome at every branch, calculates the outcome of random branches.
The technique revealed the factors which are most important in determining the outcome and narrowed down the results to two teams being favoured to win the World Cup. The data used took in consideration economic factors such as the country’s GDP and population, FIFA’s ranking of national teams and the properties of the teams themselves, such as their average age, the number of Champions League players they have, home advantage, among others to finally come out with two names.
The outcome derived on the basis of data said that in the beginning of the tournament, Spain will be the most likely winner, with a probability of 17.8 percent. Considering the structure of tournaments and upsets, the results may change.
“Spain is slightly favored over Germany mainly due to the fact that Germany has a comparatively high chance to drop out in the round-of-sixteen,” Groll said, adding that based on the entire tournament simulation and on the most probable tournament course, “instead of the Spanish the German team would win the World Cup.
According to the research, if Germany clears the group phase of the competition, it is more likely to face strong opposition in the 16-team knockout phase calculating Germany’s chances of reaching the quarter-finals as 58 percent.
On the contrary, Spain is likely to face not-so-strong opposition in the final 16 and has a 73 percent chance of reaching the quarter-finals. If both make the quarter-finals, they have a more or less equal chance of winning.