Azby |

8h ago
[For those who really don't have the time:[/b]

The engine currently operates with a discrete grid of 11 xG steps (0.22 to 0.32). Below 0.25 the conversion rate is flat at 25%, above that it rises linearly to 32%.

A patch in April shifted the distribution upwards: from 32% to 71% of chances greater than or equal to 0.28 xG. Result: +20% goals per match, all competitions combined.

My own proposals, which are in line with those of Tomasm and Woods, are to raise the xG ceiling to 0.50 to reward really big chances, and to increase the frequency of chances to reduce the variance of inconsistent matches. The two together, without touching the conversion, which is well done.

_________________

Following the many recent discussions on the match engine, with the traditional carrot topic but also Tomasm's improvement project (https://www.virtuafoot.com/#forum?topic=170250) and Woods' message on the frequency of updates, I wanted to share with you an analysis based on the data available in the public database. The idea is not to call into question the work of the MDJ, who has made some defensible and rather elegant game design choices, as we shall see, but to contribute some statistically analysed elements to the debate. I've cross-checked my results with those of another manager, Misha, who has also carried out his own regression on around 300,000 occasions, and our figures converge almost perfectly, which gives me confidence in what follows. Brewen has also worked on this and has also dug deeper on a number of points. This is a 'macro' statistical analysis, we don't go into the details of individual matches. It's an attempt at explanation, not truth.

I'm sorry, but there will be mathematical terms, as is obligatory in this kind of discussion. Images are included to make it more meaningful for those who find it hardest to get to grips with the maths.

It's also a hell of a lot of work.

[For the method, we worked on around 800,000 events from the public database[/b] (chances, goals, fouls, cards, etc), with our own time filter: intervals of less than 10 minutes between consecutive events to avoid half-time and substitution artefacts. The hypothesis and logic of the filter: a team's xG rises continuously during a match (aggressiveness, fouls, invisible micro-events), and an event displayed merely updates the counter at time T. Over long intervals, the delta calculated between two occasions is no longer pure xG but xG plus latent residue. The 10 min filter limits this contamination.

I've checked that this choice of filter doesn't bias the results: repeated at 10, 20, 30 min and without any filter, the difference between the periods I'm going to compare remains the same to within one point. Note also that a short filter mechanically excludes more events in defensive matches (longer intervals) than in prolific matches. It's a selection bias but it acts identically before and after the patches.

[An important clarification: for xG values below 0.22, I cannot say that they are "real" opportunities in the sense of the engine. These are deltas calculated between consecutive events, which may include latent residuals (any event that "raises xG" without triggering a visible opportunity). On the other hand, on the main grid we'll be looking at, i.e. 0.22-0.32 xG, the figures are astonishing and reproducible.

We'll start with Misha's data and analysis. Here we can see that the engine works with a discrete grid of xG steps ranging from 0.22 to 0.32 (if we choose to use steps of 0.01 xG). Here's how the conversion rate behaves as a function of the step size:


Table and raw curve of goal occurrences per xG generated by the occasion (from 0.00 to 0.32). Source: Misha's analysis of 300k chances.

This can be summarised in two lines:

Below 0.25 xG: strictly flat conversion at around 25%, regardless of the quality of the second-hand car.
Above 0.25 xG: linear progression at a slope of +1 point per notch, up to 32% for the 0.32 level.


Smoothed curve of the occurrence of goal: you can see the floor around 25% and then the progression from 0.26.

To show this even more clearly, here is the same data broken down into two separate linear regressions:


On the left: regression on 0.00-0.24 or R² = 0, completely flat conversion at 25.2%. Right: regression on 0.25-0.32 or R² = 0.99, slope of +1 point per notch.

The image speaks for itself: below 0.25 xG, the quality of the opportunity is mathematically useless. Above 0.25 xG, there is a real linear progression that rewards good opportunities. If we also look at off-grid values (below 0.22 xG), we find a conversion rate that fluctuates around 25%, with no clear trend. As a reminder, this is precisely the zone that I refrain from analysing because these values are probably a mixture of real mini-opportunities and indistinguishable latent residues.

This is probably not accidental. It's a deliberate design, which seems to want to balance determinism and chance: enough chance for each opportunity to retain a value, enough slope for the best ones to be rewarded. Mathematically, it's pretty clever. But also frustrating.

Now that we've said that, we need to talk a bit about the recent changes. For that I'm going to switch to the data I used (covering matches played between 4 March and 7 April 2026). You can guess two patches. The first, at the end of March, is a little trial balloon, and the second, on 2 April, leads to visible changes in the engine:

Night of 19/20 March: the proportion of chances + goals awarded in the so-called "premium" tiers (≥ 0.28 xG) rises from 32% to 43%.
2 April: another jump, this time from 43% to 71%!

To visualise this change, here's the distribution before and after the patches:


Distribution of around 340,000 opportunities BEFORE patches and their conversion rate, 10-minute interval.


Distribution and conversion rate AFTER patches. The low tiers (0.22-0.24) saw their volume plummet. Premium tiers (0.28-0.32) have almost doubled.

This is a real collapse in absolute terms, not a simple redistribution in percentage terms: the low tiers have fallen from 1.85 opportunities + goals per match to 0.36 (-81%), the medium tiers from 1.88 to 0.38 (-80%), and the premium tiers have exploded from 2.93 to 7.22 (+147%). And this lost mass has not gone elsewhere in the form of fouls or cards: their volumes per match are stable to within 1%. The engine simply produces more premium chances per match.

Now let's look at the matches themselves. In the league (ch=1), the overall statistics before and after the patches speak for themselves: from 2.85 goals per match before 20 March to 3.45 after 2 April, an increase of 21% in 13 days. The rate of games without a goal fell from 11.3% to 8.8%, and the proportion of games with 4 goals or more climbed from 33.6% to 46.3%.

There is one caveat, however: we've only just come out of the league break, so the post-match volume is low, with only 272 matches in my database. For those who remain sceptical about the statistical basis, let's look at friendlies, where the volume is much more solid. In friendlies (ch=0), there are 16,040 matches before 20 March and 5,210 after 2 April. Goals per match rose from 2.93 to 3.52, an increase of 20%. The 0-0 rate fell from 10.8% to 7.7%, and the proportion of prolific matches (4+ goals) rose from 36.4% to 48.1%. The relative amplitude is almost identical to that observed in the league.

Over all matches in all competitions (21,656 before, 6,306 after), the number of goals per match rose from 2.84 to 3.40, again an increase of 20%.

To dig a little deeper, I also looked at the EIs separately. Here, the effect is measurably attenuated: we go from 2.20 to 2.50 goals per game, or +13%. But the IE (as you can see from the export stats) are the highest-level competition, with teams on average twice as strong as in the friendlies. The matches are tighter and more defensive, and that's where the boost from the patches is least felt.

Well, we'll calculate a few other things here... but I think we're pretty well there already ^^.

[So once we've said that, how do we make the link with the improvement project behind it?

Tomasm suggested a few weeks ago doubling the xG cap to 0.54 to reduce the carrot feeling. Woods has more recently suggested going back to the old allocation probabilities and increasing the number of updates per match. Both approaches have the same objective: to make matches more readable and less frustrating. And the data I have lends credence to both.

Recent patches have indeed increased the number of goals, but by twisting the xG distribution upwards rather than tackling the underlying problems: the ceiling hasn't moved much, remaining at 0.32 xG at present. The frequency of chances has barely changed. The median interval between occasions has dropped from 388s to 368s, i.e. -5%, and I couldn't tell whether this is a real effect or just noise. The feeling of carrot and inconsistency remains.

Tomasm is right about the cap. As long as the cap stays at 0.32, a big opportunity is worth 32% success at best. The engine simply doesn't have the mathematical space to reward a clear-cut chance, and that's probably the structural reason why we all howled at the MDM during a big domination. It's also, in my opinion, what forced the hand of recent patches: not wanting to raise the ceiling, the MDJ had to compress the whole grid upwards to generate more goals. A clever workaround, but not a permanent solution. Raising it to 0.50 would give the engine the space it needs to distinguish between a genuine free-kick and a half-chance, without having to twist the distribution. The most important point for me is that freeing up the ceiling doesn't eliminate the overall randomness. A shot at 0.5 xG during a period of great domination is still missed half the time. You retain all the element of surprise that (in my opinion) is the joy of football.

Woods is right about variance. With a constant frequency of chances (around 4 per team per match), the variance per match is mechanically enormous: with 4 shots at 25%, you come up empty-handed around 32% of the time, whatever your level. Moving to 6-8 chances per team would reduce this variance and generate fewer matches with inconsistent results. Randomness is still present, and there will always be matches that you 'should' have won and will lose, but to a lesser extent.

And for me, the two tracks go together. The ceiling alone opens the door to frustrating "0.5 xG misses" if the volume remains low.

The frequency alone brings us back to the same grid compression that the patches have just produced.

Taken together, they make the engine more readable without reducing its randomness. This seems to me to be fairly faithful to what MDJ has already built around what we perceive of its match engine (which isn't 'rubbish', far from it).

If anyone has a few minutes to take a look at this analysis and check that we don't have any methodological biases... I'd be happy to share other figures if I can. And if other managers have data or thoughts to add, don't hesitate. It's by pooling our analyses that we'll be able to make collective progress.

This message has been translated. (FR) Original message

Galywat |

7h ago

The analysis is interesting (more so than I thought when you brought it up). However, I still have this reservation:

  • there's no analysis (and it's not your fault) of the news items where xG is generated without opportunity/goal --> that's the whole problem, in my opinion, because the opportunity/goal conversion rate is nice, you learn little tricks, but it allows for very limited analysis. And the game is largely made up of news items (with xG) that don't provide any opportunities. (You only have to look at the number of people complaining that nothing happens in matches, and rightly so, since nothing can be displayed for dozens of minutes sometimes). Now, an event at 0.2xG that doesn't generate any chances is no less dangerous on paper than an opportunity at 0.1xG. The opportunity is just a display.

If there were to be an evolution, I'd tend to agree with woodz. The impact of variance is much less significant when you increase the number of events. And I hope that this kind of analysis will make Aymeric want to give the evolution of xG for each event, not just for each action, even if it means reducing the size of the sample of matches.


This message has been translated. (FR) Original message

aloisio |

6h ago

Great job Azby, a future data analyst if you're not one already!

In any case, if I've understood correctly, recently the match engine seems to be much more punitive as soon as a premium opportunity is created.
That's why I made the link with LR's 260 points, because as the dominant cartel, it wouldn't have been surprising if they'd benefited from this 'bonus' in terms of concretisation.

As for the rest, the options are on the table, even if there may be (i) counter-measures to these options (such as reducing the number of chances scored on goal to something more dispersed) or (ii) other options: creating a malus on afk, dealing with the bugs of attacks that are too strong on the flanks, etc.

Well done!


This message has been translated. (FR) Original message