WASP doesn't have much buzz

Over the years, there have been different methods for predicting the runs scored by a team in a cricket match, with projected scores being displayed at various stages of the game. The newest addition to this is the WASP (Winning And Score Prediction), which was developed by two Kiwis Seamus Hogan and Scott Brooker.

The models are based on a database of all non-shortened One-day internationals and 20-20 games played between top-eight countries since late 2006 (slightly further back for 20-20 games). The first-innings model estimates the additional runs likely to be scored as a function of the number of balls and wickets remaining. The second innings model estimates the probability of winning as a function of balls and wickets remaining, runs scored to date, and the target score.

The estimates are said to be obtained using a dynamic programme rather than just curve fitting to the data. To illustrate, to calculate the expected additional runs when a given number of balls and wickets remain in the first innings, we could just average the additional runs scored in all matches when that situation arose. This would work fine for situations that have occurred a lot such as 1 wicket down after 10 overs or 5 wickets down after 40 overs, etc., but for rare situations like 5 wickets down after 10 overs or 1 wicket down after 40, it would be problematic, partly because of a lack of precision when sample sizes are small but more importantly because those rare situations will be overpopulated with games where there was a mismatch in skills between the two teams.

Instead, what we do is estimate the expected runs and the probability of a wicket falling on the next ball only. Let V(b,w) be the expected additional runs for the rest of the innings when b (legitimate) balls have been bowled and w wickets have been lost, and let r(b,w) and p(b,w) be the estimated expected runs and the probability of a wicket on the next ball in that situation respectively. We can then write,

V(b,w) =r(b,w) +p(b,w) V(b+1,w+1) +(1-p(b,w)))V(b+1,w)

Since V(b*,w)=0, where b* equals the maximum number of legitimate deliveries allowed in the innings (300 in a 50 over game), we can solve the model backwards. This means that the estimates for V(b,w) in rare situations depends only slightly on the estimated runs and probability of a wicket on that ball, and mostly on the values of V(b+1,w) and V(b+1,w+1), which will be mostly determined by thick data points. The second innings model is a bit more complicated, but it uses essentially the same logic.

The WASP developers claim that the WASP is different from other forecasts and projections in a way that the predictions are not forecasts that could be used to set TAB betting odds. Rather they are estimates about how well an average batting team would do against an average bowling team in the conditions under which the game is being played. That is, the “predictions” are more a measure of how well the teams have done to that point, rather than forecasts of how well they will do from that point on.

As an example, imagine that Zimbabwe were playing Australia and halfway through the second innings had done well enough to have their noses in front. WASP might give a winning probability of 55% for Zimbabwe; but based on past performances, one would still favour Australia to win the game. That prediction, however, would be using prior information about the ability of the teams; so it is not interesting as a statement about how a specific match is unfolding. Also, the winning probabilities are rounded off to the nearest integer; so WASP will likely show a probability of winning of either 0% or 100% before the game actually finishes, even though the result is not literally certain at that point.

Also, another novelty is in including an adjustment for the ease of batting conditions in the models. There is adjustment done for estimating ground conditions, here. Without that adjustment, the models would overstate the advantage or disadvantage a team would have based on a good or bad start respectively since those occurrences in the data would be correlated with ground conditions that apply to both teams. Using this novel technique, WASP is said to estimate ground conditions from historical games and so control for that confounding effect in our estimated models.

All said and done, I still feel there are some flaws with the WASP. Firstly, it does not take into account the nature of the batsmen and bowlers in the middle. For example, consider a match situation where Virat Kohli and MS Dhoni are batting, and India needs 40 runs from 5 overs. Given the previous track records of these two match winners, obviously the match will be in India’s favour in any condition, any ground against any opponent (although India may have poor records against that team in the past).

Now let us consider another match where Ravindra Jadeja and Mohammed Shami are batting, and India needs 40 runs from 10 overs against a relatively weaker side, say Bangladesh. Although Bangladesh has a poor record against India – under such a circumstance – the WASP should ideally favour Bangladesh to win the match even if the match is played on a placid track.

Similarly, South Africa defending 16 runs off the last two overs with Dale Steyn and Morne Morkel to bowl would be totally different when compared to Ishant Sharma and Ravichandran Ashwin bowling the last 5 overs with the opposition requiring 50 runs.

Secondly, WASP doesn’t take into account how teams fare in crunch matches. For example, South Africa has often grabbed defeat from the jaws of victory in knockout matches against not so formidable sides. So depending on the importance of a match, WASP should have a provision on taking into account the significance of a match (a semi-final or a final against a group league match) and how the team has fared in similar situations from previous matches.

Thirdly, the WASP, although, takes into account the past batting record of a batsman, what if that batsman bats in a different position on a particular day? Chris Gayle batting in the middle order as against batting as an opener can have two different outcomes. Unfortunately, WASP would average out Gayle’s performance as a batsman and given his audacious extravagance and great record as an opener, it might overestimate West Indies’ chances in a match where he plays as a middle order batsman.

Fourthly, the fielding dominance of a team should be taken into consideration. With a great fielding side, like South Africa in a big ground, say MCG, a team chasing even 240 may find it difficult owing to the great pressure implied by the South Africans.

Finally, the idea of strike rate for bowlers and batsman evolving over the course of a match is also not taken into account. A batsman would either accelerate at the start only to slow down in the middle, or would start slow and then explode. Similarly, a bowler’s ability to take wickets during different stages of a game is different, which should be taken into account while making such predictions.