Data Science Projects (Cricket Analytics Starter Kit)

To me, cricket is a simple game. Keep it simple and just go out and play

Shane Warne

Cricket has a crazy following in the sub-continent with IPL being last valued at 5.3 billion USD. This game of bat and ball largely prevalent in Commonwealth Nations is not just interesting to watch but has an equally growing analytical use case.

The discrete nature of the game and growth of IPL, the need for analytics as an edge both for on-field performance and other ancillary services such as growing and engaging the fan base is on rage.


  • The tools rely on a D/L based index which combine strike rate and runs for batsmen.
  • The same index works in an opposite direction and combines economy/wickets for bowlers.
  • The idea is to bring contribution/effectives to a single number and to be able to compare them.
  • The paper and some of the resources are mentioned towards the end.
  • The tools can be used separately to explore batsmen and bowlers. The data though is currently, only from IPL matches of last few years, I guess 2017 onwards and might have a few missing values.

Batsmen Explorer (Tool)

Bowler Explorer (Tool)

Given that my previous startup experience was around trying to cash on this growing niche, I wanted to recap and document my learnings about the ecosystem in general. Broadly as mentioned above, the opportunity lies across two directions :

  • Performance Analysis 
  • Fan Engagement & Branding Services 

The fan engagement aspect largely involves the fantasy gaming sites, IPL Teams and any other celebrity imports, primarily Bollywood and so on. Given the private nature of the data involving fan engagement, the large part of the post is about performance analysis data.


Before you can play around with the data, the first question is where do you get it. Given the game’s similarity with Baseball which has a whole branch of analytics called Sabermetrics, analytics for Cricket is still very early stages.

The only open source data set available was at Cricsheet. Unfortunately,  it stopped updating from July 2017 onwards. But an updated version was recently released at White Ball Analytics.


There are a couple of Sabermetrics courses online which should be able to give an idea or impetus around getting started with Cricket Analytics. How to define KPIs, think about performance analysis in general.


There are only a few books on the subject with a couple of them being by Tiniam V Ganesh who also authored an R package for the same.

Blogs/ Websites

These are some blogs you can refer to get an idea about the work already done, the approaches that were taken and the challenges with analysis and otherwise.

Data & Processing

The first key part of being able to do any good analytics is dependent on the quality and breadth of data available. Given the early days of the space, the only freely available data sets are by an Irish & English gentlemen, ironic given that India is home to IPL.

This free historical dataset is limited to ball-ball events catalogue. But based on my experience, there are a couple of paid vendors with much richer data including sensor information. Having access to a greater diversity of data should make it possible to do a broader range of analytics beyond the obvious metric.

Paid Historical Data

Source: Agaram Infotech

FYI, the above vendor supplies data to several IPL Teams but their minimum quote is pretty steep for analytics startups costing over 3K dollars. This is quite pricey from sub-continent point of view.

Paid Streaming Data

Source: Cricket API

Streaming data or live feed is used by fantasy sites to be able to run their games and update scores. This kind of service involves hitting a specified API service and updated match info ball-ball.

Stack & Resources

As can be inferred from the two courses mentioned. SQL for data storage & R for basic statistical analysis is more than enough for standalone reporting.

The typical Fantasy game has a simple platform to choose the 11 odd players and based on the points incurred, the top fantasy teams would be deemed winners and eligible for prizes.

Building any sophisticated or offbeat Fantasy game/ analytics over the streaming data had several challenges :

  • The ball update typically had a delay of 5 seconds which in rare cases would extend to 15 sec or more. This delay was incredibly volatile and made building a live analytical engine difficult.
  • The data quality in streaming services has its own challenges involving frequent errors which would later be corrected.

There are no known fan based engagement numbers streaming service providers.

Use Cases & Stakeholders

The entire idea behind carrying out this analysis is to be able to use them for some purpose. The numbers crunched can be consumed by :

  • Fans: Analytical reports can be a source of engaging news and alternate medium for fans to ponder on. This is something along the lines of FiveThirtyEight.
  • League Teams: IPL franchises and other T20 leagues are a ripe customer for such analytics. Though analytics is still prevalent, it is largely driven by video analysts who or were largely ex-cricketers with no statistical backgrounds resulting in the same old domain knowledge being circulated around.
  • Media/ Agencies: Fan engagement numbers and even player performance forecasts etc can be incredibly useful for advertising agencies and celebrity management firms. They can better price their associated players. Firms looking to advertise can make a more scientific assessment of their marketing spends.

Landscape & Opportunities

Despite the growth in tech in recent times, the majority of stakeholders who run the show(BCCI, IPL Teams) have been very slow to adopt and less willing to bet on newer possibilities. Though, it has to be mentioned that both HotStar and Dream11 have made some serious strategic moves backed by sound technical expertise.

The fantasy & streaming service are the two primary fan endpoints with both Dream11 and HotStar going head to head in terms of their future goals.

  • You have Cricbuzz & Cricinfo dominating the content landscape. They have the largest volume of visits but suffer from poor engagement time and the fact that their offering has no direct monetisation.
  • Dream11 has the numbers in terms of paying user base and very fast growing one but poor engagement numbers given the nature of their static game. Their next logical step is to go for some sort of streaming.
  • HotStar has the best of both worlds, official streaming partners so not only high engagement numbers but given their recent foray into fantasy, they might eat into Dream11’s pie.

Given the interesting dynamics, it looks like an open fight between Dream11 and HotStar with both Cricbuzz and Cricinfo looking like potential acquisitions.

I built an alternative means to analyse player performance based on this paper: The Best Batsmen And Bowlers in One Day Cricket


Subscribe Here

%d bloggers like this: