My last few posts have been based around producing a scouting report from a machine learning algorithm I have put together.
The algorithm comes up with the biggest factors that are causing teams to win or lose games. I take those factors and find data based evidence to support the algorithms findings. Finally then producing a scouting report much like a coaches/scouting departments would for their team.
These machine learning scouts not only provide the biggest factors for winning and losing, but also provide teams with specific and tangible targets to hit. eg "We are 7-1 this season when we get to the free throw line 18 times a game" .
Coaches can use these targets to form offensive/defensive schemes or use certain lineups to try and hit these targets.
These targets can also be used to incentivize players before and during games, where coaches can track and continually harp on them during the game. Putting them up on the whiteboard pre-game and at half time to show how the team is tracking.
To give you an idea of how long these machine learning scouts are taking to put together:
Gather data, format data, run it through algorithm to get winning/losing factors = 10 minutes
Find data backed evidence to support the factors, set targets, write scout = 45 minutes
All up the machine learning based scout is put together in under an hour. It would interesting to compare against the traditional process that scouting/coaches go through especially on back to back games where time is of the essence.
To stress test the algorithm a little further, I decided to run it and produce a machine learning scout for the Miami Heat, based on their first 10 games of the season. Then I have tracked the Heat vs the targets the scout set to see how they have fared over the next 5 games of the season.
When training the algorithm based on the Heats first 10 games, it produced a means squared error value of 1.84.
For non statistic nerds, basically this means the algorithm takes the actual statistics from the Heats games and without knowing the outcome, it is coming within 1.84 points of predicting the actual score. I'm pretty surprised with the performance to be honest, as it only has a small sample size of 10 games to work with.
Running the algorithm for other teams after 10 games, it usually gets somewhere between 5-10 points of the actual score, which is still pretty accurate but not at the accuracy I am seeing with Miami.
Here is the Miami Heat Scout I put together after the first 10 games:
There are 4 targets set for the Heat:
As Miami were playing a rare back to back series against Washington, I have also put together a machine learning scout of the Washington Wizards and set the Heat some targets.
Targets for the Heat:
So we have all our targets, these can't just magically happen for teams. Coaches and staff will need to put things into place to help make them happen and communicate them to the players.
I have provided some data to help:
Targets such as getting to the free throw line 18 times a game, again you can't just magically click your fingers and get to the free throw line more often. The Heat staff need to look deeper at why they aren't getting to the line and put things in place to improve it.
So after another 5 games, how did the heat fare?
There is a pretty clear trend happening. The factors the algorithm have found and subsequent targets I have set are co-relating to the outcome of the game.
When the Heat have only hit 2 or less targets they have lost the game. When they have hit the majority of the targets they have won.
The results are looking very favorable and are showing that the algorithm is pointing to the right factors that are causing teams to win and lose.
For my next post, I will re-train the algorithm. Adding the Heats last 5 games and see if the scout changes, and also monitor how the Heat fare in the next batch of games.