Football Betting Predcition and PythonAnywhere Rollout

6 min readDec 4, 2021

The use case

On a raining weekend and no golf ! I decided to create a project to train an estimator being able to predict the outcome of a game based on odds from several website.

Storyboard

The user gives a PL (Premiere League) game and odds from a betting platform or from his own prognostic. The model, which was trained with more than 4000 games and for each game almost 5 betting website odds, returns the prognostic (1/N/2) and the probability.

Source: Premier League games since 2010.

The modelling part

I created a model based on data collected over past years of betting and results. The dataset contains teams, odds from different platforms and outcome of the game.

Example:

Tottenham — Chelsea (1/N/2) 1.25, 2.35, 1.95 — Chelsea won (2)

The database contains 20K entries like this. I trained a Random Classifier estimator with a multi class outcome.

The outcome from the estimator is a probability and the evaluation.

Returns:

[0.3, 0.1, 0.6] and outcome = 2; this mean away team will win with 60% of chance.

Model

I saved the model using pickle and deploy it on PythonAnywhere.

The result is available here: http://resquator.pythonanywhere.com/

Code for the model.

For this model, I trained a multi class model:

clf = OneVsRestClassifier(RandomForestClassifier(random_state=0),))

based on a RandomForestClassifier estimator. This technic allows me to use the « predict_probability » method of the estimator to get the % of each outcomes (1/N/2).

Notebook to train the model

from sklearn.datasets import make_classification
from sklearn.multioutput import MultiOutputClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.metrics import f1_score
from sklearn.preprocessing import OrdinalEncoder
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_splitfrom sklearn.metrics import accuracy_score
from sklearn.preprocessing import OrdinalEncoder

I imported 3 multiclass classifiers but finally worked only with OneVsRest for this model.

from sklearn.multiclass import OutputCodeClassifier
from sklearn.multiclass import OneVsOneClassifier
from sklearn.multiclass import OneVsRestClassifier

I usually work with pyforest which is a cool libraire. The target is in 1 line of code, many usefull libraries are included in the Botebook

import numpy as np
import pyforest

Very important thing for model deployement is to respect version of librairies. On www.pythonanywhere (at the time of this post) sklearn library is in 0.24.1 version. I used the next command to check the version on my personal MacBook where I host my model. Luck me I work with the same. If not, we have some options, change the version on my local MacBook to adapt the one of PythonAnywhere or create a virtaul environment on the hosting website.

sklearn.__version__# build a final modelI will build a final model in a pipeline with OrdinalEncoder for the tem name.df = pd.read_csv(‘bet.csv’)
df.shape
df = df.sample(10000)
df[‘Target’]=df[‘Result’].map({‘H’:0,’D’:1,’A’:2})
df = df.drop([‘TeamH’,’TeamA’,’Result’], axis=1)X = df.drop(‘Target’, axis=1)
y = df.Targetcategorical_features = [0,1]
numeric_features = [2,3,4]

I defined to structure for categorical features (teams name) and numerical features (odds). With those 2 structure, I can set up a pipeline with preprocessing steps and a classifier.

numeric_transformer = Pipeline(steps=[(‘imputer’, SimpleImputer(strategy=’median’)),(‘scaler’, StandardScaler())])categorical_transformer = Pipeline(steps=[(‘imputer’, SimpleImputer(strategy=’constant’, fill_value=’missing’)),(‘encoder’, OrdinalEncoder())])# classifierclf = OneVsRestClassifier(RandomForestClassifier(random_state=0),)# preprocessor stepspreprocessor = ColumnTransformer(transformers=[(‘num’, numeric_transformer, numeric_features),(‘cat’, categorical_transformer, categorical_features)])

I separated my dataset into a training and test set and finally fit the pipeline (pipe) which contains the preprocessor steps and the classifier based on a RandomForestClassifier.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
from sklearn.pipeline import Pipelinepipe = Pipeline([(‘preprocessor’, preprocessor), (‘classifier’,clf)])pipe.fit(X_train,y_train)y_pred_proba = pipe.predict_proba(X_test)
y_pred_proba

PythonAnywhere

It’s quite easy to deplo it on PythonAnywhere, but on a free account you must be aware of this:

size of the files (you have only 512 MB free)
the flask_app.py is quite complicated to fill in
deploy html files in templates

I copy paste a very simple app.py Flask file. It was my choice to only have a predictorform.html file and to no use a result.html file but display the result of the prediction directly by wrapping html in my Flask app.py

from flask import Flask
import flask
import pickle
import sklearn# Import the os moduleimport os
import numpy as npapp = Flask(__name__)#@app.route(‘/’)@app.route(‘/’, methods=[‘GET’, ‘POST’])def hello_world():

The first part of the app.py file is to deal with the “GET” method. When the user will click on submit in the input form.

print screen of the input form found on resquator.pythonanywhere.com

if flask.request.method == ‘GET’:# Get the current working directorycwd = os.getcwd()
print(f’CURRENT FOLDER {cwd}’)# Use pickle to load in the pre-trained model.return(flask.render_template(‘predictorform.html’))

Then, when the user click on submit, the “POST” part of the method is called to collect input features (2 teams & 3 odds), load and call the predict method of the model (which ha been loaded on PythonAnywhere) and finally format the outcome in a HTML variable to be rendered.

if flask.request.method == ‘POST’:with open(‘/home/resquator/mysite/finalized_odds.sav’, ‘rb’) as f:model = pickle.load(f)print(‘MODEL LOADED successfully’)hometeam = flask.request.form[‘hometeam’]awayteam = flask.request.form[‘awayteam’]odd_1 = np.float64(flask.request.form[‘1’])odd_N = np.float64(flask.request.form[‘N’])odd_2 = np.float(flask.request.form[‘2’])v = [hometeam, awayteam, odd_1, odd_N, odd_2]result=model.predict_proba([v])# start a html variable for prinitng the outputhtml = ‘<html><body><h2>Pronostic for your request</h2>’
html = html + f’<p>You request a pronostic validation for {hometeam} Vs. {awayteam}</p>’html = html + f’Odds given was {odd_1} {hometeam} to win, {odd_N} deuce, {odd_2} {awayteam} to win<br><hr>’html = html + f’(1) is {np.round(result[0][0],2)*100}%<br>’
html = html + f’(N) is {np.round(result[0][1],2)*100}%<br>’
html = html + f’(2) is {np.round(result[0][2],2)*100}%<br>’result = model.predict([v])html = html + ‘<hr><h3>’if result[0] == 0:html = html + f’Pronostic is <b>{hometeam}</b> to <b>win</b>’if result[0] == 1:html = html + f’Pronostic is <b>{hometeam} Vs. {awayteam}</b> will <b>share</b>’if result[0] == 2:html = html + f’Pronostic is <b>{awayteam}</b> to <b>win</b>’
html = html + ‘</h3>’return f’Hello from Resquator Predictions!<br>{html}’

The predictions rendering includes a recall of the input features, the game Aston Villa Vs. Tottenham, the odds (1/N/2).

Then the % of each outcome is displayed. In our example, Aston Villa has 29% chance to win at home, Tottenham has 44% to win away and in 27% of the case the teams will draw the game.

Conclusion

I tried to mix multiclass classification problem and cloud deployement. I worked with sklearn multiclass meta estimator to wrap a RandomClassifier for a 3 classes (1 target value).

PythonAnywhere is a simple a free cloud platform. Quite easy to use but the user must have some HTML skills to enjoy the service. My outcomes:

it’s for free for 1 model and 512 MB storing (very good to practice)
be aware of the version of librairies between PythonAnywhere and your local environment
take time to understand the Flask library and possibilities
back to some HTML (I used to did a lot) to decorate the page

All of that, of course, inspired by watching football on TV

Betting website gives Newcastle winner with an 2.35 odds and very poor chances for Burnley to share or win the game. Newcastle has 46% chances to win the game.