10 minutes, ML model to predict Euromillions Jackpot

Daryl Felix
4 min readJul 15, 2022

Sometime, you just want to play with some datas and make some Python codding without managing a full Machine Learning project. For those who live in Europe, on this day (15th July 2022) there is a huge lottery with more than 200 million euros in winnings.

Let’s make a 10 minutes Machine Learning Notebook to predict good combinaison !

Photo by Alejandro Garay on Unsplash

Minute 01' Dataset

By looking on different website it’s quite easy to find historical data. I downloaded the file from this website: https://www.loterieplus.com/euromillions/services/telechargement-resultat.php

Minute 02' Create a Notebook and import Libraries

I work with a personal Macbook and Visual Studio Code, so very easy to start a new projet and create a Notebook.

import pyforest

Working with pyforest it’s easy because with this simple line of code, I’m able to import most of the working library as Pandas, Numpy and other.

Minute 03' Load file and check data

loro = pd.read_csv('data/euromillions.csv', delimiter=';')
loro.shape

Just read the file and display the size. In this case I got 1546 observations with 48 features.

loro.head()

Minutes 04' — 06' Look data and select features

I decided to work with past number (ball number) only. In nutshell, to check if there is a relation between past lottery number and the coming one by working with shift() method of Pandas!

loro['DATE']=pd.to_datetime(loro['DATE'])
cols = ['N1','N2','N3','N4','N5','E1','E2']
loro.set_index('DATE',inplace=True)
loto = loro[cols].copy()
loto.head()

Minute 07' Create features and split dataset

loto=loto.merge(loto.shift(-1),left_index=True, right_index=True)
features = [f for f in loto.columns.tolist() if '_y' in f]
bet = loto['2022-07-15']
#bet is my game of the day, the one to predict and play !
train=loto['1900-01-01':'2022-07-14']
train.dropna(inplace=True)
train.head()

I kept the last row as my unseen data (game of the day). Then, I created the dataset which will be separate to train and test set for modeling process.

Look at the .shift() method which says “… for the output on this day, past lottery number are …” by setting a period of “1” I just look for impact of last game on the current one.

_x columns are the target and _y columns are the features :-)

target = ['N1_x','N2_x','N3_x','N4_x','N5_x','E1_x','E2_x']
y = train[target]
X = train[features]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

Minute 08' Modeling

I decided to work with “multi output linear regression model” to predict 5 numbers and 2 stars.

from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)

Minute 09' Score the model

model.score(X_test, y_test)

What a score, anyway ! I said 10 minutes ML project so let’s make a prediction.

Minute 10' Predict & Play !

yhat = model.predict(bet[features])
yhat.astype(int)

So, number to play are 8, 16, 26, 35, 42 and stars 3 & 7. Let’s bet this combinaison … and pray :-)

Summary

In this article, I show:

  • how it can be fun to just play with some dataset; in 10 minutes duration I collected the file from a website, loaded it and look the data and decide my features for modeling process; then I trained a model, scored it and finally made a prediction.
  • use a multi-output regression model; many topics on the subject and this article is inspired from: https://machinelearningmastery.com/multi-output-regression-models-with-python/
  • take a simple decision to “shift” my dataset to create features

--

--

Daryl Felix

Passionate programmer with more than 30 years of experience, from the first Cobol programs to Python.