Motivation
Gambling is very popular in the Republic of Ireland, weather is online or not, more people are joining gambling communities formed all over the Island of Ireland. The majority of these communities are involved in horse races related gambling and other sports, but there is a significant amount of people dedicated to dogs races. This is a multimillion Euro industry developed on-line and live or face to face.
Objective
There are many websites and newspapers giving predictions in this direction, but there is no tool which can give mathematical analysis about the races.
For my Data Mining Project I will use a database collected from www.greyhound-data.com, then I will use this data in RapidMiner to generate a random race sample and finally I will predict the winner of the race using the same tool.
Database
The database collected is comprised of 100 examples with 11 dimensions:
1. Place – which represents the national rank
2. Name – IE/IE represents the land of standing/land of
3. Land of Birth
4. Land of Standing
5. Year of birth
6. Sex – male or female
7. Sire – father’s name
8. Dam – mother’s name (the last two dimensions are considered important in gambling)
9. Races – the number of races for 2014
10. Points – how many points each dog heave accumulated in 2014
11. Avg Dist – the average distance of races.
All the details are based on 2014 statistics collected from the website up mentioned. On top of these dimensions I manually added three more:
1. Weight – in Kg
2. Owner
3. Colour
The last three heave missing data, which make the dataset noisy but I will try to find the best way to recover the missing data.
After importing the dataset in DataMining from an Excel file, first I analysed the data, then I separated clean data from dirty data (no_missing_attributes function). As a result, only 29 items were perfect data, while 71 had missing values (noisy).
As we can see in the