Data mining

Data Mining Project – Dogs Race Prediction

Motivation
Gambling is very popular in the Republic of Ireland, weather is online or not, more people are joining gambling communities formed all over the Island of Ireland. The majority of these communities are involved in horse races related gambling and other sports, but there is a significant amount of people dedicated to dogs races. This is a multimillion Euro industry developed on-line and live or face to face.
Objective
There are many websites and newspapers giving predictions in this direction, but there is no tool which can give mathematical analysis about the races.
For my Data Mining Project I will use a database collected from www.greyhound-data.com, then I will use this data in RapidMiner to generate a random race sample and finally I will predict the winner of the race using the same tool.
Database
The database collected is comprised of 100 examples with 11 dimensions:
1. Place – which represents the national rank
2. Name – IE/IE represents the land of standing/land of
3. Land of Birth
4. Land of Standing
5. Year of birth
6. Sex – male or female
7. Sire – father’s name
8. Dam – mother’s name (the last two dimensions are considered important in gambling)
9. Races – the number of races for 2014
10. Points – how many points each dog heave accumulated in 2014
11. Avg Dist – the average distance of races.
All the details are based on 2014 statistics collected from the website up mentioned. On top of these dimensions I manually added three more:
1. Weight – in Kg
2. Owner
3. Colour
The last three heave missing data, which make the dataset noisy but I will try to find the best way to recover the missing data.
After importing the dataset in DataMining from an Excel file, first I analysed the data, then I separated clean data from dirty data (no_missing_attributes function). As a result, only 29 items were perfect data, while 71 had missing values (noisy).

As we can see in the

Data mining

You May Also Find These Documents Helpful

Nt1330 Unit 3 Quiz

Nt1330 Unit 3 Quiz

Nt1210 Lab 3.1 Essay

Nt1210 Lab 3.1 Essay

NT1210 Labs 3.1-3.4

NT1210 Labs 3.1-3.4

Keller Math 533 Project Part B

Keller Math 533 Project Part B

RFMC Analysis Paper

RFMC Analysis Paper

Fundamentals of Statistics

Fundamentals of Statistics

Companies Are Adopting Business Intelligence System Within Their Organizations

Companies Are Adopting Business Intelligence System Within Their Organizations

Data Mining-East West Airlines

Data Mining-East West Airlines

Picking Soccer Winners Research Paper

Picking Soccer Winners Research Paper

Explorable

Explorable

Stem Cell Research Cons

Stem Cell Research Cons

Essay On Slave Sale Dataset

Essay On Slave Sale Dataset

Informative Essay On Horse Racing

Informative Essay On Horse Racing

Big Data

Big Data

Data Preprocessing

Data Preprocessing