Preview

Data mining

Powerful Essays
Open Document
Open Document
949 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
Data mining
Data Mining Project – Dogs Race Prediction

Motivation
Gambling is very popular in the Republic of Ireland, weather is online or not, more people are joining gambling communities formed all over the Island of Ireland. The majority of these communities are involved in horse races related gambling and other sports, but there is a significant amount of people dedicated to dogs races. This is a multimillion Euro industry developed on-line and live or face to face.
Objective
There are many websites and newspapers giving predictions in this direction, but there is no tool which can give mathematical analysis about the races.
For my Data Mining Project I will use a database collected from www.greyhound-data.com, then I will use this data in RapidMiner to generate a random race sample and finally I will predict the winner of the race using the same tool.
Database
The database collected is comprised of 100 examples with 11 dimensions:
1. Place – which represents the national rank
2. Name – IE/IE represents the land of standing/land of
3. Land of Birth
4. Land of Standing
5. Year of birth
6. Sex – male or female
7. Sire – father’s name
8. Dam – mother’s name (the last two dimensions are considered important in gambling)
9. Races – the number of races for 2014
10. Points – how many points each dog heave accumulated in 2014
11. Avg Dist – the average distance of races.
All the details are based on 2014 statistics collected from the website up mentioned. On top of these dimensions I manually added three more:
1. Weight – in Kg
2. Owner
3. Colour
The last three heave missing data, which make the dataset noisy but I will try to find the best way to recover the missing data.
After importing the dataset in DataMining from an Excel file, first I analysed the data, then I separated clean data from dirty data (no_missing_attributes function). As a result, only 29 items were perfect data, while 71 had missing values (noisy).

As we can see in the

You May Also Find These Documents Helpful

  • Good Essays

    Nt1330 Unit 3 Quiz

    • 501 Words
    • 3 Pages

    3. These fields are contained in the database: First Digit, First Two Digits, First Three Digits, and Second Digit Last Two Digits, Second Order and Summation Suspicious data, upper and lower boundaries, mean absolute deviation and parameters.…

    • 501 Words
    • 3 Pages
    Good Essays
  • Good Essays

    Nt1210 Lab 3.1 Essay

    • 453 Words
    • 2 Pages

    In the event that the information is excessively itemized, it may cover with other data. However in the event that the data is excessively general, then there may be critical data missing.…

    • 453 Words
    • 2 Pages
    Good Essays
  • Satisfactory Essays

    NT1210 Labs 3.1-3.4

    • 1882 Words
    • 9 Pages

    If the data is too detailed, it may overlap with other information. However if the data is too general, then there may be crucial information missing.…

    • 1882 Words
    • 9 Pages
    Satisfactory Essays
  • Good Essays

    In regards to the dataset from AJ Department store, your manager has speculated the following:…

    • 1741 Words
    • 7 Pages
    Good Essays
  • Powerful Essays

    RFMC Analysis Paper

    • 1004 Words
    • 5 Pages

    Get to know the data so I understand what the data represents, what each field means…

    • 1004 Words
    • 5 Pages
    Powerful Essays
  • Good Essays

    Each section should include all data points listed in the column for the variable. The requirements include:…

    • 752 Words
    • 4 Pages
    Good Essays
  • Powerful Essays

    Predictive analysis helps forecast the future event outcome or likelihood of specific activity occurring. From…

    • 2242 Words
    • 9 Pages
    Powerful Essays
  • Good Essays

    using the Naïve Bayes classification model. If the model used is successful then it can be…

    • 642 Words
    • 3 Pages
    Good Essays
  • Good Essays

    There are quite a few online tools that claim they are able to help you pick winners, but they all want money or something in return…until now.…

    • 563 Words
    • 3 Pages
    Good Essays
  • Powerful Essays

    Explorable

    • 2389 Words
    • 10 Pages

    To be able to analyze the data sensibly, the raw data is processed [4] into "output data [5]".…

    • 2389 Words
    • 10 Pages
    Powerful Essays
  • Better Essays

    Stem Cell Research Cons

    • 1977 Words
    • 8 Pages

    of source data being provided is another issue. At this time, there is limited information…

    • 1977 Words
    • 8 Pages
    Better Essays
  • Good Essays

    Once variables from the dataset began to connect with one another, I was able to better…

    • 505 Words
    • 3 Pages
    Good Essays
  • Satisfactory Essays

    However, there are several breeds of competitive race horses and a number of horse racing variations.…

    • 487 Words
    • 2 Pages
    Satisfactory Essays
  • Good Essays

    Big Data

    • 677 Words
    • 3 Pages

    Volvo utilized data mining in an effort to discover the unknown valuable relationships in the data collected and to assist in making early predictive information. It created a network of sensors and CPUs that were embedded throughout the cars and from which data was captured. Data was also captured from customer relationship systems (CRM), dealership systems, product development and design systems and from the production floors in their factories. The terabytes of data collected was streamed via the cloud to its centralized hub or Volvo Data Warehouse. From the warehouse Volvo received real-time analysis of data captured. The data could then be archived and accessed on demand.…

    • 677 Words
    • 3 Pages
    Good Essays
  • Powerful Essays

    Data Preprocessing

    • 17962 Words
    • 72 Pages

    Today’s real-world databases are highly susceptible to noisy, missing, and inconsistent data due to their typically huge size (often several gigabytes or more) and their likely origin from multiple, heterogenous sources. Low-quality data will lead to low-quality mining results. “How can the data be preprocessed in order to help improve the quality of the data and, consequently, of the mining results? How can the data be preprocessed so as to improve the efficiency and ease of the mining process?” There are several data preprocessing techniques. Data cleaning can be applied to remove noise and correct inconsistencies in data. Data integration merges data from multiple sources into a coherent data store such as a data warehouse. Data reduction can reduce data size by, for instance, aggregating, eliminating redundant features, or clustering. Data transformations (e.g., normalization) may be applied, where data are scaled to fall within a smaller range like 0.0 to 1.0. This can improve the accuracy and efficiency of mining algorithms involving distance measurements. These techniques are not mutually exclusive; they may work together. For example, data cleaning can involve transformations to correct wrong data, such as by transforming all entries for a date field to a common format. In Chapter 2, we learned about the different attribute types and how to use basic statistical descriptions to study data characteristics. These can help identify erroneous values and outliers, which will be useful in the data cleaning and integration steps. Data processing techniques, when applied before mining, can substantially improve the overall quality of the patterns mined and/or the time required for the actual mining. In this chapter, we introduce the basic concepts of data preprocessing in Section 3.1. The methods for data preprocessing are organized into the following categories: data cleaning (Section 3.2), data integration (Section 3.3), data reduction…

    • 17962 Words
    • 72 Pages
    Powerful Essays