Preview

What Is 5.5 Experimental Algorithm

Good Essays
Open Document
Open Document
816 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
What Is 5.5 Experimental Algorithm
5.5 Experimental Setup
We here briefly describe the classification algorithms applied, the BBC forum dataset [112] used and the performance evaluation measures used to analyze the results.
5.5.1 Classification Algorithms
For classification task, in this module, we used the four classification algorithms of Support Vector Machine, Decision Tree, Naïve Bayes and Logistic Regression provided in ODM[107]. As discussed earlier that it is used for data mining tasks in a number of existing research works[108-110]. Maximum Description Length (MDL) algorithm has been applied for attribute importance and all the proposed features show positive results.
5.5.2 Dataset
The choice of proper dataset is significant as it should cover diverse topics from
…show more content…
In addition, performance evaluation measures used to evaluate classification such as Receiver Operating Characteristic (ROC), Area Under the Curve (AUC), Lift and Cost have also been used for evaluation. The measure as briefly described as follows:
5.5.4.1 ROC
Receiver Operating Characteristic (ROC) is a metric to compare actual and predicted values in a classification model. It is applied for the analysis of binary classification to obtain in-depth insight into the decision-making ability of the classification model. ROC is plotted as a curve on an X-Y axis. The false positive rate is placed on the X axis while the true positive rate is placed on the Y axis. The top left corner is the optimal location on an ROC graph, indicating a high true positive rate and a low false positive rate[113, 114].
ROC graph is defined by a parametric definition x=FPrate(t), y=TPrate(t). (21)
Where t represents the probability threshold value, which by default is
…show more content…
Lift is the ratio between the percentages of correct positive classifications to that of actual positive classification in the test data. Lift is computed using the parametric definition [113]: x=Yrate(t)= (TP(t)+FP(t))/(P+N),y=TP(t). (23)
5.5.7 Cost
Cost is an additional measure introduced by Oracle Data Miner. It is an indication of the damage done by an incorrect prediction and is useful for comparison of classification models. Lower cost means a high probability of confidence in the prediction ability of the classification model.
5.6 Results and Discussion
The post and thread classification results using four classification algorithms are compared using evaluation measures of Accuracy, Precision, Recall and F-measure. In addition, performance measures of ROC, AUC, Lift and Cost are used for in-depth analysis.
5.6.1 Post

You May Also Find These Documents Helpful

  • Good Essays

    Personality affects many aspects of life. It influences behavior and social relations. Erik Erikson is a theorist known for his stages of personality development. He explains that certain stages of development affect personality in separate ways. Abraham Maslow is a theorist known for his hierarchy of needs. He explains that fulfilling needs influences personality. This paper will discuss personality characteristics of both theories and how personality affects situational behavior and interpersonal relations.…

    • 1059 Words
    • 5 Pages
    Good Essays
  • Powerful Essays

    Classification Solution that will please your research paper requires, then you put on not should to…

    • 1414 Words
    • 9 Pages
    Powerful Essays
  • Powerful Essays

    Root Cause Analysis

    • 1501 Words
    • 7 Pages

    evaluation using the Aldrete scoring protocol. A score of 99% pass will be required, if unable to…

    • 1501 Words
    • 7 Pages
    Powerful Essays
  • Powerful Essays

    Miller, M. D., Linn, R. L. & Gronlund, N. E. (2009). Measurement and Assessment in…

    • 5378 Words
    • 22 Pages
    Powerful Essays
  • Good Essays

    metrics that attempt to predict the likely number of tests required at various testing levels.…

    • 431 Words
    • 2 Pages
    Good Essays
  • Powerful Essays

    LCI Results

    • 1172 Words
    • 4 Pages

    aspect of the scores is that they tell you whether you use a learning pattern…

    • 1172 Words
    • 4 Pages
    Powerful Essays
  • Good Essays

    Hyatt E. Moore IV, O. A. (2014, April 1). Exploring medical diagnostic performance using interactive, multi-parameter sourced receiver operating characteristic scatter plots. Computers in Biology and Medicine, 47, 120-129 . Retrieved 03 28, 2017…

    • 722 Words
    • 3 Pages
    Good Essays
  • Better Essays

    The classifiers performance is estimated in the range of sensitivity, specificity, and accuracy value is shown in Table 3.These parameters estimated by getting the value for both the test vector with the predicted vector value of output. The sensitivity value mainly used to the compare the processed signal to get the detection of MI result.…

    • 814 Words
    • 4 Pages
    Better Essays
  • Powerful Essays

    There were two teams worked on the data one team for calculation of RMI and the other team blindly from the RMI and CA125 value calculate the Assiut scoring model.…

    • 1380 Words
    • 6 Pages
    Powerful Essays
  • Satisfactory Essays

    Root Cause Analysis

    • 269 Words
    • 2 Pages

    The goal of RCA is to identify underlying problems that increase the likelihood of errors while avoiding the trap of focusing on mistakes by individuals. A team focuses on the identification of the errors that occurred. They analyze each error to determine the underlying factors (root causes), than if eliminated, can reduce the risk of similar errors in the future. Next, they put a plan into place, this will than by followed by periodic assessment of the effectiveness of the efforts taken to reduce the risk of any future errors.…

    • 269 Words
    • 2 Pages
    Satisfactory Essays
  • Good Essays

    Macbeth Act 1 Summary

    • 868 Words
    • 4 Pages

    King Duncan hears good news of the battle; Banquo and Macbeth have fught valiantly against his enemies and the king rewards Macbeth with the title Thane of Cawdor…

    • 868 Words
    • 4 Pages
    Good Essays
  • Satisfactory Essays

    Comparison of scores on these specific stages across ADCS locations during the trial will aid in identifying best practices utilized by high performing branches to be compared and applied to practices in under-performing branches.…

    • 663 Words
    • 3 Pages
    Satisfactory Essays
  • Satisfactory Essays

    Gene Expression Data

    • 388 Words
    • 2 Pages

    | 2.4 ‘Gene shaving’ as a method for identifying distinct sets of genes with similar expression patterns…

    • 388 Words
    • 2 Pages
    Satisfactory Essays
  • Good Essays

    The most widely used measures to assess the performance of diagnosis the disease systems is as follows. Table 7 shows the confusion matrix containing the information about actual and predicted classifications which is used to evaluate the performance metrics. The entries in the confusion matrix have the following meaning in the context of our study: tp (true positives) is the number of cases covered by the rule that have the class predicted by the rule. fp (false positives) is the number of cases covered by the rule that have a class different from the class predicted by the rule. fn (false negatives) is the number of cases that are not covered by the rule but that have the class predicted by the rule. tn (true…

    • 1933 Words
    • 8 Pages
    Good Essays
  • Good Essays

    In this section Leave-One-Out Cross-Validation (LOOCV) was followed with the aim of training and testing the ANN model. In this way, frequently, one sample is kept for testing while the rest is used for training up to all samples are finally tested (26). Before the proposed model is applied to the particular application it must be trained using all available samples (27). The difference between the observed and the predicted values are shown in Fig. 11. The training of network continued until maximum correlation within the measured and predicted output was achieved (Table 3). Correlation expressed by R squared that R2 is coefficient of multiple determinations and relative root mean square error (RMSE) (26). Correlation results are perfect when an R squared value of 1, a very good fit is next to 1 and a very poor fit less than 0. On the other side, how much the value of RRMSE is smaller; the performance of the model is better.…

    • 753 Words
    • 4 Pages
    Good Essays