Preview

Analysis Of Lazy Super Parent Tree Augmented Naive Bayes

Good Essays
Open Document
Open Document
1277 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
Analysis Of Lazy Super Parent Tree Augmented Naive Bayes
In this section, we propose a heuristic, called Lazy Super Parent Tree Augmented Naive Bayes (LSPTAN) that seeks to solve the problems discussed above, enabling the application of a semi-Naive Bayes techniques in large ADC tasks. Thus, we can evaluate whether the premise of independence among attributes, assumed by Naive Bayes, impacts effectiveness in large ADC tasks, an open research problem.\looseness=-1

The Lazy Super Parent TAN (LSPTAN) heuristic is a postergated version of the SP-TAN that constructs a Tree Augmented Naive Bayes for each test example. Attributes dependencies are generated based on information from the example that is being classified. To build a lazy version of SP-TAN we adapted the method of evaluation and the selection of candidates for Super Parent and Favorite Children.\looseness=-1

The SP-TAN algorithm exploits accuracy to select a candidate to Super Parent ($a_{sp}$). In our strategy, we select the candidate $a_{sp}$ whose classification model generates
…show more content…
Therefore, LSPTAN builds a simpler network than the SP-TAN. We select only the best Super Parent to a test document. But there is no limitation on the choice of the Favorite Children. Thus, all the children attributes that increment the probability that the document belongs to a class, are included in the classification model.\looseness=-1

The LSPTAN heuristic initially builds the model based on Naive Bayes and initializes a set of orphans $O$, inserting into $O$ all the terms of the vocabulary. Then, for each test document, the technique evaluates each term as a Super Parent ($a_{sp}$) and, at the end, it selects as $a_{sp}$ the term that has the highest probability $P(c_i | d_t, a_{sp})$. Thus, the $P(c_i | d_t, a_{sp})$ for a $a_{sp}$ is defined as by the Equation~\ref{eq::lsptan}, where $f$ is the frequency of a term in the document $d_t$.\looseness=-1

\begin{center}
\begin{equation}

You May Also Find These Documents Helpful

  • Good Essays

    If the stream is not old and data that is accumulated is not old , then the given updatable Naïve Bayes can have fast adaptation to concept changes and solve problem of Concept drift.…

    • 496 Words
    • 2 Pages
    Good Essays
  • Good Essays

    Scor eStore.com

    • 677 Words
    • 2 Pages

    Q2: Secondly, we are a bit unclear on the way in which the decision trees can be applied to this case.…

    • 677 Words
    • 2 Pages
    Good Essays
  • Good Essays

    The Naïve Bayes classification model will now be applied to the reduced variable dataset. The…

    • 642 Words
    • 3 Pages
    Good Essays
  • Satisfactory Essays

    Mis Decison Tree

    • 366 Words
    • 2 Pages

    Answer the two questions below and attach the screenshot(s) in your solution document where you found the answer.…

    • 366 Words
    • 2 Pages
    Satisfactory Essays
  • Good Essays

    Decision Tree

    • 1211 Words
    • 3 Pages

    After installing the software and reading the task description, I realized that the tool is pretty easy to use and that it is very helpful in structuring information, as I will explain later on in this write-up. When creating the decision tree I started with entering the existing data. By analyzing the data in a first view you can directly see that the first and last name does not have any influence on the loan grant respectively the loan amount, which seems to be self-explaining. It makes sense to start with the node with the highest number of different characteristics. This way the tree will become clearer. That’s why I started with the distinction of the age and afterwards chronologically with the loan type, the ability to pay and finally the past payment record. The loan amount that already includes the information whether a loan was granted (loan amount > 0 $) or not (loan amount = 0 $), was placed under each line of the tree. This results in a total of 72 paths to get to a loan amount as a consequence of the characteristics of the 4 named criteria.…

    • 1211 Words
    • 3 Pages
    Good Essays
  • Best Essays

    It Essay - Data Mining

    • 1998 Words
    • 8 Pages

    He, J. (2009). Advances in Data Mining: History and Future. Third International Symposium on Intelligent . Retrieved November 1, 2012, from http://ieeexplore.ieee.org.ezproxy.lib.ryerson.ca/stamp/stamp.jsp?tp=&arnumber=5370232&tag=1…

    • 1998 Words
    • 8 Pages
    Best Essays
  • Satisfactory Essays

    Gene Expression Data

    • 388 Words
    • 2 Pages

    | 2.4 ‘Gene shaving’ as a method for identifying distinct sets of genes with similar expression patterns…

    • 388 Words
    • 2 Pages
    Satisfactory Essays
  • Powerful Essays

    Data Mining

    • 1921 Words
    • 8 Pages

    Patterson, L. (2010, APR 27). The nine most common data mining techniques used in predictive…

    • 1921 Words
    • 8 Pages
    Powerful Essays
  • Satisfactory Essays

    Our group consensus is that additional information is not needed. We believe that all quantitative information needed to form a decision tree is available in the problem…

    • 346 Words
    • 4 Pages
    Satisfactory Essays
  • Good Essays

    The structure of a rule-based expert system provides an effective separation of the knowledge base from the inference engine. This makes it possible to develop different applications using the same expert system shell. It also allows a graceful and easy expansion of the expert system. To make the system smarter, a knowledge engineer simply adds some rules to the knowledge base without intervening…

    • 1236 Words
    • 5 Pages
    Good Essays
  • Powerful Essays

    In CBR approach, to classify the class for the new data, the system should calculate the local similarity for each attribute by comparing between the new case and the training dataset. After calculating the local similarity, this system will calculate the global similarity. To calculate the global similarity, the results of local similarity will times by weight. For this system, the weight was initializing by one (1) for each attributes. In this system, we calculate the normalization of weight because it can make easy to system to calculate the global similarity.…

    • 681 Words
    • 3 Pages
    Powerful Essays
  • Satisfactory Essays

    The Handbook of News Analytics \ in Finance Edited by Gautam Mitra and Leela Mitra WILEY A John Wiley and Sons, Ltd, Publication Contents Preface xiii Acknowledgements xvii…

    • 1789 Words
    • 22 Pages
    Satisfactory Essays
  • Good Essays

    Given a set of training examples, each marked for belonging to one of two categories, an SVM training algorithm builds a model that assigns new examples into one category or the other, making it a non-probabilistic binary linear classifier. An SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall…

    • 789 Words
    • 4 Pages
    Good Essays
  • Powerful Essays

    Spam

    • 2202 Words
    • 9 Pages

    Lastly, we investigate the feasibility of applying a supervised machine learning method to identify spammers…

    • 2202 Words
    • 9 Pages
    Powerful Essays
  • Satisfactory Essays

    40over using the feature vectors of length 50 (”WITHOUT TRANS”) is obvious. It can be…

    • 854 Words
    • 4 Pages
    Satisfactory Essays