Preview

Probability and Statistics Research Project

Better Essays
Open Document
Open Document
3283 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
Probability and Statistics Research Project
Probability and Statistics Research Project

Name: Lakeisha M. Henderson
ID: @02181956

Spring 2007

Abstract

Table of Contents

Principle Component Analysis (PCA) Definition……………………………………………………………………….4 Uses of PCA……………………………………………………………………5 Illustrative Example of PCA……………………………………………………5 Method to Determine PCA……………………………………………………..6 Basic Analysis of Variance (ANOVA) Purpose and Definition of ANOVA……………………………………………12 Illustrative Example of ANOVA……………………………………………….12

Risk Based Design Concepts Definition……………………………………………………………………….15 Predictions and Relation to Risk Based Designs……………………………….15

Principle Components Analysis (PCA)

Definition:

Principal Components Analysis is a method that reduces data dimensionality by performing a covariance analysis between factors. As such, it is suitable for data sets in multiple dimensions. It is a way of identifying patterns in data, and expressing the data in such a way as to highlight their similarities and differences. Since patterns in data can be hard to find in data of high dimension, where the luxury of graphical representation is not available, PCA is a powerful tool for analyzing data. The other main advantage of PCA is that once you have found these patterns in the data, and you compress the data, i.e. by reducing the number of dimensions, without much loss of information.
Technically speaking, PCA is an orthogonal linear transformation that transforms the data to a new coordinate system such that the greatest variance by any projection of the data comes to lie on the first coordinate (called the first principal component), the second greatest variance on the second coordinate, and so on. PCA can be used for dimensionality reduction in a data set while retaining those characteristics of the data set that contribute most to its variance, by keeping lower-order principal components and ignoring higher-order ones. Such low-order components often contain the "most



References: DeMuth, James E. Basic Statistics and Statistical Applications. New York: Marcel Dekker Publications, 1999. Dunteman, George H. Principal Component Analysis. Chicago: Sage Publications Inc., 1989. Frantzen, Kurt A. Risk Based Analysis for Environmental Managers. New York: CRC Press, 2002. Iversen, Gudmund R., Norpoth, Helmut. Analysis of Variance. St. Louis: Sage Publications, 1987. Jolliffe, Ian T. Principal Component Analysis. New York: Springer, 2002. Todinov, Michael. Risk-Based Reliability Analysis and Generic Principles for Risk Reduction. Texas: Elsevier, 2006. Yeung & Ruzzo (2001) Principal component analysis for clustering gene expression data. Bioinformatics 17(9): 763-74.

You May Also Find These Documents Helpful

  • Satisfactory Essays

    In crime scenes, forensic scientist uses mathematics skills like geometry, trigonometry, distance and angle in bloodstain pattern analysis (BPA). BPA is the interpretation of bloodstains at a crime scene in order to recreate the actions that caused the bloodshed. When determining BPA, analysts examine the size, shape, distribution, and location of the bloodstain to form opinions about what did or did not happen. Analysts uses those math skills to determine where the blood came from, how were the victim and perpetrator positioned when the crime happened, and what direction was the victim wounded.…

    • 115 Words
    • 1 Page
    Satisfactory Essays
  • Good Essays

    Statistics Project

    • 599 Words
    • 3 Pages

    Purpose: Analyze and compare sugar content of popular cereals with the same serving size. Cereal is a staple in my home, but chosen for different reasons. Cereals that are popular with kids are starting to advertise more fiber and grains, which capture a parent’s eye. Is eating a blander colored cereal really more healthy than a vibrant colored one?…

    • 599 Words
    • 3 Pages
    Good Essays
  • Good Essays

    Statistics Project

    • 1019 Words
    • 5 Pages

    There is a fairly strong positive correlation, as can be seen from the scatterplot. The correlation coefficient, or R, should be examined to confirm what the eye test tells us.…

    • 1019 Words
    • 5 Pages
    Good Essays
  • Better Essays

    Duncan Cramer, D. H. (2004). The SAGE Dictionary of Statistics. London, England: Sage Publications. doi:: http://dx.doi.org/10.4135/9780857020123…

    • 1038 Words
    • 3 Pages
    Better Essays
  • Good Essays

    Hcs/438 Dq's

    • 1323 Words
    • 6 Pages

    Data is classified into four levels of measurement so the information is easy to follow and research. The measurements help researchers keep data organized, this also helps to keep the measurements accurate.…

    • 1323 Words
    • 6 Pages
    Good Essays
  • Powerful Essays

    cluster analysis

    • 3230 Words
    • 13 Pages

    Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics.…

    • 3230 Words
    • 13 Pages
    Powerful Essays
  • Powerful Essays

    Factor Analysis

    • 594 Words
    • 3 Pages

    To identify underlying dimensions or factors, that explains the correlations among the set of variables.…

    • 594 Words
    • 3 Pages
    Powerful Essays
  • Satisfactory Essays

    it is mainly used for qualitative research, but is also applicable to other data (e.g.,quantitative data;…

    • 273 Words
    • 1 Page
    Satisfactory Essays
  • Good Essays

    This can be thought of as the training set for the algorithm, though no explicit training step is required. A shortcoming of the k-NN algorithm is that it is sensitive to the local structure of the data. The algorithm has nothing to do with and is not to be confused with k-means, another popular machine learning technique. When the input data to an algorithm is too large to be processed and it is suspected to be redundant (e.g. the same measurement in both feet and meters) then the input data will be transformed into a reduced representation set of features (also named features vector). Transforming the input data into the set of features is called feature extraction. If the features extracted are carefully chosen it is expected that the features set will extract the relevant information from the input data in order to perform the desired task using this reduced representation instead of the full size input. Feature extraction is performed on raw data prior to applying k-NN algorithm on the transformed data in feature space. An example of a typical computer vision computation pipeline for face recognition using k-NN including feature extraction and dimension reduction pre-processing steps (usually implemented with…

    • 789 Words
    • 4 Pages
    Good Essays
  • Satisfactory Essays

    BN2102 Info

    • 276 Words
    • 2 Pages

    providing you with the necessary skills to analyze and interpret the results that you will be getting from…

    • 276 Words
    • 2 Pages
    Satisfactory Essays
  • Best Essays

    Support Vector Machine

    • 2395 Words
    • 9 Pages

    However, since example data is often not linearly separable, SVM's introduce the notion of a “kernel induced feature space” which casts the data into a higher dimensional space where the data is separable. Typically, casting into such a space would cause problems computationally, and with overfitting. The key insight used in SVM's is that the higher-dimensional space doesn't need to be dealt with directly (as it turns out, only the…

    • 2395 Words
    • 9 Pages
    Best Essays
  • Satisfactory Essays

    Product Life Cycle

    • 1433 Words
    • 6 Pages

    Pharos University Faculty of Financial & Administrative Sciences O PERATIONS M ANAGEMENT B y: Dr. Ola E lgeuoshy S pring 2013 C hapter (3) F orecasting F ORECASTING “ a Statement about the future value of a variable of i nterest .” U ses of Forecasting: Accounting Cost/profit estimates Finance Cash flow and funding Human Resources Hiring/recruiting/training…

    • 1433 Words
    • 6 Pages
    Satisfactory Essays
  • Satisfactory Essays

    ISLR Fourth Printing

    • 160533 Words
    • 940 Pages

    This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part…

    • 160533 Words
    • 940 Pages
    Satisfactory Essays
  • Powerful Essays

    Speech Recognition

    • 2325 Words
    • 10 Pages

    The present paper deals with the analysis of non linear system using higher order statistical techniques namely Independent component analysis (ICA). The Independent component analysis is a statistical technique for decomposing complex data set into independent sub-parts. PCA is also a higher order statistical technique to find the patterns in data of high dimension. ICA is a much more powerful technique, however, capable of finding the underlying factors or sources when the classic methods fail completely. In reality, the data often does not follow a Gaussian distribution and the situation is not as simple as those methods of factor analysis, projection pursuit or PCA assumes. Many real world data sets have super Gaussian Distributions. Hence the probability density of the data is peaked at zero and has many tails, when compared to a Gaussian density of the same variance. This is the starting point of ICA where we try to find statistically independent components in the general case where the data is non Gaussian .In this paper we provide the different estimation principles of ICA and their algorithms. The simulation results of ICA are carried out by MATLAB.…

    • 2325 Words
    • 10 Pages
    Powerful Essays
  • Powerful Essays

    Artificial Neural Network

    • 2115 Words
    • 9 Pages

    Through this approach we can train the computing data that are represented in a structure or network, often tabular , a tree or a graph structure. Clustering is used to…

    • 2115 Words
    • 9 Pages
    Powerful Essays