Statistical Learning and Data Mining
Overview: Efficient asset allocation through statistical learning methods and comparison of methods for the creation of an index tracking ETF (Exchange traded fund)
Datasets:
The datasets are chosen from the website of the book “Statistics and Data Analysis for Financial Engineering” by David Ruppert. The book is mentioned as one of the references for this course. The two data sets chosen are 1. Stock_FX_Bond.csv 2. Stock_FX_Bond_2004_to_2006.csv
The data includes the volumes and adjusted closing prices for GM, F, UTX, CAT, MRK, PFE, MSFT, IBM, C and XOM. The data also contains the volumes and adjusted closing prices for the S&P 500 index. The data set also includes treasury rates for different maturities and rates on corporate bonds as well as foreign exchange rates for the period of 1987 to 2006.
Objectives: 1. Optimal portfolios for various levels of Risk.
Conventional investors look to attain maximum alpha values (rates of return) at levels of risk they are comfortable with. We can hence at any level of risk, define portfolios that generate maximal returns. In this project, we aim to identify the composition of portfolios that achieves this desired objective.
Existing models such as CAPM, along with additional forms of regression will be used to compare with additional methods, not covered in the duration of the class to identify the better methods of portfolio creation. We will use learning tools and models to predict the rates of return and risk for each stock that will allow us to build portfolios to suit needs. We will carry out uncertainty analysis using resampling techniques and attempt to use Bayesian methods as well. The performance will be tested using future rates of return on these portfolios.
2. Creation of an index tracking exchange traded instrument
An ETF is an investment fund that is comprised of various assets such as stocks, bonds or