The 95 percent confidence interval is (Z = 1.96) (n =31634) (σ = 272.84) 108.496 < Population Mean < 114.5094 The variation of the mean is not very large (+-3.01), so sample mean can be assumed to be representative of population mean. Number of offline users: 27781, x1 = 110.79, σ1 = 271.301 Number of online users: 3853, x2 = 116.67 σ2 = 283.66 Z = (x1-x2)-(µ1-µ2)/Sqrt(σ12/n1+σ22/n2) Null Hypothesis: µ1= µ2 Alternative Hypothesis: µ1 =! µ2 Z = -1.212 and Z-critical at 5% level of significance is +-1.96. Since Z lies in acceptance region, the null hypothesis is not rejected. Hence, there is no significant difference in profitability of online and offline users. We run a regression run using online/offline being the independent variable and profitability being the dependent variable. The output summary is: Regression Statistics 0.00705 Multiple R 4.97E-05 R Square 1.81E-05 ` Adjusted R Square 272.8369 Standard Error 31634 Observations ANOVA Df SS MS F Significance F 1 117039.3 117039.3 1.572264 0.209887815 Regression 31632 2.35E+09 74439.99 Residual 31633 2.35E+09 Total Coefficients Standard Error t Stat P-value 110.7862 1.636956 67.67821 0 Intercept 5.880591 4.689842 1.253899 0.209888 X Variable 1 R Square value is very low indicating low predictability of the model. F-value analysis is unable to reject null hypothesis even at level of significance of 0.2. T-stat also is unable to reject the null hypothesis. These indicate that there is no significant cause and effect relation between online/offline user and their profitability. To find the relation between the demographics of the customer and their profitability, we perform a regression analysis. To enable regression analysis, following steps are taken: 1) Independent variables are online/offline, age, Income, tenure and district. 2) Online/offline and age groups are well recoded as numbers. 3) Income and district are not recoded well. For e.g.: Income bucket 2 is
The 95 percent confidence interval is (Z = 1.96) (n =31634) (σ = 272.84) 108.496 < Population Mean < 114.5094 The variation of the mean is not very large (+-3.01), so sample mean can be assumed to be representative of population mean. Number of offline users: 27781, x1 = 110.79, σ1 = 271.301 Number of online users: 3853, x2 = 116.67 σ2 = 283.66 Z = (x1-x2)-(µ1-µ2)/Sqrt(σ12/n1+σ22/n2) Null Hypothesis: µ1= µ2 Alternative Hypothesis: µ1 =! µ2 Z = -1.212 and Z-critical at 5% level of significance is +-1.96. Since Z lies in acceptance region, the null hypothesis is not rejected. Hence, there is no significant difference in profitability of online and offline users. We run a regression run using online/offline being the independent variable and profitability being the dependent variable. The output summary is: Regression Statistics 0.00705 Multiple R 4.97E-05 R Square 1.81E-05 ` Adjusted R Square 272.8369 Standard Error 31634 Observations ANOVA Df SS MS F Significance F 1 117039.3 117039.3 1.572264 0.209887815 Regression 31632 2.35E+09 74439.99 Residual 31633 2.35E+09 Total Coefficients Standard Error t Stat P-value 110.7862 1.636956 67.67821 0 Intercept 5.880591 4.689842 1.253899 0.209888 X Variable 1 R Square value is very low indicating low predictability of the model. F-value analysis is unable to reject null hypothesis even at level of significance of 0.2. T-stat also is unable to reject the null hypothesis. These indicate that there is no significant cause and effect relation between online/offline user and their profitability. To find the relation between the demographics of the customer and their profitability, we perform a regression analysis. To enable regression analysis, following steps are taken: 1) Independent variables are online/offline, age, Income, tenure and district. 2) Online/offline and age groups are well recoded as numbers. 3) Income and district are not recoded well. For e.g.: Income bucket 2 is