Subject- Business Statistics
Q1. What do you mean by sample survey? What are the different sampling methods? Briefly describe them.
Ans.
Sample is a finite subset of a population drawn from it to estimate the characteristic of the population. Sampling is a tool which enables us to draw conclusions about characteristics of the population.
Survey Sampling describes the process of selecting a sample of elements from a target population in order to conduct a survey.
A survey may refer to many different types or techniques of observation, but in the context of survey sampling it most often refers to questionnaire used to measure the characteristics and/or attitudes of people. The purpose of sampling is to reduce the cost and/or the amount of work that it would take to survey the entire target population. A survey that measures the entire target population is called census.
Sample Survey can also be described as the technique used to study about a population with the help of a sample. Population is the totality of all objects about which the study is proposed. Sample is only a portion of this population, which is selected using certain statistical principles called sampling designs (this is for guaranteeing that a representative sample is obtained for the study). Once the sample decided information will be collected from this sample, which process is called sample survey.
It is incumbent on the researcher to clearly define the target population. There are no strict rules to follow, and the researcher must rely on logic and judgment. The population is defined in keeping with the objectives of the study.
Sometimes the entire population will be sufficiently small, and the researcher can include the entire population in the study. This type of research is called a census study because data is gathered on every member of the population.
Usually the population is too large for the researchers to attempt to survey all of its members. A small, but carefully chosen sample can be used to represent the population. The sample reflects the characteristics of the population from which it is drawn.
Sampling methods are classified as either probability or non-probability. In probability samples, each member of the population has a non-zero probability of being selected. Probability methods include random sampling, systematic sampling and stratified sampling. In non-probability sampling, members are selected from the population in some non-random manner. These include convenience sampling, judgment sampling, quota sampling, and snowball sampling. The advantage of probability sampling is that sampling error can be calculated. Sampling error is the degree to which a sample might differ from the population. When inferring to the population, results are reported plus or minus the sampling error. In non-probability sampling, the degree to which the sample differs from the population remains unknown.
Probability Sampling Methods
1. Random Sampling- Random sampling is the purest form of probability sampling. Each member of the population has an equal and known chance of being selected. When there are very large populations, it is often difficult or impossible to identify every members of the population, so the pool of available subjects becomes biased.
2. Systematic Sampling- Systematic sampling is often used instead of random sampling. It is also called an Nth name selection technique. After the required sample size has been calculated, every Nth record is selected from a list of population members. As long as the list does not contain any hidden order, this sampling method is as good as the random sampling method. Its only advantage over the random sampling technique is simplicity. Systematic sampling is frequently used to select a specified number of records from a computer file.
3. Stratified Sampling- Stratified sampling is commonly used probability method that is superior to random sampling because it reduces sampling error. A stratum is a subset of the population that shares at least one common characteristic. Examples of stratums might be males and females, or managers and non-managers. The researcher first identifies the relevant stratums and their actual representation in the population. Random sampling is then used to select a sufficient number of subjects from each stratum. “Sufficient” refers to a sample size large enough for us to be reasonably confident that the stratum represents the population. Stratified Sampling is often used when one or more of the stratums in the population have a low incidence relative to the other stratums.
Non Probability Methods
1. Convenience Sampling- Convenience sampling is used in exploratory research where the researcher is interested in getting an inexpensive approximation of the truth. As the name implies, the sample is selected because they are convenient. This non-probability method is often used during preliminary research efforts to get a gross estimate of the results, without incurring the cost or time required to select a random sample.
2. Judgment Sampling- Judgment sampling is a common non-probability method. The researcher selects the sample based on the judgment. This is usually extension of convenience sampling. For example, a researcher may decide to draw the entire sample from one “representative city”, even though the population includes all cities. When using this method, the researcher must be confident that the chosen sample is truly representative of the entire population.
3. Quota Sampling- Quota sampling is the non-probability equivalent of stratified sampling. Like stratified sampling, the researcher first identifies the stratums and their proportions as they are represented in the population. Then convenience or judgment sampling is used to select the required number of subjects from each stratum. This differs from stratified sampling, where the stratums are filled by random sampling.
4. Snowball Sampling- Snowball sampling is a special non-probability method used when the desired sample characteristic is rare. It may be extremely difficult or cost prohibitive to locate respondents in these situations. Snowball sampling relies on referrals from initial subjects to generate additional subjects. While this technique can dramatically lower search costs, it comes at the expense of introducing bias because the technique itself reduces the likelihood that the sample will represent a good cross section from the population.
Q2. What is the difference between correlation and regression? What do you understand by Rank correlation? When we use rank correlation and when we use Pearsonian Correlation Coefficient? Fit a linear regression line in the following data-
X
12
15
18
20
27
34
28
48
Y
123
150
158
170
180
184
176
130 Ans.
Correlation
When two or more variables move in sympathy with other, then they are said to be correlated. If both variables move in the same direction then they are said to be positively correlated. If the variables move in opposite direction then they are said to be negatively correlated. If they move haphazardly then there is no correlation between them.
Correlation analysis deals with
1. Measuring the relationship between variables.
2. Testing the relationship for its significance.
3. Giving confidence interval for population correlation measure.
Regression
Regression is defined as, “the measure of the average relationship between two or more variables in terms of the original units of the data.” Correlation analysis attempts to study the relationship between the two variables x and y. Regression analysis attempts to predict the average x for a given y. In Regression it is attempted to quantify the dependence of one variable on the other. The dependence is expressed in the form of equations.
Difference between Correlation and Regression
Correlation and Regression are not the same. Consider these differences:
1. Correlation answers the STRENGTH of linear association between paired variables say X and Y. On the other hand, the regression tells us the FORM of linear association that predicts Y from the value of X.
2. (a) Correlation is calculated whenever:
Both X and Y is measured in each subject and quantifies how much they are linearly associated.
In particular the Pearson’s product moment correlation coefficient is used when the assumptions of both X and Y is sampled from normally-distributed populations are satisfied.
Or the Spearman’s moment order correlation coefficient is used if the assumption of normality is not satisfied.
Correlation is not used when the variables are manipulated, for example, in experiments.
(B). Linear Regression is used whenever:
At least one of the independent variables (Xi’s) is to predict the dependent variable Y.
Note: Some of the Xi’s are dummy variables, i.e. Xi=0 or 1, which are used to code some nominal variables.
If one manipulates the X variable, e.g. in an experiment
3. Linear Regression are not symmetric in terms of X and Y. That is interchanging X and Y will give a different regression model (i.e. X in terms of Y) against the original Y in terms of X.
On the other hand, if you interchange variables X and Y in the calculation of correlation coefficient you will get the same value of this correlation coefficient.
4. The “best” linear regression model is obtained by selecting the variables (X’s) with at least strong correlation to Y, i.e. >=0.80 or