Tuesday, May 24, 2011

CHAPTER 1: THE NATURE OF REGRESSION ANALYSIS

Historical Origin of the Term Regression
Galton’s law of universal regression “regression to mediocrity”
-          He found that the average height of sons of a group of tall fathers was less than their fathers’ height and the average height of sons of a group of short fathers was greater than their fathers’ height, thus “regressing” tall and short sons alike toward the average height of all men.
The Modern Interpretation of Regression
Regression analysis – study of the dependence of one variable, the dependent variable, on one or more other variables, the explanatory variables, with a view of estimating and/or predicting the (population) mean or average value of the former in terms of the known or fixed (in repeated sampling) values of the latter.
Statistical versus Deterministic Relationships
In statistical relationships among variables we essentially deal with random or stochastic variables, that is, variables that have probability distributions.
In functional or deterministic dependency, on the other hand, we also deal with variables, but these variables are not random or stochastic.
Regression versus Causation
A statistical relationship in itself cannot logically imply causation. To ascribe causality, one must appeal to a priori or theoretical considerations. Thus, one can invoke economic theory in saying that consumption expenditure depends on real income.
Regression versus Correlation
Correlation analysis, where the primary objective is to measure the strength or degree of linear association between the variables.
In regression analysis, there is an asymmetry in the way the dependent and explanatory variables are treated…The dependent variable is assumed to be statistical, random or stochastic, that is, to have probability distribution. The explanatory variables, on the other hand, are assumed to have fixed values.
In correlation analysis, on the other hand, we treat any variables symmetrically, there is no distinction between the dependent and explanatory variables.
Terminology and Notation
Dependent variable – explained variable, predictand, regressand, response, endogenous, outcome, controlled variable.
Independent variable – independent variable, predictor, regressor, stimulus, exogenous, covariate, control variable.
Simple or two-variable regression analysis – studying the dependence of a variable on only a single explanatory variable.
Multiple regression analysis – studying the dependence of one variable on more than one explanatory variables.
The Nature and Source of Data for Economic Analysis
Types of Data
  • Time Series Data
Time series is asset of observations on the values that a variable takes at different times. Such data may be collected at regular time intervals, such as daily, weekly, monthly, quarterly, annually, quinquennially or decennially. Loosely speaking a time series is stationary if its mean and variance co not vary systematically over time.
  • Cross-Section Data
Cross-section data are data on one or more variables collected at the same point in time.
When we include such heterogenous units in a statistical analysis, the size or scale effect must be taken into account.
  • Pooled Data
In pooled or combined, data are elements of both time series and cross-section data.
  • Panel, Longitudinal, or Micropanel Data
This is a special type of pooled data in which the same cross-sectional unit is surveyed over time.
The Sources of Data
The Internet was literally revolutionized data gathering. The data collected b y various agencies may be experimental or non-experimental.
The Accuracy of Data
Although plenty of data are available for economic research, the quality of data is often not that good. There are several reasons for that:
1.      Most social science data are non-experimental.
2.      Even in experimentally collected data errors of measurement arise from approximations and roundoffs.
3.      In questionnaire-type surveys, the problem of nonresponse can be serious.
4.      The sampling method used in obtaining data may vary so widely that it is often difficult to compare the results obtained from the various samples.
5.      Economic data are generally available at a highly aggregate level.
6.      Because of confidentiality, certain data can be published only in highly aggregate form.
Because of all these and many problems, the researcher should always keep in mind that the results of research are only as good as the quality of data.
A Note on the Measurement Scales of Variables
  • Ratio Scale. For a variable X, taking two values,X1 and X2, the ratio X1/X2 and the distance (X1 – X2) are meaningful quantities. Also, there is a natural ordering of the values along the scale.
  • Interval Scale. The distance between the time periods is meaningful, but not the ratio of two time periods.
  • Ordinal Scale. A variable belongs to this category only if it satisfies the third property of the ratio scale (natural ordering).
  • Nominal. Variables such as gender and marital status simply denote categories.

No comments:

Post a Comment