Statistics theory: Difference between revisions

Revision as of 02:08, 11 December 2007

Main Article

Discussion

Related Articles ^[?]

Bibliography ^[?]

External Links ^[?]

Citable Version ^[?]

Advanced ^[?]

This editable Main Article is under development and subject to a disclaimer.

[edit intro]

Statistics refers primarily to a branch of mathematics that specializes in enumeration, or counted, data and their relation to measured data. It may also refer to a fact of classification, which is the chief source of all statistics, and has a relationship to psychometric applications in the social sciences.

An individual statistic refers to a derived numerical value, such as a mean, a coefficient of correlation, or some other single concept of descriptive statistics . It may also refer to an idea associated with an average, such as a median, or standard deviation, or some value computed from a set of data. ^[1]

More precisely, in mathematical statistics, and in general usage, a statistic is defined as any measurable function of a data sample ^[2]. A data sample is described by instances of a random variable, such as a height, weight, polling results, test performance, etc., obtained by random sampling of a population.

Illustration of concept

Suppose one wishes to embark on a quantitative study of the height of adult men in some country C. How would one go about doing this and how can the data be summarized? In statistics, the approach taken is to assume/model the quantity of in interest, i.e., "height of adult men from the country C" as a random variable X, say, taking on values in [0,5] (measured in metres) and distributed according to some unknown probability distribution F on [0,5]. One important theme studied in the realm of statistics is to develop theoretically sound methods (firmly grounded in probability theory) to learn something about the postulated random variable X and also its distribution F by collecting samples of the height of a number of men randomly drawn from the adult male population of C.

Suppose that N adult men labeled $\scriptstyle M_{1},M_{2},\ldots ,M_{N}$ have been randomly drawn whose heights are $\scriptstyle x_{1},x_{2},\ldots ,x_{N}$ . An important, yet subtle point, to note here is that, due to random sampling, the data sample $\scriptstyle x_{1},x_{2},\ldots ,x_{N}$ obtained is actually an instance or realization of a sequence of independent random variables $\scriptstyle X_{1},X_{2},\ldots ,X_{N}$ with each random variable $\scriptstyle X_{i}$ being distributed identically according to the distribution of X (that is, each $\scriptstyle X_{i}$ has the distribution F). Such a sequence $\scriptstyle X_{1},X_{2},\ldots ,X_{N}$ is referred to in statistics as independent and identically distributed (i.i.d) random variables. To further clarify this point, suppose that there are two other investigators, Tim and Allen, also interested in the same quantitative study and they in turn also randomly sample N adult males from the population of C. Let Tim's height data sample be $\scriptstyle y_{1},y_{2},\ldots ,y_{N}$ and Allen's be $\scriptstyle z_{1},z_{2},\ldots ,z_{N}$ , then both samples are another realization of the i.i.d sequence $\scriptstyle X_{1},X_{2},\ldots ,X_{N}$ , just as the first sample $\scriptstyle x_{1},x_{2},\ldots ,x_{N}$ was.

From a dats sample collected, say in this case $\scriptstyle x_{1},x_{2},\ldots ,x_{N}$ , one can construct a statistic T as $\scriptstyle T=f(x_{1},x_{2},\ldots ,x_{N})$ for any real-valued function f which is measurable function (here with respect to the Borel sets of $\scriptstyle \mathbb {R} ^{N}$ ). Two examples of commonly used statistics are:

$\scriptstyle T\,=\,{\bar {x}}\,=\,{\frac {x_{1}+x_{2}+\ldots +x_{N}}{N}}$ . This statistic is known as the sample mean
$\scriptstyle T\,=\,{\sqrt {\sum _{i=1}^{N}(x_{i}-{\bar {x}})^{2}/N}}$ . This statistic is known as the sample standard deviation. Sometimes, the alternative formula $\scriptstyle T\,=\,{\sqrt {{\frac {1}{N-1}}\sum _{i=1}^{N}(x_{i}-{\bar {x}})^{2}}}$ is preferred because it is an unbiased estimator of the standard deviation of X

References

↑ Guilford, J.P., Fruchter, B. (1978). Fundamental statistics in psychology and education. New York: McGraw-Hill.
↑ Shao, J. (2003). Mathematical Statistics (2 ed.). ser. Springer Texts in Statistics, New York: Springer-Verlag, p. 100.

[1] Guilford, J.P., Fruchter, B. (1978). Fundamental statistics in psychology and education. New York: McGraw-Hill.

[2] Shao, J. (2003). Mathematical Statistics (2 ed.). ser. Springer Texts in Statistics, New York: Springer-Verlag, p. 100.

[1]

[2]

@@ Line 10: / Line 10: @@
 Suppose one wishes to embark on a quantitative study of the height of adult men in some country ''C''. How would one go about doing this and how can the data be summarized? In statistics, the approach taken is to assume/model the quantity of in interest, i.e., "height of adult men from   the country ''C''"  as a random variable ''X'', say, taking on values in [0,5] (measured in metres) and distributed according to some ''unknown'' [[probability distribution]] ''F'' on [0,5]. One important theme studied in the realm of statistics is to develop theoretically sound methods (firmly grounded in [[probability theory]]) to learn something about the postulated random variable ''X'' and also its distribution ''F'' by collecting samples of the height of a number of men randomly drawn from the adult male population of ''C''.
-Suppose that ''N'' adult men labeled <math>\scriptstyle M_1,M_2,\ldots,M_N</math> have been randomly drawn whose heights are <math>\scriptstyle x_1,x_2,\ldots,x_N</math>. An important, yet subtle point, to note here is that due to random sampling the data sample <math>\scriptstyle x_1,x_2,\ldots,x_N</math> obtained is actually an ''instance'' or ''realization'' of a sequence of ''independent'' random variables <math>\scriptstyle X_1,X_2,\ldots,X_N</math> with each random variable <math>\scriptstyle X_i</math> being distributed identically according to the distribution of ''X'' (that is, each <math>\scriptstyle X_i</math> has the distribution ''F''). Such a sequence <math>\scriptstyle X_1,X_2,\ldots,X_N</math> is referred to in statistics as ''independent and identically distributed'' (i.i.d) random variables. To further clarify this point, suppose after there are two other investigators, Tim and Allen, also interested in the same quantitative study and they in turn also randomly sample ''N'' adult males from the population of ''C''. Let Tim's height data sample be <math>\scriptstyle y_1,y_2,\ldots,y_N</math> and Allen's be <math>\scriptstyle z_1,z_2,\ldots,z_N</math>, then both samples are another realization of the i.i.d sequence <math>\scriptstyle X_1,X_2,\ldots,X_N</math>, just as the first sample <math>\scriptstyle x_1,x_2,\ldots,x_N</math> was.
+Suppose that ''N'' adult men labeled <math>\scriptstyle M_1,M_2,\ldots,M_N</math> have been randomly drawn whose heights are <math>\scriptstyle x_1,x_2,\ldots,x_N</math>. An important, yet subtle point, to note here is that, due to random sampling, the data sample <math>\scriptstyle x_1,x_2,\ldots,x_N</math> obtained is actually an ''instance'' or ''realization'' of a sequence of ''independent'' random variables <math>\scriptstyle X_1,X_2,\ldots,X_N</math> with each random variable <math>\scriptstyle X_i</math> being distributed identically according to the distribution of ''X'' (that is, each <math>\scriptstyle X_i</math> has the distribution ''F''). Such a sequence <math>\scriptstyle X_1,X_2,\ldots,X_N</math> is referred to in statistics as ''independent and identically distributed'' (i.i.d) random variables. To further clarify this point, suppose that there are two other investigators, Tim and Allen, also interested in the same quantitative study and they in turn also randomly sample ''N'' adult males from the population of ''C''. Let Tim's height data sample be <math>\scriptstyle y_1,y_2,\ldots,y_N</math> and Allen's be <math>\scriptstyle z_1,z_2,\ldots,z_N</math>, then both samples are another realization of the i.i.d sequence <math>\scriptstyle X_1,X_2,\ldots,X_N</math>, just as the first sample <math>\scriptstyle x_1,x_2,\ldots,x_N</math> was.
 From a dats sample collected, say in this case <math>\scriptstyle x_1,x_2,\ldots,x_N</math>, one can construct a statistic ''T'' as <math>\scriptstyle T=f(x_1,x_2,\ldots,x_N)</math> for any real-valued function ''f'' which is [[measurable|measurable function]] (here with respect to the [[Borel set]]s of <math>\scriptstyle \mathbb{R}^N</math>). Two examples of commonly used statistics are:
@@ Line 16: / Line 16: @@
 #<math>\scriptstyle T\,=\,\bar{x}\,=\,\frac{x_1+x_2+\ldots+x_N}{N}</math>. This statistic is known as the ''sample mean''
 #<math>\scriptstyle T\,=\,\sqrt{\sum_{i=1}^{N} (x_i-\bar{x})^2/N}</math>. This statistic is known as the ''sample standard deviation''. Sometimes, the alternative formula <math>\scriptstyle T\,=\,\sqrt{\frac{1}{N-1}\sum_{i=1}^{N} (x_i-\bar{x})^2}</math> is preferred because it is an [[unbiased estimator]] of the [[standard deviation]] of ''X''
 ==See also==
 *[[Coefficient of correlation]]

Statistics theory: Difference between revisions

Revision as of 02:08, 11 December 2007

Illustration of concept

See also

References

Navigation menu

Search