Statistics theory: Difference between revisions

Revision as of 23:01, 27 February 2009

Main Article

Discussion

Related Articles ^[?]

Bibliography ^[?]

External Links ^[?]

Citable Version ^[?]

Advanced ^[?]

This editable Main Article is under development and subject to a disclaimer.

[edit intro]

Statistics is a mathematical approach to describe something, predict an event, or analyze the relationship between things. Statistical analysis can, for example, describe the average income of a population, test whether two groups have the same average income, or analyze factors that might explain the income level for a particular group.

The theory of statistics refers primarily to a branch of mathematics that specializes in enumeration, or counted, data and their relation to measured data.^[1]^[2] It may also refer to a fact of classification, which is the chief source of all statistics, and has a relationship to psychometric applications in the social sciences.

An individual statistic refers to a derived numerical value, such as a mean, a coefficient of correlation, or some other single concept of descriptive statistics . It may also refer to an idea associated with an average, such as a median, or standard deviation, or some value computed from a set of data. ^[3]

More precisely, in mathematical statistics, and in general usage, a statistic is defined as any measurable function of a data sample ^[4]. A data sample is described by instances of a random variable of interest, such as a height, weight, polling results, test performance, etc., obtained by random sampling of a population.

Simple illustration

Suppose one wishes to study the height of adult males in some country C. How should one go about doing this and how can the data be summarized? In statistics, the approach taken is to model the quantity of interest, i.e., "height of adult men from the country C" as a random variable X, say, taking on values in [0,5] (measured in metres) and distributed according to some unknown probability distribution^[5] F on [0,5] . One important theme studied in statistics is to develop theoretically sound methods (firmly grounded in probability theory) to learn something about the postulated random variable X and also its distribution F by collecting samples, for this particular example, of the height of a number of men randomly drawn from the adult male population of C.

Suppose that N men labeled $\scriptstyle M_{1},M_{2},\ldots ,M_{N}$ have been randomly drawn by simple random sampling (this means that each man in the population is equally likely to be selected in the sampling process) whose heights are $\scriptstyle x_{1},x_{2},\ldots ,x_{N}$ , respectively. An important yet subtle point to note here is that, due to random sampling, the data sample $\scriptstyle x_{1},x_{2},\ldots ,x_{N}$ obtained is actually an instance or realization of a sequence of independent random variables $\scriptstyle X_{1},X_{2},\ldots ,X_{N}$ with each random variable $X_{i}$ being distributed identically according to the distribution of $X$ (that is, each $\scriptstyle X_{i}$ has the distribution F). Such a sequence $\scriptstyle X_{1},X_{2},\ldots ,X_{N}$ is referred to in statistics as independent and identically distributed (i.i.d) random variables. To further clarify this point, suppose that there are two other investigators, Tim and Allen, who are also interested in the same quantitative study and they in turn also randomly sample N adult males from the population of C. Let Tim's height data sample be $\scriptstyle y_{1},y_{2},\ldots ,y_{N}$ and Allen's be $\scriptstyle z_{1},z_{2},\ldots ,z_{N}$ , then both samples are also realizations of the i.i.d sequence $\scriptstyle X_{1},X_{2},\ldots ,X_{N}$ , just as the first sample $\scriptstyle x_{1},x_{2},\ldots ,x_{N}$ was.

From a data sample $\scriptstyle x_{1},x_{2},\ldots ,x_{N}$ one may define a statistic T as $\scriptstyle T=f(x_{1},x_{2},\ldots ,x_{N})$ for some real-valued function f which is measurable (here with respect to the Borel sets of $\scriptstyle \mathbb {R} ^{N}$ ). Two examples of commonly used statistics are:

$T\,=\,{\bar {x}}\,=\,{\frac {x_{1}+x_{2}+\ldots +x_{N}}{N}}$ . This statistic is known as the sample mean
$T\,=\,\sum _{i=1}^{N}(x_{i}-{\bar {x}})^{2}/N$ . This statistic is known as the sample variance. Often the alternative definition $T\,=\,{\frac {1}{N-1}}\sum _{i=1}^{N}(x_{i}-{\bar {x}})^{2}$ of sample variance is preferred because it is an unbiased estimator of the variance of X, while the former is a biased estimator.

Transforming data

Statisticians may transform data by taking the logarithm, square root, reciprocal, or other function if the data does not fit a normal distribution.^[6]^[7] Data needs to be transformed back to its original form in order to present confidence intervals.^[8]

Summary statistics

Descriptive statistics

Measurements of central tendency

Mean
Median

Measurements of variation

Standard deviation (SD) is a measure of variation or scatter. The standard deviation does not change with sample size.
Variance is the square of the standard deviation:

s^{2}

Standard error of the mean (SEM) measures the how accurately you know the mean of a population and is always smaller than the SD.^[9] The SEM becomes smaller as the sample size increases. The sample standard devision (S) and SEM are related by:

SE_{\bar {x}}\ ={\frac {s}{\sqrt {n}}}

95% confidence interval is + 1.96 * standard error.

Inferential statistics and hypothesis testing

For more information, see: Statistical significance.

Problems in reporting of statistics

In medicine, common problems in the reporting and usage of statistics have been inventoried.^[10] These problems tend to exaggerated treatment differences.

References

↑ Trapp, Robert; Beth Dawson (2004). Basic & clinical biostatistics. New York: Lange Medical Books/McGraw-Hill. LCC QH323.5 .D38 LCCN 2005-263. ISBN 0-07-141017-1.
↑ Mosteller, Frederick; Bailar, John Christian (1992). Medical uses of statistics. Boston, Mass: NEJM Books. ISBN 0-910133-36-0. Google Books
↑ Guilford, J.P., Fruchter, B. (1978). Fundamental statistics in psychology and education. New York: McGraw-Hill.
↑ Shao, J. (2003). Mathematical Statistics (2 ed.). ser. Springer Texts in Statistics, New York: Springer-Verlag, p. 100.
↑ This is the case in non-parametric statistics. On the other hand, in parametric statistics the underlying distribution is assumed to be of some particular type, say a normal or exponential distribution, but with unknown parameters that are to be estimated.
↑ Bland JM, Altman DG (March 1996). "Transforming data". BMJ 312 (7033): 770. PMID 8605469. PMC 2350481. ^[e]
↑ Bland JM, Altman DG (May 1996). "The use of transformation when comparing two means". BMJ 312 (7039): 1153. PMID 8620137. PMC 2350653. ^[e]
↑ Bland JM, Altman DG (April 1996). "Transformations, means, and confidence intervals". BMJ 312 (7038): 1079. PMID 8616417. PMC 2350916. ^[e]
↑ What is the difference between "standard deviation" and "standard error of the mean"? Which should I show in tables and graphs?. Retrieved on 2008-09-18.
↑ Pocock SJ, Hughes MD, Lee RJ (August 1987). "Statistical problems in the reporting of clinical trials. A survey of three medical journals". N. Engl. J. Med. 317 (7): 426–32. PMID 3614286. ^[e]

[isbn0-07-141017-1-1] Trapp, Robert; Beth Dawson (2004). Basic & clinical biostatistics. New York: Lange Medical Books/McGraw-Hill. LCC QH323.5 .D38 LCCN 2005-263. ISBN 0-07-141017-1.

[isbn0-910133-36-0-2] Mosteller, Frederick; Bailar, John Christian (1992). Medical uses of statistics. Boston, Mass: NEJM Books. ISBN 0-910133-36-0. Google Books

[3] Guilford, J.P., Fruchter, B. (1978). Fundamental statistics in psychology and education. New York: McGraw-Hill.

[4] Shao, J. (2003). Mathematical Statistics (2 ed.). ser. Springer Texts in Statistics, New York: Springer-Verlag, p. 100.

[5] This is the case in non-parametric statistics. On the other hand, in parametric statistics the underlying distribution is assumed to be of some particular type, say a normal or exponential distribution, but with unknown parameters that are to be estimated.

[pmid8605469-6] Bland JM, Altman DG (March 1996). "Transforming data". BMJ 312 (7033): 770. PMID 8605469. PMC 2350481. ^[e]

[pmid8620137-7] Bland JM, Altman DG (May 1996). "The use of transformation when comparing two means". BMJ 312 (7039): 1153. PMID 8620137. PMC 2350653. ^[e]

[pmid8616417-8] Bland JM, Altman DG (April 1996). "Transformations, means, and confidence intervals". BMJ 312 (7038): 1079. PMID 8616417. PMC 2350916. ^[e]

[urlWhat_is_the_difference_between_standard_deviation_and_standard_error_of_the_mean?_Which_should_I_show_in_tables_and_graphs?-9] What is the difference between "standard deviation" and "standard error of the mean"? Which should I show in tables and graphs?. Retrieved on 2008-09-18.

[pmid3614286-10] Pocock SJ, Hughes MD, Lee RJ (August 1987). "Statistical problems in the reporting of clinical trials. A survey of three medical journals". N. Engl. J. Med. 317 (7): 426–32. PMID 3614286. ^[e]

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

@@ Line 1: / Line 1: @@
 {{subpages}}
-The main point of using statistics is either to describe something, predict something, or analyze the relationship between things. Statistics can, for example, describe the average income of a population, test whether two different groups have the same average income, or analyze what kinds of things might explain why a some group has the income level it does.
+'''Statistics''' is a mathematical approach to describe something, predict an event, or analyze the relationship between things. Statistical analysis can, for example, describe the average income of a population, test whether two groups have the same average income, or analyze factors that might explain the income level for a particular group.
-The theory of '''Statistics''' refers primarily to a branch of [[mathematics]] that specializes in enumeration, or counted, [[data]] and their relation to measured [[data]].<ref name="isbn0-07-141017-1">{{cite book |author=Trapp, Robert; Beth Dawson |authorlink= |editor= |others= |title=Basic & clinical biostatistics
+The theory of statistics refers primarily to a branch of [[mathematics]] that specializes in enumeration, or counted, [[data]] and their relation to measured [[data]].<ref name="isbn0-07-141017-1">{{cite book |author=Trapp, Robert; Beth Dawson |authorlink= |editor= |others= |title=Basic & clinical biostatistics
 |chapter=
 |chapterurl=

Statistics theory: Difference between revisions

Revision as of 23:01, 27 February 2009

Contents

Simple illustration

Transforming data

Summary statistics

Measurements of central tendency

Measurements of variation

Inferential statistics and hypothesis testing

Problems in reporting of statistics

References

Navigation menu

Statistics theory: Difference between revisions

Revision as of 23:01, 27 February 2009

Simple illustration

Transforming data

Summary statistics

Measurements of central tendency

Measurements of variation

Inferential statistics and hypothesis testing

Problems in reporting of statistics

References

Navigation menu

Search