Law of the unconscious statistician

In probability theory and statistics, the law of the unconscious statistician, or LOTUS, is a theorem which expresses the expected value of a function $g (X)$ of a random variable $X$ in terms of $g$ and the probability distribution of $X$ .

The form of the law depends on the type of random variable $X$ in question. If the distribution of $X$ is discrete and one knows its probability mass function $p X$ , then the expected value of $g (X)$ is

\operatorname {E} [g(X)]=\sum _{x}g(x)p_{X}(x),\,

x

X

X

continuous probability density function

f X

g (X)

\operatorname {E} [g(X)]=\int _{-\infty }^{\infty }g(x)f_{X}(x)\,\mathrm {d} x

Both of these special cases can be expressed in terms of the cumulative probability distribution function $F X$ of $X$ , with the expected value of $g (X)$ now given by the Lebesgue–Stieltjes integral

\operatorname {E} [g(X)]=\int _{-\infty }^{\infty }g(x)\,\mathrm {d} F_{X}(x).

In even greater generality, $X$ could be a random element in any measurable space, in which case the law is given in terms of measure theory and the Lebesgue integral. In this setting, there is no need to restrict the context to probability measures, and the law becomes a general theorem of mathematical analysis on Lebesgue integration relative to a pushforward measure.

Etymology

This proposition is (sometimes) known as the law of the unconscious statistician because of a purported tendency to think of the identity as the very definition of the expected value, rather than (more formally) as a consequence of its true definition.^[1] The naming is sometimes attributed to Sheldon Ross' textbook Introduction to Probability Models, although he removed the reference in later editions.^[2] Many statistics textbooks do present the result as the definition of expected value.^[3]

Joint distributions

A similar property holds for joint distributions, or equivalently, for random vectors. For discrete random variables X and Y, a function of two variables g, and joint probability mass function $p_{X,Y}(x,y)$ :^[4]

\operatorname {E} [g(X,Y)]=\sum _{y}\sum _{x}g(x,y)p_{X,Y}(x,y)

absolutely continuous

f_{X,Y}(x,y)

\operatorname {E} [g(X,Y)]=\int _{-\infty }^{\infty }\int _{-\infty }^{\infty }g(x,y)f_{X,Y}(x,y)\,\mathrm {d} x\,\mathrm {d} y

Special cases

A number of special cases are given here. In the simplest case, where the random variable $X$ takes on countably many values (so that its distribution is discrete), the proof is particularly simple, and holds without modification if $X$ is a discrete random vector or even a discrete random element.

The case of a continuous random variable is more subtle, since the proof in generality requires subtle forms of the change-of-variables formula for integration. However, in the framework of measure theory, the discrete case generalizes straightforwardly to general (not necessarily discrete) random elements, and the case of a continuous random variable is then a special case by making use of the Radon–Nikodym theorem.

Discrete case

Suppose that $X$ is a random variable which takes on only finitely or countably many different values $x 1, x 2, ...$ , with probabilities $p 1, p 2, ...$ . Then for any function $g$ of these values, the random variable $g (X)$ has values $g (x 1), g (x 2), ...$ , although some of these may coincide with each other. For example, this is the case if $X$ can take on both values $1$ and $-1$ and $g (x) = x 2$ .

Let $y 1, y 2, ...$ enumerate the possible distinct values of $g(X)$ , and for each $i$ let $I i$ denote the collection of all $j$ with $g (x j) = y i$ . Then, according to the definition of expected value, there is

\operatorname {E} [g(X)]=\sum _{i}y_{i}p_{g(X)}(y_{i}).

Since a $y_{i}$ can be the image of multiple, distinct $x_{j}$ , it holds that

p_{g(X)}(y_{i})=\sum _{j\in I_{i}}p_{X}(x_{j}).

Then the expected value can be rewritten as

\sum _{i}y_{i}p_{g(X)}(y_{i})=\sum _{i}y_{i}\sum _{j\in I_{i}}p_{X}(x_{j})=\sum _{i}\sum _{j\in I_{i}}g(x_{j})p_{X}(x_{j})=\sum _{x}g(x)p_{X}(x).

g (X)

g (X)

X

If $X$ takes on only finitely many possible values, the above is fully rigorous. However, if $X$ takes on countably many values, the last equality given does not always hold, as seen by the Riemann series theorem. Because of this, it is necessary to assume the absolute convergence of the sums in question.^[5]

Continuous case

Suppose that $X$ is a random variable whose distribution has a continuous density $f$ . If $g$ is a general function, then the probability that $g (X)$ is valued in a set of real numbers $K$ equals the probability that $X$ is valued in $g -1 (K)$ , which is given by

\int _{g^{-1}(K)}f(x)\,\mathrm {d} x.

g

change-of-variables formula for integration

K

g (X)

X

g

\int _{K}f(g^{-1}(y))(g^{-1})'(y)\,\mathrm {d} y,

g (X)

f (g -1 (y))(g -1)'(y)

g (X)

\int _{-\infty }^{\infty }yf(g^{-1}(y))(g^{-1})'(y)\,\mathrm {d} y=\int _{-\infty }^{\infty }g(x)f(x)\,\mathrm {d} x,

g (X)

g

f

X

^[6]

The assumption that $g$ is differentiable with nonvanishing derivative, which is necessary for applying the usual change-of-variables formula, excludes many typical cases, such as $g (x) = x 2$ . The result still holds true in these broader settings, although the proof requires more sophisticated results from mathematical analysis such as Sard's theorem and the coarea formula. In even greater generality, using the Lebesgue theory as below, it can be found that the identity

\operatorname {E} [g(X)]=\int _{-\infty }^{\infty }g(x)f(x)\,\mathrm {d} x

X

f

g

measurable function

g (X)

X

random vector

g

X

Measure-theoretic formulation

An abstract and general form of the result is available using the framework of measure theory and the Lebesgue integral. Here, the setting is that of a measure space $(Ω, μ)$ and a measurable map $X$ from $Ω$ to a measurable space $Ω'$ . The theorem then says that for any measurable function $g$ on $Ω'$ which is valued in real numbers (or even the extended real number line), there is

\int _{\Omega }g\circ X\,\mathrm {d} \mu =\int _{\Omega '}g\,\mathrm {d} (X_{\sharp }\mu ),

X ♯ μ

pushforward measure

Ω'

X

μ

probability measure monotone convergence theorem^[7]outer measures^[8]

If $μ$ is a σ-finite measure, the theory of the Radon–Nikodym derivative is applicable. In the special case that the measure $X ♯ μ$ is absolutely continuous relative to some background σ-finite measure $ν$ on $Ω'$ , there is a real-valued function $f X$ on $Ω'$ representing the Radon–Nikodym derivative of the two measures, and then

\int _{\Omega '}g\,\mathrm {d} (X_{\sharp }\mu )=\int _{\Omega '}gf_{X}\,\mathrm {d} \nu .

Ω'

real number line

ν

Lebesgue measure

μ

probability measure^[9]

References

^ DeGroot & Schervish 2014, pp. 213−214.
^ Casella & Berger 2001, Section 2.2; Ross 2019.
^ Casella & Berger 2001, Section 2.2.
^ Ross 2019.
^ Feller 1968, Section IX.2.
^ Papoulis & Pillai 2002, Chapter 5.
^ Bogachev 2007, Section 3.6; Cohn 2013, Section 2.6; Halmos 1950, Section 39.
^ Federer 1969, Section 2.4.
^ Halmos 1950, Section 39.

Bogachev, V. I. (2007). Measure theory. Volume I. Berlin: Springer-Verlag. doi:10.1007/978-3-540-34514-5. ISBN 978-3-540-34513-8. MR 2267655. Zbl 1120.28001.
Casella, George; Berger, Roger L. (2001). Statistical inference. Duxbury Advanced Series (Second edition of 1990 original ed.). Pacific Grove, CA: Duxbury. ISBN 0-534-11958-1. Zbl 0699.62001.
Cohn, Donald L. (2013). Measure theory. Birkhäuser Advanced Texts: Basler Lehrbücher (Second edition of 1980 original ed.). New York: Birkhäuser/Springer. doi:10.1007/978-1-4614-6956-8. ISBN 978-1-4614-6955-1. MR 3098996. Zbl 1292.28002.
DeGroot, Morris H.; Schervish, Mark J. (2014). Probability and statistics (Fourth edition of 1975 original ed.). Pearson Education. ISBN 0-321-50046-6. MR 0373075. Zbl 0619.62001.
Federer, Herbert (1969). Geometric measure theory. Die Grundlehren der mathematischen Wissenschaften. Vol. 153. Berlin–Heidelberg–New York: Springer-Verlag. doi:10.1007/978-3-642-62010-2. ISBN 978-3-540-60656-7. MR 0257325. Zbl 0176.00801.
Feller, William (1968). An introduction to probability theory and its applications. Volume I (Third edition of 1950 original ed.). New York–London–Sydney: John Wiley & Sons, Inc. MR 0228020. Zbl 0155.23101.
Halmos, Paul R. (1950). Measure theory. New York: D. Van Nostrand Co., Inc. doi:10.1007/978-1-4684-9440-2. MR 0033869. Zbl 0040.16802.
Papoulis, Athanasios; Pillai, S. Unnikrishna (2002). Probability, random variables, and stochastic processes (Fourth edition of 1965 original ed.). New York: McGraw-Hill. ISBN 0-07-366011-6.
Ross, Sheldon M. (2019). Introduction to probability models (Twelfth edition of 1972 original ed.). London: Academic Press. doi:10.1016/C2017-0-01324-1. ISBN 978-0-12-814346-9. MR 3931305. Zbl 1408.60002.