Suppose \(X_1\), \(\dots\), \(X_n\) is a sample of independent values from a population. The order statistics \(X_{(1)}\), \(\dots\), \(X_{(n)}\) is an ascending ordered arrangement of the sample. The value \(X_{(i)}\) is called the \(i\)-th order statistic.
Example:
Find the \(7\)-th order statistic of the following sample of iris petal length
1.3, 6.6, 1.4, 1.4, 4.5, 5.1, 4.9, 5.4, 5.1, 1.9
Solution:
Rearranging the sample in ascending order yields
1.3, 1.4, 1.4, 1.9, 4.5, 4.9, 5.1, 5.1, 5.4, 6.6
The \(7\)-th order statistic is 5.1, the 7-th value in this ordered set.
Suppose \(m\) is the median of the population. Fixed a confidence level \(1-\alpha\) and the size \(n\) of samples, We are looking for \(i\) and \(j\) so that the probability \(P(X_{(i)}\le m\le X_{(j)})\) of \(m\) being contained in the interval \([X_{(i)}, X_{(j)}]\) of a random sample is \(1-\alpha\). For symmetric reason, we may take \(j=n+1-i\).
The probability that the interval \([X_{(k)}, X_{(k+1)}]\) of a ransom sample contains the median \(m\) is equivalent to that exactly \(k\) values in the sample are no greater than the population median. Denote by \(Z\) the number of \(X_i\) with \(X_i\le m\). Then \(Z\) is a binomial random variable with \(n\) mutually independent trials and probability of success \(p=P(X_i\le m)=0.5\). Then $$P(X_{(k)}\le m\le X_{(k+1)}=P(Z=k)={n\choose k}p^k(1-p)^{n-k}={n\choose k}\left(\frac12\right)^n,$$ and $$1-\alpha=P(X_{(i)}\le m\le X_{(n+1-i)})=\sum\limits_{k=i}^{n+1-i}P(Z=k)=\sum\limits_{k=i}^{n+1-i}{n\choose k}\left(\frac12\right)^n.$$
To determine \(i\) for a given \(n\), we need to calculate the cumulated probability \(\sum\limits_{k=i}^{n+1-i}P(Z=k)\) for the binomial variable \(Z\).
\(n\) | CI | Probability |
---|---|---|
5 | \([X_{(1)}, X_{(5)}]\) | 0.969 |
6 | \([X_{(1)}, X_{(6)}]\) | 0.984 |
7 | \([X_{(1)}, X_{(7)}]\) | 0.992 |
8 | \([X_{(2)}, X_{(7)}]\) | 0.961 |
9 | \([X_{(2)}, X_{(8)}]\) | 0.979 |
10 | \([X_{(2)}, X_{(9)}]\) | 0.988 |
11 | \([X_{(3)}, X_{(9)}]\) | 0.961 |
12 | \([X_{(3)}, X_{(10)}]\) | 0.978 |
\(n\) | CI | Probability |
---|---|---|
13 | \([X_{(3)}, X_{(11)}]\) | 0.987 |
14 | \([X_{(4)}, X_{(11)}]\) | 0.965 |
15 | \([X_{(4)}, X_{(12)}]\) | 0.979 |
16 | \([X_{(5)}, X_{(12)}]\) | 0.951 |
17 | \([X_{(5)}, X_{(13)}]\) | 0.969 |
18 | \([X_{(5)}, X_{(14)}]\) | 0.981 |
19 | \([X_{(6)}, X_{(14)}]\) | 0.959 |
20 | \([X_{(6)}, X_{(15)}]\) | 0.973 |
When \(n>21\), the binomial distribution \(Z\) is approximately normal with the mean \(\mu=np=0.5n\) and the standard deviation \(\sigma=\sqrt{np(1-p)}=\sqrt{0.25n}\).
Then the confidence interval \([X_{(i)}, X_{(j)}]\) at the confidence level \(1-\alpha\) is approximately given by $$i=\lfloor \mu-z_{\alpha/2}\sigma \rfloor=\lfloor 0.5n-z_{\alpha/2}\sqrt{0.25n}\rfloor$$ $$j=\lceil \mu+z_{\alpha/2}\sigma\rceil=\lceil 0.5n+z_{\alpha/2}\sqrt{0.25n}\rceil.$$
For more information, see the discussion on the central limit theorem for median. https://stats.stackexchange.com/questions/45124/central-limit-theorem-for-sample-medians?noredirect=1&lq=1
The following are prices (in thousands) of 24 randomly selected houses in a certain city. Find a confidence interval for the median hourse price in that city.
745 | 808 | 899 | 929 | 949 | 988 | 1090 | 1100 | 1130 | 1140 | 1150 | 1190 |
1240 | 1350 | 1430 | 1500 | 1500 | 1600 | 1880 | 2000 | 2450 | 2480 | 3360 | 5600 |
Solution:
Since the size of the sample is 24, we may use the normal approximation to get the confidence intervals. The mean and standard deviation for the normal approximation of the binomial distribution with the success rate \(p=0.5\) and \(n=24\) trials are \(\mu=np=12\), \(\sigma=\sqrt{np(1-p)}=\sqrt{6}\). At the \(1-\alpha=95\%\) confidence level, the \(z\)-score \(z_{\alpha/2}\) is approximately 1.96.
Then the orders of the bounds of the confidence interval are
$$i=\lfloor 12-1.96*\sqrt{6} \rfloor \approx 7,$$ $$j=\lceil 12+1.96*\sqrt{6}\rceil \approx 17.$$
Then \(X_{(7)}\) is 1090 and \(X_{(17)}\) is 1500. The confidence interval can be taken as \([1090, 1500]\).
The probability that \(m\) is in this interval is $$P(X_{(7)}\le m\le X_{(17)})\approx 0.977.$$
We are \(97.7\%\) confidence that the median house price of this city is between 1090 and 1500.
Lab Instructions in Excel
Let \(Z\) be a standard normal random varaible. In Excel, \(P(Z<z)\) is given by NORM.S.DIST(z,TRUE)
.
Let \(X\) be a normal random variable with mean \(\mu\) and standard deviation \(\sigma\), that is \(X\sim \mathcal{N}(\mu, \sigma^2)\). In Excel, \(P(X<x)\) is given by NORM.DIST(x,mean,sd,TRUE)
.
When a cumulative probability \(p=P(X<x)\) of a normal random variable \(X\) is given, we can find \(x\) using NORM.INV(p,mean,sd)
.
When a cumulative probability \(p=P(Z<z)\) of a standard normal random variable \(Z\) is given, we can find \(z\) using NORM.S.INV(p)
.
If a sample of size \(n\) has the proportion \(\hat{p}=phat\) and the sampling distribution is approximately normal, the marginal error for the proportion can be obtained by the Excel function
CONFIDENCE.NORM(1-confidence level, SQRT(phat*(1-phat)/n), n)