Confidence Intervals for Median

Fei Ye

September 2023

1. Learning Goals for Confidence Intervals


2. Order Statistics

Suppose \(X_1\), \(\dots\), \(X_n\) is a sample of independent values from a population. The order statistics \(X_{(1)}\), \(\dots\), \(X_{(n)}\) is an ascending ordered arrangement of the sample. The value \(X_{(i)}\) is called the \(i\)-th order statistic.

Example:

Find the \(7\)-th order statistic of the following sample of iris petal length

1.3, 6.6, 1.4, 1.4, 4.5, 5.1, 4.9, 5.4, 5.1, 1.9

Solution:

Rearranging the sample in ascending order yields

1.3, 1.4, 1.4, 1.9, 4.5, 4.9, 5.1, 5.1, 5.4, 6.6

The \(7\)-th order statistic is 5.1, the 7-th value in this ordered set.


3. Distribution-Free Confidence Intervals for Medians: Method (1 of 2)

Suppose \(m\) is the median of the population. Fixed a confidence level \(1-\alpha\) and the size \(n\) of samples, We are looking for \(i\) and \(j\) so that the probability \(P(X_{(i)}\le m\le X_{(j)})\) of \(m\) being contained in the interval \([X_{(i)}, X_{(j)}]\) of a random sample is \(1-\alpha\). For symmetric reason, we may take \(j=n+1-i\).

The probability that the interval \([X_{(k)}, X_{(k+1)}]\) of a ransom sample contains the median \(m\) is equivalent to that exactly \(k\) values in the sample are no greater than the population median. Denote by \(Z\) the number of \(X_i\) with \(X_i\le m\). Then \(Z\) is a binomial random variable with \(n\) mutually independent trials and probability of success \(p=P(X_i\le m)=0.5\). Then $$P(X_{(k)}\le m\le X_{(k+1)}=P(Z=k)={n\choose k}p^k(1-p)^{n-k}={n\choose k}\left(\frac12\right)^n,$$ and $$1-\alpha=P(X_{(i)}\le m\le X_{(n+1-i)})=\sum\limits_{k=i}^{n+1-i}P(Z=k)=\sum\limits_{k=i}^{n+1-i}{n\choose k}\left(\frac12\right)^n.$$


4. Distribution-Free Confidence Intervals for Medians: Method (2 of 2)

To determine \(i\) for a given \(n\), we need to calculate the cumulated probability \(\sum\limits_{k=i}^{n+1-i}P(Z=k)\) for the binomial variable \(Z\).

\(n\) CI Probability
5 \([X_{(1)}, X_{(5)}]\) 0.969
6 \([X_{(1)}, X_{(6)}]\) 0.984
7 \([X_{(1)}, X_{(7)}]\) 0.992
8 \([X_{(2)}, X_{(7)}]\) 0.961
9 \([X_{(2)}, X_{(8)}]\) 0.979
10 \([X_{(2)}, X_{(9)}]\) 0.988
11 \([X_{(3)}, X_{(9)}]\) 0.961
12 \([X_{(3)}, X_{(10)}]\) 0.978
\(n\) CI Probability
13 \([X_{(3)}, X_{(11)}]\) 0.987
14 \([X_{(4)}, X_{(11)}]\) 0.965
15 \([X_{(4)}, X_{(12)}]\) 0.979
16 \([X_{(5)}, X_{(12)}]\) 0.951
17 \([X_{(5)}, X_{(13)}]\) 0.969
18 \([X_{(5)}, X_{(14)}]\) 0.981
19 \([X_{(6)}, X_{(14)}]\) 0.959
20 \([X_{(6)}, X_{(15)}]\) 0.973

5. Distribution-Free Confidence Intervals for Medians by Normal Approximation

When \(n>21\), the binomial distribution \(Z\) is approximately normal with the mean \(\mu=np=0.5n\) and the standard deviation \(\sigma=\sqrt{np(1-p)}=\sqrt{0.25n}\).

Then the confidence interval \([X_{(i)}, X_{(j)}]\) at the confidence level \(1-\alpha\) is approximately given by $$i=\lfloor \mu-z_{\alpha/2}\sigma \rfloor=\lfloor 0.5n-z_{\alpha/2}\sqrt{0.25n}\rfloor$$ $$j=\lceil \mu+z_{\alpha/2}\sigma\rceil=\lceil 0.5n+z_{\alpha/2}\sqrt{0.25n}\rceil.$$

For more information, see the discussion on the central limit theorem for median. https://stats.stackexchange.com/questions/45124/central-limit-theorem-for-sample-medians?noredirect=1&lq=1


6. Example: Median House Price (1 of 2)

The following are prices (in thousands) of 24 randomly selected houses in a certain city. Find a confidence interval for the median hourse price in that city.

745 808 899 929 949 988 1090 1100 1130 1140 1150 1190
1240 1350 1430 1500 1500 1600 1880 2000 2450 2480 3360 5600

Solution:

Since the size of the sample is 24, we may use the normal approximation to get the confidence intervals. The mean and standard deviation for the normal approximation of the binomial distribution with the success rate \(p=0.5\) and \(n=24\) trials are \(\mu=np=12\), \(\sigma=\sqrt{np(1-p)}=\sqrt{6}\). At the \(1-\alpha=95\%\) confidence level, the \(z\)-score \(z_{\alpha/2}\) is approximately 1.96.


7. Example: Median House Price (2 of 2)

Then the orders of the bounds of the confidence interval are

$$i=\lfloor 12-1.96*\sqrt{6} \rfloor \approx 7,$$ $$j=\lceil 12+1.96*\sqrt{6}\rceil \approx 17.$$

Then \(X_{(7)}\) is 1090 and \(X_{(17)}\) is 1500. The confidence interval can be taken as \([1090, 1500]\).

The probability that \(m\) is in this interval is $$P(X_{(7)}\le m\le X_{(17)})\approx 0.977.$$

We are \(97.7\%\) confidence that the median house price of this city is between 1090 and 1500.


Lab Instructions in Excel


8. Normal Distributions and Marginal Errors