Performing a t-test for population mean

$t$-test for population mean

This document shows how to conduct a t-test in R. We will do the $t$-test both manually, and using built in functions.

The first step is to get the data. The data were discussed in class on Wednesday, Nov 19. The data reflect a change in weight of 53 obese children after 10 weeks of a dietary intervention. From our presentation, here is the data hard-coded:

obs <- c(
  -16.7, -14.8, -11.9, -9.7, -9.6, -8.8, -8, -7.1, -6.6, -6.0, -5.6, -5.6, -5.5, 
  -5.5, -5.1, -5, -5, -4.8, -4.4, -4.4, -4.1, -4, -4, -3.6, -3.5, -3.2, -2.8, 
  -2, -1.8, -1.8, -1.4, -1.2, -0.2, -0.1, 0, 0.2, 0.6, 1, 1.2, 1.2, 1.4, 1.8, 2, 
  2.2, 2.5, 2.8, 3.3, 4.2, 5.4, 5.8, 6, 6.4, 8.4
)

The question we would like to answer is:

Q: Does this specific dietary intervention correspond to an average weight-loss for obese children?

Checking Assumptions

To be able to perform a $t$-test, we need to check assumptions about the inference.

The data are a simple random sample.

This is impossible to check by looking at the data alone; it’s more of a question about how the data are generated / sampled. From the description of the problem, we can assume that this assumption is reasonably met.

The data are either:

Normally distributed
Not too different from normal, but we have sufficient observations ($n$ is big enough).

hist(obs)

boxplot(obs)

qqnorm(obs)

Each of these figures look pretty good, that the data are at least approximately normal. Remember that we don’t need the data to be exactly normal as long as we have enough observations and there aren’t any gross violations of the normality assumption. Here, the $X_i$ look to be approximately normal, and $n = 53$, so we are more than comfortable with this assumption.

Confidence Intervals (manually)

Next, we might be interested in getting a confidence interval for the mean. Let’s use $\alpha = 0.05$, meaning that we want a $95\%$ confidence interval.

Note: We start with a 2-sided confidence interval, because this is what we have so far practiced. In Lecture on Wednesday, Nov 19, we introduced the idea of a one-sided confidence interval, which would be more appropriate in this case because our scientific question we are trying to answer is also one-sided.

For practice, the two-sided confidence interval can be calculated as:

\[ \bar{X}_n \pm t_{CI: \,0.95\%,\, df = 52} \times \frac{s}{\sqrt{n}} \]

Calculating the mean and standard deviation in R is an easy task, we have done this many times already this semester. Now, we need to find the t-value associated with this test.

Looking at Table C (textbook)

One way to find the critical value is looking at a $t$-distribution table. We have one such table in our textbook for this class, Table C:

Table C from our textbook. Here, the t value that has 95% of the area in the middle is highlighted

From the table above, we find the value: \[ t_{CI: \,0.95\%,\, df = 52} = 2.009 \]

Note that this is a bit of a conservative estimate, because we don’t have in the table a row for $\text{df} = 52$, so we just rounded down and used 50 degrees of freedom, which is approximating $t_{CI: \,0.95\%,\, df = 52} \approx t_{CI:\, 95\%,\, df = 50}= 2.009$.

In R, we can find the exact number using the qt(p, df) function. Here, p is the area to the left that we want, and df is the degrees of freedom. For a $95\%$ confidence interval (two-sided), we need to find the $t$ value where $97.5\%$ of the area is to the left (which leaves $2.5\%$ of the area to the right), so that the area in between two values is $95\%$:

qt(0.975, df = 52)

## [1] 2.006647

Notice how close this is to the value in the table. Using R gives us two advantages:

The value is more exact
We can find the $t$-value for different levels of confidence that aren’t in the table (or, if we are doing the inverse problem where we start with $t$ and try to find the P-value).

For arbitrary $\alpha$, giving a two-sided $C = 1-\alpha$ level confidence interval, we can calculate this as follows:

alpha <- 0.05

lower <- mean(obs) - qt(1 - alpha / 2, 52) * sd(obs) / sqrt(length(obs))
upper <- mean(obs) + qt(1 - alpha / 2, 52) * sd(obs) / sqrt(length(obs))

c(lower, upper)

## [1] -3.8488267 -0.9587205

# lower <- mean(obs) + qt(0.025, 52) * sd(obs) / sqrt(length(obs))
# upper <- mean(obs) + qt(0.975, 52) * sd(obs) / sqrt(length(obs))

In summary, the two-sided $95\%$ confidence interval for the mean, rounded to 3 digits, is:

\[ \text{Two-sided } 95\% \text{ confidence interval for } \mu_{\text{change}}: (-3.849, -0.959) \]

One Sided confidence interval

If we are doing a one-sided test, we might want a one-sided confidence interval. This means that we are only looking for an upper or lower bound for the mean. In our example, we want to test if the intervention helps lower the weight of children, so a one-sided confidence interval will be giving an upper-bound to the mean reduction.

One problem is that the tables that we would use in the textbook (Table C) only includes t-values for two-sided confidence intervals. To get a one-sided confidence interval, we need to find the value $t$ such that $95\%$ of the area is to the left of $t$. IF $95\%$ of the area is to the left, that means that $5\%$ is to the right (and, consequently, $90\%$ of the area is in the middle). Thus, the one-sided $t$ value for a $95\%$ confidence interval is the same as the $t$-value for a $90\%$, two-sided confidence interval:

Table C from our textbook. Here, the t-value that has 95% of the area to the left is highlighted. This is the same as the t-value with 90% of the area in the middle of a two-sided confidence interval.

Again, we don’t have the exact correct degrees of freedom here, so we approximate using $df = 50$.

\[ t_{CI: \,<0.95\%,\, df = 50} = 1.676 \]

In R, we can calcluate this exactly. Remember: In R, the default output always corresponds to area to the left:

qt(0.95, 52)

## [1] 1.674689

Now we can combine everything together to get a $95\%$ confidence, upper bound to the mean change $\mu_{\text{change}}$.

Our $95\%$, one-sided (upper-bound) confidence interval is calculated using:

# Remember alpha = 0.05
mean(obs) + qt(1 - alpha, 52) * sd(obs) / sqrt(length(obs))

## [1] -1.197774

Thus, we are $95\%$ confidence that the true mean lies in:

\[ <95\% \text{ CI}: (-\infty, -1.198) \]

$t$-test

Our hypothesis that we would like to test is the following: \[ H_0:\, \mu_{\text{change}} = 0 \quad\quad H_{A}: \mu_{\text{change}} < 0. \] This is a one-sided test. We will also try to conduct this test at the $\alpha = 0.05$ level, a choice picked before performing the test.

By Hand:

Now we will perform a $t$-test. To do this manually, there are two approaches:

Calculate the P-value exactly, compare to $\alpha$. If it’s smaller, we reject.
Don’t calculate the P-value exactly. Instead, find the $t$-value that corresponds to a $\alpha$-level test, and compare our $t$ value to this critical value. If our $t$ value is smaller, then we reject because we know that the P-value must be smaller than $\alpha$, even if we don’t know the exact P-value.

The second approach is more popular if we are using T-tables (like Table C) rather than software, because it can be difficult to calculate the exact P-value using a table. However, in last the table above, we found the critical value of $t$ to be \[ t_{\text{one-sided } \alpha = 0.05,\, df = 50} = 1.676 \] Again, the exact t-value calculated using software is:

qt(1-alpha, 52)

## [1] 1.674689

Thus, we will reject the test if we calculate a sample $t$ value that is smaller than $-1.675$. Our sample $t-value$, which we will denote $t^*$, is calculated as:

\[ t^* = \frac{\bar{X}_n - \mu_0}{\big(s / \sqrt{n}\big)} = \frac{-2.404}{\big(5.243/\sqrt{53}\big)} = -3.338 \] Here’s the calculation in R:

mean(obs) / (sd(obs)/sqrt(length(obs)))

## [1] -3.337957

Thus, because TRUE, we will reject the null hypothesis and conclude that there is sufficient evidence to support the claim that $\mu_{\text{change}} < 0$.

The above result may not feel satisfactory, because we don’t have a P-value. The conclusion is the same, we still end up rejecting the null, but the P-value is hard to calculate exactly using Table C because there isn’t really a t-value on the table that corresponds to $t = -3.338$. Remember, we are “stuck” in the row $df = 50$ in the table, because that’s the closest we can get to the true $df = 52$:

$Table C from our textbook. Unlike the previous times, we now have a t-value, and we want to find the corresponding P-value (or area). Here, we find that the observed t-value was between 3.261–3.496, which means our one-sided P-value is between 0.001–0.0005; whatever the exact value is doesn’t matter, it’s smallter than \alpha = 0.05, so we choose to reject the null.$

Table C from our textbook. Unlike the previous times, we now have a t-value, and we want to find the corresponding P-value (or area). Here, we find that the observed t-value was between 3.261–3.496, which means our one-sided P-value is between 0.001–0.0005; whatever the exact value is doesn’t matter, it’s smallter than $\alpha = 0.05$, so we choose to reject the null.

We can find the exact P-value in R:

tval <- mean(obs) / (sd(obs)/sqrt(length(obs)))

# Area to the left (ONE-SIDED TEST):
pt(tval, df = 52)

## [1] 0.0007825569

Thus, the exact P-value is:

\[ \text{P-value}=7.8\times 10^{-4} < 0.05 \]

Automatic Test:

Now that we have some extensive practice for this one-sided test, we can actually use the R software to do lots of this for us, beyond just finding the t and P values.

As a reminder, the data is saved in an object called obs:

obs

##  [1] -16.7 -14.8 -11.9  -9.7  -9.6  -8.8  -8.0  -7.1  -6.6  -6.0  -5.6  -5.6
## [13]  -5.5  -5.5  -5.1  -5.0  -5.0  -4.8  -4.4  -4.4  -4.1  -4.0  -4.0  -3.6
## [25]  -3.5  -3.2  -2.8  -2.0  -1.8  -1.8  -1.4  -1.2  -0.2  -0.1   0.0   0.2
## [37]   0.6   1.0   1.2   1.2   1.4   1.8   2.0   2.2   2.5   2.8   3.3   4.2
## [49]   5.4   5.8   6.0   6.4   8.4

and the hypothesis that we are tesing is the following:

\[ H_0:\, \mu_{\text{change}} = 0 \quad\quad H_{A}: \mu_{\text{change}} < 0. \]

We will use the t.test function to conduct the test. Please take a look at the help-page ?t.test for specific details

t.test(x = obs, alternative = "less", mu = 0)

## 
##  One Sample t-test
## 
## data:  obs
## t = -3.338, df = 52, p-value = 0.0007826
## alternative hypothesis: true mean is less than 0
## 95 percent confidence interval:
##       -Inf -1.197774
## sample estimates:
## mean of x 
## -2.403774

Notice above, the output matches what we calculated by hand:

The $t$-value: -3.338
The P-value: $7.8\times 10^{-4}$
One-sided confidence interval: $(-\infty, -1.198)$.

Performing a t-test for population mean

Chapter 17, Math 3350

Jesse Wheeler

2025-11-19

\(t\)-test for population mean