This document shows how to conduct a t-test in R. We will do the \(t\)-test both manually, and using built in functions.
The first step is to get the data. The data were discussed in class on Wednesday, Nov 19. The data reflect a change in weight of 53 obese children after 10 weeks of a dietary intervention. From our presentation, here is the data hard-coded:
obs <- c(
-16.7, -14.8, -11.9, -9.7, -9.6, -8.8, -8, -7.1, -6.6, -6.0, -5.6, -5.6, -5.5,
-5.5, -5.1, -5, -5, -4.8, -4.4, -4.4, -4.1, -4, -4, -3.6, -3.5, -3.2, -2.8,
-2, -1.8, -1.8, -1.4, -1.2, -0.2, -0.1, 0, 0.2, 0.6, 1, 1.2, 1.2, 1.4, 1.8, 2,
2.2, 2.5, 2.8, 3.3, 4.2, 5.4, 5.8, 6, 6.4, 8.4
)
The question we would like to answer is:
Q: Does this specific dietary intervention correspond to an average weight-loss for obese children?
To be able to perform a \(t\)-test, we need to check assumptions about the inference.
hist(obs)
boxplot(obs)
qqnorm(obs)
Each of these figures look pretty good, that the data are at least approximately normal. Remember that we don’t need the data to be exactly normal as long as we have enough observations and there aren’t any gross violations of the normality assumption. Here, the \(X_i\) look to be approximately normal, and \(n = 53\), so we are more than comfortable with this assumption.
Next, we might be interested in getting a confidence interval for the mean. Let’s use \(\alpha = 0.05\), meaning that we want a \(95\%\) confidence interval.
Note: We start with a 2-sided confidence interval, because this is what we have so far practiced. In Lecture on Wednesday, Nov 19, we introduced the idea of a one-sided confidence interval, which would be more appropriate in this case because our scientific question we are trying to answer is also one-sided.
For practice, the two-sided confidence interval can be calculated as:
\[ \bar{X}_n \pm t_{CI: \,0.95\%,\, df = 52} \times \frac{s}{\sqrt{n}} \]
Calculating the mean and standard deviation in R is an
easy task, we have done this many times already this semester. Now, we
need to find the t-value associated with this test.
One way to find the critical value is looking at a \(t\)-distribution table. We have one such table in our textbook for this class, Table C:
From the table above, we find the value: \[ t_{CI: \,0.95\%,\, df = 52} = 2.009 \]
Note that this is a bit of a conservative estimate, because we don’t have in the table a row for \(\text{df} = 52\), so we just rounded down and used 50 degrees of freedom, which is approximating \(t_{CI: \,0.95\%,\, df = 52} \approx t_{CI:\, 95\%,\, df = 50}= 2.009\).
In R, we can find the exact number using the qt(p, df)
function. Here, p is the area to the left
that we want, and df is the degrees of freedom. For a \(95\%\) confidence interval (two-sided), we
need to find the \(t\) value where
\(97.5\%\) of the area is to the
left (which leaves \(2.5\%\)
of the area to the right), so that the area in between two
values is \(95\%\):
qt(0.975, df = 52)
## [1] 2.006647
Notice how close this is to the value in the table. Using R gives us two advantages:
For arbitrary \(\alpha\), giving a two-sided \(C = 1-\alpha\) level confidence interval, we can calculate this as follows:
alpha <- 0.05
lower <- mean(obs) - qt(1 - alpha / 2, 52) * sd(obs) / sqrt(length(obs))
upper <- mean(obs) + qt(1 - alpha / 2, 52) * sd(obs) / sqrt(length(obs))
c(lower, upper)
## [1] -3.8488267 -0.9587205
# lower <- mean(obs) + qt(0.025, 52) * sd(obs) / sqrt(length(obs))
# upper <- mean(obs) + qt(0.975, 52) * sd(obs) / sqrt(length(obs))
In summary, the two-sided \(95\%\) confidence interval for the mean, rounded to 3 digits, is:
\[ \text{Two-sided } 95\% \text{ confidence interval for } \mu_{\text{change}}: (-3.849, -0.959) \]
If we are doing a one-sided test, we might want a one-sided confidence interval. This means that we are only looking for an upper or lower bound for the mean. In our example, we want to test if the intervention helps lower the weight of children, so a one-sided confidence interval will be giving an upper-bound to the mean reduction.
One problem is that the tables that we would use in the textbook (Table C) only includes t-values for two-sided confidence intervals. To get a one-sided confidence interval, we need to find the value \(t\) such that \(95\%\) of the area is to the left of \(t\). IF \(95\%\) of the area is to the left, that means that \(5\%\) is to the right (and, consequently, \(90\%\) of the area is in the middle). Thus, the one-sided \(t\) value for a \(95\%\) confidence interval is the same as the \(t\)-value for a \(90\%\), two-sided confidence interval:
Again, we don’t have the exact correct degrees of freedom here, so we approximate using \(df = 50\).
\[ t_{CI: \,<0.95\%,\, df = 50} = 1.676 \]
In R, we can calcluate this exactly.
Remember: In R, the default output always
corresponds to area to the left:
qt(0.95, 52)
## [1] 1.674689
Now we can combine everything together to get a \(95\%\) confidence, upper bound to the mean change \(\mu_{\text{change}}\).
Our \(95\%\), one-sided (upper-bound) confidence interval is calculated using:
# Remember alpha = 0.05
mean(obs) + qt(1 - alpha, 52) * sd(obs) / sqrt(length(obs))
## [1] -1.197774
Thus, we are \(95\%\) confidence that the true mean lies in:
\[ <95\% \text{ CI}: (-\infty, -1.198) \]
Our hypothesis that we would like to test is the following: \[ H_0:\, \mu_{\text{change}} = 0 \quad\quad H_{A}: \mu_{\text{change}} < 0. \] This is a one-sided test. We will also try to conduct this test at the \(\alpha = 0.05\) level, a choice picked before performing the test.
Now we will perform a \(t\)-test. To do this manually, there are two approaches:
The second approach is more popular if we are using T-tables (like Table C) rather than software, because it can be difficult to calculate the exact P-value using a table. However, in last the table above, we found the critical value of \(t\) to be \[ t_{\text{one-sided } \alpha = 0.05,\, df = 50} = 1.676 \] Again, the exact t-value calculated using software is:
qt(1-alpha, 52)
## [1] 1.674689
Thus, we will reject the test if we calculate a sample \(t\) value that is smaller than \(-1.675\). Our sample \(t-value\), which we will denote \(t^*\), is calculated as:
\[ t^* = \frac{\bar{X}_n - \mu_0}{\big(s / \sqrt{n}\big)} = \frac{-2.404}{\big(5.243/\sqrt{53}\big)} = -3.338 \] Here’s the calculation in R:
mean(obs) / (sd(obs)/sqrt(length(obs)))
## [1] -3.337957
Thus, because TRUE, we will reject the null hypothesis and conclude that there is sufficient evidence to support the claim that \(\mu_{\text{change}} < 0\).
The above result may not feel satisfactory, because we don’t have a P-value. The conclusion is the same, we still end up rejecting the null, but the P-value is hard to calculate exactly using Table C because there isn’t really a t-value on the table that corresponds to \(t = -3.338\). Remember, we are “stuck” in the row \(df = 50\) in the table, because that’s the closest we can get to the true \(df = 52\):
We can find the exact P-value in R:
tval <- mean(obs) / (sd(obs)/sqrt(length(obs)))
# Area to the left (ONE-SIDED TEST):
pt(tval, df = 52)
## [1] 0.0007825569
Thus, the exact P-value is:
\[ \text{P-value}=7.8\times 10^{-4} < 0.05 \]
Now that we have some extensive practice for this one-sided test, we
can actually use the R software to do lots of this for us,
beyond just finding the t and P values.
As a reminder, the data is saved in an object called
obs:
obs
## [1] -16.7 -14.8 -11.9 -9.7 -9.6 -8.8 -8.0 -7.1 -6.6 -6.0 -5.6 -5.6
## [13] -5.5 -5.5 -5.1 -5.0 -5.0 -4.8 -4.4 -4.4 -4.1 -4.0 -4.0 -3.6
## [25] -3.5 -3.2 -2.8 -2.0 -1.8 -1.8 -1.4 -1.2 -0.2 -0.1 0.0 0.2
## [37] 0.6 1.0 1.2 1.2 1.4 1.8 2.0 2.2 2.5 2.8 3.3 4.2
## [49] 5.4 5.8 6.0 6.4 8.4
and the hypothesis that we are tesing is the following:
\[ H_0:\, \mu_{\text{change}} = 0 \quad\quad H_{A}: \mu_{\text{change}} < 0. \]
We will use the t.test function to conduct the test.
Please take a look at the help-page ?t.test for specific
details
t.test(x = obs, alternative = "less", mu = 0)
##
## One Sample t-test
##
## data: obs
## t = -3.338, df = 52, p-value = 0.0007826
## alternative hypothesis: true mean is less than 0
## 95 percent confidence interval:
## -Inf -1.197774
## sample estimates:
## mean of x
## -2.403774
Notice above, the output matches what we calculated by hand: