Researchers recorded resting heart rates (beats per minute) for three
groups of adults: nonsmokers, light smokers, and heavy
smokers. We wish to determine whether mean heart rate differs
among the three groups.
# -------------------------------
# Dataset
# -------------------------------
# Resting heart rate (bpm) by smoking category
nonSmoker = c(70, 58, 51, 56, 53, 53, 65)
lightSmoker = c(67, 75, 65, 78, 62, 70, 73)
heavySmoker = c(79, 80, 77, 77, 86, 68, 83)
# Response variable: Heart Rate
heartRate = c(nonSmoker, lightSmoker, heavySmoker)
# Categorical variable: People (with 7 observations each)
people = factor(rep(c("nonSmoker", "lightSmoker", "heavySmoker"), each = 7))
# Data frame for plotting and ANOVA
heartRateDataFrame = data.frame(people, heartRate)
# Display the first few rows of the data to verify structure
head(heartRateDataFrame)
## people heartRate
## 1 nonSmoker 70
## 2 nonSmoker 58
## 3 nonSmoker 51
## 4 nonSmoker 56
## 5 nonSmoker 53
## 6 nonSmoker 53
# Extract data by group
nonSmoker = heartRateDataFrame$heartRate[heartRateDataFrame$people == "nonSmoker"]
lightSmoker = heartRateDataFrame$heartRate[heartRateDataFrame$people == "lightSmoker"]
heavySmoker = heartRateDataFrame$heartRate[heartRateDataFrame$people == "heavySmoker"]
# Statistics of each group
# nonSmokers
numOfNonSmokers = length(nonSmoker)
meanOfNonSmokers = mean(nonSmoker)
varianceOfNonSmokers = var(nonSmoker)
stdevOfNonSmokers = sd(nonSmoker)
# lightSmokers
numOfLightSmokers = length(lightSmoker)
meanOfLightSmokers = mean(lightSmoker)
varianceOfLightSmokers = var(lightSmoker)
stdevOfLightSmokers = sd(lightSmoker)
# heavySmokers
numOfHeavySmokers = length(heavySmoker)
meanOfHeavySmokers = mean(heavySmoker)
varianceOfHeavySmokers = var(heavySmoker)
stdevOfHeavySmokers = sd(heavySmoker)
# Summary Table
summaryTable = data.frame(
Group = c("nonSmokers", "lightSmokers", "heavySmokers"),
n = c(numOfNonSmokers, numOfLightSmokers, numOfHeavySmokers),
Mean = round(c(meanOfNonSmokers, meanOfLightSmokers, meanOfHeavySmokers), 3),
Variance = round(c(varianceOfNonSmokers, varianceOfLightSmokers, varianceOfHeavySmokers), 3),
StdDev = round(c(stdevOfNonSmokers, stdevOfLightSmokers, stdevOfHeavySmokers), 3)
)
# Print the summary table
print(summaryTable)
## Group n Mean Variance StdDev
## 1 nonSmokers 7 58.000 49.333 7.024
## 2 lightSmokers 7 70.000 32.667 5.715
## 3 heavySmokers 7 78.571 32.286 5.682
Key Insights:
# Fit a one-way ANOVA model
# Response variable: heartRate
# Predictor (grouping factor): people
heartRateANOVA = aov(heartRate ~ people, data = heartRateDataFrame)
# Display the ANOVA table
# This output includes:
# - Degrees of Freedom (DF)
# - Sum of Squares (SS)
# - Mean Squares (MS)
# - F-statistic
# - P-value
summary(heartRateANOVA)
## Df Sum Sq Mean Sq F value Pr(>F)
## people 2 1494.9 747.4 19.62 3.01e-05 ***
## Residuals 18 685.7 38.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Interpretation
Degrees of Freedom (Df):
Sum of Squares (SS):
Mean Squares (MS):
F-value: 19.62
P-value: 0.0000301
Interpretation of p-value:
Assuming the null hypothesis is true (that all three groups of people have the same mean heart rate)
Ho = HnonSmoker = HlightSmoker = HheavySmoker
Ha = At least one of the means differ
Since the p-value (0.0000301) is less than 0.05, we reject the null hypothesis and conclude that at least one group of people has a significantly different mean heart rate.
Continued:
To determine which specific pairs of mean heart rates differ, a
Tukey HSD post-hoc test will be performed.
Before performing the Tukey HSD post-hoc test, we first create a
boxplot to visually confirm the ANOVA results.
The boxplot shows the distribution of mean heart rate for each group of
people, helping us see differences in medians, variability, and
potential outliers.
# Create a horizontal boxplot to compare shear bond strength across the four repair kits
boxplot(heartRate ~ people,
data = heartRateDataFrame, # Data frame containing the variables
horizontal = TRUE, # Display boxes horizontally for better readability
xlab = "Mean Heart Rates (bpm)", # Label for the x-axis
ylab = "", # Leave y-axis label blank since kit names are already shown
main = "Heart Rates by Group of People") # Main title of the plot
# Add points to represent the group means on top of each box
points(tapply(heartRateDataFrame$heartRate, heartRateDataFrame$people, mean), # Compute mean for each group of people
1:length(levels(heartRateDataFrame$people)), # Match the position of each mean to the box location
pch = 19) # Solid circles (pch = 19) for visibility
The boxplot results support the ANOVA findings that
at least one group mean differs.
However, while the boxplot suggests where differences may exist, a
statistical test such as Tukey’s HSD is required to
confirm which specific pairs of groups differ.
The Q-Q (quantile-quantile) plot is used to visually assess whether
the residuals from the ANOVA model are approximately normally
distributed.
If the residuals are normal, the points should closely follow the
diagonal reference line.
# Extract residuals from the ANOVA model
residualsANOVA = residuals(heartRateANOVA)
# Create a Q-Q plot
qqnorm(residualsANOVA,
main = "Q-Q Plot of ANOVA Residuals",
xlab = "Theoretical Quantiles",
ylab = "Sample Quantiles",
pch = 19, col = "blue")
# Reference line
qqline(residualsANOVA, col = "red", lwd = 3)
#### Interpretation of Q-Q Plot
Interpretation:
This indicates that the residuals are approximately normal, satisfying
the normality assumption.
tukeyResult = TukeyHSD(heartRateANOVA) # Run Tukey's HSD post-hoc test
print(tukeyResult) # Print the Tukey test results
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = heartRate ~ people, data = heartRateDataFrame)
##
## $people
## diff lwr upr p adj
## lightSmoker-heavySmoker -8.571429 -16.99138 -0.1514738 0.0456328
## nonSmoker-heavySmoker -20.571429 -28.99138 -12.1514738 0.0000200
## nonSmoker-lightSmoker -12.000000 -20.41995 -3.5800453 0.0050912
# Visualize Tukey HSD results
par(mar = c(8, 10, 5, 3)) # Increase left margin space
plot(tukeyResult, las = 1, col = "blue") # Extra space for labels
p = 0.0000200 → Significant difference, heavy smoker is
higherp = 0.0050912 → Significant difference, light smoker is
higher.p = 0.045
→ Significant difference, heavy smoker is slightly higher# Assigning information to combine into a data frame
Location = c(rep("Chicago", 6), rep("Bolingbrook", 6), rep("Peoria", 6))
Service = c(rep("Specialty Chain", 3), rep("General Service", 3), rep("Specialty Chain", 3),
rep("General Service", 3), rep("Specialty Chain", 3), rep("General Service", 3))
Price = c(79.80, 111.80, 95.96, 87.96, 107.80, 99.80, 95.96, 119.80, 115.96, 89.80, 119.80, 112.52, 99.96, 107.96, 79.80, 91.96, 99.80, 111.96)
# Combine into a data frame
meanOilChangePriceDataFrame = data.frame(Location, Service, Price)
print(meanOilChangePriceDataFrame)
## Location Service Price
## 1 Chicago Specialty Chain 79.80
## 2 Chicago Specialty Chain 111.80
## 3 Chicago Specialty Chain 95.96
## 4 Chicago General Service 87.96
## 5 Chicago General Service 107.80
## 6 Chicago General Service 99.80
## 7 Bolingbrook Specialty Chain 95.96
## 8 Bolingbrook Specialty Chain 119.80
## 9 Bolingbrook Specialty Chain 115.96
## 10 Bolingbrook General Service 89.80
## 11 Bolingbrook General Service 119.80
## 12 Bolingbrook General Service 112.52
## 13 Peoria Specialty Chain 99.96
## 14 Peoria Specialty Chain 107.96
## 15 Peoria Specialty Chain 79.80
## 16 Peoria General Service 91.96
## 17 Peoria General Service 99.80
## 18 Peoria General Service 111.96
# Running the Two-Way ANOVA in R
oilTwoWayANOVA=aov(Price ~ Location * Service, data = meanOilChangePriceDataFrame)
summary(oilTwoWayANOVA)
## Df Sum Sq Mean Sq F value Pr(>F)
## Location 2 498.0 249.01 1.388 0.287
## Service 1 11.5 11.52 0.064 0.804
## Location:Service 2 57.2 28.59 0.159 0.854
## Residuals 12 2153.0 179.41
Interaction (Location × Service): Not statistically significant (\(p = 0.854\)).
Main effect of Location: Not statistically significant (\(p = 0.287\)).
Main effect of Service: Not statistically significant (\(p = 0.804\)).
The effects of location and service do not change across the levels of each other.
# Interaction plot in R
interaction.plot(
x.factor = meanOilChangePriceDataFrame$Location, # Factor on x-axis
trace.factor = meanOilChangePriceDataFrame$Service, # Groups traced by color
response = meanOilChangePriceDataFrame$Price, # Response variable
fun = mean, # Use means for each group
type = "b", # Points and lines ("b" = both)
col = c("red", "blue", "green"), # Colors for Drug levels
pch = 16, # Solid points
xlab = "Location",
ylab = "Mean Oil Change Price",
trace.label = "Service provided",
main = "Interaction Plot for Oil Change Prices"
)
The plot shows that there is an interaction but it is not statistically significant.
This visually supports the non-significant interaction term (p = 0.854) found in the ANOVA table.