R_Designパッケージ
重回帰分析のための Design パッケージの利用例
_ 重回帰分析と非線形回帰
Baayen のAnalyzing Linguistic Data pp.185--
Ordinary least squares regression
関数としては ols() を使う以外は lm() と変わるところはない. ただし,2次の項を含めるなど,非線形分析を行うときには記法が異なる.
# lm() 関数を使う
english.lm <- lm(RTlexdec ~ WrittenFrequency + I(WrittenFrequency^2) + AgeSubject + LengthInLetters, data = english) summary(english.lm) anova(english.lm)
# ols() 関数を使う.p.192
english.ols <- ols(RTlexdec ~ pol(WrittenFrequency, 2) + AgeSubject + LengthInLetters, english) summary(english.ols) anova(english.ols)
ただし anova の出力はそれぞれ異なる.
> anova(english.lm) Analysis of Variance Table Response: RTlexdec Df Sum Sq Mean Sq F value Pr(>F) WrittenFrequency 1 21.261 21.261 2890.361 < 2.2e-16 I(WrittenFrequency^2) 1 1.407 1.407 191.246 < 2.2e-16 AgeSubject 1 56.141 56.141 7632.096 < 2.2e-16 LengthInLetters 1 0.081 0.081 11.032 0.0009026 Residuals 4563 33.565 0.007 > anova(english.ols) Analysis of Variance Response: RTlexdec Factor d.f. Partial SS MS F P WrittenFrequency 2 22.4709395 11.235469735 1527.40 <.0001 Nonlinear 1 1.4391888 1.439188764 195.65 <.0001 AgeSubject 1 56.1411838 56.141183791 7632.10 <.0001 LengthInLetters 1 0.0811495 0.081149504 11.03 9e-04 REGRESSION 4 78.8904273 19.722606822 2681.18 <.0001 ERROR 4563 33.5651224 0.007355933
標準パッケージの lm() 関数による分析結果に anova() を適用した場合は Sequential な Anova 表,つまり表の上に位置する要因を次々と階層的に組み込んでいき,それぞれの段階での予測子の効果を計っている.これに対して ols() 関数の結果にanova()を適用した結果は Non-Sequential な Anova 表を出力する.つまり,他の変数がすべてモデルに組み込まれている場合の当該予測子の効果を見ることができる.すなわち他の要因を全て定数とした場合の Partial Effect を調べることができる.
_ Spline
rcs() 関数による smoothing を行った例.Baayen pp.191
# p.192
&english.olsB <- ols(RTlexdec ~ pol(WrittenFrequency, 2) + AgeSubject + LengthInLetters, english) summary(english.ols) anova(english.ols) # p.193 english.olsC <- ols(RTlexdec ~ rcs(WrittenFrequency, 3) + AgeSubject + LengthInLetters, english) english.olsE <- ols(RTlexdec ~ rcs(WrittenFrequency, 5) + AgeSubject + LengthInLetters, english) english.olsD <- ols(RTlexdec ~ rcs(WrittenFrequency, 7) + AgeSubject + LengthInLetters, english) # p.195 par(mfrow = c(2,2)) plot(english.olsC, WrittenFrequency = NA, ylim = c(6.5, 7.0), conf.int = F) plot(english.olsB, WrittenFrequency = NA, add = T, lty = 2, conf.int = F) mtext("3 knots, undersmoothing", 3, 1, cex = .8) plot(english.olsD, WrittenFrequency = NA, ylim = c(6.5, 7.0), conf.int = F) plot(english.olsB, WrittenFrequency = NA, add = T, lty = 2, conf.int = F) mtext("7 knots, undersmoothing", 3, 1, cex = .8) plot(english.olsE, WrittenFrequency = NA, ylim = c(6.5, 7.0), conf.int = F) plot(english.olsB, WrittenFrequency = NA, add = T, lty = 2, conf.int = F) mtext("5 knots, undersmoothing", 3, 1, cex = .8)
# p.195 さらに交互作用を組み込んだ
english.olsE2 <- ols(RTlexdec ~ rcs(WrittenFrequency, 5) + AgeSubject + LengthInLetters + rcs(WrittenFrequency,5):AgeSubject, english) anova(english.olsE2) # p.198 par(mfrow = c(2,2), bg = "white") plot(english.olsE2, WrittenFrequency = NA, ylim = c(6.2, 7.0) ) plot(english.olsE2, WrittenFrequency = NA, AgeSubject = "young", add = T, col = "blue") plot(english.olsE2, LengthInLetters = NA, ylim = c(6.2, 7.0) ) plot(english.olsE2, AgeSubject = NA, ylim = c(6.2, 7.0) )
_ lrm を使ったロジスティック回帰分析と,モデルの partial effect のグラフ.
Baayen のAnalyzing Linguistic Data
regularity.dd <- datadist(regularity)
options(datadist = "regularity.dd")
xtabs(~ regularity$Regularity)
regularity.lrm <- lrm(Regularity ~ WrittenFrequency? + rcs(FamilySize?, 3) + NcountStem? + InflectionalEntropy? + Auxiliary + Valency + NVratio + WrittenSpokenRatio?, data = regularity, x = T, y = T)
# p.225
pentrace(regularity.lrm, seq(0, 0.8, 0.05))
# p.226
regularity.lrm.pen <- update(regularity.lrm, penalty = 0.6)
# p.227
par(mfrow = c(3,3), bg = "white")
plot(regularity.lrm.pen, fun = plogis, ylab ="Pr(regular", adj.subtitle = F, ylim=c(0,1))