R_Designパッケージ の変更点 - アールメカブ

アールメカブ


R_Designパッケージ の変更点


重回帰分析のための Design パッケージの利用例

#contents

* 重回帰分析と非線形回帰 [#hd889da1]
[[Baayen:http://www.mpi.nl/world/persons/private/baayen/]] の[[Analyzing Linguistic Data:http://www.amazon.co.jp/Analyzing-Linguistic-Data-Introduction-Statistics/dp/0521882591/]]  pp.185--

Ordinary least squares regression

関数としては ols() を使う以外は lm() と変わるところはない.
ただし,2次の項を含めるなど,非線形分析を行うときには記法が異なる.

# lm() 関数を使う
 english.lm <- lm(RTlexdec ~ WrittenFrequency +
 I(WrittenFrequency^2) + AgeSubject  + LengthInLetters,
 data =  english)
 summary(english.lm)
 anova(english.lm)
 
# ols() 関数を使う.p.192
 english.ols <- ols(RTlexdec ~ pol(WrittenFrequency, 2)
  + AgeSubject  + LengthInLetters, english)
 summary(english.ols)
 anova(english.ols)

ただし anova の出力はそれぞれ異なる.
 > anova(english.lm)
 Analysis of Variance Table
 Response: RTlexdec
                        Df Sum Sq Mean Sq  F value    Pr(>F)
 WrittenFrequency       1 21.261  21.261 2890.361 < 2.2e-16
 I(WrittenFrequency^2) 1  1.407   1.407  191.246 < 2.2e-16
 AgeSubject               1 56.141  56.141 7632.096 < 2.2e-16
 LengthInLetters         1  0.081   0.081   11.032 0.0009026
 Residuals             4563 33.565   0.007                   
  
 > anova(english.ols)
 Analysis of Variance          Response: RTlexdec 
 Factor           d.f. Partial SS MS           F       P     
 WrittenFrequency 2 22.4709395 11.235469735 1527.40 <.0001
 Nonlinear          1  1.4391888  1.439188764  195.65 <.0001
 AgeSubject         1 56.1411838 56.141183791 7632.10 <.0001
 LengthInLetters   1  0.0811495  0.081149504   11.03 9e-04 
 REGRESSION         4 78.8904273 19.722606822 2681.18 <.0001
 ERROR            4563 33.5651224  0.007355933               


標準パッケージの lm() 関数による分析結果に anova() を適用した場合は Sequential な Anova 表,つまり表の上に位置する要因を次々と階層的に組み込んでいき,それぞれの段階での予測子の効果を計っている.これに対して ols() 関数の結果にanova()を適用した結果は Non-Sequential な Anova 表を出力する.つまり,他の変数がすべてモデルに組み込まれている場合の当該予測子の効果を見ることができる.すなわち他の要因を全て定数とした場合の Partial Effect を調べることができる.

* Spline [#o0bf80e7]
&aname(spline);
rcs() 関数による smoothing を行った例.Baayen pp.191

# p.192
 &english.olsB <- ols(RTlexdec ~
   pol(WrittenFrequency, 2) + AgeSubject +
    LengthInLetters, english)
 summary(english.ols)
 anova(english.ols)
 # p.193
 english.olsC <- ols(RTlexdec ~
   rcs(WrittenFrequency, 3) + AgeSubject +
    LengthInLetters, english)
 english.olsE <- ols(RTlexdec ~
   rcs(WrittenFrequency, 5) + AgeSubject +
    LengthInLetters, english)
 english.olsD <- ols(RTlexdec ~
   rcs(WrittenFrequency, 7) + AgeSubject +
    LengthInLetters, english)
 # p.195
 par(mfrow = c(2,2))
 plot(english.olsC, WrittenFrequency = NA, 
   ylim = c(6.5, 7.0), conf.int = F)
 plot(english.olsB, WrittenFrequency = NA,
    add = T, lty = 2, conf.int = F)
 mtext("3 knots, undersmoothing", 3, 1, cex = .8)
 plot(english.olsD, WrittenFrequency = NA, 
   ylim = c(6.5, 7.0), conf.int = F)
 plot(english.olsB, WrittenFrequency = NA, add = T, 
   lty = 2, conf.int = F)
 mtext("7 knots, undersmoothing", 3, 1, cex = .8)
 plot(english.olsE, WrittenFrequency = NA, 
   ylim = c(6.5, 7.0), conf.int = F)
 plot(english.olsB, WrittenFrequency = NA, add = T,
 lty = 2, conf.int = F)
 mtext("5 knots, undersmoothing", 3, 1, cex = .8)

#ref(baayen196.png,center,nowrap,nolink,スムージングの例)

# p.195 さらに交互作用を組み込んだ
 english.olsE2 <- ols(RTlexdec ~
   rcs(WrittenFrequency, 5) + AgeSubject +
    LengthInLetters +
    rcs(WrittenFrequency,5):AgeSubject, english)
 anova(english.olsE2)
 # p.198
 par(mfrow = c(2,2), bg = "white")
 plot(english.olsE2, WrittenFrequency = NA,
   ylim = c(6.2, 7.0) )
 plot(english.olsE2, WrittenFrequency = NA,
   AgeSubject = "young", add = T, col = "blue")
 plot(english.olsE2, LengthInLetters = NA,
   ylim = c(6.2, 7.0) )
 plot(english.olsE2, AgeSubject = NA,
   ylim = c(6.2, 7.0) )

#ref(baayen199.png,center,nowrap,nolink,スムージングの例)


* lrm を使ったロジスティック回帰分析と,モデルの partial effect のグラフ. [#x9f6ea71]
[[Baayen:http://www.mpi.nl/world/persons/private/baayen/]] の[[Analyzing Linguistic Data:http://www.amazon.co.jp/Analyzing-Linguistic-Data-Introduction-Statistics/dp/0521882591/]] 

regularity.dd <- datadist(regularity)
options(datadist = "regularity.dd")
regularity.dd <- datadist(regularity)~
options(datadist = "regularity.dd")~
xtabs(~ regularity$Regularity)~
regularity.lrm <- lrm(Regularity ~ WrittenFrequency + rcs(FamilySize, 3) + NcountStem + InflectionalEntropy + Auxiliary + Valency + NVratio + WrittenSpokenRatio, data = regularity, x = T, y = T)~

xtabs(~ regularity$Regularity)
# p.225~
pentrace(regularity.lrm, seq(0, 0.8, 0.05))~

regularity.lrm <- lrm(Regularity ~ WrittenFrequency + rcs(FamilySize, 3) + NcountStem + InflectionalEntropy + Auxiliary + Valency + NVratio + WrittenSpokenRatio, data = regularity, x = T, y = T)
# p.226~
regularity.lrm.pen <- update(regularity.lrm, penalty = 0.6)~

# p.225
pentrace(regularity.lrm, seq(0, 0.8, 0.05))
# p.227~
par(mfrow = c(3,3), bg = "white")~
plot(regularity.lrm.pen, fun = plogis, ylab ="Pr(regular", adj.subtitle = F, ylim=c(0,1))~

# p.226
regularity.lrm.pen <- update(regularity.lrm, penalty = 0.6)

# p.227
par(mfrow = c(3,3), bg = "white")
plot(regularity.lrm.pen, fun = plogis, ylab ="Pr(regular", adj.subtitle = F, ylim=c(0,1))