R_fromOldHtml3_2 のバックアップ(No.4) - アールメカブ

アールメカブ


R_fromOldHtml3_2 のバックアップ(No.4)


個人メモを整理中です.不適切な記述が多々あるかと思います.お気づきの際は,ishida-m.at.ias.tokushima-u.ac.jp. までご連絡ください.

R_fromOldHtml3

_ anova によるモデル比較の意味 2007 08 01

Crawley, Using R for Introductory Statistics,p.307

anova は二つの入れ子になったモデルを比較する.

RSS 残差平方和は,モデルとデータの差をはかった尺度である. いま,p 個の変数モデルの RSS(p) が k 個の変数モデルの RSS(k) よりわずかに少ないとする.ただし p > k とする.

k に加えられた新しい変数がさほど重要でないのならば,RSS(k) - RSS(p)は小さいはずである.逆に重要ならば,その差は大きいはずである.そこで

 ( RSS(k) - RSS(k) ) / RSS(p)

 F = {( RSS(k) - RSS(k) ) / (p - k )} / { RSS(p) 
      / (n - (p + 1))} 
   = {( RSS(k) - RSS(k) ) / (p - k )} / sigma^2

は自由度 (p - k ), (n - (p + 1) の F 分布に従う.

_ 環境設定 Renviron Rprofile 2007 08 02

.Renviron をホームフォルダに置き,例えば

 R_LIBS=etc/R

.Rprofile では

 	grDevices::ps.options(family= "Japan1")

などを書く.

_ rgl による 3D グラフィックスの作成例 2007 08 02

 # 
 life<- source(
 "chap4lifeexp.dat")$value 
 #
 attach(life)
 #
 # 3 因子モデル
 life.fa3<- factanal(life, factors = 3, scores = "regression")
 life.fa3
 # install.packages("rgl")
 library(rgl)
 rgl.clear()
 #rgl.clear(type="lights")
 #rgl.clear(type="bbox")
 rgl.open() 
 rgl.bg(color=c("white","black"))
 rgl.spheres( life.fa3$scores[,"Factor1"], 
          life.fa3$scores[,"Factor2"], 
          life.fa3$scores[,"Factor3"], 
          radius=0.05, color = 1: 
          length(life.fa3$scores[,"Factor1"]))

 rgl.bbox(color="#112233", 
  emission="#90ee90",specular="#556677", 
  shinines=8,alpha=0.8)

 rgl.texts( life.fa3$scores[,"Factor1"],
          life.fa3$scores[,"Factor2"], 
          life.fa3$scores[,"Factor3"], 
          abbreviate(
             names(life.fa3$scores[,"Factor1"])), 
          color =  1: length(
          life.fa3$scores[,"Factor1"])) 
      # , adj = "left" )

 rgl.postscript("filename.eps", fmt="eps" )
  # eps 形式で保存

 rgl.close()
rgl.quit()

_ ベクトルに名前を付ける 2007 08 02

 > x<- c(1,2,3)
 > x
 [1] 1 2 3
 > names(x)<- c("A","B","C")
 > x
 A B C 
 1 2 3 

_ [] と [,] の違い 2007 08 02

 > mode(iris[5])
 [1] "list"
 > mode(iris[,5])
 [1] "numeric"
 > is.factor(iris[,5]) 
 [1] TRUE
 > is.vector(iris[,5]) 
 [1] FALSE
 # 
 > is.data.frame(iris[5]) 
 [1] TRUE
 > is.list(iris[5]) 
 [1] TRUE
 >  

_ 一時的にロケールを変更 Sys.setlocale 2007 08 05

この情報はRjoWiki より入手

 lc<-Sys.getlocale("LC_CTYPE")
 Sys.setlocale("LC_CTYPE","C")
 df<-read.fwf("hoge.dat",width=c(バイト,バイト,バイト))
 Sys.setlocale("LC_CTYPE",lc)

_ The R Book のデータアドレス (URL から直接データを読む) 2008 08 09

http://www.bio.ic.ac.uk/research/mjcraw/therbook/

 [ishida@amd64 crawley]$ pwd
 research/statistics/book/crawley
 [ishida@amd64 crawley]$ ls
 therbook.zip

この url から直接データを読むには

 gain <- read.table(
"http://www.bio.ic.ac.uk/research/mjcraw/therbook/data/Gain.txt", 
 header = T)

_ data.frame の or 条件抽出 (条件指定がリサイクルされる recycle) 2008 08 11

 name.data<- c("Michiko", "Taro", "Masako", 
   "Jiro","Aiko","Santa")
 math.data<- c(50, 60, 70, 80, 90, 100)
 name.math<- data.frame(students = name.data, 
   math = math.data) 
 (gen.data<- rep(c("female", "mdd", "male"), 2)) 
  name.math$gen<- gen.data
 is.character(name.math$gen)
 # [1] TRUE
 levels(name.math$gen)
 #NULL

# 次の二つの条件指定はうまくいかない

 name.math[name.math$gen == c("female", "male"),]
 #  students math    gen
 #1  Michiko   50 female
 #6    Santa  100   male
 # 
 name.math[name.math$math == c(50, 60),]
 #  students math    gen
 #1  Michiko   50 female
 #2     Taro   60    mdd
 name.math[name.math$math == c(60,70),]
 #[1] students math     gen     
 #<0 rows> (or 0-length row.names)

これは以下の例からも明らかなように

  name.math$math == c(50, 60)
 # [1]  TRUE  TRUE FALSE FALSE FALSE FALSE
  name.math$math == c(60, 70)
 # [1] FALSE FALSE FALSE FALSE FALSE FALSE 

c("female", "male") や c(50,60) を 自動的に行数にあわせて リカーシブに繰り返し

 name.math[name.math$gen == c("female", "male", 
  "female", "male", "female", "male") 
 name.math$math == c(50, 60, 50, 60, 50, 70) 

として全行を一対一で条件判定しているため

_ RSiteSearch? によるメーリングリスト検索 2007 08 15

  RSiteSearch("X11 Ubuntu")

_ aov の結果で design unbalanced と表示される 2007 08 16

http://permalink.gmane.org/gmane.comp.lang.r.general/23317

 Dear Prof Ripley

 Thanks for your reply and clarification.  However:

 1.  Regarding model.tables() returning 
"Design is unbalanced".  
 Setting contrasts to Helmert does indeed
 make the design balanced, but model.tables() still returns 
 "Design is unbalanced":

 > options()$contrasts
         unordered           ordered 
 "contr.treatment"      "contr.poly" 
 > aov(S~rep+trt1*trt2*trt3, data=dummy.data)
 Call:
 ...
 Residual standard error: 14.59899 
 Estimated effects may be unbalanced
 > options(contrasts=c("contr.helmert", "contr.treatment"))
 > aov(S~rep+trt1*trt2*trt3, data=dummy.data)
 Call:
  ...
 Residual standard error: 14.59899 
 Estimated effects are balanced
 > model.tables(aov(S~rep+trt1*trt2*trt3, 
 data=dummy.data), se=T)
 Design is unbalanced - use se.contrasts for se's
 Tables of effects
 ...

 However, this is a relatively minor issue, 
 and covered in ?model.tables which clearly states that "The
 implementation is incomplete, 
 and only the simpler cases have been tested thoroughly."

 2.  You point out that "In either case you can 
 predict something 
 you want to estimate and use
 predict(, se=TRUE)."  
 Doesn't this give the standard error of 
 the predicted value, 
 rather than the mean
 for, say, trt1 level 0?  For example:
 > predict(temp.lm, newdata=data.frame(rep='1', 
   trt1='0', trt2='1', trt3='0'), se=T)
 $fit
 [1] 32

 $se.fit
 [1] 10.53591

 $df
 [1] 23

 $residual.scale
 [1] 14.59899

 Whereas from the analysis of variance table 
 we can get the standard error of the mean for 
 trt1 as being
 sqrt(anova(temp.lm)[9,3]/12) = 4.214365.  
 It is the equivalent of this latter value that 
 I'm after in the
 glm() case.

 >>> Prof Brian Ripley<ripley<at> 03/08/04 18:10:56 >>>
 On Tue, 3 Aug 2004, Peter Alspach wrote:

 [Lines wrapped for legibility.]

 > I'm having a little difficulty getting the correct 
 standard errors from
 > a glm.object (R 1.9.0 under Windows XP 5.1).  
 predict() will gives
 > standard errors of the predicted values, 
 but I am wanting the standard
 > errors of the mean.
 > 
 > To clarify:
 > 
 > Assume I have a 4x3x2 factorial with 2 complete 
 replications (i.e. 48
 > observations, I've appended a dummy set of data at 
 the end of this
 > message).  Call the treatments trt1 (4 levels), trt2   
 (3 levels) and trt3
 > (2 levels) and the replications rep - all are  
 factors.  The observed
 > data is S.  Then:
 > 
 > temp.aov<- aov(S~rep+trt1*trt2*trt3, data=dummy.data)
 > model.tables(temp.aov, type='mean', se=T)
 > 
 > Returns the means, but states "Design is unbalanced - 
 use se.contrasts
 > for se's" which is a little surprising since the 
 design is balanced.  
 
 If you used the default treatment contrasts, 
 it is not.  Try Helmert 
 contrasts with aov().

 > Nevertheless, se.contrast gives what I'd expect:
 > 
 > se.contrast(temp.aov, list(trt1==0, trt1==1), 
 data=dummy.data)
 > [1] 5.960012
 > 
 > i.e. standard error of mean is 5.960012/sqrt(2) = 
  4.214, which is the
 > sqrt(anova(temp.aov)[9,3]/12) as expected.  
 Similarly for interactions,
 > e.g.:
 > 
 > se.contrast(temp.aov, list(trt1==0 & trt2==0, 
 trt1==1 & trt2==1), data=dummy.data)/sqrt(2)
 > [1]  7.299494
 > 
 > How do I get the equivalent of these standard errors 
 if I have used
 > lm(), and by extension glm()?  I think I should be 
 able to get these
 > using predict(..., type='terms', se=T) or 
 coef(summary()) but can't
 > quite see how.

 In either case you can predict something you want to 
 estimate and use
 predict(, se=TRUE).

_ kkmeans の使い方 1 2007 08 18

sc<- kkmeans(as.matrix(iris[,-5]), center = 3)
slotNames(sc)
sc@centers
sc@.Data

table(iris$Species, sc@.Data)

kkmeans には predict 関数は用意されていない

_ string kernel の作成 その 1 2007 08 18

# string kernel の作成
library(kernlab)
# sample, Lodhi p.423
test.str<- list("science is organized knowledge",
  "wisdom is organized life")
str.kern.list<- NULL
for(i in 1:6){
  str.dot<- stringdot(length = i, lambda = 0.5, 
            type = "sequence", normalized = TRUE)
  str.kern.list[i]<- list(kernelMatrix(str.dot, test.str))
}
 str.kern.list

_ tm パッケージの使い方 2007 08 18

## tm パッケージ

 library(tm)
 data("acq")
 summary(acq)
 show(acq)
 inspect(acq[1])
 # acq<- tmMap(acq, asPlain, convertReut21578XMLPlain)
 # summary(acq)
 # inspect(acq[1])

# 空白の処理

 acq<- tmMap(acq, stripWhitespace)
 inspect(acq[1])

# stopWords 処理

 acq<- tmMap(acq, removeWords, stopwords("english"))
 inspect(acq[1])

# stemming 処理

 acq<- tmMap(acq, stemDoc)
 inspect(acq[1])

# 小文字への変換

 acq<- tmMap(acq, tmTolower)
 inspect(acq[1])

# タグ情報に関するクエリーの発行

 query<- "identifier == '10'"
 tmFilter(acq, query)

# フルサーチ

 kekka<- tmFilter(acq, FUN = searchFullText, 
 "comput", doclevel = TRUE)
 inspect(kekka)

_ kvsm カーネル サポートベクターの使い方 2007 08 18

 ## ksvm
 tmp<- sample(1:150, 100)
 iris.x<- iris[tmp,]
 iris.y<- iris[-tmp,]
 results<- ksvm(Species ~., data = iris.x)
 slotNames(results)
 slot(results, "SVindex")
 results2<- predict(results, new = iris.y)
 table(iris.y$Species, results2)

_ specc カーネル サポートベクターの使い方 2007 08 18

td<- tempfile()
dir.create(td)
 write(c("Human machine interface for ABC computer 
  applications"), 
       file=paste(td, "D1", sep="/") )
 write(c("A survey of user opinion of computer system 
  response time"), 
       file=paste(td, "D2", sep="/") )
 write(c("The EPS user interface management system"), 
       file=paste(td, "D3", sep="/") )
 write(c("System and human system engineering testing 
   of EPS"), 
       file=paste(td, "D4", sep="/") )
 write(c("Relation of user perceived response time" ,
         "to error measurement"), 
       file=paste(td, "D5", sep="/") )
 write(c("The intersection graph of paths in trees"), 
       file=paste(td, "D6", sep="/") )
 write(c("Graph minors IV: Widths of trees and 
   well-quasi-ordering"), 
       file=paste(td, "D7", sep="/") )
 write(c("The generation of random, binary,",  
    "ordered trees"), 
       file=paste(td, "D8", sep="/") )
 write(c("Graph minors: A survey"), 
       file=paste(td, "D9", sep="/") )
 td.doc <- TextDocCol(DirSource(td), # tm パッケージ
            readerControl = list(reader = readPlain,
	     language = "en_US", load = TRUE))
 summary(td.doc)
 inspect(td.doc)
 str.dot<- stringdot(length = 4, lambda = 0.5, 
    type = "sequence",
 normalized = TRUE)      # kernlab パッケージ
 test.kern<- kernelMatrix(str.dot, td.doc)
 td.doc.specc<- specc(td.doc, centers = 2, 
    kernel = "stringdot")

_ lsa パッケージの利用法 2007 08 18

lc <- Sys.getlocale("LC_CTYPE")# utf-8 以外の環境の場合
Sys.setlocale("LC_CTYPE","C")
 library(lsa)

一時ファイルで実験する場合

td<- tempfile()
dir.create(td)
 write(c("Human", "machine", "interface", "for", "ABC",
   "computer", "applications"), 
     file = paste(td, "D1", sep="/") )
 write(c("A", "survey", "of", "user", "opinion", "of",
   "computer", "system", "response", "time"), 
     file = paste(td, "D2", sep="/") )
 write(c("The", "EPS", "user", "interface", "management", 
        "system"), file=paste(td, "D3", sep="/") )
 write(c("System", "and", "human", "system", "engineering",
   "testing", "of", "EPS"), 
     file = paste(td, "D4", sep="/") )
 write(c("Relation", "of", "user", "perceived", 
   "response", "time" ,"to", "error", "measurement"), 
   file = paste(td, "D5", sep="/") )
 write(c("The", "intersection", "graph", "of", 
   "paths", "in" ,"trees"), 
   file = paste(td, "D6", sep="/") )
 write(c("Graph", "minors", "IV:", "Widths", "of",  
  "trees",   "and", "well-quasi-ordering"), 
  file = paste(td, "D7", sep="/") )
 write( c("The", "generation", "of", "random,", 
   "binary,",    "ordered", "trees"),
   file = paste(td, "D8", sep="/") )
 write(c("Graph", "minors:", "A", "survey"), 
   file = paste(td, "D9", sep="/") )
 ####################

単純に文章ターム行列を作ってみる

 myMatrix<- textmatrix(td)

stopword をロード

 data(stopwords_en)

stopword と stemming を指定しての文書・ターム行列作成

 myMatrix<- textmatrix(td, stopwords = stopwords_en, 
    stemming = TRUE)

必要なら重みを付け

 # myMatrix = lw_logtf(myMatrix) * gw_idf(myMatrix)

生の検索語の設定

 myQuery<- query("user interface", rownames(myMatrix), 
    stemming = TRUE  )
 myMat.Que<-  cbind(myMatrix, myQuery)
 as.matrix(round(cosine(myMat.Que), dig = 2)[,10])

単純な特異値分解

 # myLSAraw<- lsa(myMatrix, dims = dimcalc_raw())
 # 復元
 # round(myLSAraw$tk %*% diag(myLSAraw$sk) %*% 
       t(myLSAraw$dk), digit = 2)

LSA を実行してみる dimcalc_share(0.4) は許容する特異値の数を指定

 myLSAspace<- lsa(myMatrix, dims = dimcalc_share(0.4))
 myLSAspace 
 # もとの文書行列では 0 の要素にも索引重みが計算されている
 round(myLSAspace$tk, digits= 2)

もとの文書ベクトルを 3 次元で近似する

 new3Doc<- t(myLSAspace$tk) %*% myMatrix
 #
 plot(new3Doc[1,], new3Doc[2,])
 library(rgl)
 rgl.open() 
 rgl.bg(color=c("white", "black"))
 rgl.spheres(new3Doc[1,],
             new3Doc[2,],
             new3Doc[3,],
             radius = 0.01,color = 1: ncol(new3Doc))
 rgl.bbox(color= "#112233", emission = 
    "#90ee90",specular = "#556677",
          shinines = 8, alpha = 0.8)
 rgl.texts(new3Doc[1,],
             new3Doc[2,],
             new3Doc[3,],
            rownames(myLSAspace$dk), 
            color =  1:ncol(new3Doc), cex = 1.2) 
            # , adj = "left" )
 # rgl.viewpoint
 # rgl.snapshot(file = "sla.png", fmt = "png")
 rgl.postscript("sla.eps", fmt="eps" ) 
 for (i in  seq(2,20,2)) {
        rgl.viewpoint(i,20)
        filename<- paste("lsa-",formatC(i, digits=2, 
        flag="0"),".eps",sep="")
        rgl.postscript(filename, fmt="eps" )
      }
 rgl.close()

3 次元に圧縮した文書行列による検索 この結果を使って検索

 # query("user interface",  rownames(myLSAspace$tk), 
     stemming = TRUE  )
 ## myQuery2<- query("user interface", rownames(myLSAspace$tk),
   stemming = TRUE  )
 ## myMat.Que2<-  cbind(myLSAspace$tk, myQuery2)
 ## cosine(myMat.Que2 ) #  USER INTERFACE 列との相関の程度で
 ## nrow( myQuery2 )
 ## ncol( myQuery2 )
 myQuery3<- query("user interface", 
     rownames(myLSAspace$tk), stemming = TRUE  )
 new3Query<- t(myLSAspace$tk) %*%  myQuery3
 myMat.Que3<-  cbind(new3Doc, new3Query)
 as.matrix(round(cosine(myMat.Que3), dig = 2)[,10])
 unlink(td, recursive=TRUE)
 Sys.setlocale("LC_CTYPE",lc)

_ Rinternals.h を使った処理 2007 08 20

このサイト から引用させていただきました.

  /*
  g++ -O2 `mecab-config --cflags` myfunc.c -o myfunc 
  `mecab-config --libs`
  -I/usr/local/lib64/R/include
  	*/
  #include<R.h>
  #include<Rinternals.h>
  SEXP myfunc(SEXP param, SEXP vecparam, SEXP aa)
  {
   SEXP ans;
   double a = REAL(param)[0];
   int len1 = length(param);
   int len2 = length(vecparam);
   int p1 = INTEGER(vecparam)[0];
   int p2 = INTEGER(vecparam)[1];
   char* str = CHAR(STRING_ELT(aa,0));
   Rprintf("%s\n",str);
   Rprintf("length of 1: %d\n",len1);
   Rprintf("length of 2: %d\n",len2);
   Rprintf("input param: %lf, %d, %d\n",a,p1,p2);
   PROTECT(ans = allocVector(INTSXP, p1*p2));
   for (int i = 0; i< p1*p2; i++)
     INTEGER(ans)[i] = i;
   UNPROTECT(1);
   return(ans);
 }
 [ishida@amd64 myRcode]$ R CMD SHLIB myfunc.c
 [ishida@amd64 myRcode]$ R

c プログラムテスト

 > dyn.load("myfunc.so")
 #
 > ret = .Call("myfunc",1.15,as.integer(c(2,3)),
    "hogeほげ")

_ mecab の処理結果を R で取得する 2007 08 21

# C プログラムとして

 ishida/research/statistics/myRcode/mecab.c
 ファイルを作成
  #include<R.h>
  #include<Rdefines.h>
  #include<Rinternals.h>
  #include<mecab.h>
  #include<stdio.h>
  #define CHECK(eval) if (! eval) { \
     fprintf (stderr, "Exception:%s\n", mecab_strerror 
    (mecab)); \
     mecab_destroy(mecab); \
     return -1; }
  SEXP mecab(SEXP aa){ 
  SEXP parsed;
  const char* input = CHAR(STRING_ELT(aa,0)); 
  mecab_t *mecab;
  mecab_node_t *node;
  const char *result;
  int i;
  mecab = mecab_new2 (input);
  CHECK(mecab);
  result = mecab_sparse_tostr(mecab, input);
  CHECK(result);
  Rprintf ("INPUT: %s\n", input);
  Rprintf ("RESULT:\n%s", result);
  PROTECT(parsed = allocVector(STRSXP,1));
  SET_STRING_ELT(parsed, 0, mkChar(result));
  //PROTECT(parsed = mkString(result));
  UNPROTECT(1);
  mecab_destroy(mecab);
  return(parsed); 
  }

コンパイルは

  % R CMD SHLIB chartest.c -L/usr/local/lib/ -lmecab 
       -I/usr/local/include

# R 側で

  dyn.load("research/statistics/myRcode/mecab.so")

  kekka<- .Call("mecab","すもももももももものうち")
  kekka2<- NULL
  kekka2<- unlist(strsplit(kekka, "\n"))

  reg<- NULL
  kekka3<- NULL

  for(i in 1 :length(kekka2)){
   reg<- regexpr("^(\\w+)\t(\\w+)", kekka2[i])
   kekka3<- c(kekka3, substring(kekka2[i], reg[1], 
     attributes(reg)[[1]]))
  }

  kekka3

_ fligner.test 多群の等分散性を検定するノンパラメトリックな方法 2007 08 22

Crawley The R Book p.293

_ stringkernel の作成 2007 08 23

# string kernel の作成

  library(kernlab)
  # sample, Lodhi p.423
  test.str<- list("science is organized 
       knowledge","wisdom is organized life")


  str.kern.list<- NULL
  for(i in 1:6){
   str.dot<- stringdot(length = i, lambda = 0.5, 
     type = "sequence", normalized = TRUE)
   str.kern.list[i]<- list(kernelMatrix(str.dot, 
      test.str))
  }

## 日本語の方は 6 バイト扱いで計算している

  test.str.jp<- list("これと","これは")

  str.kern.list.jp<- NULL
  for(i in 1:6){
   str.dot<- stringdot(length = i, lambda = 0.5, type = 
      "sequence", 
                        normalized = TRUE)
   str.kern.list.jp[i]<- list(kernelMatrix(str.dot, 
      test.str.jp))
  }

  test.str<- list("car","cat")

  str.kern.list <- NULL
  for(i in 1:6){
   str.dot<- stringdot(length = i, lambda = 0.5, type = 
         "sequence",    normalized = TRUE)
   str.kern.list[i]<- list(kernelMatrix(str.dot, 
      test.str))
  }

_ tick.mark 座標ラベルの設定 2007 08 25

Crawley The R Book p.293 p. 146

 plot(0:10, 0:10, xlab = "", ylab = "", xaxt = "n", yaxt = "n")

_ コントラスト再考 2007 08 29

John Fox p.127 -- 153

_ se in summary.lm 回帰係数の標準誤差 2007 08 28

Crawley The R Book p.365

_ 4 次元配列 2007 08 29

mosaicplot(Titanic[c("1st","2nd","3rd"),,"Adult",],
  main = "Survival on the Titanic", shade = T)

_ 二元配置モデルでの summary.lm のパラメータの意味 2007 08 30

John Verzani p.336 の構造モデルから判断すると,すべてベース (Intercept) との切片の差ということになる.

二元配置の分散分析

 frogs3<- read.csv(
  "http://150.59.18.68/frogs3.csv", header = FALSE)
 frogs3 # header = FALSE で,列名はファイルに未設定と指示

なお列名が未定義の場合,自動的に V1, V2, V3 などの名前が付加される 二つの要因がある場合,それらをチルダ記号の右側に + 記号で指定する

 frogs3.aov<- aov(V1 ~ V2 + V3, data = frogs3)
 summary(frogs3.aov)
 summary.lm(frogs3.aov)

Intercept は V2 = 12H かつ V3 = 100ug の場合.繰り返し数 3 この標準偏差は sqrt(7.51/6).これは V2 V3 の自由度の積か 2行目の V224H は sqrt(2 *7.51/9).9 は V2 の繰り返し数か

Intercept は V2 が 12H で V3 が 100 ug の場合 2行目 V224H は V2 が 24H の場合の Intercept(V2=12Hかつ V3=100ugの場合) との差

同じく,p.332 によれば共分散分析では,連続量はスロープを表す.

 regrowth<- read.table(
 "http://www.bio.ic.ac.uk/research/mjcraw/
   therbook/data/ipomopsis.txt", 
  header = T)
 ancova1<- lm(Fruit ~ Grazing * Root)
 summary(ancova1)
 anova(ancova1)

_ 共分散分析での各パラメータの標準誤差の計算

Crawley The R Book [#ha05dc7e]

p. 492 - 498

Faraway (2006) よりデータを借用

 babyfood<- read.table(file = 
    "http://150.59.18.68/babyfood.txt")
 babyfood

# データから要因別に罹患比率を求めて分割表にする.xtabs() 関数を利用

 xtabs(disease/(disease+nondisease) ~ sex + food, babyfood)

# ロジスティック回帰分析を実行する

# 目的変数を 2 項分布とした一般化線形モデル glm() による

 model1<- glm(cbind(disease, nondisease) ~ 
            sex + food, family = binomial, data = babyfood)

# glm は一般化線形モデルを実行する関数.family は分布を指定する

 summary(model1)              # 要約を見る
 drop1(model1, test = "Chi")  # 各項は有意か
 exp(-.669)                # 母乳の効果を確認する
 model.matrix(model1)
 # Intercept  は Boy で Bottle
 # sexGirl    は Girl の場合の Intercept との差
 # foodBreast は Intercept (Boy Bottle) の場合に比べての差
 # foodSuppl  は Intercept (Boy Bottle) の場合に比べての差