R_fromOldHtml3_2 の変更点 - アールメカブ

アールメカブ


R_fromOldHtml3_2 の変更点


個人メモを整理中です.不適切な記述が多々あるかと思います.お気づきの際は,ishida-mこの部分を"@"に変更下さいias.tokushima-u.ac.jp
 までご連絡ください.
個人メモを整理中です.不適切な記述が多々あるかと思います.お気づきの際は,ishida-m(この部分を"@"に変更下さい)ias.tokushima-u.ac.jp
までご連絡ください.


[[R_fromOldHtml3]]


#contents




*  anova によるモデル比較の意味 2007 08 01 [#ad7d4e6c]


Crawley, Using R for Introductory Statistics,p.307

anova は二つの入れ子になったモデルを比較する.

RSS 残差平方和は,モデルとデータの差をはかった尺度である.
いま,p 個の変数モデルの RSS(p) が k 個の変数モデルの RSS(k) よりわずかに少ないとする.ただし p > k とする.

k に加えられた新しい変数がさほど重要でないのならば,RSS(k) - RSS(p)は小さいはずである.逆に重要ならば,その差は大きいはずである.そこで

  ( RSS(k) - RSS(k) ) / RSS(p)
 
  F = {( RSS(k) - RSS(k) ) / (p - k )} / { RSS(p) 
       / (n - (p + 1))} 
    = {( RSS(k) - RSS(k) ) / (p - k )} / sigma^2

は自由度 (p - k ),  (n - (p + 1) の F 分布に従う.








*  環境設定 Renviron Rprofile 2007 08 02 [#x05a0451]


.Renviron をホームフォルダに置き,例えば
  R_LIBS=etc/R

.Rprofile では
  	grDevices::ps.options(family= "Japan1")
などを書く.






*  rgl による 3D グラフィックスの作成例 2007 08 02 [#ee686f14]


  # 
  life<- source(
  "chap4lifeexp.dat")$value 
  #
  attach(life)
  #
  # 3 因子モデル
  life.fa3<- factanal(life, factors = 3, scores = "regression")
  life.fa3
  # install.packages("rgl")
  library(rgl)
  rgl.clear()
  #rgl.clear(type="lights")
  #rgl.clear(type="bbox")
  rgl.open() 
  rgl.bg(color=c("white","black"))

  rgl.spheres( life.fa3$scores[,"Factor1"], 
           life.fa3$scores[,"Factor2"], 
           life.fa3$scores[,"Factor3"], 
           radius=0.05, color = 1: 
           length(life.fa3$scores[,"Factor1"]))
 
  rgl.bbox(color="#112233", 
   emission="#90ee90",specular="#556677", 
   shinines=8,alpha=0.8)
 
  rgl.texts( life.fa3$scores[,"Factor1"],
           life.fa3$scores[,"Factor2"], 
           life.fa3$scores[,"Factor3"], 
           abbreviate(
              names(life.fa3$scores[,"Factor1"])), 
           color =  1: length(
           life.fa3$scores[,"Factor1"])) 
       # , adj = "left" )
 
  rgl.postscript("filename.eps", fmt="eps" )
   # eps 形式で保存
 
  rgl.close()
 rgl.quit()





*  ベクトルに名前を付ける 2007 08 02 [#f56966a2]



  > x<- c(1,2,3)
  > x
  [1] 1 2 3
  > names(x)<- c("A","B","C")
  > x
  A B C 
  1 2 3 


*  [] と [,] の違い 2007 08 02 [#cbb7aa96]


  > mode(iris[5])
  [1] "list"
  > mode(iris[,5])
  [1] "numeric"
  > is.factor(iris[,5]) 
  [1] TRUE
  > is.vector(iris[,5]) 
  [1] FALSE
  # 
  > is.data.frame(iris[5]) 
  [1] TRUE
  > is.list(iris[5]) 
  [1] TRUE
  >  






*  一時的にロケールを変更 Sys.setlocale 2007 08 05 [#le4d2174]
この情報は[[RjoWiki:http://www.okada.jp.org/RWiki/?%BD%E9%B5%E9%A3%D1%A1%F5%A3%C1%20%A5%A2%A1%BC%A5%AB%A5%A4%A5%D6(6)]] より入手

  lc<-Sys.getlocale("LC_CTYPE")
  Sys.setlocale("LC_CTYPE","C")
  df<-read.fwf("hoge.dat",width=c(バイト,バイト,バイト))
  Sys.setlocale("LC_CTYPE",lc)


*  The R Book のデータアドレス (URL から直接データを読む) 2008 08 09 [#x3d64c5d]

http://www.bio.ic.ac.uk/research/mjcraw/therbook/


  [ishida@amd64 crawley]$ pwd
  research/statistics/book/crawley
  [ishida@amd64 crawley]$ ls
  therbook.zip

この url から直接データを読むには

  gain <- read.table(
 "http://www.bio.ic.ac.uk/research/mjcraw/therbook/data/Gain.txt", 
  header = T)






*  data.frame の or 条件抽出 (条件指定がリサイクルされる recycle) 2008 08 11 [#v6604d91]


  name.data<- c("Michiko", "Taro", "Masako", 
    "Jiro","Aiko","Santa")
  math.data<- c(50, 60, 70, 80, 90, 100)
  name.math<- data.frame(students = name.data, 
    math = math.data) 
  (gen.data<- rep(c("female", "mdd", "male"), 2)) 
   name.math$gen<- gen.data
  is.character(name.math$gen)
  # [1] TRUE
  levels(name.math$gen)
  #NULL
#  次の二つの条件指定はうまくいかない
  name.math[name.math$gen == c("female", "male"),]
  #  students math    gen
  #1  Michiko   50 female
  #6    Santa  100   male
  # 
  name.math[name.math$math == c(50, 60),]
  #  students math    gen
  #1  Michiko   50 female
  #2     Taro   60    mdd
  name.math[name.math$math == c(60,70),]
  #[1] students math     gen     
  #<0 rows> (or 0-length row.names)

これは以下の例からも明らかなように
   name.math$math == c(50, 60)
  # [1]  TRUE  TRUE FALSE FALSE FALSE FALSE
   name.math$math == c(60, 70)
  # [1] FALSE FALSE FALSE FALSE FALSE FALSE 

c("female", "male") や c(50,60) を
自動的に行数にあわせて リカーシブに繰り返し
  name.math[name.math$gen == c("female", "male", 
   "female", "male", "female", "male") 
  name.math$math == c(50, 60, 50, 60, 50, 70) 

として全行を一対一で条件判定しているため




*   RSiteSearch によるメーリングリスト検索 2007 08 15 [#kcf94e43]


   RSiteSearch("X11 Ubuntu")


*  aov の結果で design unbalanced と表示される 2007 08 16 [#jcd27af3]


http://permalink.gmane.org/gmane.comp.lang.r.general/23317
 
  Dear Prof Ripley
 
  Thanks for your reply and clarification.  However:
 
  1.  Regarding model.tables() returning 
 "Design is unbalanced".  
  Setting contrasts to Helmert does indeed
  make the design balanced, but model.tables() still returns 
  "Design is unbalanced":
 
  > options()$contrasts
          unordered           ordered 
  "contr.treatment"      "contr.poly" 
  > aov(S~rep+trt1*trt2*trt3, data=dummy.data)
  Call:
  ...
  Residual standard error: 14.59899 
  Estimated effects may be unbalanced
  > options(contrasts=c("contr.helmert", "contr.treatment"))
  > aov(S~rep+trt1*trt2*trt3, data=dummy.data)
  Call:
   ...
  Residual standard error: 14.59899 
  Estimated effects are balanced
  > model.tables(aov(S~rep+trt1*trt2*trt3, 
  data=dummy.data), se=T)
  Design is unbalanced - use se.contrasts for se's
  Tables of effects
  ...
 
  However, this is a relatively minor issue, 
  and covered in ?model.tables which clearly states that "The
  implementation is incomplete, 
  and only the simpler cases have been tested thoroughly."
 
  2.  You point out that "In either case you can 
  predict something 
  you want to estimate and use
  predict(, se=TRUE)."  
  Doesn't this give the standard error of 
  the predicted value, 
  rather than the mean
  for, say, trt1 level 0?  For example:
  > predict(temp.lm, newdata=data.frame(rep='1', 
    trt1='0', trt2='1', trt3='0'), se=T)
  $fit
  [1] 32
 
  $se.fit
  [1] 10.53591
 
  $df
  [1] 23
 
  $residual.scale
  [1] 14.59899
 
  Whereas from the analysis of variance table 
  we can get the standard error of the mean for 
  trt1 as being
  sqrt(anova(temp.lm)[9,3]/12) = 4.214365.  
  It is the equivalent of this latter value that 
  I'm after in the
  glm() case.
 
  >>> Prof Brian Ripley<ripley<at> 03/08/04 18:10:56 >>>
  On Tue, 3 Aug 2004, Peter Alspach wrote:
 
  [Lines wrapped for legibility.]
 
  > I'm having a little difficulty getting the correct 
  standard errors from
  > a glm.object (R 1.9.0 under Windows XP 5.1).  
  predict() will gives
  > standard errors of the predicted values, 
  but I am wanting the standard
  > errors of the mean.
  > 
  > To clarify:
  > 
  > Assume I have a 4x3x2 factorial with 2 complete 
  replications (i.e. 48
  > observations, I've appended a dummy set of data at 
  the end of this
  > message).  Call the treatments trt1 (4 levels), trt2   
  (3 levels) and trt3
  > (2 levels) and the replications rep - all are  
  factors.  The observed
  > data is S.  Then:
  > 
  > temp.aov<- aov(S~rep+trt1*trt2*trt3, data=dummy.data)
  > model.tables(temp.aov, type='mean', se=T)
  > 
  > Returns the means, but states "Design is unbalanced - 
  use se.contrasts
  > for se's" which is a little surprising since the 
  design is balanced.  
  
  If you used the default treatment contrasts, 
  it is not.  Try Helmert 
  contrasts with aov().
 
  > Nevertheless, se.contrast gives what I'd expect:
  > 
  > se.contrast(temp.aov, list(trt1==0, trt1==1), 
  data=dummy.data)
  > [1] 5.960012
  > 
  > i.e. standard error of mean is 5.960012/sqrt(2) = 
   4.214, which is the
  > sqrt(anova(temp.aov)[9,3]/12) as expected.  
  Similarly for interactions,
  > e.g.:
  > 
  > se.contrast(temp.aov, list(trt1==0 & trt2==0, 
  trt1==1 & trt2==1), data=dummy.data)/sqrt(2)
  > [1]  7.299494
  > 
  > How do I get the equivalent of these standard errors 
  if I have used
  > lm(), and by extension glm()?  I think I should be 
  able to get these
  > using predict(..., type='terms', se=T) or 
  coef(summary()) but can't
  > quite see how.
 
  In either case you can predict something you want to 
  estimate and use
  predict(, se=TRUE).








*  kkmeans の使い方 1 2007 08 18 [#ya340229]

 sc<- kkmeans(as.matrix(iris[,-5]), center = 3)
 slotNames(sc)
 sc@centers
 sc@.Data
 
 table(iris$Species, sc@.Data)

kkmeans には predict 関数は用意されていない








*  string kernel の作成 その 1  2007 08 18 [#e90e8286]


 # string kernel の作成
 library(kernlab)
 # sample, Lodhi p.423
 test.str<- list("science is organized knowledge",
   "wisdom is organized life")
 str.kern.list<- NULL
 for(i in 1:6){
   str.dot<- stringdot(length = i, lambda = 0.5, 
             type = "sequence", normalized = TRUE)
   str.kern.list[i]<- list(kernelMatrix(str.dot, test.str))
 }
  str.kern.list







*  tm パッケージの使い方 2007 08 18 [#ead6f07a]


##      tm パッケージ
  library(tm)
  data("acq")
  summary(acq)
  show(acq)
  inspect(acq[1])
  # acq<- tmMap(acq, asPlain, convertReut21578XMLPlain)
  # summary(acq)
  # inspect(acq[1])

# 空白の処理
  acq<- tmMap(acq, stripWhitespace)
  inspect(acq[1])


# stopWords 処理
  acq<- tmMap(acq, removeWords, stopwords("english"))
  inspect(acq[1])

# stemming 処理
  acq<- tmMap(acq, stemDoc)
  inspect(acq[1])

# 小文字への変換
  acq<- tmMap(acq, tmTolower)
  inspect(acq[1])

# タグ情報に関するクエリーの発行
  query<- "identifier == '10'"
  tmFilter(acq, query)

# フルサーチ
  kekka<- tmFilter(acq, FUN = searchFullText, 
  "comput", doclevel = TRUE)
  inspect(kekka)








*  kvsm カーネル サポートベクターの使い方 2007 08 18 [#c45094c2]


  ## ksvm
  tmp<- sample(1:150, 100)
  iris.x<- iris[tmp,]
  iris.y<- iris[-tmp,]

  results<- ksvm(Species ~., data = iris.x)
  slotNames(results)
  slot(results, "SVindex")
  results2<- predict(results, new = iris.y)

  table(iris.y$Species, results2)







*  specc カーネル サポートベクターの使い方 2007 08 18 [#f1b4c096]

 td<- tempfile()
 dir.create(td)
  write(c("Human machine interface for ABC computer 
   applications"), 
        file=paste(td, "D1", sep="/") )
  write(c("A survey of user opinion of computer system 
   response time"), 
        file=paste(td, "D2", sep="/") )
  write(c("The EPS user interface management system"), 
        file=paste(td, "D3", sep="/") )
  write(c("System and human system engineering testing 
    of EPS"), 
        file=paste(td, "D4", sep="/") )
  write(c("Relation of user perceived response time" ,
          "to error measurement"), 
        file=paste(td, "D5", sep="/") )
  write(c("The intersection graph of paths in trees"), 
        file=paste(td, "D6", sep="/") )
  write(c("Graph minors IV: Widths of trees and 
    well-quasi-ordering"), 
        file=paste(td, "D7", sep="/") )
  write(c("The generation of random, binary,",  
     "ordered trees"), 
        file=paste(td, "D8", sep="/") )
  write(c("Graph minors: A survey"), 
        file=paste(td, "D9", sep="/") )
  td.doc <- TextDocCol(DirSource(td), # tm パッケージ
             readerControl = list(reader = readPlain,
	     language = "en_US", load = TRUE))
  summary(td.doc)
  inspect(td.doc)


  str.dot<- stringdot(length = 4, lambda = 0.5, 
     type = "sequence",
  normalized = TRUE)      # kernlab パッケージ

  test.kern<- kernelMatrix(str.dot, td.doc)
  td.doc.specc<- specc(td.doc, centers = 2, 
     kernel = "stringdot")









* lsa パッケージの利用法 2007 08 18 [#t487cb21]

 lc <- Sys.getlocale("LC_CTYPE")# utf-8 以外の環境の場合
 Sys.setlocale("LC_CTYPE","C")


  library(lsa)

一時ファイルで実験する場合

 td<- tempfile()
 dir.create(td)
  write(c("Human", "machine", "interface", "for", "ABC",
    "computer", "applications"), 
      file = paste(td, "D1", sep="/") )
  write(c("A", "survey", "of", "user", "opinion", "of",
    "computer", "system", "response", "time"), 
      file = paste(td, "D2", sep="/") )
  write(c("The", "EPS", "user", "interface", "management", 
         "system"), file=paste(td, "D3", sep="/") )
  write(c("System", "and", "human", "system", "engineering",
    "testing", "of", "EPS"), 
      file = paste(td, "D4", sep="/") )
  write(c("Relation", "of", "user", "perceived", 
    "response", "time" ,"to", "error", "measurement"), 
    file = paste(td, "D5", sep="/") )
  write(c("The", "intersection", "graph", "of", 
    "paths", "in" ,"trees"), 
    file = paste(td, "D6", sep="/") )
  write(c("Graph", "minors", "IV:", "Widths", "of",  
   "trees",   "and", "well-quasi-ordering"), 
   file = paste(td, "D7", sep="/") )
  write( c("The", "generation", "of", "random,", 
    "binary,",    "ordered", "trees"),
    file = paste(td, "D8", sep="/") )
  write(c("Graph", "minors:", "A", "survey"), 
    file = paste(td, "D9", sep="/") )
  ####################


単純に文章ターム行列を作ってみる
  myMatrix<- textmatrix(td)

stopword をロード
  data(stopwords_en)

stopword と stemming を指定しての文書・ターム行列作成
  myMatrix<- textmatrix(td, stopwords = stopwords_en, 
     stemming = TRUE)
必要なら重みを付け
  # myMatrix = lw_logtf(myMatrix) * gw_idf(myMatrix)
生の検索語の設定
  myQuery<- query("user interface", rownames(myMatrix), 
     stemming = TRUE  )
  myMat.Que<-  cbind(myMatrix, myQuery)
  as.matrix(round(cosine(myMat.Que), dig = 2)[,10])

単純な特異値分解
  # myLSAraw<- lsa(myMatrix, dims = dimcalc_raw())
  # 復元
  # round(myLSAraw$tk %*% diag(myLSAraw$sk) %*% 
        t(myLSAraw$dk), digit = 2)


LSA を実行してみる dimcalc_share(0.4) は許容する特異値の数を指定

  myLSAspace<- lsa(myMatrix, dims = dimcalc_share(0.4))

  myLSAspace 
  # もとの文書行列では 0 の要素にも索引重みが計算されている
  round(myLSAspace$tk, digits= 2)



もとの文書ベクトルを 3 次元で近似する

  new3Doc<- t(myLSAspace$tk) %*% myMatrix
  #
  plot(new3Doc[1,], new3Doc[2,])

  library(rgl)

  rgl.open() 
  rgl.bg(color=c("white", "black"))

  rgl.spheres(new3Doc[1,],
              new3Doc[2,],
              new3Doc[3,],
              radius = 0.01,color = 1: ncol(new3Doc))

  rgl.bbox(color= "#112233", emission = 
     "#90ee90",specular = "#556677",
           shinines = 8, alpha = 0.8)

  rgl.texts(new3Doc[1,],
              new3Doc[2,],
              new3Doc[3,],
             rownames(myLSAspace$dk), 
             color =  1:ncol(new3Doc), cex = 1.2) 
             # , adj = "left" )

  # rgl.viewpoint
  # rgl.snapshot(file = "sla.png", fmt = "png")


  rgl.postscript("sla.eps", fmt="eps" ) 

  for (i in  seq(2,20,2)) {
         rgl.viewpoint(i,20)
         filename<- paste("lsa-",formatC(i, digits=2, 
         flag="0"),".eps",sep="")
         rgl.postscript(filename, fmt="eps" )
       }

  rgl.close()

3 次元に圧縮した文書行列による検索
この結果を使って検索

  # query("user interface",  rownames(myLSAspace$tk), 
      stemming = TRUE  )

  ## myQuery2<- query("user interface", rownames(myLSAspace$tk),
    stemming = TRUE  )
  ## myMat.Que2<-  cbind(myLSAspace$tk, myQuery2)
  ## cosine(myMat.Que2 ) #  USER INTERFACE 列との相関の程度で
  ## nrow( myQuery2 )
  ## ncol( myQuery2 )
  myQuery3<- query("user interface", 
      rownames(myLSAspace$tk), stemming = TRUE  )
  new3Query<- t(myLSAspace$tk) %*%  myQuery3
  myMat.Que3<-  cbind(new3Doc, new3Query)
  as.matrix(round(cosine(myMat.Que3), dig = 2)[,10])

  unlink(td, recursive=TRUE)
  Sys.setlocale("LC_CTYPE",lc)








* Rinternals.h を使った処理 2007 08 20 [#c3c46f3c]
結局,http://cran.md.tsukuba.ac.jp/doc/manuals/R-exts.pdf を熟読するに限るのだが,

[[このサイト:http://noplans.org/~1gac/d/blosxom.py/software/R/7.html]]
は分かり易い.以下も引用させていただきました.

   /*
   g++ -O2 `mecab-config --cflags` myfunc.c -o myfunc 
   `mecab-config --libs`
   -I/usr/local/lib64/R/include
   	*/
   #include<R.h>
   #include<Rinternals.h>
   SEXP myfunc(SEXP param, SEXP vecparam, SEXP aa)
   {
    SEXP ans;
    double a = REAL(param)[0];
    int len1 = length(param);
    int len2 = length(vecparam);
    int p1 = INTEGER(vecparam)[0];
    int p2 = INTEGER(vecparam)[1];
    char* str = CHAR(STRING_ELT(aa,0));
    Rprintf("%s\n",str);
    Rprintf("length of 1: %d\n",len1);
    Rprintf("length of 2: %d\n",len2);
    Rprintf("input param: %lf, %d, %d\n",a,p1,p2);
    PROTECT(ans = allocVector(INTSXP, p1*p2));
    for (int i = 0; i< p1*p2; i++)
      INTEGER(ans)[i] = i;
    UNPROTECT(1);
    return(ans);
  }


  [ishida@amd64 myRcode]$ R CMD SHLIB myfunc.c
  [ishida@amd64 myRcode]$ R
c プログラムテスト
  > dyn.load("myfunc.so")
  #
  > ret = .Call("myfunc",1.15,as.integer(c(2,3)),
     "hogeほげ")


* mecab の処理結果を R で取得する 2007 08 21 [#i291931b]



# C プログラムとして
  ishida/research/statistics/myRcode/mecab.c
  ファイルを作成

   #include<R.h>
   #include<Rdefines.h>
   #include<Rinternals.h>
   #include<mecab.h>
   #include<stdio.h>
   #define CHECK(eval) if (! eval) { \
      fprintf (stderr, "Exception:%s\n", mecab_strerror 
     (mecab)); \
      mecab_destroy(mecab); \
      return -1; }
   SEXP mecab(SEXP aa){ 
   SEXP parsed;
   const char* input = CHAR(STRING_ELT(aa,0)); 
   mecab_t *mecab;
   mecab_node_t *node;
   const char *result;
   int i;
   mecab = mecab_new2 (input);
   CHECK(mecab);
   result = mecab_sparse_tostr(mecab, input);
   CHECK(result);
   Rprintf ("INPUT: %s\n", input);
   Rprintf ("RESULT:\n%s", result);
   PROTECT(parsed = allocVector(STRSXP,1));
   SET_STRING_ELT(parsed, 0, mkChar(result));
   //PROTECT(parsed = mkString(result));
   UNPROTECT(1);
   mecab_destroy(mecab);
   return(parsed); 
   }


コンパイルは

   % R CMD SHLIB chartest.c -L/usr/local/lib/ -lmecab 
        -I/usr/local/include

#  R 側で

   dyn.load("research/statistics/myRcode/mecab.so")
 
   kekka<- .Call("mecab","すもももももももものうち")
   kekka2<- NULL
   kekka2<- unlist(strsplit(kekka, "\n"))
 
   reg<- NULL
   kekka3<- NULL
 
   for(i in 1 :length(kekka2)){
    reg<- regexpr("^(\\w+)\t(\\w+)", kekka2[i])
    kekka3<- c(kekka3, substring(kekka2[i], reg[1], 
      attributes(reg)[[1]]))
   }
 
   kekka3









* fligner.test 多群の等分散性を検定するノンパラメトリックな方法 2007 08 22 [#x93cf322]


[[Crawley The R Book:http://www.amazon.co.jp/R-Book-Michael-J-Crawley/dp/0470510242/]]  p.293 









* stringkernel の作成 2007 08 23 [#yd069bf1]


# string kernel の作成

   library(kernlab)
   # sample, Lodhi p.423
   test.str<- list("science is organized 
        knowledge","wisdom is organized life")
 
 
   str.kern.list<- NULL
   for(i in 1:6){
    str.dot<- stringdot(length = i, lambda = 0.5, 
      type = "sequence", normalized = TRUE)
    str.kern.list[i]<- list(kernelMatrix(str.dot, 
       test.str))
   }

## 日本語の方は 6 バイト扱いで計算している

   test.str.jp<- list("これと","これは")
 
   str.kern.list.jp<- NULL
   for(i in 1:6){
    str.dot<- stringdot(length = i, lambda = 0.5, type = 
       "sequence", 
                         normalized = TRUE)
    str.kern.list.jp[i]<- list(kernelMatrix(str.dot, 
       test.str.jp))
   }
 
   test.str<- list("car","cat")
 
   str.kern.list <- NULL
   for(i in 1:6){
    str.dot<- stringdot(length = i, lambda = 0.5, type = 
          "sequence",    normalized = TRUE)
    str.kern.list[i]<- list(kernelMatrix(str.dot, 
       test.str))
   }









* tick.mark 座標ラベルの設定 2007 08 25 [#yaf0a2ee]

[[Crawley The R Book:http://www.amazon.co.jp/R-Book-Michael-J-Crawley/dp/0470510242/]]  p.293  p. 146
  plot(0:10, 0:10, xlab = "", ylab = "", xaxt = "n", yaxt = "n")








* コントラスト再考 2007 08 29 [#k6ca4baa]


[[John Fox:http://socserv.mcmaster.ca/jfox/]] p.127 -- 153










* se in summary.lm 回帰係数の標準誤差 2007 08 28 [#v38c3d80]


[[Crawley The R Book:http://www.amazon.co.jp/R-Book-Michael-J-Crawley/dp/0470510242/]]   p.365








* 4 次元配列 2007 08 29 [#m1ac51cd]

 mosaicplot(Titanic[c("1st","2nd","3rd"),,"Adult",],
   main = "Survival on the Titanic", shade = T)







* 二元配置モデルでの summary.lm のパラメータの意味 2007 08 30 [#mde70ada]


John Verzani p.336 の構造モデルから判断すると,すべてベース (Intercept) との切片の差ということになる.

二元配置の分散分析	
  frogs3<- read.csv(
   "http://150.59.18.68/frogs3.csv", header = FALSE)
  frogs3 # header = FALSE で,列名はファイルに未設定と指示
なお列名が未定義の場合,自動的に V1, V2, V3 などの名前が付加される
二つの要因がある場合,それらをチルダ記号の右側に + 記号で指定する
  frogs3.aov<- aov(V1 ~ V2 + V3, data = frogs3)
  summary(frogs3.aov)
  summary.lm(frogs3.aov)
Intercept は V2 = 12H かつ V3 = 100ug の場合.繰り返し数 3
この標準偏差は sqrt(7.51/6).これは V2 V3 の自由度の積か
2行目の V224H は  sqrt(2 *7.51/9).9 は V2 の繰り返し数か

Intercept は V2 が 12H で V3 が 100 ug の場合
2行目 V224H は V2 が 24H の場合の Intercept(V2=12Hかつ V3=100ugの場合) との差


同じく,p.332 によれば共分散分析では,連続量はスロープを表す.

  regrowth<- read.table(
  "http://www.bio.ic.ac.uk/research/mjcraw/
    therbook/data/ipomopsis.txt", 
   header = T)
  ancova1<- lm(Fruit ~ Grazing * Root)
  summary(ancova1)
  anova(ancova1)



*  共分散分析での各パラメータの標準誤差の計算 [#z681a9dc]

[[Crawley The R Book:http://www.amazon.co.jp/R-Book-Michael-J-Crawley/dp/0470510242/]] [#ha05dc7e]
 p. 492 - 498

[[Faraway:http://www.amazon.co.jp/Extending-Linear-Model-Generalized-Nonparametric/dp/158488424X/]] (2006) よりデータを借用
  babyfood<- read.table(file = 
     "http://150.59.18.68/babyfood.txt")
  babyfood
# データから要因別に罹患比率を求めて分割表にする.xtabs() 関数を利用
  xtabs(disease/(disease+nondisease) ~ sex + food, babyfood)

# ロジスティック回帰分析を実行する
# 目的変数を 2 項分布とした一般化線形モデル glm() による
  model1<- glm(cbind(disease, nondisease) ~ 
             sex + food, family = binomial, data = babyfood)
# glm は一般化線形モデルを実行する関数.family は分布を指定する
  summary(model1)              # 要約を見る
  drop1(model1, test = "Chi")  # 各項は有意か
  exp(-.669)                # 母乳の効果を確認する
  model.matrix(model1)
  # Intercept  は Boy で Bottle
  # sexGirl    は Girl の場合の Intercept との差
  # foodBreast は Intercept (Boy Bottle) の場合に比べての差
  # foodSuppl  は Intercept (Boy Bottle) の場合に比べての差