R_頻度区間調整のバックアップ(No.1)

バックアップ一覧
差分を表示
現在との差分を表示
ソースを表示
R_頻度区間調整へ行く。
- 1 (2007-10-08 (月) 14:55:44)

例えば

writers <- read.csv(file = "corpusWork2007/test.csv")
# 上の中身は
  Writer,Text,Length
  akutagawa,rashom,151
# の2行だけだとして
 text0 <- as.character(writers$Text[1])
 buntyo <- read.csv(file = 
    paste("CAB/",text0,".csv",sep = ""), header = F)
 # 文を先頭から句数で数えたファイルを読み込む

 # そのファイルから句数の頻度表を作成する
bun.orig.df <- data.frame(table(buntyo))

###

### 高頻度では、データ区間が飛び飛びになっているので、これを補正する

# y <- min(buntyo):max(buntyo) y <- 0:max(buntyo) bun.df <- data.frame(cate = y, freq = c(rep(0, length(y)))) z <- 0 bun.df[1,] = c(0,0) for(i in 1:nrow(bun.df)){

 if(bun.orig.df[i - z,]$buntyo == bun.df[i,]$cate){
   bun.df[i,]$freq <- bun.orig.df[bun.orig.df$buntyo == bun.df[i,]$cate, ]$Freq}
 else{
   z <- z + 1;
   next;
 }

}

次のような結果が得られる．

Modified

Original

1	0
2	1	1
3	4	4
4	3	3
5	8	8
6	6	6
7	15	15
8	12	12
9	19	19
10	17	17
11	7	7
12	12	12
13	5	5
14	9	9
15	8	8
16	8	8
17	2	2
18	4	4
19	4	4
20	0
21	1	1
22	1	1
23	0
24	0
25	2	2
26	1	1
27	0
28	2	2

1	0
2	1	1
3	4	4
4	3	3
5	8	8
6	6	6
7	15	15
8	12	12
9	19	19
10	17	17
11	7	7
12	12	12
13	5	5
14	9	9
15	8	8
16	8	8
17	2	2
18	4	4
19	4	4
20	0
21	1	1
22	1	1
23	0
24	0
25	2	2
26	1	1
27	0
28	2	2

1	0
2	1	1
3	4	4
4	3	3
5	8	8
6	6	6
7	15	15
8	12	12
9	19	19
10	17	17
11	7	7
12	12	12
13	5	5
14	9	9
15	8	8
16	8	8
17	2	2
18	4	4
19	4	4
20	0
21	1	1
22	1	1
23	0
24	0
25	2	2
26	1	1
27	0
28	2	2

アールメカブ

R_頻度区間調整 のバックアップ(No.1)

R_頻度区間調整のバックアップ(No.1)

1	0
2	1	1
3	4	4
4	3	3
5	8	8
6	6	6
7	15	15
8	12	12
9	19	19
10	17	17
11	7	7
12	12	12
13	5	5
14	9	9
15	8	8
16	8	8
17	2	2
18	4	4
19	4	4
20	0
21	1	1
22	1	1
23	0
24	0
25	2	2
26	1	1
27	0
28	2	2