R_データの読み込み
Wiki 作成の練習をかねて...
this | is | a | data |
171 | 62.6 | 23 | 1 |
158 | 49.8 | 25 | 0 |
163 | 68.7 | 42 | 1 |
178 | 75.4 | 33 | 1 |
163 | 55.7 | 28 | 0 |
というデータがあるとして,
> z <- scan("2data.txt",skip=1) Read 20 items > z [1] 171.0 62.6 23.0 1.0 158.0 49.8 25.0 0.0 163.0 68.7 42.0 1.0 [13] 178.0 75.4 33.0 1.0 163.0 55.7 28.0 0.0
一行にまとめられる.
> z <- scan("2data.txt",list(height="",weight="",sex=""), skip=1) Read 7 records > z $height [1] "171" "1" "25" "68.7" "178" "1" "28" $weight [1] "62.6" "158" "0" "42" "75.4" "163" "0" $sex [1] "23" "49.8" "163" "1" "33" "55.7" ""
数値も文字列として読み込んでしまう.
> z <- scan("2data.txt",list(height=0,weight=0,sex=""), skip=1) Read 7 records > z $height [1] 171.0 1.0 25.0 68.7 178.0 1.0 28.0 $weight [1] 62.6 158.0 0.0 42.0 75.4 163.0 0.0 $sex [1] "23" "49.8" "163" "1" "33" "55.7" ""
身長と体重を数値データとして読みこんだ. オプションは 1 でも良いらしい.
ここで次のようなデータセットを考えてみる.
this | is | a | data |
171 | 62.6 | 23 | M |
158 | 49.8 | 25 | F |
163 | 68.7 | 42 | M |
178 | 75.4 | 33 | M |
163 | 55.7 | 28 | M |
> z <- scan("2data.txt",list(height=0,weight=0,age=0,sex=""), skip=1) Read 5 records > z $height [1] 171 158 163 178 163 $weight [1] 62.6 49.8 68.7 75.4 55.7 $age [1] 23 25 42 33 28 $sex [1] "M" "F" "M" "F" "M"
最後のsexが文字データとして読み込まれている. これを今後の処理の都合上, 因子オブジェクトに変更する.
> z <- scan("2data.txt",list(height=0,weight=0,age=0,sex=""), skip=1) Read 5 records > z $height [1] 171 158 163 178 163 $weight [1] 62.6 49.8 68.7 75.4 55.7 $age [1] 23 25 42 33 28 $sex [1] "M" "F" "M" "F" "M" > z$sex <- factor(z$sex) > z $height [1] 171 158 163 178 163 $weight [1] 62.6 49.8 68.7 75.4 55.7 $age [1] 23 25 42 33 28 $sex [1] M F M F M Levels: F M
ただし,この段階だとデータフレームとして不完全.
> summary(z) Length Class Mode height 5 -none- numeric weight 5 -none- numeric age 5 -none- numeric sex 5 factor numeric > z[[1]] [1] 171 158 163 178 163 > z[1,] Error in z[1, ] : incorrect number of dimensions > z[,1] Error in z[, 1] : incorrect number of dimensions > z[[2]] [1] 62.6 49.8 68.7 75.4 55.7
そこで
> df <- data.frame(z$height, z$weight, z$age, z$sex) > df z.height z.weight z.age z.sex 1 171 62.6 23 M 2 158 49.8 25 F 3 163 68.7 42 M 4 178 75.4 33 F 5 163 55.7 28 M
なお [1,] で一行目全て, [,1] は一列目全てを意味する.
> df[1,] z.height z.weight z.age z.sex 1 171 62.6 23 M > df[,1] [1] 171 158 163 178 163
以下は適合度の検定
> summary(df) z.height z.weight z.age z.sex Min. :158.0 Min. :49.80 Min. :23.0 F:2 1st Qu.:163.0 1st Qu.:55.70 1st Qu.:25.0 M:3 Median :163.0 Median :62.60 Median :28.0 Mean :166.6 Mean :62.44 Mean :30.2 3rd Qu.:171.0 3rd Qu.:68.70 3rd Qu.:33.0 Max. :178.0 Max. :75.40 Max. :42.0
なお次のようなファイルの場合
lineNr, | Words |
00001, | 6 |
00002, | 15 |
00003, | 13 |
..., | ... |
00027, | 3 |
00028, | 23 |
> gesetz <- read.table("gesetz.csv",sep=",", header=TRUE)
と明示的に ヘッダがあることを指定する必要あり.
一行目にタイトルなど,データ以外があれば
> gesetz <- read.table("gesetz.csv",sep=",", header=TRUE, skip=1)
と明示的に ヘッダがあることを指定する必要あり.
Link: R_old_tips(1913d)
Rの備忘録(3950d)
Last-modified: 2007-09-25 (火) 09:57:42 (6226d)