docMatrix {RMeCab}R Documentation

docMatrix

Description

creates a document-term matrix out of all textfiles in a given directory.

Usage

docMatrix( mydir, pos = "Default", minFreq = 1, weight ="no", kigo=0, co = 0)
docVector( filename, pos,  posN, minFreq, kigo , dic = "", mecabrc = "", etc = "")

Arguments

filename filename (may include path).
mydir the directory path to textfiles.
pos specifies which parts of speech should be extracted. Default being noun and adjective.
posN specifies length of parts of speech which should be extracted.
minFreq words of a document appearing less than minDocFreq within that document will be ignored.
weight Calculates a weighted document-term matrix with some options.
kigo if total must include number of symbols set kigo = 1. Default being 0
co retrieve co-ocurrence term matrix. default being = 0
dic to specify user dictionary, e.x. ishida.dic
mecabrc to specify mecab resource file
etc other options to mecab

Details

All textfiles in the specified directory are read in and a matrix is composed. Every cell of the matrix shows the actual frequency of each word.

docVector() is a supporting function that creates a document-term frequency list.

Value

docMatrix the document-term matrix

Author(s)

Motohiro ISHIDA ishida.motohiro@gmail.comt

References

石田基広『Rによるテキストマイニング入門』森北出版 2008


[Package RMeCab version 0.94 Index]