docMatrix {RMeCab}R Documentation

docMatrix

Description

creates a document-term matrix out of all textfiles in a given directory.

Usage

docMatrix( mydir, pos = "Default", minFreq = 1, weight ="no", kigo=0, co = 0)
docVector( filename, pos,  posN, minFreq, kigo , dic = "", mecabrc = "", etc = "")

Arguments

filename

filename (may include path).

mydir

the directory path to textfiles.

pos

specifies which parts of speech should be extracted. Default being noun and adjective.

posN

specifies length of parts of speech which should be extracted.

minFreq

words of a document appearing less than minDocFreq within that document will be ignored.

weight

Calculates a weighted document-term matrix with some options.

kigo

if total must include number of symbols set kigo = 1. Default being 0

co

retrieve co-ocurrence term matrix. default being = 0

dic

to specify user dictionary, e.x. ishida.dic

mecabrc

to specify mecab resource file

etc

other options to mecab

Details

All textfiles in the specified directory are read in and a matrix is composed. Every cell of the matrix shows the actual frequency of each word.

docVector() is a supporting function that creates a document-term frequency list.

Value

docMatrix

the document-term matrix

Author(s)

Motohiro ISHIDA ishida.motohiro@gmail.comt

References

石田基広『Rによるテキストマイニング入門』森北出版 2008


[Package RMeCab version 0.97 Index]