docMatrix2 {RMeCab}R Documentation

docMatrix2

Description

creates a document-term matrix out of a file or all textfiles in a given directory.

Usage

docMatrix2(directory, pos= "Default",  minFreq = 1, weight = "no", kigo
= 0, co = 0 , dic = "", mecabrc = "", etc = "" )

Arguments

directory

directory path or a filename (may include path).

pos

specifies which parts of speech should be extracted. Default being noun and adjective.

minFreq

words of a document appearing less than minDocFreq within that document will be ignored.

weight

Calculates a weighted document-term matrix with some options.

kigo

if total must include number of symbols, set sym = 1. Default being 0

co

retrieve co-ocurrence term matrix. default being = 0

dic

to specify user dictionary, e.x. ishida.dic

mecabrc

to specify mecab resource file

etc

other options to mecab

Details

All textfiles in the specified directory are read in and a matrix is composed. Every cell of the matrix shows the actual frequency of each word.

Value

docMatrix2

the document-term matrix

Author(s)

Motohiro ISHIDA ishida.motohiro@gmail.comt

References

石田基広『Rによるテキストマイニング入門』森北出版 2008


[Package RMeCab version 0.97 Index]