RCaBoChaMx {RCaBoCha}R Documentation

RCaBoChaMx

Description

creates a document-term matrix out of a file or all textfiles in a given directory.

Usage

RCaBoChaMx(directory, conj = 1, str2 = "",  pos = "DEFAULT",  minFreq = 1, weight = "no", mecabrc = "" , cabocharc = "")

Arguments

directory directory path or a filename (may include path).
conj conjugated word form = 0 or 1(default)
str2 a japaense word
pos specifies which parts of speech should be extracted; default being noun, adjective, verb
minFreq words of a document appearing less than minDocFreq within that document will be ignored.
weight Calculates a weighted document-term matrix with some options.
mecabrc to specify mecab resource file mecabrc
cabocharc to specify cabocha resource file cabocharc

Details

All textfiles in the specified directory are read in and a matrix is composed. Every cell of the matrix shows the actual frequency of each word.

Value

RCaBoChaMx the document-term matrix

Author(s)

Motohiro ISHIDA ishida.motohiro@gmail.com

References

石田基広『Rによるテキストマイニング入門』森北出版 2008


[Package RCaBoCha version 0.29 Index]