RCaBoChaDF {RCaBoCha} | R Documentation |
creates a document-term matrix out of a column in a dataframe.
RCaBoChaDF( charVec = c("CaBoCha"), str2 = "", pos= "DEFAULT", minFreq = 1, weight = "no" , mecabrc = "" , cabocharc = "" )
charVec |
column of strings. |
str2 |
a japanese word |
pos |
specifies which parts of speech should be extracted. |
minFreq |
words of a document appearing less than minDocFreq within that document will be ignored. |
weight |
Calculates a weighted document-term matrix with some options. |
mecabrc |
to specify mecab resource file mecabrc |
cabocharc |
to specify cabocha resource file cabocharc |
column in the specified dataframe are read in and a matrix is composed. Every cell of the matrix shows the actual frequency of each word.
RCaBoChaDF |
the document-term matrix |
Motohiro ISHIDA ishida.motohiro@gmail.comt
石田基広『Rによるテキストマイニング入門』森北出版 2008