docMatrix {RMeCab} | R Documentation |
creates a document-term matrix out of all textfiles in a given directory.
docMatrix( mydir, pos = "Default", minFreq = 1, weight ="no", kigo=0, co = 0) docVector( filename, pos, posN, minFreq, kigo)
filename |
filename (may include path). |
mydir |
the directory path to textfiles. |
pos |
specifies which parts of speech should be extracted. Default being noun and adjective. |
posN |
specifies length of parts of speech which should be extracted. |
minFreq |
words of a document appearing less than minDocFreq within that document will be ignored. |
weight |
Calculates a weighted document-term matrix with some options. |
kigo |
if total must include number of symbols set kigo = 1. Default being 0 |
co |
retrieve co-ocurrence term matrix. default being = 0 |
All textfiles in the specified directory are read in and a matrix is composed. Every cell of the matrix shows the actual frequency of each word.
docVector()
is a supporting function that creates a
document-term frequency list.
docMatrix |
the document-term matrix |
Motohiro ISHIDA ishida.motohiro@gmail.comt