docMatrix {RMeCab}R Documentation

docMatrix

Description

creates a document-term matrix out of all textfiles in a given directory.

Usage

docMatrix( mydir, pos = "Default", minFreq = 1, weight ="no", kigo=0, co = 0)
docVector( filename, pos,  posN, minFreq, kigo)

Arguments

filename filename (may include path).
mydir the directory path to textfiles.
pos specifies which parts of speech should be extracted. Default being noun and adjective.
posN specifies length of parts of speech which should be extracted.
minFreq words of a document appearing less than minDocFreq within that document will be ignored.
weight Calculates a weighted document-term matrix with some options.
kigo if total must include number of symbols set kigo = 1. Default being 0
co retrieve co-ocurrence term matrix. default being = 0

Details

All textfiles in the specified directory are read in and a matrix is composed. Every cell of the matrix shows the actual frequency of each word.

docVector() is a supporting function that creates a document-term frequency list.

Value

docMatrix the document-term matrix

Author(s)

Motohiro ISHIDA ishida.motohiro@gmail.comt


[Package RMeCab version 0.84 Index]