data mining - r : Why is removeSparseTerms() not doing anything? -


why removesparseterms() not removing terms? words single occurrence (etc.) should removed.
(r v. 3.2)

> docs <- tm_map(docs, stemdocument) > dtm <- documenttermmatrix(docs) > freq <- colsums(as.matrix(dtm)) > ord <- order(freq) > freq[tail(ord)] 1 experi     can lucid dream 287   312   363   452   1018   2413 > freq[head(ord)] abbey abdomin   abdu abraham absent   abus 1       1       1       1       1       1 > dim(dtm) [1]   1 5265 > dtms <- removesparseterms(dtm, 0.1) > dim(dtms) [1]   1 5265 > dtms <- removesparseterms(dtm, 0.001) > dim(dtms) [1]   1 5265 > dtms <- removesparseterms(dtm, 0.9) > dim(dtms) [1]   1 5265 >  

(the corpus single document, text version of book.)

the reason have 1 document, sparseness doesn't change change threshold. run these lines , see effect:

data("crude") tdm <- termdocumentmatrix(crude) dtm <- documenttermmatrix(crude[1]) # pick first article (document, chapter) dim(dtm) (twenty <- removesparseterms(dtm, 0.2)) (forty <- removesparseterms(dtm, 0.4)) (sixty <- removesparseterms(dtm, 0.6)) 

Comments

Popular posts from this blog

c++ - No viable overloaded operator for references a map -

java - Custom OutputStreamAppender not run: LOGBACK: No context given for <MYAPPENDER> -

java - Cannot secure connection using TLS -