data mining - r : Why is removeSparseTerms() not doing anything? -
why removesparseterms() not removing terms? words single occurrence (etc.) should removed.
(r v. 3.2)
> docs <- tm_map(docs, stemdocument) > dtm <- documenttermmatrix(docs) > freq <- colsums(as.matrix(dtm)) > ord <- order(freq) > freq[tail(ord)] 1 experi can lucid dream 287 312 363 452 1018 2413 > freq[head(ord)] abbey abdomin abdu abraham absent abus 1 1 1 1 1 1 > dim(dtm) [1] 1 5265 > dtms <- removesparseterms(dtm, 0.1) > dim(dtms) [1] 1 5265 > dtms <- removesparseterms(dtm, 0.001) > dim(dtms) [1] 1 5265 > dtms <- removesparseterms(dtm, 0.9) > dim(dtms) [1] 1 5265 >
(the corpus single document, text version of book.)
the reason have 1 document, sparseness doesn't change change threshold. run these lines , see effect:
data("crude") tdm <- termdocumentmatrix(crude) dtm <- documenttermmatrix(crude[1]) # pick first article (document, chapter) dim(dtm) (twenty <- removesparseterms(dtm, 0.2)) (forty <- removesparseterms(dtm, 0.4)) (sixty <- removesparseterms(dtm, 0.6))
Comments
Post a Comment