nlp - how to treat with <s> and </s> in calculating unigram LM? -

- July 15, 2012

i beginner in nlp , i'm confused how treat <s> , </s> symbols calculate counts unigram model? should count them or ignore?

if understand correctly <s> , </s> mean special (fake) unigrams first , last unigrams (actually, pre-first , after-last) each text, there no need in them unigrams, because string contains these unigrams , provide no additional information.

such special unigrams can useful in case of high-order n-grams: example, allows extract 1-word string hello 2 bigrams: <s> hello , hello </s> or 3 trigrams: <s0> <s1> hello, <s1> hello </s1>,hello </s1> </s0>.

Search This Blog

Shefl

nlp - how to treat with <s> and </s> in calculating unigram LM? -

Comments

Post a Comment

Popular posts from this blog

c++ - No viable overloaded operator for references a map -

java - UML - How would you draw a try catch in a sequence diagram? -

c++ - Gamma correction doesn't look properly corrected, is this linear? -