Notices
Results 1 to 1 of 1

Thread: n-grams for language model ?

  1. #1 n-grams for language model ? 
    New Member
    Join Date
    Jul 2005
    Posts
    2
    Hello

    if I want to estimate unigrams for a closed vocabulary 'a' and 'b' (i.e. I want p(a) and p(b)) using the sentences of this corpus

    ab
    ba

    I will have p(a) = 0.5 and p(b) = 0.5

    Some authors claim that the sum of the probabilities of the sentences should give 1, i.e.

    p(ab) + p(ba) = p(a)*p(b) + p(b)*p(a) = 1

    But with my unigrams I have p(ab) + p(ba) = 0.5 * 0.5 + 0.5 * 0.5 = 0.5

    The sum will be lower if we deal with an open vocabulary where some probability mass is given to the unseen words.

    The question is what's wrong with the constraint of the sentences probabilities sum

    Thanks


    Reply With Quote  
     

  2.  
     

Bookmarks
Bookmarks
Posting Permissions
  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •