diff options
author | Kelly Rauchenberger <fefferburbia@gmail.com> | 2016-02-01 09:30:04 -0500 |
---|---|---|
committer | Kelly Rauchenberger <fefferburbia@gmail.com> | 2016-02-01 09:30:04 -0500 |
commit | 617155fe562652c859a380d85cc5710783d79448 (patch) | |
tree | f5eee89b0fa4b3c9dfe7187ca78916a71b59045e /kgramstats.h | |
parent | b316e309559d7176af6cf0bb7dcd6dbaa83c01cd (diff) | |
download | rawr-ebooks-617155fe562652c859a380d85cc5710783d79448.tar.gz rawr-ebooks-617155fe562652c859a380d85cc5710783d79448.tar.bz2 rawr-ebooks-617155fe562652c859a380d85cc5710783d79448.zip |
Added emoji freevar
Strings of emojis are tokenized separately from anything else, and added to an emoticon freevar, which is mixed in with regular emoticons like :P. This breaks old-style freevars like $name$ and $noun$ so some legacy support for compatibility is left in but eventually $name$ should be made into an actual new freevar. Emoji data is from gemoji (https://github.com/github/gemoji).
Diffstat (limited to 'kgramstats.h')
-rw-r--r-- | kgramstats.h | 5 |
1 files changed, 4 insertions, 1 deletions
diff --git a/kgramstats.h b/kgramstats.h index a97d7bf..4acde65 100644 --- a/kgramstats.h +++ b/kgramstats.h | |||
@@ -112,8 +112,11 @@ private: | |||
112 | 112 | ||
113 | int maxK; | 113 | int maxK; |
114 | std::map<kgram, std::map<int, token_data> > stats; | 114 | std::map<kgram, std::map<int, token_data> > stats; |
115 | word hashtags {"#hashtag"}; | 115 | |
116 | // Words | ||
116 | std::map<std::string, word> words; | 117 | std::map<std::string, word> words; |
118 | word hashtags {"#hashtag"}; | ||
119 | word emoticons {"👌"}; | ||
117 | }; | 120 | }; |
118 | 121 | ||
119 | void printKgram(kgram k); | 122 | void printKgram(kgram k); |