| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
The logic in rawr::randomSentence with the cuts might be slightly different now but who even knows what's going on there.
|
|
|
|
| |
This means the bot is also now single-threaded.
|
| |
|
| |
|
| |
|
| |
|
| |
|
|\ |
|
| | |
|
|/ |
|
| |
|
|
|
|
| |
It also now does not reply to itself.
|
| |
|
| |
|
| |
|
|
|
|
| |
Now using libtwitter++ instead of twitcurl!
|
| |
|
| |
|
| |
|
|
|
|
| |
Strings of emojis are tokenized separately from anything else, and added to an emoticon freevar, which is mixed in with regular emoticons like :P. This breaks old-style freevars like $name$ and $noun$ so some legacy support for compatibility is left in but eventually $name$ should be made into an actual new freevar. Emoji data is from gemoji (https://github.com/github/gemoji).
|
|
|
|
|
|
| |
A 'word' is now an object that contains a distribution of forms that word can take. For now, most word just contain one form, the canonical one. The only special use is currently hashtags.
Malapropisms have been disabled because of compatibility issues and because an upcoming feature is planned to replace it.
|
|
|
|
| |
the config file
|
|
|
|
|
|
| |
The algorithm still treats most tokens literally, but now groups together tokens that terminate a clause somehow (so, contain .?!,), without distinguishing between the different terminating characters. For each word that can terminate a sentence, the algorithm creates a histogram of the terminating characters and number of occurrences of those characters for that word (number of occurrences is to allow things like um???? and um,,,,, to still be folded down into um.). Then, when the terminating version of that token is invoked, a random terminating string is added to that token based on the histogram for that word (again, to allow things like the desu-ly use of multiple commas to end clauses).
The algorithm now also has a slightly advanced kgram structure; a special "sentence wildcard" kgram value is set aside from normal strings of tokens that can match any terminating token. This kgram value is never printed (it is only ever present in the query kgrams and cannot actually be present in the histograms (it is of a different datatype)) and is used at the beginning of sentence generation to make sure that the first couple of words generated actually form the beginning of a sentence instead of picking up somewhere in the middle of a sentence. It is also used to reset sentence generation in the rare occasion that the end of the corpus is reached.
|
|
|
|
|
|
|
| |
this time it's a literal algorithm again
not canonizing away punctuation
newlines are actually considered new sentences now
we look for the end of a sentence and then start after that
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
Also changed default delay from 15 minutes to 1 hour
|
|
|
|
| |
Arbitrary variable tokens can now be defined (though at this point only in the code itself) as a pair of a variable name and a filename pointing to a plain text file containing a newline-delimited list of elements. When a token of the form $name$ (where name is the name of a variable) is encountered, the output will include a random element from the appropriate list. The variables $name$ and $noun$ are hard-coded at this point, but the program will not crash if names.txt and nouns.txt do not exist and will instead just silently ignore the variables.
|
|
Also wrote README
|