Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | Removed aspell session editing | Kelly Rauchenberger | 2016-02-28 | 1 | -4/+0 |
| | | | | This wasn't really necessary since it was completely automated anyway, and it caused crashes for reasons that I haven't looked into with some bad corpuses. | ||||
* | Added yaml-cpp as a vendor submodule | Kelly Rauchenberger | 2016-02-28 | 3 | -5/+11 |
| | |||||
* | Reverted to an older kgram cut rate | Kelly Rauchenberger | 2016-02-20 | 1 | -13/+9 |
| | |||||
* | Added percentage display to preprocessing stage | Kelly Rauchenberger | 2016-02-20 | 1 | -4/+52 |
| | |||||
* | Modified kgram cut rate. It's do or die. | Kelly Rauchenberger | 2016-02-17 | 1 | -10/+13 |
| | |||||
* | Attemped to fix line-endings for Windows | Kelly Rauchenberger | 2016-02-17 | 3 | -0/+20 |
| | |||||
* | Fixed issue when names.txt was not present | Kelly Rauchenberger | 2016-02-15 | 1 | -24/+13 |
| | | | | Also removed any code mentioning $noun$ because it turns out the current version of the canonical corpus doesn't even use it anymore. | ||||
* | Tweaked kgram cut rate some more (it never ends) | Kelly Rauchenberger | 2016-02-15 | 1 | -1/+1 |
| | |||||
* | Tweaked kgram cut rate AGAIN | Kelly Rauchenberger | 2016-02-14 | 1 | -1/+2 |
| | |||||
* | Fixed incorrect diversity of tokens containing the letters aemnou | Kelly Rauchenberger | 2016-02-14 | 1 | -1/+1 |
| | |||||
* | Tweaked kgram cut rate again | Kelly Rauchenberger | 2016-02-14 | 1 | -2/+2 |
| | |||||
* | Fixed problem wherein "$name$'s" was considered a form of "name's" | Kelly Rauchenberger | 2016-02-14 | 1 | -8/+6 |
| | |||||
* | Fixed issue where queries with both the wildcard token and a terminating ↵ | Kelly Rauchenberger | 2016-02-13 | 1 | -14/+5 |
| | | | | token would reset the prefix | ||||
* | Merge in changes to older kgram cutting strategy | Kelly Rauchenberger | 2016-02-09 | 1 | -9/+21 |
|\ | |||||
| * | Tweaked kgram cut rate again | Kelly Rauchenberger | 2016-02-09 | 1 | -4/+8 |
| | | |||||
* | | Tweaked the kgram cutting rate again | Kelly Rauchenberger | 2016-02-07 | 1 | -1/+1 |
| | | |||||
* | | Changed how kgram cutting works | Kelly Rauchenberger | 2016-02-06 | 1 | -17/+9 |
|/ | | | | Whereas cutting occurred randomly before, now a token will be cut from the search kgram whenever the previously generated token was guaranteed by its search kgram (that is, it was the only token that could follow that specific query). | ||||
* | Merge branch 'master' of http://github.com/hatkirby/rawr-ebooks | Kelly Rauchenberger | 2016-02-03 | 1 | -0/+1 |
|\ | |||||
| * | Added #include <cstring> to kgramstats | Kelly Rauchenberger | 2016-02-03 | 1 | -0/+1 |
| | | |||||
* | | Added some more emoticons | Kelly Rauchenberger | 2016-02-03 | 1 | -1/+5 |
| | | |||||
* | | Declared old-style $name$ and $noun$ canonical | Kelly Rauchenberger | 2016-02-03 | 1 | -0/+6 |
|/ | | | | Without this, they get mixed in by the spell checker with "name" and "noun." | ||||
* | Token generator now uses aspell to link different spellings of a word | Kelly Rauchenberger | 2016-02-03 | 2 | -4/+58 |
| | | | | This is the grand scheme for the multi-formed word design. | ||||
* | Terminator characters in the middle of tokens are no longer stripped | Kelly Rauchenberger | 2016-02-03 | 2 | -13/+25 |
| | | | | Emoticon checking is also now case sensitive, and a few more emoticons were added to the list. | ||||
* | Fixed issue where closing opened delimiters wouldn't pop them off the stack | Kelly Rauchenberger | 2016-02-01 | 1 | -0/+2 |
| | | | | This would cause a random quotation mark, for instance, to appear at the end of a tweet if a quote had been opened and closed naturally within the tweet. | ||||
* | Switched to pkg-config for finding libcurl | Kelly Rauchenberger | 2016-02-01 | 1 | -5/+5 |
| | |||||
* | Added emoji freevar | Kelly Rauchenberger | 2016-02-01 | 13 | -65/+1064 |
| | | | | Strings of emojis are tokenized separately from anything else, and added to an emoticon freevar, which is mixed in with regular emoticons like :P. This breaks old-style freevars like $name$ and $noun$ so some legacy support for compatibility is left in but eventually $name$ should be made into an actual new freevar. Emoji data is from gemoji (https://github.com/github/gemoji). | ||||
* | Rewrote how tokens are handled | Kelly Rauchenberger | 2016-01-29 | 8 | -266/+406 |
| | | | | | | A 'word' is now an object that contains a distribution of forms that word can take. For now, most word just contain one form, the canonical one. The only special use is currently hashtags. Malapropisms have been disabled because of compatibility issues and because an upcoming feature is planned to replace it. | ||||
* | rawr-ebooks no longer generates Twitter access tokens and requires one in ↵ | Kelly Rauchenberger | 2016-01-27 | 1 | -118/+13 |
| | | | | the config file | ||||
* | You can now make rawr-gen even if you don't have curl or yaml-cpp | Kelly Rauchenberger | 2016-01-27 | 1 | -6/+10 |
| | |||||
* | hashtags are now randomized | Kelly Rauchenberger | 2016-01-25 | 2 | -37/+102 |
| | |||||
* | switched to cmake | Kelly Rauchenberger | 2016-01-24 | 7 | -382/+21 |
| | |||||
* | Merge branch 'master' of http://github.com/hatkirby/rawr-ebooks | Kelly Rauchenberger | 2016-01-05 | 1 | -1/+1 |
|\ | |||||
| * | Merge branch 'master' of https://github.com/hatkirby/rawr-ebooks | Kelly Rauchenberger | 2016-01-04 | 5 | -502/+439 |
| |\ | | | | | | | | | | | | | Conflicts: malaprop.cpp | ||||
| * \ | Merge branch 'master' of https://github.com/hatkirby/rawr-ebooks | Kelly Rauchenberger | 2015-12-30 | 3 | -45/+84 |
| |\ \ | |||||
| * | | | ???? | Kelly Rauchenberger | 2015-11-26 | 1 | -0/+6 |
| | | | | |||||
* | | | | Did you know you can put comments in front of ascii art ↵ | Kelly Rauchenberger | 2016-01-05 | 1 | -0/+34 |
| |_|/ |/| | | | | | | | | (https://twitter.com/rawr_ebooks/status/684376473369706498) | ||||
* | | | Rewrote quite a bit of kgramstats | Kelly Rauchenberger | 2016-01-04 | 5 | -501/+444 |
| |/ |/| | | | | | | | | | The algorithm still treats most tokens literally, but now groups together tokens that terminate a clause somehow (so, contain .?!,), without distinguishing between the different terminating characters. For each word that can terminate a sentence, the algorithm creates a histogram of the terminating characters and number of occurrences of those characters for that word (number of occurrences is to allow things like um???? and um,,,,, to still be folded down into um.). Then, when the terminating version of that token is invoked, a random terminating string is added to that token based on the histogram for that word (again, to allow things like the desu-ly use of multiple commas to end clauses). The algorithm now also has a slightly advanced kgram structure; a special "sentence wildcard" kgram value is set aside from normal strings of tokens that can match any terminating token. This kgram value is never printed (it is only ever present in the query kgrams and cannot actually be present in the histograms (it is of a different datatype)) and is used at the beginning of sentence generation to make sure that the first couple of words generated actually form the beginning of a sentence instead of picking up somewhere in the middle of a sentence. It is also used to reset sentence generation in the rare occasion that the end of the corpus is reached. | ||||
* | | guess what! the algorithm | Kelly Rauchenberger | 2015-12-30 | 3 | -45/+84 |
|/ | | | | | | | this time it's a literal algorithm again not canonizing away punctuation newlines are actually considered new sentences now we look for the end of a sentence and then start after that | ||||
* | You guessed it,,, twerked the algo | Kelly Rauchenberger | 2015-11-23 | 1 | -44/+41 |
| | |||||
* | std::set seems to have some problem with inserting an empty string. Warrants ↵ | Kelly Rauchenberger | 2015-11-23 | 1 | -2/+7 |
| | | | | further investigation. | ||||
* | Fixed std namespace references in ebooks.cpp | Kelly Rauchenberger | 2015-11-23 | 1 | -10/+10 |
| | |||||
* | Added malapropisms | Kelly Rauchenberger | 2015-11-22 | 9 | -117/+293 |
| | |||||
* | I may have made things better. I may have made things worse. | Kelly Rauchenberger | 2015-11-22 | 3 | -10/+22 |
| | |||||
* | Added some newline recognition | Kelly Rauchenberger | 2015-07-24 | 1 | -31/+55 |
| | |||||
* | Took into account question marks and exclamation marks | Kelly Rauchenberger | 2015-07-19 | 1 | -2/+2 |
| | |||||
* | Stopped using C++11 because yamlcpp didn't like it | Kelly Rauchenberger | 2015-07-19 | 2 | -5/+7 |
| | |||||
* | Kerjiggered the algorithms | Kelly Rauchenberger | 2015-07-19 | 3 | -22/+173 |
| | |||||
* | Modified kgram shortening rate | Kelly Rauchenberger | 2014-04-22 | 1 | -1/+1 |
| | |||||
* | Fixed typo with respect to reading delay time from config.yml | Feffernoose | 2013-10-08 | 1 | -1/+1 |
| | |||||
* | Added user-configurble delay between tweets in rawr-ebooks | Feffernoose | 2013-10-08 | 3 | -3/+8 |
| | | | | Also changed default delay from 15 minutes to 1 hour |