rawr-ebooks - Generates nonsense statistically similar to an input corpus

	Commit message (Collapse)	Author	Age	Files	Lines
*	Tweaked kgram cut rate again	Kelly Rauchenberger	2016-02-09	1	-4/+8
\|
*	Merge branch 'master' of http://github.com/hatkirby/rawr-ebooks	Kelly Rauchenberger	2016-02-03	1	-0/+1
\|\
\| *	Added #include <cstring> to kgramstats	Kelly Rauchenberger	2016-02-03	1	-0/+1
\| \|
* \|	Added some more emoticons	Kelly Rauchenberger	2016-02-03	1	-1/+5
\| \|
* \|	Declared old-style $name$ and $noun$ canonical	Kelly Rauchenberger	2016-02-03	1	-0/+6
\|/ \| \| \|	Without this, they get mixed in by the spell checker with "name" and "noun."
*	Token generator now uses aspell to link different spellings of a word	Kelly Rauchenberger	2016-02-03	2	-4/+58
\| \| \| \|	This is the grand scheme for the multi-formed word design.
*	Terminator characters in the middle of tokens are no longer stripped	Kelly Rauchenberger	2016-02-03	2	-13/+25
\| \| \| \|	Emoticon checking is also now case sensitive, and a few more emoticons were added to the list.
*	Fixed issue where closing opened delimiters wouldn't pop them off the stack	Kelly Rauchenberger	2016-02-01	1	-0/+2
\| \| \| \|	This would cause a random quotation mark, for instance, to appear at the end of a tweet if a quote had been opened and closed naturally within the tweet.
*	Switched to pkg-config for finding libcurl	Kelly Rauchenberger	2016-02-01	1	-5/+5
\|
*	Added emoji freevar	Kelly Rauchenberger	2016-02-01	13	-65/+1064
\| \| \| \|	Strings of emojis are tokenized separately from anything else, and added to an emoticon freevar, which is mixed in with regular emoticons like :P. This breaks old-style freevars like $name$ and $noun$ so some legacy support for compatibility is left in but eventually $name$ should be made into an actual new freevar. Emoji data is from gemoji (https://github.com/github/gemoji).
*	Rewrote how tokens are handled	Kelly Rauchenberger	2016-01-29	8	-266/+406
\| \| \| \| \| \|	A 'word' is now an object that contains a distribution of forms that word can take. For now, most word just contain one form, the canonical one. The only special use is currently hashtags. Malapropisms have been disabled because of compatibility issues and because an upcoming feature is planned to replace it.
*	rawr-ebooks no longer generates Twitter access tokens and requires one in ↵	Kelly Rauchenberger	2016-01-27	1	-118/+13
\| \| \| \|	the config file
*	You can now make rawr-gen even if you don't have curl or yaml-cpp	Kelly Rauchenberger	2016-01-27	1	-6/+10
\|
*	hashtags are now randomized	Kelly Rauchenberger	2016-01-25	2	-37/+102
\|
*	switched to cmake	Kelly Rauchenberger	2016-01-24	7	-382/+21
\|
*	Merge branch 'master' of http://github.com/hatkirby/rawr-ebooks	Kelly Rauchenberger	2016-01-05	1	-1/+1
\|\
\| *	Merge branch 'master' of https://github.com/hatkirby/rawr-ebooks	Kelly Rauchenberger	2016-01-04	5	-502/+439
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: malaprop.cpp
\| * \	Merge branch 'master' of https://github.com/hatkirby/rawr-ebooks	Kelly Rauchenberger	2015-12-30	3	-45/+84
\| \|\ \
\| * \| \|	????	Kelly Rauchenberger	2015-11-26	1	-0/+6
\| \| \| \|
* \| \| \|	Did you know you can put comments in front of ascii art ↵	Kelly Rauchenberger	2016-01-05	1	-0/+34
\| \|_\|/ \|/\| \| \| \| \| \| \| \|	(https://twitter.com/rawr_ebooks/status/684376473369706498)
* \| \|	Rewrote quite a bit of kgramstats	Kelly Rauchenberger	2016-01-04	5	-501/+444
\| \|/ \|/\| \| \| \| \| \| \| \| \|	The algorithm still treats most tokens literally, but now groups together tokens that terminate a clause somehow (so, contain .?!,), without distinguishing between the different terminating characters. For each word that can terminate a sentence, the algorithm creates a histogram of the terminating characters and number of occurrences of those characters for that word (number of occurrences is to allow things like um???? and um,,,,, to still be folded down into um.). Then, when the terminating version of that token is invoked, a random terminating string is added to that token based on the histogram for that word (again, to allow things like the desu-ly use of multiple commas to end clauses). The algorithm now also has a slightly advanced kgram structure; a special "sentence wildcard" kgram value is set aside from normal strings of tokens that can match any terminating token. This kgram value is never printed (it is only ever present in the query kgrams and cannot actually be present in the histograms (it is of a different datatype)) and is used at the beginning of sentence generation to make sure that the first couple of words generated actually form the beginning of a sentence instead of picking up somewhere in the middle of a sentence. It is also used to reset sentence generation in the rare occasion that the end of the corpus is reached.
* \|	guess what! the algorithm	Kelly Rauchenberger	2015-12-30	3	-45/+84
\|/ \| \| \| \| \| \|	this time it's a literal algorithm again not canonizing away punctuation newlines are actually considered new sentences now we look for the end of a sentence and then start after that
*	You guessed it,,, twerked the algo	Kelly Rauchenberger	2015-11-23	1	-44/+41
\|
*	std::set seems to have some problem with inserting an empty string. Warrants ↵	Kelly Rauchenberger	2015-11-23	1	-2/+7
\| \| \| \|	further investigation.
*	Fixed std namespace references in ebooks.cpp	Kelly Rauchenberger	2015-11-23	1	-10/+10
\|
*	Added malapropisms	Kelly Rauchenberger	2015-11-22	9	-117/+293
\|
*	I may have made things better. I may have made things worse.	Kelly Rauchenberger	2015-11-22	3	-10/+22
\|
*	Added some newline recognition	Kelly Rauchenberger	2015-07-24	1	-31/+55
\|
*	Took into account question marks and exclamation marks	Kelly Rauchenberger	2015-07-19	1	-2/+2
\|
*	Stopped using C++11 because yamlcpp didn't like it	Kelly Rauchenberger	2015-07-19	2	-5/+7
\|
*	Kerjiggered the algorithms	Kelly Rauchenberger	2015-07-19	3	-22/+173
\|
*	Modified kgram shortening rate	Kelly Rauchenberger	2014-04-22	1	-1/+1
\|
*	Fixed typo with respect to reading delay time from config.yml	Feffernoose	2013-10-08	1	-1/+1
\|
*	Added user-configurble delay between tweets in rawr-ebooks	Feffernoose	2013-10-08	3	-3/+8
\| \| \| \|	Also changed default delay from 15 minutes to 1 hour
*	Fixed a few minor compile errors in freevars	Feffernoose	2013-10-07	2	-1/+2
\|
*	Implemented freevars	Feffernoose	2013-10-07	5	-4/+82
\| \| \| \|	Arbitrary variable tokens can now be defined (though at this point only in the code itself) as a pair of a variable name and a filename pointing to a plain text file containing a newline-delimited list of elements. When a token of the form $name$ (where name is the name of a variable) is encountered, the output will include a random element from the appropriate list. The variables $name$ and $noun$ are hard-coded at this point, but the program will not crash if names.txt and nouns.txt do not exist and will instead just silently ignore the variables.
*	Removed yamlcpp dependency from rawr-gen	Feffernoose	2013-10-06	3	-15/+28
\| \| \| \|	rawr-gen now takes the input corpus as a command-line argument, so as to increase the ease-of-use. It also now shows a usage message if provided with a non-existent file or no argument.
*	Merge branch 'master' of http://github.com/hatkirby/rawr-ebooks	Feffernoose	2013-10-06	1	-2/+8
\|\
\| *	Merge branch 'master' of https://github.com/hatkirby/rawr-ebooks	Feffernoose	2013-10-06	1	-11/+11
\| \|\
\| * \|	Stripped empty tokens from corpus	Feffernoose	2013-10-06	1	-2/+8
\| \| \|
* \| \|	Split rawr-ebooks and rawr-gen	Feffernoose	2013-10-06	4	-7/+94
\| \|/ \|/\| \| \| \| \|	Also wrote README
* \|	Program no longer recalculates kgramstats repeatedly within each run	Feffernoose	2013-10-05	1	-11/+11
\|/
*	Rewrote weighted random number generator	Feffernoose	2013-10-05	2	-34/+39
\| \| \| \| \| \|	The previous method of picking which token was the next one was flawed in some mysterious way that ended up picking various words that occurred only once in the input corpus as the first word of the generated output (most notably, "hysterically," "Anarchy," "Yorkshire," and "impunity.").
*	Changed incidence of random kgram-trimming	Feffernoose	2013-10-04	1	-4/+10
\| \| \| \|	Also added better terminal output
*	Weighed token casing and presence of periods	Feffernoose	2013-10-01	2	-28/+76
\| \| \| \| \| \| \| \|	Tokens which differ only by casing or the presence of an ending period are now considered the same token. When tokens are generated, they are cased based on the prevalence of Upper/Title/Lower casing of the token in the input corpus, and similarly, a period is added to the end of the word based on how often the same token was ended with a period in the input corpus.
*	Wrote program	Feffernoose	2013-10-01	6	-1/+336
\|
*	Started automake stuff	Feffernoose	2013-09-30	4	-0/+38
\|
*	Initial commit	Kelly Rauchenberger	2013-09-30	3	-0/+356