| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
| |
The previous method of picking which token was the next one was flawed in some mysterious way that ended up
picking various words that occurred only once in the input corpus as the first word of the generated output
(most notably, "hysterically," "Anarchy," "Yorkshire," and "impunity.").
|
|
|
|
| |
Also added better terminal output
|
|
|
|
|
|
|
|
| |
Tokens which differ only by casing or the presence of an ending period are
now considered the same token. When tokens are generated, they are cased
based on the prevalence of Upper/Title/Lower casing of the token in the
input corpus, and similarly, a period is added to the end of the word based
on how often the same token was ended with a period in the input corpus.
|
|
|