verbly - Natural language generation library

	Commit message (Collapse)	Author	Age	Files	Lines
*	Added word frequency information	Star Rauchenberger	2023-02-03	1	-0/+4
\|
*	Generator now splits ImageNet list into per-notion files	Star Rauchenberger	2022-12-09	1	-1/+4
\|
*	Added a bunch of stuff for making LINGO puzzles	Star Rauchenberger	2022-12-08	1	-1/+10
\|
*	De-duped pronunciations in generated database hkutil	Star Rauchenberger	2022-11-30	1	-0/+1
\| \| \| \| \| \|	Identical pronunciations will now share an idea and be re-used by multiple forms. This has a negligible effect on database size, but it's useful for writing queries looking for words with the exact same pronunciations. This constitutes a minor database update, which we will call d1.2.
*	Removed unnecessary ROWIDs from database schema	Kelly Rauchenberger	2018-09-26	1	-1/+1
\| \| \| \| \| \| \| \|	The generator also now sorts and uniq's the WordNet files for antonymy, classification, and pertainymy/mannernymy, because those files contained duplicate rows, and the join tables without ROWIDs now enforce a uniqueness constraint. This constitutes a minor database update -- the new database is compatible with d1.0, but is ~12MB smaller. refs #6
*	Migrated generator to hkutil	Kelly Rauchenberger	2018-03-31	1	-1/+1
\|
*	Started migrating to hkutil (does not build)	Kelly Rauchenberger	2018-03-30	1	-1/+1
\|
*	Created database versioning system d1.0	Kelly Rauchenberger	2017-11-08	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \|	Also added an ANALYZE statement to the end of the datafile generation process. This generates information that allows sqlite to sometimes come up with a better query plan, and in many cases can significant speed up queries. This constitutes a minor database update, but because this is the first version that uses the database versioning system, older versions are essentially incompatible. refs #2
*	Flattened selrestrs	Kelly Rauchenberger	2017-02-05	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now, selrestrs are, instead of logically being a tree of positive/negative restrictions that are ANDed/ORed together, they are a flat set of positive restrictions that are ORed together. They are stored as strings in a table called selrestrs, just like synrestrs, which makes them a lot more queryable now as well. This change required some changes to the VerbNet data, because we needed to consolidate any ANDed clauses into single selrestrs, as well as convert any negative selrestrs into positive ones. The changes made are detailed on the wiki. Preposition choices are now encoded as comma-separated lists instead of using JSON. This change, along with the selrestrs one, allows us to remove verbly's dependency on nlohmann::json.
*	Restructured verb frame schema to be more queryable	Kelly Rauchenberger	2017-01-28	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Groups are much less significant now, and they no longer have a database table, nor are they considered a top level object anymore. Instead of containing their own role data, that data is folded into the frames so that it's easier to query; as a result, each group has its own copy of the frames that it contains. Additionally, parts are considered top level objects now, and you can query for frames based on attributes of their indexed parts. Synrestrs are also contained in their own table now, so that parts can be filtered against their synrestrs; they are however not considered top level objects. Created a new type of field, the "join where" or "condition join" field, which is a normal join field that has a built in condition on a specified field. This is used to allow creating multiple distinct join fields from one object to another. This is required for the lemma::form and frame::part joins, because filters for forms of separate inflections should not be coalesced; similarly, filters on differently indexed frame parts should not be coalesced. Queries can now be ordered, ascending or descending, by a field, in addition to randomly as before. This is necessary for accessing the parts of a verb frame in the correct order, but may be useful to an end user as well. Fixed a bug with statement generation in that condition groups were not being surrounded in parentheses, which made mixing OR groups and AND groups generate inaccurate statements. This has been fixed; additionally, parentheses are not placed around the top level condition, and nested condition groups with the same logic type are coalesced, to make query strings as easy to read as possible. Also simplified the form::lemma field; it no longer conditions on the inflection of the form like the lemma::form field does. Also added a debug flag to statement::getQueryString that makes it return a query string with all of the bindings filled in, for debug use only.
*	Whitespace changes	Kelly Rauchenberger	2017-01-24	1	-54/+54
\|
*	Moved some generator classes into the main namespace	Kelly Rauchenberger	2017-01-21	1	-3/+4
\|
*	Started structural rewrite	Kelly Rauchenberger	2017-01-16	1	-0/+151
	The new object structure was designed to build on the existing WordNet structure, while also adding in all of the data that we get from other sources. More information about this can be found on the project wiki. The generator has already been completely rewritten to generate a datafile that uses the new structure. In addition, a number of indexes are created, which does double the size of the datafile, but also allows for much faster lookups. Finally, the new generator is written modularly and is a lot more readable than the old one. The verbly interface to the new object structure has mostly been completed, but has not been tested fully. There is a completely new search API which utilizes a lot of operator overloading; documentation on how to use it should go up at some point. Token processing and verb frames are currently unimplemented. Source for these have been left in the repository for now.