summary refs log tree commit diff stats
Commit message (Collapse)AuthorAgeFilesLines
* Added word::synonyms join field (BAD)Kelly Rauchenberger2017-02-162-0/+32
| | | | | | Note that this is not a great implementation; the filter generated is mergable with unrelated filters and may cause results that are misleading.
* Fixed weird filter normalization crashKelly Rauchenberger2017-02-161-13/+13
|
* Tweaked database indexesKelly Rauchenberger2017-02-131-3/+3
| | | | | | | | `rhymes_with` now also contains `prerhyme` so that rhyming joins can be convering. `notions_lemmas` and `lemmas_notions` have been created so as to faciliate "jumping" over `words` when it's only needed as a many-to-many through table. Because `notion_words` and `lemma_words` are prefixes of these new indexes, they have been removed.
* Expanded some indexesKelly Rauchenberger2017-02-111-34/+34
| | | | These modifications can make some queries run significantly faster.
* Fixed statement generation involving two CTEs for the same tableKelly Rauchenberger2017-02-111-2/+10
|
* Added negative filter conversions to objectsKelly Rauchenberger2017-02-105-40/+103
|
* Renamed object validity checksKelly Rauchenberger2017-02-107-7/+7
| | | | | The bool conversion operator was unfortunately Very Confusing so I've just renamed the methods to isValid.
* Made pronunciation::rhymes join dynamicKelly Rauchenberger2017-02-0624-365/+574
| | | | | | | | | | | | | | | | | | | | | | | | | This involved adding a new type of filter; one that compares (currently only equality and inequality) a field with another field located in an enclosing join context. In the process, it was discovered that simplifying the lemma::forms join field earlier actually made some queries return inaccurate results because the inflection of the form was being ignored and anything in the lemma would be used because of the inner join. Because the existing condition join did not allow for the condition field to be on the from side of the join, two things were done: a condition version of joinThrough was made, and lemma was finally eliminated as a top-level object, replaced instead with a condition join between word and form through lemmas_forms. Queries are also now grouped by the first select field (assumed to be the primary ID) of the top table, in order to eliminate duplicates created by inner joins, so that there is a uniform distribution between results for random queries. Created a database index on pronunciations(rhyme) which decreases query time for rhyming filters. The new database version is backwards-compatible because no data or structure changed.
* Fixed error with notion::words joinKelly Rauchenberger2017-02-051-1/+1
|
* Added some missing includesKelly Rauchenberger2017-02-052-0/+2
|
* Flattened selrestrsKelly Rauchenberger2017-02-0520-12732/+468
| | | | | | | | | | | | | | | Now, selrestrs are, instead of logically being a tree of positive/negative restrictions that are ANDed/ORed together, they are a flat set of positive restrictions that are ORed together. They are stored as strings in a table called selrestrs, just like synrestrs, which makes them a lot more queryable now as well. This change required some changes to the VerbNet data, because we needed to consolidate any ANDed clauses into single selrestrs, as well as convert any negative selrestrs into positive ones. The changes made are detailed on the wiki. Preposition choices are now encoded as comma-separated lists instead of using JSON. This change, along with the selrestrs one, allows us to remove verbly's dependency on nlohmann::json.
* Renamed object join fields to prevent conflicts with class namesKelly Rauchenberger2017-02-0315-52/+52
| | | | This was not a problem with clang but it caused compilation errors with gcc.
* Fixed reference to local address bug with gccKelly Rauchenberger2017-02-031-3/+6
| | | | Using the ternary operator appeared to cause a reference to local address bug with field::getConditionField. Replacing the ternary operator with an if statement fixes the problem. This problem did not occur with clang.
* Fixed statement generation re negative joins without a through tableKelly Rauchenberger2017-02-031-1/+1
| | | | | | | | Previously, a simple negative join directly to an object rather than through another table would error because the statement generator would attempt to instantiate a CTE on the field's through table, which is undefined. Now, the proper table is used regardless of whether a through table is defined.
* Added enum inequality matches to fieldKelly Rauchenberger2017-02-033-1/+25
|
* Restructured verb frame schema to be more queryableKelly Rauchenberger2017-01-2835-649/+885
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Groups are much less significant now, and they no longer have a database table, nor are they considered a top level object anymore. Instead of containing their own role data, that data is folded into the frames so that it's easier to query; as a result, each group has its own copy of the frames that it contains. Additionally, parts are considered top level objects now, and you can query for frames based on attributes of their indexed parts. Synrestrs are also contained in their own table now, so that parts can be filtered against their synrestrs; they are however not considered top level objects. Created a new type of field, the "join where" or "condition join" field, which is a normal join field that has a built in condition on a specified field. This is used to allow creating multiple distinct join fields from one object to another. This is required for the lemma::form and frame::part joins, because filters for forms of separate inflections should not be coalesced; similarly, filters on differently indexed frame parts should not be coalesced. Queries can now be ordered, ascending or descending, by a field, in addition to randomly as before. This is necessary for accessing the parts of a verb frame in the correct order, but may be useful to an end user as well. Fixed a bug with statement generation in that condition groups were not being surrounded in parentheses, which made mixing OR groups and AND groups generate inaccurate statements. This has been fixed; additionally, parentheses are not placed around the top level condition, and nested condition groups with the same logic type are coalesced, to make query strings as easy to read as possible. Also simplified the form::lemma field; it no longer conditions on the inflection of the form like the lemma::form field does. Also added a debug flag to statement::getQueryString that makes it return a query string with all of the bindings filled in, for debug use only.
* Removed some debug outputKelly Rauchenberger2017-01-243-4/+0
|
* Whitespace changesKelly Rauchenberger2017-01-2432-801/+801
|
* Added word::getGroupKelly Rauchenberger2017-01-242-3/+35
|
* Fixed behavior of normalizing grouped hierarchal joinsKelly Rauchenberger2017-01-242-57/+82
| | | | | | | | Previously, we did not merge grouped hierarchal joins; however, because of the way statements are compiled, we do need to merge OR-d positive hierarchal joins, and ANDed negative hierarchal joins. Also made some whitespace changes.
* Fixed notion::prepositionGroup fieldKelly Rauchenberger2017-01-231-13/+13
|
* Added ability to filter on existence of inflectionKelly Rauchenberger2017-01-234-3/+17
|
* Added form::startsWithVowelSound convenience methodKelly Rauchenberger2017-01-232-51/+77
|
* Rewrote tokensKelly Rauchenberger2017-01-234-650/+405
|
* Added verb frame parsingKelly Rauchenberger2017-01-239-260/+643
|
* Added filter compactingKelly Rauchenberger2017-01-233-42/+76
| | | | | Before statement compilation, empty filters are removed from group filters, and childless group filters become empty filters.
* Fixed normalization of negative join filtersKelly Rauchenberger2017-01-231-183/+192
| | | | | | | | | | | | Previously, negative join filters were folded in with positive joins by AND/ORing them together and negating the negative joins. Checking for the existence of something that doesn't match a condition is different from checking for the non-existence of something that does match a condition, so now normalization considers positive and negative join filters to be distinct classes of filters and does not fold them together. Also made some whitespace changes.
* Fixed nullity/non-nullity filters on join fieldsKelly Rauchenberger2017-01-231-2/+12
|
* Whitespace changesKelly Rauchenberger2017-01-231-19/+19
|
* Fixed generator ignoring multiple inflection variantsKelly Rauchenberger2017-01-221-257/+268
| | | | | | Previously, the generator would recognize at most one form per inflection per lemma; now, the generator adds all variants in AGID to the database.
* Removed underscores in two-word literal prepositions in verb framesKelly Rauchenberger2017-01-221-2/+11
|
* Fixed statement generation involving negative subqueriesKelly Rauchenberger2017-01-212-57/+234
| | | | | | | | | | Previously, we generated negative subqueries by integrating them into the main statement normally, and then making the connecting join be a LEFT JOIN instead of an INNER JOIN, and by adding a condition that the join column be NULL. The problem with this is that if the top table of the subquery joins against any other table (which join throughs always do), then no rows will be returned. This was solved by putting the subquery into a CTE and then LEFT JOINing as before with the CTE.
* Fixed statement generation involving nullity/non-nullityKelly Rauchenberger2017-01-211-1/+20
|
* Moved some generator classes into the main namespaceKelly Rauchenberger2017-01-2116-431/+479
|
* Fixed instances of prep "off of" in VerbNet dataKelly Rauchenberger2017-01-211-0/+13
|
* Started structural rewriteKelly Rauchenberger2017-01-1678-8696/+8971
| | | | | | | | | | | | | | | | | | | | The new object structure was designed to build on the existing WordNet structure, while also adding in all of the data that we get from other sources. More information about this can be found on the project wiki. The generator has already been completely rewritten to generate a datafile that uses the new structure. In addition, a number of indexes are created, which does double the size of the datafile, but also allows for much faster lookups. Finally, the new generator is written modularly and is a lot more readable than the old one. The verbly interface to the new object structure has mostly been completed, but has not been tested fully. There is a completely new search API which utilizes a lot of operator overloading; documentation on how to use it should go up at some point. Token processing and verb frames are currently unimplemented. Source for these have been left in the repository for now.
* Updated to nlohmann/json 2.0.9Kelly Rauchenberger2016-12-281-291/+1698
|
* Removed nlohmann/json submoduleKelly Rauchenberger2016-11-274-4/+10795
| | | | The submodule contained around 73MB of benchmarks and tests that are not necessary for inclusion in this project. Thus, the submodule has been removed, and the 2.0.7 release of nlohmann/json has been added to the repository.
* Added pronunciation syllable count and stress structureKelly Rauchenberger2016-05-3011-12/+330
| | | | Also updated CMakeLists.txt such that including projects don't have to include sqlite3.
* Added debug print method for token typeKelly Rauchenberger2016-05-162-0/+17
|
* Fixed token extra functionality in copyingKelly Rauchenberger2016-05-161-0/+2
|
* Added rhymes_with predicate based on rhymes rather than wordsKelly Rauchenberger2016-05-168-0/+32
|
* Implemented some accidentally unimplemented adjective_query predicatesKelly Rauchenberger2016-05-101-0/+21
|
* Merge branch 'master' of https://github.com/hatkirby/verblyKelly Rauchenberger2016-05-0218-366/+980
|\
| * Fixed problem with words containing certain charactersKelly Rauchenberger2016-04-181-1/+6
| | | | | | | | | | | | | | | | The generator previously had a problem wherein it would ignore WordNet lemmas containing certain non-alpha characters (hyphens, slashes, numbers, apostrophes). In addition to these words not being included in the generated datafile, it had the side effect of causing relationships involving the ignored words (e.g. hypernymy, synonymy, etc) to instead be related to the word with id 0, which did not exist. This rarely caused a failure with direct queries; but it caused hierarchal queries (most notably full hyponymy, which is where the error was noticed) to potentially permit far more lemmas than they should have because a very large number of words could be transitively reached through the sentinel word id 0. The generator has been fixed to not ignore the words containing special characters, which removed the word id 0 from most relationships and therefore fixed hierarchal queries. The only remaining word id 0s are as a synonym of "free-flying" (synset 301380571) and as an anti-mannernym of "aerially" (synset 400202718). This is because the WordNet data is malformed in the definitions of two words: "aerial" (synset 301380267) and "marine" (synset 301380721). The generator ignored those two lines, causing the described error, although the latter word being ignored did not cause any other errors. The bug was discovered when the Twitter bot difference (https://github.com/hatkirby/difference) generated a tweet (https://twitter.com/differencebot/status/722084219925700613) as a result of returning the noun "tearaway" in a full hyponym query of "artifact".
| * Fixed perfect rhymingKelly Rauchenberger2016-04-1715-75/+442
| | | | | | | | | | | | Rhyme detection now ensures that any rhymes it finds are perfect rhymes and not identical rhymes. Rhyme detection is also now a lot faster because additional information is stored in the datafile. Also fixed a bug in the query interface (and the generator) that could cause incorrect queries to be executed.
| * Added support for ImageNet and fixed bug with query interfaceKelly Rauchenberger2016-04-1513-309/+551
| | | | | | | | | | | | Datafile change: nouns now know how many images are associated with them on ImageNet, and also have their WordNet synset ID saved so that you can query for images of that noun via the ImageNet API. So far, verbly only exposes the ImageNet API URL, and doesn't actually interact with it itself. This may be changed in the future. The query interface had a huge issue in which multiple instances of the same condition would overwrite each other. This has been fixed.
* | Added "requires plural form" noun query predicateKelly Rauchenberger2016-05-022-0/+16
|/
* Added sqlite3 version restriction (for WITH clauses)Kelly Rauchenberger2016-03-291-2/+2
|
* Added prefix/suffix search, and word complexity search for nouns, ↵Kelly Rauchenberger2016-03-2710-12/+452
| | | | | | adjectives, and adverbs Word complexity refers to the number of words in a noun, adjective, or adverb.