| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
All database access goes through hatkirby::database now.
verbly::token, verbly::statement::condition, and verbly::part have been converted to use mpark::variant now. verbly::binding has been deleted, and replaced with a mpark::variant typedef in statement.h. This means that the only remaining tagged union class is verbly::generator::part.
refs #5
|
|
|
|
|
|
|
|
| |
The generator also now sorts and uniq's the WordNet files for antonymy, classification, and pertainymy/mannernymy, because those files contained duplicate rows, and the join tables without ROWIDs now enforce a uniqueness constraint.
This constitutes a minor database update -- the new database is compatible with d1.0, but is ~12MB smaller.
refs #6
|
| |
|
| |
|
| |
|
|
|
|
| |
refs #1
|
|
|
|
|
|
|
|
|
|
|
| |
Also added an ANALYZE statement to the end of the datafile generation
process. This generates information that allows sqlite to sometimes come
up with a better query plan, and in many cases can significant speed up
queries. This constitutes a minor database update, but because this is
the first version that uses the database versioning system, older
versions are essentially incompatible.
refs #2
|
|
|
|
| |
This token wraps the inner token in two provided delimiters.
|
| |
|
|
|
|
|
| |
Word tokens and literal tokens that contained more than one word would
only capitalize the first word; this has been fixed.
|
|
|
|
| |
This commit contains a database update.
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
Note that this is not a great implementation; the filter generated is
mergable with unrelated filters and may cause results that are
misleading.
|
| |
|
| |
|
| |
|
|
|
|
|
| |
The bool conversion operator was unfortunately Very Confusing so I've
just renamed the methods to isValid.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This involved adding a new type of filter; one that compares (currently
only equality and inequality) a field with another field located in an
enclosing join context.
In the process, it was discovered that simplifying the lemma::forms join
field earlier actually made some queries return inaccurate results
because the inflection of the form was being ignored and anything in the
lemma would be used because of the inner join. Because the existing
condition join did not allow for the condition field to be on the from
side of the join, two things were done: a condition version of
joinThrough was made, and lemma was finally eliminated as a top-level
object, replaced instead with a condition join between word and form
through lemmas_forms.
Queries are also now grouped by the first select field (assumed to be
the primary ID) of the top table, in order to eliminate duplicates
created by inner joins, so that there is a uniform distribution between
results for random queries.
Created a database index on pronunciations(rhyme) which decreases query
time for rhyming filters. The new database version is
backwards-compatible because no data or structure changed.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now, selrestrs are, instead of logically being a tree of
positive/negative restrictions that are ANDed/ORed together, they are a
flat set of positive restrictions that are ORed together. They are
stored as strings in a table called selrestrs, just like synrestrs,
which makes them a lot more queryable now as well. This change required
some changes to the VerbNet data, because we needed to consolidate any
ANDed clauses into single selrestrs, as well as convert any negative
selrestrs into positive ones. The changes made are detailed on the wiki.
Preposition choices are now encoded as comma-separated lists instead of
using JSON. This change, along with the selrestrs one, allows us to
remove verbly's dependency on nlohmann::json.
|
|
|
|
| |
This was not a problem with clang but it caused compilation errors with gcc.
|
|
|
|
| |
Using the ternary operator appeared to cause a reference to local address bug with field::getConditionField. Replacing the ternary operator with an if statement fixes the problem. This problem did not occur with clang.
|
|
|
|
|
|
|
|
| |
Previously, a simple negative join directly to an object rather than
through another table would error because the statement generator would
attempt to instantiate a CTE on the field's through table, which is
undefined. Now, the proper table is used regardless of whether a through
table is defined.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Groups are much less significant now, and they no longer have a database
table, nor are they considered a top level object anymore. Instead of
containing their own role data, that data is folded into the frames so
that it's easier to query; as a result, each group has its own copy of
the frames that it contains. Additionally, parts are considered top
level objects now, and you can query for frames based on attributes of
their indexed parts. Synrestrs are also contained in their own table
now, so that parts can be filtered against their synrestrs; they are
however not considered top level objects.
Created a new type of field, the "join where" or "condition join" field,
which is a normal join field that has a built in condition on a
specified field. This is used to allow creating multiple distinct join
fields from one object to another. This is required for the lemma::form
and frame::part joins, because filters for forms of separate inflections
should not be coalesced; similarly, filters on differently indexed frame
parts should not be coalesced.
Queries can now be ordered, ascending or descending, by a field, in
addition to randomly as before. This is necessary for accessing the
parts of a verb frame in the correct order, but may be useful to an end
user as well.
Fixed a bug with statement generation in that condition groups were not
being surrounded in parentheses, which made mixing OR groups and AND
groups generate inaccurate statements. This has been fixed;
additionally, parentheses are not placed around the top level condition,
and nested condition groups with the same logic type are coalesced, to
make query strings as easy to read as possible.
Also simplified the form::lemma field; it no longer conditions on the
inflection of the form like the lemma::form field does.
Also added a debug flag to statement::getQueryString that makes it
return a query string with all of the bindings filled in, for debug use
only.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
Previously, we did not merge grouped hierarchal joins; however, because
of the way statements are compiled, we do need to merge OR-d positive
hierarchal joins, and ANDed negative hierarchal joins.
Also made some whitespace changes.
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
| |
Before statement compilation, empty filters are removed from group
filters, and childless group filters become empty filters.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, negative join filters were folded in with positive joins by
AND/ORing them together and negating the negative joins. Checking for
the existence of something that doesn't match a condition is different
from checking for the non-existence of something that does match a
condition, so now normalization considers positive and negative join
filters to be distinct classes of filters and does not fold them
together.
Also made some whitespace changes.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Previously, we generated negative subqueries by integrating them into
the main statement normally, and then making the connecting join be a
LEFT JOIN instead of an INNER JOIN, and by adding a condition that the
join column be NULL. The problem with this is that if the top table of
the subquery joins against any other table (which join throughs always
do), then no rows will be returned. This was solved by putting the
subquery into a CTE and then LEFT JOINing as before with the CTE.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new object structure was designed to build on the existing WordNet
structure, while also adding in all of the data that we get from other sources.
More information about this can be found on the project wiki.
The generator has already been completely rewritten to generate a
datafile that uses the new structure. In addition, a number of indexes
are created, which does double the size of the datafile, but also allows
for much faster lookups. Finally, the new generator is written modularly
and is a lot more readable than the old one.
The verbly interface to the new object structure has mostly been
completed, but has not been tested fully. There is a completely new
search API which utilizes a lot of operator overloading; documentation
on how to use it should go up at some point.
Token processing and verb frames are currently unimplemented. Source for
these have been left in the repository for now.
|
|
|
|
| |
Also updated CMakeLists.txt such that including projects don't have to include sqlite3.
|