From 13ace4d50e9a35090be4914775e30e76ffed393f Mon Sep 17 00:00:00 2001 From: Star Rauchenberger Date: Tue, 3 Oct 2023 22:17:22 +0000 Subject: Imported docs from Github --- docs/new_object_structure.md | 72 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 72 insertions(+) create mode 100644 docs/new_object_structure.md (limited to 'docs/new_object_structure.md') diff --git a/docs/new_object_structure.md b/docs/new_object_structure.md new file mode 100644 index 0000000..9e25615 --- /dev/null +++ b/docs/new_object_structure.md @@ -0,0 +1,72 @@ +# New object structure + +The rewrite of verbly uses a completely redesigned object structure that was designed to build off of the already-existing WordNet structure and add to it the data we are getting from other sources. + +## notion +Something that can be expressed with words. fields: part of speech, wnid (WordNet ID, optional). nouns also have images field (number of images ImageNet has for this notion). has many words. related to each other through hypernymy, meronymy, synonymy, etc. parts of speech are: +- noun {0} +- adjective {1} +- adverb {2} +- verb {3} +- preposition {4} + +relations are: +- hypernymy (noun/noun and verb/verb) +- instantiation (noun/noun) +- meronymy (noun/noun) +- variation (noun/adjective) +- similarity (adjective/adjective) [symmetric] +- entailment (verb/verb) +- causality (verb/verb) + +notion also has a special relation "is a" between a preposition and a string group name + +## word +An expression of a concept. belongs to a notion. belongs to a lemma. tag count (optional). adjectives also have position field. verbs optionally belong to groups. has several relations to itself: +- antonymy (noun/noun, adjective/adjective, adverb/adverb, verb/verb) [symmetric] +- specification (adjective/adjective, verb/verb) +- pertainymy (noun/adjective) +- mannernymy (adjective/adverb) +- usage (noun/noun, noun/adjective, noun/adverb, noun/verb) +- topicality (noun/noun, noun/adjective, noun/adverb, noun/verb) +- regionality (noun/noun, noun/adjective, noun/adverb, noun/verb) + +adjective positions are: +- predicate {0} +- attributive {1} +- postnominal {2} + +## lemma +A lexical set that can be used to represent words. has many inflections (including the base inflection). has many words (that it represents). relations with itself: +- derivation [not implemented yet] + +in implementation, this object has no fields, and thus it does not need a table. uniquely identifiable by base form. constructible from base form. + +## lemma/form +The inflection relationship relates an uninflected lemma to its inflected forms. there can potentially be multiple ways to inflect a lemma, so the tuple (lemma_id, category) is not necessarily unique. field: type of inflection. ex: "care" is a singular (base) inflection of a noun, and a base inflection of a verb. "cares" is both a plural and an s form inflection of "care". the types of inflection are: +- base {0} +- plural (nouns) {1} +- comparative (adjectives and adverbs) {2} +- superlative (adjectives and adverbs) {3} +- past tense (verbs) {4} +- past participle (verbs) {5} +- ing form (verbs) {6} +- s form (verbs) {7} + +## form +An inflection of a lemma. fields: text form, complexity (number of spaces plus one), proper (true if there is at least one capital letter, false otherwise). uniquely identifiable by text form. constructible from text form. has many and belongs to many pronunciations. + +## form/pronunciation +One spelling of a word can have multiple pronunciations (whether by homography or speaker variation), but multiple words can also have the same pronunciation (homophony). the current data we have doesn't tell us which pronunciations go with which words, so we just associate all pronunciations of a form with the form. + +## pronunciation +Fields: phonemes, rhyme phonemes, prerhyme, syllables, stress structure. has many and belongs to many forms. + +## frame +A verb frame. belongs to a group. has many parts. + +## group (word/frame) +A collection of verb frames. has many frames. has many words. this is not really an object per-se, more rather the name given to the cross join between sets of words and sets of frames. in implementation, this join has no fields, and thus it does not need a table. + +## part +An ordered element of a verb frame. belongs to a frame. fields: index (position in the frame), and type. the tuple (frame_id, index) is unique. there are additional fields depending on the type of the frame. noun phrases have role and selrestrs. prepositions have prepositions and preposition_literality. literals have literal_value. in addition, noun phrases have synrestrs, which, in order to be queryable, are located in a separate table called "synrestrs". -- cgit 1.4.1