summary refs log tree commit diff stats
path: root/docs/new_object_structure.md
blob: 9e25615f1eec2f1b875bd862d795de9056a43b5a (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
# New object structure

The rewrite of verbly uses a completely redesigned object structure that was designed to build off of the already-existing WordNet structure and add to it the data we are getting from other sources.

## notion
Something that can be expressed with words. fields: part of speech, wnid (WordNet ID, optional). nouns also have images field (number of images ImageNet has for this notion). has many words. related to each other through hypernymy, meronymy, synonymy, etc. parts of speech are:
- noun {0}
- adjective {1}
- adverb {2}
- verb {3}
- preposition {4}

relations are:
- hypernymy (noun/noun and verb/verb)
- instantiation (noun/noun)
- meronymy (noun/noun)
- variation (noun/adjective)
- similarity (adjective/adjective) [symmetric]
- entailment (verb/verb)
- causality (verb/verb)

notion also has a special relation "is a" between a preposition and a string group name

## word
An expression of a concept. belongs to a notion. belongs to a lemma. tag count (optional). adjectives also have position field. verbs optionally belong to groups. has several relations to itself:
- antonymy (noun/noun, adjective/adjective, adverb/adverb, verb/verb) [symmetric]
- specification (adjective/adjective, verb/verb)
- pertainymy (noun/adjective)
- mannernymy (adjective/adverb)
- usage (noun/noun, noun/adjective, noun/adverb, noun/verb)
- topicality (noun/noun, noun/adjective, noun/adverb, noun/verb)
- regionality (noun/noun, noun/adjective, noun/adverb, noun/verb)

adjective positions are:
- predicate {0}
- attributive {1}
- postnominal {2}

## lemma
A lexical set that can be used to represent words. has many inflections (including the base inflection). has many words (that it represents). relations with itself:
- derivation [not implemented yet]

in implementation, this object has no fields, and thus it does not need a table. uniquely identifiable by base form. constructible from base form.

## lemma/form
The inflection relationship relates an uninflected lemma to its inflected forms. there can potentially be multiple ways to inflect a lemma, so the tuple (lemma_id, category) is not necessarily unique. field: type of inflection. ex: "care" is a singular (base) inflection of a noun, and a base inflection of a verb. "cares" is both a plural and an s form inflection of "care". the types of inflection are:
- base {0}
- plural (nouns) {1}
- comparative (adjectives and adverbs) {2}
- superlative (adjectives and adverbs) {3}
- past tense (verbs) {4}
- past participle (verbs) {5}
- ing form (verbs) {6}
- s form (verbs) {7}

## form
An inflection of a lemma. fields: text form, complexity (number of spaces plus one), proper (true if there is at least one capital letter, false otherwise). uniquely identifiable by text form. constructible from text form. has many and belongs to many pronunciations.

## form/pronunciation
One spelling of a word can have multiple pronunciations (whether by homography or speaker variation), but multiple words can also have the same pronunciation (homophony). the current data we have doesn't tell us which pronunciations go with which words, so we just associate all pronunciations of a form with the form.

## pronunciation
Fields: phonemes, rhyme phonemes, prerhyme, syllables, stress structure. has many and belongs to many forms.

## frame
A verb frame. belongs to a group. has many parts.

## group (word/frame)
A collection of verb frames. has many frames. has many words. this is not really an object per-se, more rather the name given to the cross join between sets of words and sets of frames. in implementation, this join has no fields, and thus it does not need a table.

## part
An ordered element of a verb frame. belongs to a frame. fields: index (position in the frame), and type. the tuple (frame_id, index) is unique. there are additional fields depending on the type of the frame. noun phrases have role and selrestrs. prepositions have prepositions and preposition_literality. literals have literal_value. in addition, noun phrases have synrestrs, which, in order to be queryable, are located in a separate table called "synrestrs".