summary refs log tree commit diff stats
path: root/generator/generator.h
diff options
context:
space:
mode:
authorKelly Rauchenberger <fefferburbia@gmail.com>2017-01-16 18:02:50 -0500
committerKelly Rauchenberger <fefferburbia@gmail.com>2017-01-16 18:02:50 -0500
commit6746da6edd7d9d50efe374eabbb79a3cac882d81 (patch)
treeff20917e08b08d36b9541c1371106596e7bec442 /generator/generator.h
parent4af7e55733098ca42f75a4ffaca1b0f6bab4dd36 (diff)
downloadverbly-6746da6edd7d9d50efe374eabbb79a3cac882d81.tar.gz
verbly-6746da6edd7d9d50efe374eabbb79a3cac882d81.tar.bz2
verbly-6746da6edd7d9d50efe374eabbb79a3cac882d81.zip
Started structural rewrite
The new object structure was designed to build on the existing WordNet
structure, while also adding in all of the data that we get from other sources.
More information about this can be found on the project wiki.

The generator has already been completely rewritten to generate a
datafile that uses the new structure. In addition, a number of indexes
are created, which does double the size of the datafile, but also allows
for much faster lookups. Finally, the new generator is written modularly
and is a lot more readable than the old one.

The verbly interface to the new object structure has mostly been
completed, but has not been tested fully. There is a completely new
search API which utilizes a lot of operator overloading; documentation
on how to use it should go up at some point.

Token processing and verb frames are currently unimplemented. Source for
these have been left in the repository for now.
Diffstat (limited to 'generator/generator.h')
-rw-r--r--generator/generator.h151
1 files changed, 151 insertions, 0 deletions
diff --git a/generator/generator.h b/generator/generator.h new file mode 100644 index 0000000..e2a7404 --- /dev/null +++ b/generator/generator.h
@@ -0,0 +1,151 @@
1#ifndef GENERATOR_H_5B61CBC5
2#define GENERATOR_H_5B61CBC5
3
4#include <string>
5#include <map>
6#include <list>
7#include <set>
8#include <libxml/parser.h>
9#include "database.h"
10#include "notion.h"
11#include "word.h"
12#include "lemma.h"
13#include "form.h"
14#include "pronunciation.h"
15#include "group.h"
16#include "frame.h"
17
18namespace verbly {
19 namespace generator {
20
21 enum class part_of_speech;
22 class selrestr;
23
24 class generator {
25 public:
26
27 // Constructor
28
29 generator(
30 std::string verbNetPath,
31 std::string agidPath,
32 std::string wordNetPath,
33 std::string cmudictPath,
34 std::string imageNetPath,
35 std::string outputPath);
36
37 // Action
38
39 void run();
40
41 private:
42
43 // Subroutines
44
45 void readWordNetSynsets();
46
47 void readAdjectivePositioning();
48
49 void readImageNetUrls();
50
51 void readWordNetSenseKeys();
52
53 void readVerbNet();
54
55 void readAgidInflections();
56
57 void readPrepositions();
58
59 void readCmudictPronunciations();
60
61 void writeSchema();
62
63 void dumpObjects();
64
65 void readWordNetAntonymy();
66
67 void readWordNetVariation();
68
69 void readWordNetClasses();
70
71 void readWordNetCausality();
72
73 void readWordNetEntailment();
74
75 void readWordNetHypernymy();
76
77 void readWordNetInstantiation();
78
79 void readWordNetMemberMeronymy();
80
81 void readWordNetPartMeronymy();
82
83 void readWordNetSubstanceMeronymy();
84
85 void readWordNetPertainymy();
86
87 void readWordNetSpecification();
88
89 void readWordNetSimilarity();
90
91 // Helpers
92
93 std::list<std::string> readFile(std::string path);
94
95 inline part_of_speech partOfSpeechByWnid(int wnid);
96
97 notion& createNotion(part_of_speech partOfSpeech);
98
99 notion& lookupOrCreateNotion(int wnid);
100
101 lemma& lookupOrCreateLemma(std::string base_form);
102
103 form& lookupOrCreateForm(std::string text);
104
105 template <typename... Args> word& createWord(Args&&... args);
106
107 group& createGroup(xmlNodePtr top);
108
109 selrestr parseSelrestr(xmlNodePtr top);
110
111 // Input
112
113 std::string verbNetPath_;
114 std::string agidPath_;
115 std::string wordNetPath_;
116 std::string cmudictPath_;
117 std::string imageNetPath_;
118
119 // Output
120
121 database db_;
122
123 // Data
124
125 std::list<notion> notions_;
126 std::list<word> words_;
127 std::list<lemma> lemmas_;
128 std::list<form> forms_;
129 std::list<pronunciation> pronunciations_;
130 std::list<frame> frames_;
131 std::list<group> groups_;
132
133 // Indexes
134
135 std::map<int, notion*> notionByWnid_;
136 std::map<int, std::set<word*>> wordsByWnid_;
137 std::map<std::pair<int, int>, word*> wordByWnidAndWnum_;
138 std::map<std::string, std::set<word*>> wordsByBaseForm_;
139 std::map<std::string, lemma*> lemmaByBaseForm_;
140 std::map<std::string, form*> formByText_;
141
142 // Caches
143
144 std::map<std::string, word*> wnSenseKeys_;
145
146 };
147
148 };
149};
150
151#endif /* end of include guard: GENERATOR_H_5B61CBC5 */