grammar_cop 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/.DS_Store +0 -0
- data/.gitignore +4 -0
- data/Gemfile +4 -0
- data/Rakefile +8 -0
- data/data/.DS_Store +0 -0
- data/data/Makefile +511 -0
- data/data/Makefile.am +4 -0
- data/data/Makefile.in +511 -0
- data/data/de/.DS_Store +0 -0
- data/data/de/4.0.affix +7 -0
- data/data/de/4.0.dict +474 -0
- data/data/de/Makefile +387 -0
- data/data/de/Makefile.am +9 -0
- data/data/de/Makefile.in +387 -0
- data/data/en/.DS_Store +0 -0
- data/data/en/4.0.affix +26 -0
- data/data/en/4.0.batch +1002 -0
- data/data/en/4.0.biolg.batch +411 -0
- data/data/en/4.0.constituent-knowledge +127 -0
- data/data/en/4.0.dict +8759 -0
- data/data/en/4.0.dict.m4 +6928 -0
- data/data/en/4.0.enwiki.batch +14 -0
- data/data/en/4.0.fixes.batch +2776 -0
- data/data/en/4.0.knowledge +306 -0
- data/data/en/4.0.regex +225 -0
- data/data/en/4.0.voa.batch +114 -0
- data/data/en/Makefile +554 -0
- data/data/en/Makefile.am +19 -0
- data/data/en/Makefile.in +554 -0
- data/data/en/README +173 -0
- data/data/en/tiny.dict +157 -0
- data/data/en/words/.DS_Store +0 -0
- data/data/en/words/Makefile +456 -0
- data/data/en/words/Makefile.am +78 -0
- data/data/en/words/Makefile.in +456 -0
- data/data/en/words/currency +205 -0
- data/data/en/words/currency.p +28 -0
- data/data/en/words/entities.given-bisex.sing +39 -0
- data/data/en/words/entities.given-female.sing +4141 -0
- data/data/en/words/entities.given-male.sing +1633 -0
- data/data/en/words/entities.locations.sing +68 -0
- data/data/en/words/entities.national.sing +253 -0
- data/data/en/words/entities.organizations.sing +7 -0
- data/data/en/words/entities.us-states.sing +11 -0
- data/data/en/words/units.1 +45 -0
- data/data/en/words/units.1.dot +4 -0
- data/data/en/words/units.3 +2 -0
- data/data/en/words/units.4 +5 -0
- data/data/en/words/units.4.dot +1 -0
- data/data/en/words/words-medical.adv.1 +1191 -0
- data/data/en/words/words-medical.prep.1 +67 -0
- data/data/en/words/words-medical.v.4.1 +2835 -0
- data/data/en/words/words-medical.v.4.2 +2848 -0
- data/data/en/words/words-medical.v.4.3 +3011 -0
- data/data/en/words/words-medical.v.4.4 +3036 -0
- data/data/en/words/words-medical.v.4.5 +3050 -0
- data/data/en/words/words.adj.1 +6794 -0
- data/data/en/words/words.adj.2 +638 -0
- data/data/en/words/words.adj.3 +667 -0
- data/data/en/words/words.adv.1 +1573 -0
- data/data/en/words/words.adv.2 +67 -0
- data/data/en/words/words.adv.3 +157 -0
- data/data/en/words/words.adv.4 +80 -0
- data/data/en/words/words.n.1 +11464 -0
- data/data/en/words/words.n.1.wiki +264 -0
- data/data/en/words/words.n.2.s +2017 -0
- data/data/en/words/words.n.2.s.biolg +1 -0
- data/data/en/words/words.n.2.s.wiki +298 -0
- data/data/en/words/words.n.2.x +65 -0
- data/data/en/words/words.n.2.x.wiki +10 -0
- data/data/en/words/words.n.3 +5717 -0
- data/data/en/words/words.n.t +23 -0
- data/data/en/words/words.v.1.1 +1038 -0
- data/data/en/words/words.v.1.2 +1043 -0
- data/data/en/words/words.v.1.3 +1052 -0
- data/data/en/words/words.v.1.4 +1023 -0
- data/data/en/words/words.v.1.p +17 -0
- data/data/en/words/words.v.10.1 +14 -0
- data/data/en/words/words.v.10.2 +15 -0
- data/data/en/words/words.v.10.3 +88 -0
- data/data/en/words/words.v.10.4 +17 -0
- data/data/en/words/words.v.2.1 +1253 -0
- data/data/en/words/words.v.2.2 +1304 -0
- data/data/en/words/words.v.2.3 +1280 -0
- data/data/en/words/words.v.2.4 +1285 -0
- data/data/en/words/words.v.2.5 +1287 -0
- data/data/en/words/words.v.4.1 +2472 -0
- data/data/en/words/words.v.4.2 +2487 -0
- data/data/en/words/words.v.4.3 +2441 -0
- data/data/en/words/words.v.4.4 +2478 -0
- data/data/en/words/words.v.4.5 +2483 -0
- data/data/en/words/words.v.5.1 +98 -0
- data/data/en/words/words.v.5.2 +98 -0
- data/data/en/words/words.v.5.3 +103 -0
- data/data/en/words/words.v.5.4 +102 -0
- data/data/en/words/words.v.6.1 +388 -0
- data/data/en/words/words.v.6.2 +401 -0
- data/data/en/words/words.v.6.3 +397 -0
- data/data/en/words/words.v.6.4 +405 -0
- data/data/en/words/words.v.6.5 +401 -0
- data/data/en/words/words.v.8.1 +117 -0
- data/data/en/words/words.v.8.2 +118 -0
- data/data/en/words/words.v.8.3 +118 -0
- data/data/en/words/words.v.8.4 +119 -0
- data/data/en/words/words.v.8.5 +119 -0
- data/data/en/words/words.y +104 -0
- data/data/lt/.DS_Store +0 -0
- data/data/lt/4.0.affix +6 -0
- data/data/lt/4.0.constituent-knowledge +24 -0
- data/data/lt/4.0.dict +135 -0
- data/data/lt/4.0.knowledge +38 -0
- data/data/lt/Makefile +389 -0
- data/data/lt/Makefile.am +11 -0
- data/data/lt/Makefile.in +389 -0
- data/ext/.DS_Store +0 -0
- data/ext/link_grammar/.DS_Store +0 -0
- data/ext/link_grammar/extconf.rb +2 -0
- data/ext/link_grammar/link-grammar/.DS_Store +0 -0
- data/ext/link_grammar/link-grammar/.deps/analyze-linkage.Plo +198 -0
- data/ext/link_grammar/link-grammar/.deps/and.Plo +202 -0
- data/ext/link_grammar/link-grammar/.deps/api.Plo +244 -0
- data/ext/link_grammar/link-grammar/.deps/build-disjuncts.Plo +212 -0
- data/ext/link_grammar/link-grammar/.deps/command-line.Plo +201 -0
- data/ext/link_grammar/link-grammar/.deps/constituents.Plo +201 -0
- data/ext/link_grammar/link-grammar/.deps/count.Plo +202 -0
- data/ext/link_grammar/link-grammar/.deps/disjunct-utils.Plo +126 -0
- data/ext/link_grammar/link-grammar/.deps/disjuncts.Plo +123 -0
- data/ext/link_grammar/link-grammar/.deps/error.Plo +121 -0
- data/ext/link_grammar/link-grammar/.deps/expand.Plo +133 -0
- data/ext/link_grammar/link-grammar/.deps/extract-links.Plo +198 -0
- data/ext/link_grammar/link-grammar/.deps/fast-match.Plo +200 -0
- data/ext/link_grammar/link-grammar/.deps/idiom.Plo +200 -0
- data/ext/link_grammar/link-grammar/.deps/jni-client.Plo +217 -0
- data/ext/link_grammar/link-grammar/.deps/link-parser.Po +1 -0
- data/ext/link_grammar/link-grammar/.deps/massage.Plo +202 -0
- data/ext/link_grammar/link-grammar/.deps/post-process.Plo +202 -0
- data/ext/link_grammar/link-grammar/.deps/pp_knowledge.Plo +202 -0
- data/ext/link_grammar/link-grammar/.deps/pp_lexer.Plo +201 -0
- data/ext/link_grammar/link-grammar/.deps/pp_linkset.Plo +200 -0
- data/ext/link_grammar/link-grammar/.deps/prefix.Plo +102 -0
- data/ext/link_grammar/link-grammar/.deps/preparation.Plo +202 -0
- data/ext/link_grammar/link-grammar/.deps/print-util.Plo +200 -0
- data/ext/link_grammar/link-grammar/.deps/print.Plo +201 -0
- data/ext/link_grammar/link-grammar/.deps/prune.Plo +202 -0
- data/ext/link_grammar/link-grammar/.deps/read-dict.Plo +223 -0
- data/ext/link_grammar/link-grammar/.deps/read-regex.Plo +123 -0
- data/ext/link_grammar/link-grammar/.deps/regex-morph.Plo +131 -0
- data/ext/link_grammar/link-grammar/.deps/resources.Plo +203 -0
- data/ext/link_grammar/link-grammar/.deps/spellcheck-aspell.Plo +1 -0
- data/ext/link_grammar/link-grammar/.deps/spellcheck-hun.Plo +115 -0
- data/ext/link_grammar/link-grammar/.deps/string-set.Plo +198 -0
- data/ext/link_grammar/link-grammar/.deps/tokenize.Plo +160 -0
- data/ext/link_grammar/link-grammar/.deps/utilities.Plo +222 -0
- data/ext/link_grammar/link-grammar/.deps/word-file.Plo +201 -0
- data/ext/link_grammar/link-grammar/.deps/word-utils.Plo +212 -0
- data/ext/link_grammar/link-grammar/.libs/analyze-linkage.o +0 -0
- data/ext/link_grammar/link-grammar/.libs/and.o +0 -0
- data/ext/link_grammar/link-grammar/.libs/api.o +0 -0
- data/ext/link_grammar/link-grammar/.libs/build-disjuncts.o +0 -0
- data/ext/link_grammar/link-grammar/.libs/command-line.o +0 -0
- data/ext/link_grammar/link-grammar/.libs/constituents.o +0 -0
- data/ext/link_grammar/link-grammar/.libs/count.o +0 -0
- data/ext/link_grammar/link-grammar/.libs/disjunct-utils.o +0 -0
- data/ext/link_grammar/link-grammar/.libs/disjuncts.o +0 -0
- data/ext/link_grammar/link-grammar/.libs/error.o +0 -0
- data/ext/link_grammar/link-grammar/.libs/expand.o +0 -0
- data/ext/link_grammar/link-grammar/.libs/extract-links.o +0 -0
- data/ext/link_grammar/link-grammar/.libs/fast-match.o +0 -0
- data/ext/link_grammar/link-grammar/.libs/idiom.o +0 -0
- data/ext/link_grammar/link-grammar/.libs/jni-client.o +0 -0
- data/ext/link_grammar/link-grammar/.libs/liblink-grammar-java-symbols.expsym +31 -0
- data/ext/link_grammar/link-grammar/.libs/liblink-grammar-java.4.dylib +0 -0
- data/ext/link_grammar/link-grammar/.libs/liblink-grammar-java.4.dylib.dSYM/Contents/Info.plist +20 -0
- data/ext/link_grammar/link-grammar/.libs/liblink-grammar-java.4.dylib.dSYM/Contents/Resources/DWARF/liblink-grammar-java.4.dylib +0 -0
- data/ext/link_grammar/link-grammar/.libs/liblink-grammar-java.a +0 -0
- data/ext/link_grammar/link-grammar/.libs/liblink-grammar-java.dylib +0 -0
- data/ext/link_grammar/link-grammar/.libs/liblink-grammar-symbols.expsym +194 -0
- data/ext/link_grammar/link-grammar/.libs/liblink-grammar.4.dylib +0 -0
- data/ext/link_grammar/link-grammar/.libs/liblink-grammar.4.dylib.dSYM/Contents/Info.plist +20 -0
- data/ext/link_grammar/link-grammar/.libs/liblink-grammar.4.dylib.dSYM/Contents/Resources/DWARF/liblink-grammar.4.dylib +0 -0
- data/ext/link_grammar/link-grammar/.libs/liblink-grammar.a +0 -0
- data/ext/link_grammar/link-grammar/.libs/liblink-grammar.dylib +0 -0
- data/ext/link_grammar/link-grammar/.libs/liblink-grammar.la +41 -0
- data/ext/link_grammar/link-grammar/.libs/liblink-grammar.lai +41 -0
- data/ext/link_grammar/link-grammar/.libs/massage.o +0 -0
- data/ext/link_grammar/link-grammar/.libs/post-process.o +0 -0
- data/ext/link_grammar/link-grammar/.libs/pp_knowledge.o +0 -0
- data/ext/link_grammar/link-grammar/.libs/pp_lexer.o +0 -0
- data/ext/link_grammar/link-grammar/.libs/pp_linkset.o +0 -0
- data/ext/link_grammar/link-grammar/.libs/prefix.o +0 -0
- data/ext/link_grammar/link-grammar/.libs/preparation.o +0 -0
- data/ext/link_grammar/link-grammar/.libs/print-util.o +0 -0
- data/ext/link_grammar/link-grammar/.libs/print.o +0 -0
- data/ext/link_grammar/link-grammar/.libs/prune.o +0 -0
- data/ext/link_grammar/link-grammar/.libs/read-dict.o +0 -0
- data/ext/link_grammar/link-grammar/.libs/read-regex.o +0 -0
- data/ext/link_grammar/link-grammar/.libs/regex-morph.o +0 -0
- data/ext/link_grammar/link-grammar/.libs/resources.o +0 -0
- data/ext/link_grammar/link-grammar/.libs/spellcheck-aspell.o +0 -0
- data/ext/link_grammar/link-grammar/.libs/spellcheck-hun.o +0 -0
- data/ext/link_grammar/link-grammar/.libs/string-set.o +0 -0
- data/ext/link_grammar/link-grammar/.libs/tokenize.o +0 -0
- data/ext/link_grammar/link-grammar/.libs/utilities.o +0 -0
- data/ext/link_grammar/link-grammar/.libs/word-file.o +0 -0
- data/ext/link_grammar/link-grammar/.libs/word-utils.o +0 -0
- data/ext/link_grammar/link-grammar/Makefile +900 -0
- data/ext/link_grammar/link-grammar/Makefile.am +202 -0
- data/ext/link_grammar/link-grammar/Makefile.in +900 -0
- data/ext/link_grammar/link-grammar/analyze-linkage.c +1317 -0
- data/ext/link_grammar/link-grammar/analyze-linkage.h +24 -0
- data/ext/link_grammar/link-grammar/and.c +1603 -0
- data/ext/link_grammar/link-grammar/and.h +27 -0
- data/ext/link_grammar/link-grammar/api-structures.h +362 -0
- data/ext/link_grammar/link-grammar/api-types.h +72 -0
- data/ext/link_grammar/link-grammar/api.c +1887 -0
- data/ext/link_grammar/link-grammar/api.h +96 -0
- data/ext/link_grammar/link-grammar/autoit/.DS_Store +0 -0
- data/ext/link_grammar/link-grammar/autoit/README +10 -0
- data/ext/link_grammar/link-grammar/autoit/_LGTest.au3 +22 -0
- data/ext/link_grammar/link-grammar/autoit/_LinkGrammar.au3 +545 -0
- data/ext/link_grammar/link-grammar/build-disjuncts.c +487 -0
- data/ext/link_grammar/link-grammar/build-disjuncts.h +21 -0
- data/ext/link_grammar/link-grammar/command-line.c +458 -0
- data/ext/link_grammar/link-grammar/command-line.h +15 -0
- data/ext/link_grammar/link-grammar/constituents.c +1836 -0
- data/ext/link_grammar/link-grammar/constituents.h +26 -0
- data/ext/link_grammar/link-grammar/corpus/.DS_Store +0 -0
- data/ext/link_grammar/link-grammar/corpus/.deps/cluster.Plo +1 -0
- data/ext/link_grammar/link-grammar/corpus/.deps/corpus.Plo +1 -0
- data/ext/link_grammar/link-grammar/corpus/Makefile +527 -0
- data/ext/link_grammar/link-grammar/corpus/Makefile.am +46 -0
- data/ext/link_grammar/link-grammar/corpus/Makefile.in +527 -0
- data/ext/link_grammar/link-grammar/corpus/README +17 -0
- data/ext/link_grammar/link-grammar/corpus/cluster.c +286 -0
- data/ext/link_grammar/link-grammar/corpus/cluster.h +32 -0
- data/ext/link_grammar/link-grammar/corpus/corpus.c +483 -0
- data/ext/link_grammar/link-grammar/corpus/corpus.h +46 -0
- data/ext/link_grammar/link-grammar/count.c +828 -0
- data/ext/link_grammar/link-grammar/count.h +25 -0
- data/ext/link_grammar/link-grammar/disjunct-utils.c +261 -0
- data/ext/link_grammar/link-grammar/disjunct-utils.h +27 -0
- data/ext/link_grammar/link-grammar/disjuncts.c +138 -0
- data/ext/link_grammar/link-grammar/disjuncts.h +13 -0
- data/ext/link_grammar/link-grammar/error.c +92 -0
- data/ext/link_grammar/link-grammar/error.h +35 -0
- data/ext/link_grammar/link-grammar/expand.c +67 -0
- data/ext/link_grammar/link-grammar/expand.h +13 -0
- data/ext/link_grammar/link-grammar/externs.h +22 -0
- data/ext/link_grammar/link-grammar/extract-links.c +625 -0
- data/ext/link_grammar/link-grammar/extract-links.h +16 -0
- data/ext/link_grammar/link-grammar/fast-match.c +309 -0
- data/ext/link_grammar/link-grammar/fast-match.h +17 -0
- data/ext/link_grammar/link-grammar/idiom.c +373 -0
- data/ext/link_grammar/link-grammar/idiom.h +15 -0
- data/ext/link_grammar/link-grammar/jni-client.c +779 -0
- data/ext/link_grammar/link-grammar/jni-client.h +236 -0
- data/ext/link_grammar/link-grammar/liblink-grammar-java.la +42 -0
- data/ext/link_grammar/link-grammar/liblink-grammar.la +41 -0
- data/ext/link_grammar/link-grammar/link-features.h +37 -0
- data/ext/link_grammar/link-grammar/link-features.h.in +37 -0
- data/ext/link_grammar/link-grammar/link-grammar-java.def +31 -0
- data/ext/link_grammar/link-grammar/link-grammar.def +194 -0
- data/ext/link_grammar/link-grammar/link-includes.h +465 -0
- data/ext/link_grammar/link-grammar/link-parser.c +849 -0
- data/ext/link_grammar/link-grammar/massage.c +329 -0
- data/ext/link_grammar/link-grammar/massage.h +13 -0
- data/ext/link_grammar/link-grammar/post-process.c +1113 -0
- data/ext/link_grammar/link-grammar/post-process.h +45 -0
- data/ext/link_grammar/link-grammar/pp_knowledge.c +376 -0
- data/ext/link_grammar/link-grammar/pp_knowledge.h +14 -0
- data/ext/link_grammar/link-grammar/pp_lexer.c +1920 -0
- data/ext/link_grammar/link-grammar/pp_lexer.h +19 -0
- data/ext/link_grammar/link-grammar/pp_linkset.c +158 -0
- data/ext/link_grammar/link-grammar/pp_linkset.h +20 -0
- data/ext/link_grammar/link-grammar/prefix.c +482 -0
- data/ext/link_grammar/link-grammar/prefix.h +139 -0
- data/ext/link_grammar/link-grammar/preparation.c +412 -0
- data/ext/link_grammar/link-grammar/preparation.h +20 -0
- data/ext/link_grammar/link-grammar/print-util.c +87 -0
- data/ext/link_grammar/link-grammar/print-util.h +32 -0
- data/ext/link_grammar/link-grammar/print.c +1085 -0
- data/ext/link_grammar/link-grammar/print.h +16 -0
- data/ext/link_grammar/link-grammar/prune.c +1864 -0
- data/ext/link_grammar/link-grammar/prune.h +17 -0
- data/ext/link_grammar/link-grammar/read-dict.c +1785 -0
- data/ext/link_grammar/link-grammar/read-dict.h +29 -0
- data/ext/link_grammar/link-grammar/read-regex.c +161 -0
- data/ext/link_grammar/link-grammar/read-regex.h +12 -0
- data/ext/link_grammar/link-grammar/regex-morph.c +126 -0
- data/ext/link_grammar/link-grammar/regex-morph.h +17 -0
- data/ext/link_grammar/link-grammar/resources.c +180 -0
- data/ext/link_grammar/link-grammar/resources.h +23 -0
- data/ext/link_grammar/link-grammar/sat-solver/.DS_Store +0 -0
- data/ext/link_grammar/link-grammar/sat-solver/.deps/fast-sprintf.Plo +1 -0
- data/ext/link_grammar/link-grammar/sat-solver/.deps/sat-encoder.Plo +1 -0
- data/ext/link_grammar/link-grammar/sat-solver/.deps/util.Plo +1 -0
- data/ext/link_grammar/link-grammar/sat-solver/.deps/variables.Plo +1 -0
- data/ext/link_grammar/link-grammar/sat-solver/.deps/word-tag.Plo +1 -0
- data/ext/link_grammar/link-grammar/sat-solver/Makefile +527 -0
- data/ext/link_grammar/link-grammar/sat-solver/Makefile.am +29 -0
- data/ext/link_grammar/link-grammar/sat-solver/Makefile.in +527 -0
- data/ext/link_grammar/link-grammar/sat-solver/clock.hpp +33 -0
- data/ext/link_grammar/link-grammar/sat-solver/fast-sprintf.cpp +26 -0
- data/ext/link_grammar/link-grammar/sat-solver/fast-sprintf.hpp +7 -0
- data/ext/link_grammar/link-grammar/sat-solver/guiding.hpp +244 -0
- data/ext/link_grammar/link-grammar/sat-solver/matrix-ut.hpp +79 -0
- data/ext/link_grammar/link-grammar/sat-solver/sat-encoder.cpp +2811 -0
- data/ext/link_grammar/link-grammar/sat-solver/sat-encoder.h +11 -0
- data/ext/link_grammar/link-grammar/sat-solver/sat-encoder.hpp +381 -0
- data/ext/link_grammar/link-grammar/sat-solver/trie.hpp +118 -0
- data/ext/link_grammar/link-grammar/sat-solver/util.cpp +23 -0
- data/ext/link_grammar/link-grammar/sat-solver/util.hpp +14 -0
- data/ext/link_grammar/link-grammar/sat-solver/variables.cpp +5 -0
- data/ext/link_grammar/link-grammar/sat-solver/variables.hpp +829 -0
- data/ext/link_grammar/link-grammar/sat-solver/word-tag.cpp +159 -0
- data/ext/link_grammar/link-grammar/sat-solver/word-tag.hpp +162 -0
- data/ext/link_grammar/link-grammar/spellcheck-aspell.c +148 -0
- data/ext/link_grammar/link-grammar/spellcheck-hun.c +136 -0
- data/ext/link_grammar/link-grammar/spellcheck.h +34 -0
- data/ext/link_grammar/link-grammar/string-set.c +169 -0
- data/ext/link_grammar/link-grammar/string-set.h +16 -0
- data/ext/link_grammar/link-grammar/structures.h +498 -0
- data/ext/link_grammar/link-grammar/tokenize.c +1049 -0
- data/ext/link_grammar/link-grammar/tokenize.h +15 -0
- data/ext/link_grammar/link-grammar/utilities.c +847 -0
- data/ext/link_grammar/link-grammar/utilities.h +281 -0
- data/ext/link_grammar/link-grammar/word-file.c +124 -0
- data/ext/link_grammar/link-grammar/word-file.h +15 -0
- data/ext/link_grammar/link-grammar/word-utils.c +526 -0
- data/ext/link_grammar/link-grammar/word-utils.h +152 -0
- data/ext/link_grammar/link_grammar.c +202 -0
- data/ext/link_grammar/link_grammar.h +99 -0
- data/grammar_cop.gemspec +24 -0
- data/lib/.DS_Store +0 -0
- data/lib/grammar_cop.rb +9 -0
- data/lib/grammar_cop/.DS_Store +0 -0
- data/lib/grammar_cop/dictionary.rb +19 -0
- data/lib/grammar_cop/linkage.rb +30 -0
- data/lib/grammar_cop/parse_options.rb +32 -0
- data/lib/grammar_cop/sentence.rb +36 -0
- data/lib/grammar_cop/version.rb +3 -0
- data/test/.DS_Store +0 -0
- data/test/grammar_cop_test.rb +27 -0
- metadata +407 -0
@@ -0,0 +1,329 @@
|
|
1
|
+
/*************************************************************************/
|
2
|
+
/* Copyright (c) 2004 */
|
3
|
+
/* Daniel Sleator, David Temperley, and John Lafferty */
|
4
|
+
/* All rights reserved */
|
5
|
+
/* */
|
6
|
+
/* Use of the link grammar parsing system is subject to the terms of the */
|
7
|
+
/* license set forth in the LICENSE file included with this software, */
|
8
|
+
/* and also available at http://www.link.cs.cmu.edu/link/license.html */
|
9
|
+
/* This license allows free redistribution and use in source and binary */
|
10
|
+
/* forms, with or without modification, subject to certain conditions. */
|
11
|
+
/* */
|
12
|
+
/*************************************************************************/
|
13
|
+
|
14
|
+
#include "api.h"
|
15
|
+
#include "disjunct-utils.h"
|
16
|
+
|
17
|
+
/* This file contains the functions for massaging disjuncts of the
|
18
|
+
sentence in special ways having to do with conjunctions.
|
19
|
+
The only function called from the outside world is
|
20
|
+
install_special_conjunctive_connectors()
|
21
|
+
|
22
|
+
It would be nice if this code was written more transparently. In
|
23
|
+
other words, there should be some fairly general functions that
|
24
|
+
manipulate disjuncts, and take words like "neither" etc as input
|
25
|
+
parameters, so as to encapsulate the changes being made for special
|
26
|
+
words. This would not be too hard to do, but it's not a high priority.
|
27
|
+
-DS 3/98
|
28
|
+
*/
|
29
|
+
|
30
|
+
#define COMMA_LABEL (-2) /* to hook the comma to the following "and" */
|
31
|
+
#define EITHER_LABEL (-3) /* to connect the "either" to the following "or" */
|
32
|
+
#define NEITHER_LABEL (-4) /* to connect the "neither" to the following "nor"*/
|
33
|
+
#define NOT_LABEL (-5) /* to connect the "not" to the following "but"*/
|
34
|
+
#define NOTONLY_LABEL (-6) /* to connect the "not" to the following "only"*/
|
35
|
+
#define BOTH_LABEL (-7) /* to connect the "both" to the following "and"*/
|
36
|
+
|
37
|
+
/* There's a problem with installing "...but...", "not only...but...", and
|
38
|
+
"not...but...", which is that the current comma mechanism will allow
|
39
|
+
a list separated by commas. "Not only John, Mary but Jim came"
|
40
|
+
The best way to prevent this is to make it impossible for the comma
|
41
|
+
to attach to the "but", via some sort of additional subscript on commas.
|
42
|
+
|
43
|
+
I can't think of a good way to prevent this.
|
44
|
+
*/
|
45
|
+
|
46
|
+
/* The following functions all do slightly different variants of the
|
47
|
+
following thing:
|
48
|
+
|
49
|
+
Catenate to the disjunct list pointed to by d, a new disjunct list.
|
50
|
+
The new list is formed by copying the old list, and adding the new
|
51
|
+
connector somewhere in the old disjunct, for disjuncts that satisfy
|
52
|
+
certain conditions
|
53
|
+
*/
|
54
|
+
|
55
|
+
/**
|
56
|
+
* glom_comma_connector() --
|
57
|
+
* In this case the connector is to connect to the comma to the
|
58
|
+
* left of an "and" or an "or". Only gets added next to a fat link
|
59
|
+
*/
|
60
|
+
static Disjunct * glom_comma_connector(Disjunct * d)
|
61
|
+
{
|
62
|
+
Disjunct * d_list, * d1, * d2;
|
63
|
+
Connector * c, * c1;
|
64
|
+
d_list = NULL;
|
65
|
+
for (d1 = d; d1!=NULL; d1=d1->next) {
|
66
|
+
if (d1->left == NULL) continue;
|
67
|
+
for (c = d1->left; c->next != NULL; c = c->next)
|
68
|
+
;
|
69
|
+
if (c->label < 0) continue; /* last one must be a fat link */
|
70
|
+
|
71
|
+
d2 = copy_disjunct(d1);
|
72
|
+
d2->next = d_list;
|
73
|
+
d_list = d2;
|
74
|
+
|
75
|
+
c1 = connector_new();
|
76
|
+
c1->label = COMMA_LABEL;
|
77
|
+
|
78
|
+
c->next = c1;
|
79
|
+
}
|
80
|
+
return catenate_disjuncts(d, d_list);
|
81
|
+
}
|
82
|
+
|
83
|
+
/**
|
84
|
+
* In this case the connector is to connect to the "either", "neither",
|
85
|
+
* "not", or some auxilliary d to the current which is a conjunction.
|
86
|
+
* Only gets added next to a fat link, but before it (not after it)
|
87
|
+
* In the case of "nor", we don't create new disjuncts, we merely modify
|
88
|
+
* existing ones. This forces the fat link uses of "nor" to
|
89
|
+
* use a neither. (Not the case with "or".) If necessary=FALSE, then
|
90
|
+
* duplication is done, otherwise it isn't
|
91
|
+
*/
|
92
|
+
static Disjunct * glom_aux_connector(Disjunct * d, int label, int necessary)
|
93
|
+
{
|
94
|
+
Disjunct * d_list, * d1, * d2;
|
95
|
+
Connector * c, * c1, *c2;
|
96
|
+
d_list = NULL;
|
97
|
+
for (d1 = d; d1!=NULL; d1=d1->next) {
|
98
|
+
if (d1->left == NULL) continue;
|
99
|
+
for (c = d1->left; c->next != NULL; c = c->next)
|
100
|
+
;
|
101
|
+
if (c->label < 0) continue; /* last one must be a fat link */
|
102
|
+
|
103
|
+
if (!necessary) {
|
104
|
+
d2 = copy_disjunct(d1);
|
105
|
+
d2->next = d_list;
|
106
|
+
d_list = d2;
|
107
|
+
}
|
108
|
+
|
109
|
+
c1 = connector_new();
|
110
|
+
c1->label = label;
|
111
|
+
c1->next = c;
|
112
|
+
|
113
|
+
if (d1->left == c) {
|
114
|
+
d1->left = c1;
|
115
|
+
} else {
|
116
|
+
for (c2 = d1->left; c2->next != c; c2 = c2->next)
|
117
|
+
;
|
118
|
+
c2->next = c1;
|
119
|
+
}
|
120
|
+
}
|
121
|
+
return catenate_disjuncts(d, d_list);
|
122
|
+
}
|
123
|
+
|
124
|
+
/**
|
125
|
+
* This adds one connector onto the beginning of the left (or right)
|
126
|
+
* connector list of d. The label and string of the connector are
|
127
|
+
* specified
|
128
|
+
*/
|
129
|
+
static Disjunct * add_one_connector(int label, int dir, const char *cs, Disjunct * d)
|
130
|
+
{
|
131
|
+
Connector * c;
|
132
|
+
|
133
|
+
c = connector_new();
|
134
|
+
c->string = cs;
|
135
|
+
c->label = label;
|
136
|
+
|
137
|
+
if (dir == '+') {
|
138
|
+
c->next = d->right;
|
139
|
+
d->right = c;
|
140
|
+
} else {
|
141
|
+
c->next = d->left;
|
142
|
+
d->left = c;
|
143
|
+
}
|
144
|
+
return d;
|
145
|
+
}
|
146
|
+
|
147
|
+
/**
|
148
|
+
* special_disjunct() --
|
149
|
+
* Builds a new disjunct with one connector pointing in direction dir
|
150
|
+
* (which is '+' or '-'). The label and string of the connector
|
151
|
+
* are specified, as well as the string of the disjunct.
|
152
|
+
* The next pointer of the new disjunct set to NULL, so it can be
|
153
|
+
* regarded as a list.
|
154
|
+
*/
|
155
|
+
static Disjunct * special_disjunct(int label, int dir, const char *cs, const char * ds)
|
156
|
+
{
|
157
|
+
Disjunct * d1;
|
158
|
+
Connector * c;
|
159
|
+
d1 = (Disjunct *) xalloc(sizeof(Disjunct));
|
160
|
+
d1->cost = 0;
|
161
|
+
d1->string = ds;
|
162
|
+
d1->next = NULL;
|
163
|
+
|
164
|
+
c = connector_new();
|
165
|
+
c->string = cs;
|
166
|
+
c->label = label;
|
167
|
+
|
168
|
+
if (dir == '+') {
|
169
|
+
d1->left = NULL;
|
170
|
+
d1->right = c;
|
171
|
+
} else {
|
172
|
+
d1->right = NULL;
|
173
|
+
d1->left = c;
|
174
|
+
}
|
175
|
+
return d1;
|
176
|
+
}
|
177
|
+
|
178
|
+
/**
|
179
|
+
* Finds all places in the sentence where a comma is followed by
|
180
|
+
* a conjunction ("and", "or", "but", or "nor"). It modifies these comma
|
181
|
+
* disjuncts, and those of the following word, to allow the following
|
182
|
+
* word to absorb the comma (if used as a conjunction).
|
183
|
+
*/
|
184
|
+
static void construct_comma(Sentence sent)
|
185
|
+
{
|
186
|
+
int w;
|
187
|
+
for (w=0; w<sent->length-1; w++) {
|
188
|
+
if ((strcmp(sent->word[w].string, ",")==0) && sent->is_conjunction[w+1]) {
|
189
|
+
sent->word[w].d = catenate_disjuncts(special_disjunct(COMMA_LABEL,'+',"", ","), sent->word[w].d);
|
190
|
+
sent->word[w+1].d = glom_comma_connector(sent->word[w+1].d);
|
191
|
+
}
|
192
|
+
}
|
193
|
+
}
|
194
|
+
|
195
|
+
|
196
|
+
/** Returns TRUE if one of the words in the sentence is s */
|
197
|
+
static int sentence_contains(Sentence sent, const char * s)
|
198
|
+
{
|
199
|
+
int w;
|
200
|
+
for (w=0; w<sent->length; w++) {
|
201
|
+
if (strcmp(sent->word[w].string, s) == 0) return TRUE;
|
202
|
+
}
|
203
|
+
return FALSE;
|
204
|
+
}
|
205
|
+
|
206
|
+
/**
|
207
|
+
* The functions below put the special connectors on certain auxilliary
|
208
|
+
words to be used with conjunctions. Examples: either, neither,
|
209
|
+
both...and..., not only...but...
|
210
|
+
XXX FIXME: This routine uses "sentence_contains" to test for explicit
|
211
|
+
English words, and clearly this fails for other langauges!! XXX FIXME!
|
212
|
+
*/
|
213
|
+
|
214
|
+
static void construct_either(Sentence sent)
|
215
|
+
{
|
216
|
+
int w;
|
217
|
+
if (!sentence_contains(sent, "either")) return;
|
218
|
+
for (w=0; w<sent->length; w++) {
|
219
|
+
if (strcmp(sent->word[w].string, "either") != 0) continue;
|
220
|
+
sent->word[w].d = catenate_disjuncts(
|
221
|
+
special_disjunct(EITHER_LABEL,'+',"", "either"),
|
222
|
+
sent->word[w].d);
|
223
|
+
}
|
224
|
+
|
225
|
+
for (w=0; w<sent->length; w++) {
|
226
|
+
if (strcmp(sent->word[w].string, "or") != 0) continue;
|
227
|
+
sent->word[w].d = glom_aux_connector
|
228
|
+
(sent->word[w].d, EITHER_LABEL, FALSE);
|
229
|
+
}
|
230
|
+
}
|
231
|
+
|
232
|
+
static void construct_neither(Sentence sent)
|
233
|
+
{
|
234
|
+
int w;
|
235
|
+
if (!sentence_contains(sent, "neither")) {
|
236
|
+
/* I don't see the point removing disjuncts on "nor". I
|
237
|
+
Don't know why I did this. What's the problem keeping the
|
238
|
+
stuff explicitely defined for "nor" in the dictionary? --DS 3/98 */
|
239
|
+
#if 0
|
240
|
+
for (w=0; w<sent->length; w++) {
|
241
|
+
if (strcmp(sent->word[w].string, "nor") != 0) continue;
|
242
|
+
free_disjuncts(sent->word[w].d);
|
243
|
+
sent->word[w].d = NULL; /* a nor with no neither is dead */
|
244
|
+
}
|
245
|
+
#endif
|
246
|
+
return;
|
247
|
+
}
|
248
|
+
for (w=0; w<sent->length; w++) {
|
249
|
+
if (strcmp(sent->word[w].string, "neither") != 0) continue;
|
250
|
+
sent->word[w].d = catenate_disjuncts(
|
251
|
+
special_disjunct(NEITHER_LABEL,'+',"", "neither"),
|
252
|
+
sent->word[w].d);
|
253
|
+
}
|
254
|
+
|
255
|
+
for (w=0; w<sent->length; w++) {
|
256
|
+
if (strcmp(sent->word[w].string, "nor") != 0) continue;
|
257
|
+
sent->word[w].d = glom_aux_connector
|
258
|
+
(sent->word[w].d, NEITHER_LABEL, TRUE);
|
259
|
+
}
|
260
|
+
}
|
261
|
+
|
262
|
+
static void construct_notonlybut(Sentence sent)
|
263
|
+
{
|
264
|
+
int w;
|
265
|
+
Disjunct *d;
|
266
|
+
if (!sentence_contains(sent, "not")) {
|
267
|
+
return;
|
268
|
+
}
|
269
|
+
for (w=0; w<sent->length; w++) {
|
270
|
+
if (strcmp(sent->word[w].string, "not") != 0) continue;
|
271
|
+
sent->word[w].d = catenate_disjuncts(
|
272
|
+
special_disjunct(NOT_LABEL,'+',"", "not"),
|
273
|
+
sent->word[w].d);
|
274
|
+
if (w<sent->length-1 && strcmp(sent->word[w+1].string, "only")==0) {
|
275
|
+
sent->word[w+1].d = catenate_disjuncts(
|
276
|
+
special_disjunct(NOTONLY_LABEL, '-',"","only"),
|
277
|
+
sent->word[w+1].d);
|
278
|
+
d = special_disjunct(NOTONLY_LABEL, '+', "","not");
|
279
|
+
d = add_one_connector(NOT_LABEL,'+',"", d);
|
280
|
+
sent->word[w].d = catenate_disjuncts(d, sent->word[w].d);
|
281
|
+
}
|
282
|
+
}
|
283
|
+
/* The code below prevents sentences such as the following from
|
284
|
+
parsing:
|
285
|
+
it was not carried out by Serbs but by Croats */
|
286
|
+
|
287
|
+
|
288
|
+
/* We decided that this is a silly thing to. Here's the bug report
|
289
|
+
caused by this:
|
290
|
+
|
291
|
+
Bug with conjunctions. Some that work with "and" but they don't work
|
292
|
+
with "but". "He was not hit by John and by Fred".
|
293
|
+
(Try replacing "and" by "but" and it does not work.
|
294
|
+
It's getting confused by the "not".)
|
295
|
+
*/
|
296
|
+
for (w=0; w<sent->length; w++) {
|
297
|
+
if (strcmp(sent->word[w].string, "but") != 0) continue;
|
298
|
+
sent->word[w].d = glom_aux_connector
|
299
|
+
(sent->word[w].d, NOT_LABEL, FALSE);
|
300
|
+
/* The above line use to have a TRUE in it */
|
301
|
+
}
|
302
|
+
}
|
303
|
+
|
304
|
+
static void construct_both(Sentence sent)
|
305
|
+
{
|
306
|
+
int w;
|
307
|
+
if (!sentence_contains(sent, "both")) return;
|
308
|
+
for (w=0; w<sent->length; w++) {
|
309
|
+
if (strcmp(sent->word[w].string, "both") != 0) continue;
|
310
|
+
sent->word[w].d = catenate_disjuncts(
|
311
|
+
special_disjunct(BOTH_LABEL,'+',"", "both"),
|
312
|
+
sent->word[w].d);
|
313
|
+
}
|
314
|
+
|
315
|
+
for (w=0; w<sent->length; w++) {
|
316
|
+
if (strcmp(sent->word[w].string, "and") != 0) continue;
|
317
|
+
sent->word[w].d = glom_aux_connector(sent->word[w].d, BOTH_LABEL, FALSE);
|
318
|
+
}
|
319
|
+
}
|
320
|
+
|
321
|
+
void install_special_conjunctive_connectors(Sentence sent)
|
322
|
+
{
|
323
|
+
construct_either(sent); /* special connectors for "either" */
|
324
|
+
construct_neither(sent); /* special connectors for "neither" */
|
325
|
+
construct_notonlybut(sent); /* special connectors for "not..but.." */
|
326
|
+
/* and "not only..but.." */
|
327
|
+
construct_both(sent); /* special connectors for "both..and.." */
|
328
|
+
construct_comma(sent); /* special connectors for extra comma */
|
329
|
+
}
|
@@ -0,0 +1,13 @@
|
|
1
|
+
/*************************************************************************/
|
2
|
+
/* Copyright (c) 2004 */
|
3
|
+
/* Daniel Sleator, David Temperley, and John Lafferty */
|
4
|
+
/* All rights reserved */
|
5
|
+
/* */
|
6
|
+
/* Use of the link grammar parsing system is subject to the terms of the */
|
7
|
+
/* license set forth in the LICENSE file included with this software, */
|
8
|
+
/* and also available at http://www.link.cs.cmu.edu/link/license.html */
|
9
|
+
/* This license allows free redistribution and use in source and binary */
|
10
|
+
/* forms, with or without modification, subject to certain conditions. */
|
11
|
+
/* */
|
12
|
+
/*************************************************************************/
|
13
|
+
void install_special_conjunctive_connectors(Sentence sent);
|
@@ -0,0 +1,1113 @@
|
|
1
|
+
/*************************************************************************/
|
2
|
+
/* Copyright (c) 2004 */
|
3
|
+
/* Daniel Sleator, David Temperley, and John Lafferty */
|
4
|
+
/* All rights reserved */
|
5
|
+
/* */
|
6
|
+
/* Use of the link grammar parsing system is subject to the terms of the */
|
7
|
+
/* license set forth in the LICENSE file included with this software, */
|
8
|
+
/* and also available at http://www.link.cs.cmu.edu/link/license.html */
|
9
|
+
/* This license allows free redistribution and use in source and binary */
|
10
|
+
/* forms, with or without modification, subject to certain conditions. */
|
11
|
+
/* */
|
12
|
+
/*************************************************************************/
|
13
|
+
|
14
|
+
/* see bottom of file for comments on post processing */
|
15
|
+
|
16
|
+
#include <stdarg.h>
|
17
|
+
#include <memory.h>
|
18
|
+
#include "api.h"
|
19
|
+
#include "error.h"
|
20
|
+
|
21
|
+
#define PP_MAX_DOMAINS 128
|
22
|
+
|
23
|
+
/***************** utility routines (not exported) ***********************/
|
24
|
+
|
25
|
+
static int string_in_list(const char * s, const char * a[])
|
26
|
+
{
|
27
|
+
/* returns FALSE if the string s does not match anything in
|
28
|
+
the array. The array elements are post-processing symbols */
|
29
|
+
int i;
|
30
|
+
for (i=0; a[i] != NULL; i++)
|
31
|
+
if (post_process_match(a[i], s)) return TRUE;
|
32
|
+
return FALSE;
|
33
|
+
}
|
34
|
+
|
35
|
+
static int find_domain_name(Postprocessor *pp, const char *link)
|
36
|
+
{
|
37
|
+
/* Return the name of the domain associated with the provided starting
|
38
|
+
link. Return -1 if link isn't associated with a domain. */
|
39
|
+
int i,domain;
|
40
|
+
StartingLinkAndDomain *sllt = pp->knowledge->starting_link_lookup_table;
|
41
|
+
for (i=0;;i++)
|
42
|
+
{
|
43
|
+
domain = sllt[i].domain;
|
44
|
+
if (domain==-1) return -1; /* hit the end-of-list sentinel */
|
45
|
+
if (post_process_match(sllt[i].starting_link, link)) return domain;
|
46
|
+
}
|
47
|
+
}
|
48
|
+
|
49
|
+
static int contained_in(Domain * d1, Domain * d2, Sublinkage *sublinkage)
|
50
|
+
{
|
51
|
+
/* returns TRUE if domain d1 is contained in domain d2 */
|
52
|
+
char mark[MAX_LINKS];
|
53
|
+
List_o_links * lol;
|
54
|
+
memset(mark, 0, sublinkage->num_links*(sizeof mark[0]));
|
55
|
+
for (lol=d2->lol; lol != NULL; lol = lol->next)
|
56
|
+
mark[lol->link] = TRUE;
|
57
|
+
for (lol=d1->lol; lol != NULL; lol = lol->next)
|
58
|
+
if (!mark[lol->link]) return FALSE;
|
59
|
+
return TRUE;
|
60
|
+
}
|
61
|
+
|
62
|
+
static int link_in_domain(int link, Domain * d)
|
63
|
+
{
|
64
|
+
/* returns the predicate "the given link is in the given domain" */
|
65
|
+
List_o_links * lol;
|
66
|
+
for (lol = d->lol; lol != NULL; lol = lol->next)
|
67
|
+
if (lol->link == link) return TRUE;
|
68
|
+
return FALSE;
|
69
|
+
}
|
70
|
+
|
71
|
+
/* #define CHECK_DOMAIN_NESTING */
|
72
|
+
|
73
|
+
#if defined(CHECK_DOMAIN_NESTING)
|
74
|
+
/* Although this is no longer used, I'm leaving the code here for future reference --DS 3/98 */
|
75
|
+
|
76
|
+
static int check_domain_nesting(Postprocessor *pp, int num_links)
|
77
|
+
{
|
78
|
+
/* returns TRUE if the domains actually form a properly nested structure */
|
79
|
+
Domain * d1, * d2;
|
80
|
+
int counts[4];
|
81
|
+
char mark[MAX_LINKS];
|
82
|
+
List_o_links * lol;
|
83
|
+
int i;
|
84
|
+
for (d1=pp->pp_data.domain_array; d1 < pp->pp_data.domain_array + pp->pp_data.N_domains; d1++) {
|
85
|
+
for (d2=d1+1; d2 < pp->pp_data.domain_array + pp->pp_data.N_domains; d2++) {
|
86
|
+
memset(mark, 0, num_links*(sizeof mark[0]));
|
87
|
+
for (lol=d2->lol; lol != NULL; lol = lol->next) {
|
88
|
+
mark[lol->link] = 1;
|
89
|
+
}
|
90
|
+
for (lol=d1->lol; lol != NULL; lol = lol->next) {
|
91
|
+
mark[lol->link] += 2;
|
92
|
+
}
|
93
|
+
counts[0] = counts[1] = counts[2] = counts[3] = 0;
|
94
|
+
for (i=0; i<num_links; i++)
|
95
|
+
counts[(int)mark[i]]++;/* (int) cast avoids compiler warning DS 7/97 */
|
96
|
+
if ((counts[1] > 0) && (counts[2] > 0) && (counts[3] > 0))
|
97
|
+
return FALSE;
|
98
|
+
}
|
99
|
+
}
|
100
|
+
return TRUE;
|
101
|
+
}
|
102
|
+
#endif
|
103
|
+
|
104
|
+
/**
|
105
|
+
* Free the list of links pointed to by lol
|
106
|
+
* (does not free any strings)
|
107
|
+
*/
|
108
|
+
static void free_List_o_links(List_o_links *lol)
|
109
|
+
{
|
110
|
+
List_o_links * xlol;
|
111
|
+
while(lol != NULL) {
|
112
|
+
xlol = lol->next;
|
113
|
+
xfree(lol, sizeof(List_o_links));
|
114
|
+
lol = xlol;
|
115
|
+
}
|
116
|
+
}
|
117
|
+
|
118
|
+
static void free_D_tree_leaves(DTreeLeaf *dtl)
|
119
|
+
{
|
120
|
+
DTreeLeaf * xdtl;
|
121
|
+
while(dtl != NULL) {
|
122
|
+
xdtl = dtl->next;
|
123
|
+
xfree(dtl, sizeof(DTreeLeaf));
|
124
|
+
dtl = xdtl;
|
125
|
+
}
|
126
|
+
}
|
127
|
+
|
128
|
+
/**
|
129
|
+
* Gets called after every invocation of post_process()
|
130
|
+
*/
|
131
|
+
void post_process_free_data(PP_data * ppd)
|
132
|
+
{
|
133
|
+
int w, d;
|
134
|
+
for (w = 0; w < ppd->length; w++)
|
135
|
+
{
|
136
|
+
free_List_o_links(ppd->word_links[w]);
|
137
|
+
ppd->word_links[w] = NULL;
|
138
|
+
}
|
139
|
+
for (d = 0; d < ppd->N_domains; d++)
|
140
|
+
{
|
141
|
+
free_List_o_links(ppd->domain_array[d].lol);
|
142
|
+
ppd->domain_array[d].lol = NULL;
|
143
|
+
free_D_tree_leaves(ppd->domain_array[d].child);
|
144
|
+
ppd->domain_array[d].child = NULL;
|
145
|
+
}
|
146
|
+
free_List_o_links(ppd->links_to_ignore);
|
147
|
+
ppd->links_to_ignore = NULL;
|
148
|
+
}
|
149
|
+
|
150
|
+
#ifdef THIS_FUNCTION_IS_NOT_CURRENTLY_USED
|
151
|
+
static void connectivity_dfs(Postprocessor *pp, Sublinkage *sublinkage,
|
152
|
+
int w, pp_linkset *ls)
|
153
|
+
{
|
154
|
+
List_o_links *lol;
|
155
|
+
pp->visited[w] = TRUE;
|
156
|
+
for (lol = pp->pp_data.word_links[w]; lol != NULL; lol = lol->next)
|
157
|
+
{
|
158
|
+
if (!pp->visited[lol->word] &&
|
159
|
+
!pp_linkset_match(ls, sublinkage->link[lol->link]->name))
|
160
|
+
connectivity_dfs(pp, sublinkage, lol->word, ls);
|
161
|
+
}
|
162
|
+
}
|
163
|
+
#endif /* THIS_FUNCTION_IS_NOT_CURRENTLY_USED */
|
164
|
+
|
165
|
+
static void mark_reachable_words(Postprocessor *pp, int w)
|
166
|
+
{
|
167
|
+
List_o_links *lol;
|
168
|
+
if (pp->visited[w]) return;
|
169
|
+
pp->visited[w] = TRUE;
|
170
|
+
for (lol = pp->pp_data.word_links[w]; lol != NULL; lol = lol->next)
|
171
|
+
mark_reachable_words(pp, lol->word);
|
172
|
+
}
|
173
|
+
|
174
|
+
static int is_connected(Postprocessor *pp)
|
175
|
+
{
|
176
|
+
/* Returns true if the linkage is connected, considering words
|
177
|
+
that have at least one edge....this allows conjunctive sentences
|
178
|
+
not to be thrown out. */
|
179
|
+
int i;
|
180
|
+
for (i=0; i<pp->pp_data.length; i++)
|
181
|
+
pp->visited[i] = (pp->pp_data.word_links[i] == NULL);
|
182
|
+
mark_reachable_words(pp, 0);
|
183
|
+
for (i=0; i<pp->pp_data.length; i++)
|
184
|
+
if (!pp->visited[i]) return FALSE;
|
185
|
+
return TRUE;
|
186
|
+
}
|
187
|
+
|
188
|
+
|
189
|
+
static void build_type_array(Postprocessor *pp)
|
190
|
+
{
|
191
|
+
D_type_list * dtl;
|
192
|
+
int d;
|
193
|
+
List_o_links * lol;
|
194
|
+
for (d=0; d<pp->pp_data.N_domains; d++)
|
195
|
+
{
|
196
|
+
for (lol=pp->pp_data.domain_array[d].lol; lol != NULL; lol = lol->next)
|
197
|
+
{
|
198
|
+
dtl = (D_type_list *) xalloc(sizeof(D_type_list));
|
199
|
+
dtl->next = pp->pp_node->d_type_array[lol->link];
|
200
|
+
pp->pp_node->d_type_array[lol->link] = dtl;
|
201
|
+
dtl->type = pp->pp_data.domain_array[d].type;
|
202
|
+
}
|
203
|
+
}
|
204
|
+
}
|
205
|
+
|
206
|
+
void free_d_type(D_type_list * dtl)
|
207
|
+
{
|
208
|
+
D_type_list * dtlx;
|
209
|
+
for (; dtl!=NULL; dtl=dtlx) {
|
210
|
+
dtlx = dtl->next;
|
211
|
+
xfree((void*) dtl, sizeof(D_type_list));
|
212
|
+
}
|
213
|
+
}
|
214
|
+
|
215
|
+
D_type_list * copy_d_type(D_type_list * dtl)
|
216
|
+
{
|
217
|
+
D_type_list *dtlx, *dtlcurr=NULL, *dtlhead=NULL;
|
218
|
+
for (; dtl!=NULL; dtl=dtl->next)
|
219
|
+
{
|
220
|
+
dtlx = (D_type_list *) xalloc(sizeof(D_type_list));
|
221
|
+
*dtlx = *dtl;
|
222
|
+
if (dtlhead == NULL)
|
223
|
+
{
|
224
|
+
dtlhead = dtlx;
|
225
|
+
dtlcurr = dtlx;
|
226
|
+
}
|
227
|
+
else
|
228
|
+
{
|
229
|
+
dtlcurr->next = dtlx;
|
230
|
+
dtlcurr = dtlx;
|
231
|
+
}
|
232
|
+
}
|
233
|
+
return dtlhead;
|
234
|
+
}
|
235
|
+
|
236
|
+
/** free the pp node from last time */
|
237
|
+
static void free_pp_node(Postprocessor *pp)
|
238
|
+
{
|
239
|
+
int i;
|
240
|
+
PP_node *ppn = pp->pp_node;
|
241
|
+
pp->pp_node = NULL;
|
242
|
+
if (ppn == NULL) return;
|
243
|
+
|
244
|
+
for (i=0; i<MAX_LINKS; i++)
|
245
|
+
{
|
246
|
+
free_d_type(ppn->d_type_array[i]);
|
247
|
+
}
|
248
|
+
xfree((void*) ppn, sizeof(PP_node));
|
249
|
+
}
|
250
|
+
|
251
|
+
|
252
|
+
/** set up a fresh pp_node for later use */
|
253
|
+
static void alloc_pp_node(Postprocessor *pp)
|
254
|
+
{
|
255
|
+
int i;
|
256
|
+
pp->pp_node=(PP_node *) xalloc(sizeof(PP_node));
|
257
|
+
pp->pp_node->violation = NULL;
|
258
|
+
for (i=0; i<MAX_LINKS; i++)
|
259
|
+
pp->pp_node->d_type_array[i] = NULL;
|
260
|
+
}
|
261
|
+
|
262
|
+
static void reset_pp_node(Postprocessor *pp)
|
263
|
+
{
|
264
|
+
free_pp_node(pp);
|
265
|
+
alloc_pp_node(pp);
|
266
|
+
}
|
267
|
+
|
268
|
+
/************************ rule application *******************************/
|
269
|
+
|
270
|
+
static int apply_rules(Postprocessor *pp,
|
271
|
+
int (applyfn) (Postprocessor *,Sublinkage *,pp_rule *),
|
272
|
+
Sublinkage *sublinkage,
|
273
|
+
pp_rule *rule_array,
|
274
|
+
const char **msg)
|
275
|
+
{
|
276
|
+
int i;
|
277
|
+
for (i=0; (*msg=rule_array[i].msg)!=NULL; i++)
|
278
|
+
if (!applyfn(pp, sublinkage, &(rule_array[i]))) return 0;
|
279
|
+
return 1;
|
280
|
+
}
|
281
|
+
|
282
|
+
static int
|
283
|
+
apply_relevant_rules(Postprocessor *pp,
|
284
|
+
int(applyfn)(Postprocessor *pp,Sublinkage*,pp_rule *rule),
|
285
|
+
Sublinkage *sublinkage,
|
286
|
+
pp_rule *rule_array,
|
287
|
+
int *relevant_rules,
|
288
|
+
const char **msg)
|
289
|
+
{
|
290
|
+
int i, idx;
|
291
|
+
|
292
|
+
/* if we didn't accumulate link names for this sentence, we need to apply
|
293
|
+
all rules */
|
294
|
+
if (pp_linkset_population(pp->set_of_links_of_sentence)==0) {
|
295
|
+
return apply_rules(pp, applyfn, sublinkage, rule_array, msg);
|
296
|
+
}
|
297
|
+
|
298
|
+
/* we did, and we don't */
|
299
|
+
for (i=0; (idx=relevant_rules[i])!=-1; i++) {
|
300
|
+
*msg = rule_array[idx].msg; /* Adam had forgotten this -- DS 4/9/98 */
|
301
|
+
if (!applyfn(pp, sublinkage, &(rule_array[idx]))) return 0;
|
302
|
+
}
|
303
|
+
return 1;
|
304
|
+
}
|
305
|
+
|
306
|
+
static int
|
307
|
+
apply_contains_one(Postprocessor *pp, Sublinkage *sublinkage, pp_rule *rule)
|
308
|
+
{
|
309
|
+
/* returns TRUE if and only if all groups containing the specified link
|
310
|
+
contain at least one from the required list. (as determined by exact
|
311
|
+
string matching) */
|
312
|
+
DTreeLeaf * dtl;
|
313
|
+
int d, count;
|
314
|
+
for (d=0; d<pp->pp_data.N_domains; d++)
|
315
|
+
{
|
316
|
+
for (dtl = pp->pp_data.domain_array[d].child;
|
317
|
+
dtl != NULL &&
|
318
|
+
!post_process_match(rule->selector,
|
319
|
+
sublinkage->link[dtl->link]->name);
|
320
|
+
dtl = dtl->next) {}
|
321
|
+
if (dtl != NULL)
|
322
|
+
{
|
323
|
+
/* selector link of rule appears in this domain */
|
324
|
+
count=0;
|
325
|
+
for (dtl = pp->pp_data.domain_array[d].child; dtl != NULL; dtl = dtl->next)
|
326
|
+
if (string_in_list(sublinkage->link[dtl->link]->name,
|
327
|
+
rule->link_array))
|
328
|
+
{
|
329
|
+
count=1;
|
330
|
+
break;
|
331
|
+
}
|
332
|
+
if (count == 0) return FALSE;
|
333
|
+
}
|
334
|
+
}
|
335
|
+
return TRUE;
|
336
|
+
}
|
337
|
+
|
338
|
+
|
339
|
+
static int
|
340
|
+
apply_contains_none(Postprocessor *pp,Sublinkage *sublinkage,pp_rule *rule)
|
341
|
+
{
|
342
|
+
/* returns TRUE if and only if:
|
343
|
+
all groups containing the selector link do not contain anything
|
344
|
+
from the link_array contained in the rule. Uses exact string matching. */
|
345
|
+
DTreeLeaf * dtl;
|
346
|
+
int d;
|
347
|
+
for (d=0; d<pp->pp_data.N_domains; d++)
|
348
|
+
{
|
349
|
+
for (dtl = pp->pp_data.domain_array[d].child;
|
350
|
+
dtl != NULL &&
|
351
|
+
!post_process_match(rule->selector,
|
352
|
+
sublinkage->link[dtl->link]->name);
|
353
|
+
dtl = dtl->next) {}
|
354
|
+
if (dtl != NULL)
|
355
|
+
{
|
356
|
+
/* selector link of rule appears in this domain */
|
357
|
+
for (dtl = pp->pp_data.domain_array[d].child; dtl != NULL; dtl = dtl->next)
|
358
|
+
if (string_in_list(sublinkage->link[dtl->link]->name,
|
359
|
+
rule->link_array))
|
360
|
+
return FALSE;
|
361
|
+
}
|
362
|
+
}
|
363
|
+
return TRUE;
|
364
|
+
}
|
365
|
+
|
366
|
+
static int
|
367
|
+
apply_contains_one_globally(Postprocessor *pp,Sublinkage *sublinkage,pp_rule *rule)
|
368
|
+
{
|
369
|
+
/* returns TRUE if and only if
|
370
|
+
(1) the sentence doesn't contain the selector link for the rule, or
|
371
|
+
(2) it does, and it also contains one or more from the rule's link set */
|
372
|
+
|
373
|
+
int i,j,count;
|
374
|
+
for (i=0; i<sublinkage->num_links; i++) {
|
375
|
+
if (sublinkage->link[i]->l == -1) continue;
|
376
|
+
if (post_process_match(rule->selector,sublinkage->link[i]->name)) break;
|
377
|
+
}
|
378
|
+
if (i==sublinkage->num_links) return TRUE;
|
379
|
+
|
380
|
+
/* selector link of rule appears in sentence */
|
381
|
+
count=0;
|
382
|
+
for (j=0; j<sublinkage->num_links && count==0; j++) {
|
383
|
+
if (sublinkage->link[j]->l == -1) continue;
|
384
|
+
if (string_in_list(sublinkage->link[j]->name, rule->link_array))
|
385
|
+
{
|
386
|
+
count=1;
|
387
|
+
break;
|
388
|
+
}
|
389
|
+
}
|
390
|
+
if (count==0) return FALSE; else return TRUE;
|
391
|
+
}
|
392
|
+
|
393
|
+
static int
|
394
|
+
apply_connected(Postprocessor *pp, Sublinkage *sublinkage, pp_rule *rule)
|
395
|
+
{
|
396
|
+
/* There is actually just one (or none, if user didn't specify it)
|
397
|
+
rule asserting that linkage is connected. */
|
398
|
+
if (!is_connected(pp)) return 0;
|
399
|
+
return 1;
|
400
|
+
}
|
401
|
+
|
402
|
+
#if 0
|
403
|
+
/* replaced in 3/98 with a slightly different algorithm shown below ---DS*/
|
404
|
+
static int
|
405
|
+
apply_connected_without(Postprocessor *pp,Sublinkage *sublinkage,pp_rule *rule)
|
406
|
+
{
|
407
|
+
/* Returns true if the linkage is connected when ignoring the links
|
408
|
+
whose names are in the given list of link names.
|
409
|
+
Actually, what it does is this: it returns FALSE if the connectivity
|
410
|
+
of the subgraph reachable from word 0 changes as a result of deleting
|
411
|
+
these links. */
|
412
|
+
int i;
|
413
|
+
memset(pp->visited, 0, pp->pp_data.length*(sizeof pp->visited[0]));
|
414
|
+
mark_reachable_words(pp, 0);
|
415
|
+
for (i=0; i<pp->pp_data.length; i++)
|
416
|
+
pp->visited[i] = !pp->visited[i];
|
417
|
+
connectivity_dfs(pp, sublinkage, 0, rule->link_set);
|
418
|
+
for (i=0; i<pp->pp_data.length; i++)
|
419
|
+
if (pp->visited[i] == FALSE) return FALSE;
|
420
|
+
return TRUE;
|
421
|
+
}
|
422
|
+
#else
|
423
|
+
|
424
|
+
/* Here's the new algorithm: For each link in the linkage that is in the
|
425
|
+
must_form_a_cycle list, we want to make sure that that link
|
426
|
+
is in a cycle. We do this simply by deleting the link, then seeing if the
|
427
|
+
end points of that link are still connected.
|
428
|
+
*/
|
429
|
+
|
430
|
+
static void reachable_without_dfs(Postprocessor *pp, Sublinkage *sublinkage, int a, int b, int w) {
|
431
|
+
/* This is a depth first search of words reachable from w, excluding any direct edge
|
432
|
+
between word a and word b. */
|
433
|
+
List_o_links *lol;
|
434
|
+
pp->visited[w] = TRUE;
|
435
|
+
for (lol = pp->pp_data.word_links[w]; lol != NULL; lol = lol->next) {
|
436
|
+
if (!pp->visited[lol->word] && !(w == a && lol->word == b) && ! (w == b && lol->word == a)) {
|
437
|
+
reachable_without_dfs(pp, sublinkage, a, b, lol->word);
|
438
|
+
}
|
439
|
+
}
|
440
|
+
}
|
441
|
+
|
442
|
+
/**
|
443
|
+
* Returns TRUE if the linkage is connected when ignoring the links
|
444
|
+
* whose names are in the given list of link names.
|
445
|
+
* Actually, what it does is this: it returns FALSE if the connectivity
|
446
|
+
* of the subgraph reachable from word 0 changes as a result of deleting
|
447
|
+
* these links.
|
448
|
+
*/
|
449
|
+
static int
|
450
|
+
apply_must_form_a_cycle(Postprocessor *pp,Sublinkage *sublinkage,pp_rule *rule)
|
451
|
+
{
|
452
|
+
List_o_links *lol;
|
453
|
+
int w;
|
454
|
+
for (w=0; w<pp->pp_data.length; w++) {
|
455
|
+
for (lol = pp->pp_data.word_links[w]; lol != NULL; lol = lol->next) {
|
456
|
+
if (w > lol->word) continue; /* only consider each edge once */
|
457
|
+
if (!pp_linkset_match(rule->link_set, sublinkage->link[lol->link]->name)) continue;
|
458
|
+
memset(pp->visited, 0, pp->pp_data.length*(sizeof pp->visited[0]));
|
459
|
+
reachable_without_dfs(pp, sublinkage, w, lol->word, w);
|
460
|
+
if (!pp->visited[lol->word]) return FALSE;
|
461
|
+
}
|
462
|
+
}
|
463
|
+
|
464
|
+
for (lol = pp->pp_data.links_to_ignore; lol != NULL; lol = lol->next) {
|
465
|
+
w = sublinkage->link[lol->link]->l;
|
466
|
+
/* (w, lol->word) are the left and right ends of the edge we're considering */
|
467
|
+
if (!pp_linkset_match(rule->link_set, sublinkage->link[lol->link]->name)) continue;
|
468
|
+
memset(pp->visited, 0, pp->pp_data.length*(sizeof pp->visited[0]));
|
469
|
+
reachable_without_dfs(pp, sublinkage, w, lol->word, w);
|
470
|
+
if (!pp->visited[lol->word]) return FALSE;
|
471
|
+
}
|
472
|
+
|
473
|
+
return TRUE;
|
474
|
+
}
|
475
|
+
|
476
|
+
#endif
|
477
|
+
|
478
|
+
static int
|
479
|
+
apply_bounded(Postprocessor *pp,Sublinkage *sublinkage,pp_rule *rule)
|
480
|
+
{
|
481
|
+
/* Checks to see that all domains with this name have the property that
|
482
|
+
all of the words that touch a link in the domain are not to the left
|
483
|
+
of the root word of the domain. */
|
484
|
+
int d, lw, d_type;
|
485
|
+
List_o_links * lol;
|
486
|
+
d_type = rule->domain;
|
487
|
+
for (d=0; d<pp->pp_data.N_domains; d++) {
|
488
|
+
if (pp->pp_data.domain_array[d].type != d_type) continue;
|
489
|
+
lw = sublinkage->link[pp->pp_data.domain_array[d].start_link]->l;
|
490
|
+
for (lol = pp->pp_data.domain_array[d].lol; lol != NULL; lol = lol->next) {
|
491
|
+
if (sublinkage->link[lol->link]->l < lw) return FALSE;
|
492
|
+
}
|
493
|
+
}
|
494
|
+
return TRUE;
|
495
|
+
}
|
496
|
+
|
497
|
+
/********************* various non-exported functions ***********************/
|
498
|
+
|
499
|
+
static void build_graph(Postprocessor *pp, Sublinkage *sublinkage)
|
500
|
+
{
|
501
|
+
/* fill in the pp->pp_data.word_links array with a list of words neighboring each
|
502
|
+
word (actually a list of links). The dir fields are not set, since this
|
503
|
+
(after fat-link-extraction) is an undirected graph. */
|
504
|
+
int i, link;
|
505
|
+
List_o_links * lol;
|
506
|
+
|
507
|
+
for (i=0; i<pp->pp_data.length; i++)
|
508
|
+
pp->pp_data.word_links[i] = NULL;
|
509
|
+
|
510
|
+
for (link=0; link<sublinkage->num_links; link++)
|
511
|
+
{
|
512
|
+
if (sublinkage->link[link]->l == -1) continue;
|
513
|
+
if (pp_linkset_match(pp->knowledge->ignore_these_links, sublinkage->link[link]->name)) {
|
514
|
+
lol = (List_o_links *) xalloc(sizeof(List_o_links));
|
515
|
+
lol->next = pp->pp_data.links_to_ignore;
|
516
|
+
pp->pp_data.links_to_ignore = lol;
|
517
|
+
lol->link = link;
|
518
|
+
lol->word = sublinkage->link[link]->r;
|
519
|
+
continue;
|
520
|
+
}
|
521
|
+
|
522
|
+
lol = (List_o_links *) xalloc(sizeof(List_o_links));
|
523
|
+
lol->next = pp->pp_data.word_links[sublinkage->link[link]->l];
|
524
|
+
pp->pp_data.word_links[sublinkage->link[link]->l] = lol;
|
525
|
+
lol->link = link;
|
526
|
+
lol->word = sublinkage->link[link]->r;
|
527
|
+
|
528
|
+
lol = (List_o_links *) xalloc(sizeof(List_o_links));
|
529
|
+
lol->next = pp->pp_data.word_links[sublinkage->link[link]->r];
|
530
|
+
pp->pp_data.word_links[sublinkage->link[link]->r] = lol;
|
531
|
+
lol->link = link;
|
532
|
+
lol->word = sublinkage->link[link]->l;
|
533
|
+
}
|
534
|
+
}
|
535
|
+
|
536
|
+
static void setup_domain_array(Postprocessor *pp,
|
537
|
+
int n, const char *string, int start_link)
|
538
|
+
{
|
539
|
+
/* set pp->visited[i] to FALSE */
|
540
|
+
memset(pp->visited, 0, pp->pp_data.length*(sizeof pp->visited[0]));
|
541
|
+
pp->pp_data.domain_array[n].string = string;
|
542
|
+
pp->pp_data.domain_array[n].lol = NULL;
|
543
|
+
pp->pp_data.domain_array[n].size = 0;
|
544
|
+
pp->pp_data.domain_array[n].start_link = start_link;
|
545
|
+
}
|
546
|
+
|
547
|
+
static void add_link_to_domain(Postprocessor *pp, int link)
|
548
|
+
{
|
549
|
+
List_o_links *lol;
|
550
|
+
lol = (List_o_links *) xalloc(sizeof(List_o_links));
|
551
|
+
lol->next = pp->pp_data.domain_array[pp->pp_data.N_domains].lol;
|
552
|
+
pp->pp_data.domain_array[pp->pp_data.N_domains].lol = lol;
|
553
|
+
pp->pp_data.domain_array[pp->pp_data.N_domains].size++;
|
554
|
+
lol->link = link;
|
555
|
+
}
|
556
|
+
|
557
|
+
static void depth_first_search(Postprocessor *pp, Sublinkage *sublinkage,
|
558
|
+
int w, int root,int start_link)
|
559
|
+
{
|
560
|
+
List_o_links *lol;
|
561
|
+
pp->visited[w] = TRUE;
|
562
|
+
for (lol = pp->pp_data.word_links[w]; lol != NULL; lol = lol->next) {
|
563
|
+
if (lol->word < w && lol->link != start_link) {
|
564
|
+
add_link_to_domain(pp, lol->link);
|
565
|
+
}
|
566
|
+
}
|
567
|
+
for (lol = pp->pp_data.word_links[w]; lol != NULL; lol = lol->next) {
|
568
|
+
if (!pp->visited[lol->word] && (lol->word != root) &&
|
569
|
+
!(lol->word < root && lol->word < w &&
|
570
|
+
pp_linkset_match(pp->knowledge->restricted_links,
|
571
|
+
sublinkage->link[lol->link]->name)))
|
572
|
+
depth_first_search(pp, sublinkage, lol->word, root, start_link);
|
573
|
+
}
|
574
|
+
}
|
575
|
+
|
576
|
+
static void bad_depth_first_search(Postprocessor *pp, Sublinkage *sublinkage,
|
577
|
+
int w, int root, int start_link)
|
578
|
+
{
|
579
|
+
List_o_links * lol;
|
580
|
+
pp->visited[w] = TRUE;
|
581
|
+
for (lol = pp->pp_data.word_links[w]; lol != NULL; lol = lol->next) {
|
582
|
+
if ((lol->word < w) && (lol->link != start_link) && (w != root)) {
|
583
|
+
add_link_to_domain(pp, lol->link);
|
584
|
+
}
|
585
|
+
}
|
586
|
+
for (lol = pp->pp_data.word_links[w]; lol != NULL; lol = lol->next) {
|
587
|
+
if ((!pp->visited[lol->word]) && !(w == root && lol->word < w) &&
|
588
|
+
!(lol->word < root && lol->word < w &&
|
589
|
+
pp_linkset_match(pp->knowledge->restricted_links,
|
590
|
+
sublinkage->link[lol->link]->name)))
|
591
|
+
bad_depth_first_search(pp, sublinkage, lol->word, root, start_link);
|
592
|
+
}
|
593
|
+
}
|
594
|
+
|
595
|
+
static void d_depth_first_search(Postprocessor *pp, Sublinkage *sublinkage,
|
596
|
+
int w, int root, int right, int start_link)
|
597
|
+
{
|
598
|
+
List_o_links * lol;
|
599
|
+
pp->visited[w] = TRUE;
|
600
|
+
for (lol = pp->pp_data.word_links[w]; lol != NULL; lol = lol->next) {
|
601
|
+
if ((lol->word < w) && (lol->link != start_link) && (w != root)) {
|
602
|
+
add_link_to_domain(pp, lol->link);
|
603
|
+
}
|
604
|
+
}
|
605
|
+
for (lol = pp->pp_data.word_links[w]; lol != NULL; lol = lol->next) {
|
606
|
+
if (!pp->visited[lol->word] && !(w == root && lol->word >= right) &&
|
607
|
+
!(w == root && lol->word < root) &&
|
608
|
+
!(lol->word < root && lol->word < w &&
|
609
|
+
pp_linkset_match(pp->knowledge->restricted_links,
|
610
|
+
sublinkage->link[lol->link]->name)))
|
611
|
+
d_depth_first_search(pp,sublinkage,lol->word,root,right,start_link);
|
612
|
+
}
|
613
|
+
}
|
614
|
+
|
615
|
+
static void left_depth_first_search(Postprocessor *pp, Sublinkage *sublinkage,
|
616
|
+
int w, int right,int start_link)
|
617
|
+
{
|
618
|
+
List_o_links *lol;
|
619
|
+
pp->visited[w] = TRUE;
|
620
|
+
for (lol = pp->pp_data.word_links[w]; lol != NULL; lol = lol->next) {
|
621
|
+
if (lol->word < w && lol->link != start_link) {
|
622
|
+
add_link_to_domain(pp, lol->link);
|
623
|
+
}
|
624
|
+
}
|
625
|
+
for (lol = pp->pp_data.word_links[w]; lol != NULL; lol = lol->next) {
|
626
|
+
if (!pp->visited[lol->word] && (lol->word != right))
|
627
|
+
depth_first_search(pp, sublinkage, lol->word, right, start_link);
|
628
|
+
}
|
629
|
+
}
|
630
|
+
|
631
|
+
static int domain_compare(const Domain * d1, const Domain * d2)
|
632
|
+
{ return (d1->size-d2->size); /* for sorting the domains by size */ }
|
633
|
+
|
634
|
+
static void build_domains(Postprocessor *pp, Sublinkage *sublinkage)
|
635
|
+
{
|
636
|
+
int link, i, d;
|
637
|
+
const char *s;
|
638
|
+
pp->pp_data.N_domains = 0;
|
639
|
+
|
640
|
+
for (link = 0; link<sublinkage->num_links; link++) {
|
641
|
+
if (sublinkage->link[link]->l == -1) continue;
|
642
|
+
s = sublinkage->link[link]->name;
|
643
|
+
|
644
|
+
if (pp_linkset_match(pp->knowledge->ignore_these_links, s)) continue;
|
645
|
+
if (pp_linkset_match(pp->knowledge->domain_starter_links, s))
|
646
|
+
{
|
647
|
+
setup_domain_array(pp, pp->pp_data.N_domains, s, link);
|
648
|
+
if (pp_linkset_match(pp->knowledge->domain_contains_links, s))
|
649
|
+
add_link_to_domain(pp, link);
|
650
|
+
depth_first_search(pp,sublinkage,sublinkage->link[link]->r,
|
651
|
+
sublinkage->link[link]->l, link);
|
652
|
+
|
653
|
+
pp->pp_data.N_domains++;
|
654
|
+
assert(pp->pp_data.N_domains<PP_MAX_DOMAINS, "raise value of PP_MAX_DOMAINS");
|
655
|
+
}
|
656
|
+
else {
|
657
|
+
if (pp_linkset_match(pp->knowledge->urfl_domain_starter_links,s))
|
658
|
+
{
|
659
|
+
setup_domain_array(pp, pp->pp_data.N_domains, s, link);
|
660
|
+
/* always add the starter link to its urfl domain */
|
661
|
+
add_link_to_domain(pp, link);
|
662
|
+
bad_depth_first_search(pp,sublinkage,sublinkage->link[link]->r,
|
663
|
+
sublinkage->link[link]->l,link);
|
664
|
+
pp->pp_data.N_domains++;
|
665
|
+
assert(pp->pp_data.N_domains<PP_MAX_DOMAINS,"raise PP_MAX_DOMAINS value");
|
666
|
+
}
|
667
|
+
else
|
668
|
+
if (pp_linkset_match(pp->knowledge->urfl_only_domain_starter_links,s))
|
669
|
+
{
|
670
|
+
setup_domain_array(pp, pp->pp_data.N_domains, s, link);
|
671
|
+
/* do not add the starter link to its urfl_only domain */
|
672
|
+
d_depth_first_search(pp,sublinkage, sublinkage->link[link]->l,
|
673
|
+
sublinkage->link[link]->l,
|
674
|
+
sublinkage->link[link]->r,link);
|
675
|
+
pp->pp_data.N_domains++;
|
676
|
+
assert(pp->pp_data.N_domains<PP_MAX_DOMAINS,"raise PP_MAX_DOMAINS value");
|
677
|
+
}
|
678
|
+
else
|
679
|
+
if (pp_linkset_match(pp->knowledge->left_domain_starter_links,s))
|
680
|
+
{
|
681
|
+
setup_domain_array(pp, pp->pp_data.N_domains, s, link);
|
682
|
+
/* do not add the starter link to a left domain */
|
683
|
+
left_depth_first_search(pp,sublinkage, sublinkage->link[link]->l,
|
684
|
+
sublinkage->link[link]->r,link);
|
685
|
+
pp->pp_data.N_domains++;
|
686
|
+
assert(pp->pp_data.N_domains<PP_MAX_DOMAINS,"raise PP_MAX_DOMAINS value");
|
687
|
+
}
|
688
|
+
}
|
689
|
+
}
|
690
|
+
|
691
|
+
/* sort the domains by size */
|
692
|
+
qsort((void *) pp->pp_data.domain_array,
|
693
|
+
pp->pp_data.N_domains,
|
694
|
+
sizeof(Domain),
|
695
|
+
(int (*)(const void *, const void *)) domain_compare);
|
696
|
+
|
697
|
+
/* sanity check: all links in all domains have a legal domain name */
|
698
|
+
for (d=0; d<pp->pp_data.N_domains; d++) {
|
699
|
+
i = find_domain_name(pp, pp->pp_data.domain_array[d].string);
|
700
|
+
if (i == -1)
|
701
|
+
prt_error("Error: post_process(): Need an entry for %s in LINK_TYPE_TABLE",
|
702
|
+
pp->pp_data.domain_array[d].string);
|
703
|
+
pp->pp_data.domain_array[d].type = i;
|
704
|
+
}
|
705
|
+
}
|
706
|
+
|
707
|
+
static void build_domain_forest(Postprocessor *pp, Sublinkage *sublinkage)
|
708
|
+
{
|
709
|
+
int d, d1, link;
|
710
|
+
DTreeLeaf * dtl;
|
711
|
+
if (pp->pp_data.N_domains > 0)
|
712
|
+
pp->pp_data.domain_array[pp->pp_data.N_domains-1].parent = NULL;
|
713
|
+
for (d=0; d < pp->pp_data.N_domains-1; d++) {
|
714
|
+
for (d1 = d+1; d1 < pp->pp_data.N_domains; d1++) {
|
715
|
+
if (contained_in(&pp->pp_data.domain_array[d],&pp->pp_data.domain_array[d1],sublinkage))
|
716
|
+
{
|
717
|
+
pp->pp_data.domain_array[d].parent = &pp->pp_data.domain_array[d1];
|
718
|
+
break;
|
719
|
+
}
|
720
|
+
}
|
721
|
+
if (d1 == pp->pp_data.N_domains) {
|
722
|
+
/* we know this domain is a root of a new tree */
|
723
|
+
pp->pp_data.domain_array[d].parent = NULL;
|
724
|
+
/* It's now ok for this to happen. It used to do:
|
725
|
+
printf("I can't find a parent domain for this domain\n");
|
726
|
+
print_domain(d);
|
727
|
+
exit(1); */
|
728
|
+
}
|
729
|
+
}
|
730
|
+
/* the parent links of domain nodes have been established.
|
731
|
+
now do the leaves */
|
732
|
+
for (d=0; d < pp->pp_data.N_domains; d++) {
|
733
|
+
pp->pp_data.domain_array[d].child = NULL;
|
734
|
+
}
|
735
|
+
for (link=0; link < sublinkage->num_links; link++) {
|
736
|
+
if (sublinkage->link[link]->l == -1) continue; /* probably not necessary */
|
737
|
+
for (d=0; d<pp->pp_data.N_domains; d++) {
|
738
|
+
if (link_in_domain(link, &pp->pp_data.domain_array[d])) {
|
739
|
+
dtl = (DTreeLeaf *) xalloc(sizeof(DTreeLeaf));
|
740
|
+
dtl->link = link;
|
741
|
+
dtl->parent = &pp->pp_data.domain_array[d];
|
742
|
+
dtl->next = pp->pp_data.domain_array[d].child;
|
743
|
+
pp->pp_data.domain_array[d].child = dtl;
|
744
|
+
break;
|
745
|
+
}
|
746
|
+
}
|
747
|
+
}
|
748
|
+
}
|
749
|
+
|
750
|
+
static int
|
751
|
+
internal_process(Postprocessor *pp, Sublinkage *sublinkage, const char **msg)
|
752
|
+
{
|
753
|
+
int i;
|
754
|
+
/* quick test: try applying just the relevant global rules */
|
755
|
+
if (!apply_relevant_rules(pp,apply_contains_one_globally,
|
756
|
+
sublinkage,
|
757
|
+
pp->knowledge->contains_one_rules,
|
758
|
+
pp->relevant_contains_one_rules, msg)) {
|
759
|
+
for (i=0; i<pp->pp_data.length; i++)
|
760
|
+
pp->pp_data.word_links[i] = NULL;
|
761
|
+
pp->pp_data.N_domains = 0;
|
762
|
+
return -1;
|
763
|
+
}
|
764
|
+
|
765
|
+
/* build graph; confirm that it's legally connected */
|
766
|
+
build_graph(pp, sublinkage);
|
767
|
+
build_domains(pp, sublinkage);
|
768
|
+
build_domain_forest(pp, sublinkage);
|
769
|
+
|
770
|
+
#if defined(CHECK_DOMAIN_NESTING)
|
771
|
+
/* These messages were deemed to not be useful, so
|
772
|
+
this code is commented out. See comment above. */
|
773
|
+
if(!check_domain_nesting(pp, sublinkage->num_links))
|
774
|
+
printf("WARNING: The domains are not nested.\n");
|
775
|
+
#endif
|
776
|
+
|
777
|
+
/* The order below should be optimal for most cases */
|
778
|
+
if (!apply_relevant_rules(pp,apply_contains_one, sublinkage,
|
779
|
+
pp->knowledge->contains_one_rules,
|
780
|
+
pp->relevant_contains_one_rules, msg)) return 1;
|
781
|
+
if (!apply_relevant_rules(pp,apply_contains_none, sublinkage,
|
782
|
+
pp->knowledge->contains_none_rules,
|
783
|
+
pp->relevant_contains_none_rules, msg)) return 1;
|
784
|
+
if (!apply_rules(pp,apply_must_form_a_cycle, sublinkage,
|
785
|
+
pp->knowledge->form_a_cycle_rules,msg)) return 1;
|
786
|
+
if (!apply_rules(pp,apply_connected, sublinkage,
|
787
|
+
pp->knowledge->connected_rules, msg)) return 1;
|
788
|
+
if (!apply_rules(pp,apply_bounded, sublinkage,
|
789
|
+
pp->knowledge->bounded_rules, msg)) return 1;
|
790
|
+
return 0; /* This linkage satisfied all the rules */
|
791
|
+
}
|
792
|
+
|
793
|
+
|
794
|
+
/**
|
795
|
+
* Call this (a) after having called post_process_scan_linkage() on all
|
796
|
+
* generated linkages, but (b) before calling post_process() on any
|
797
|
+
* particular linkage. Here we mark all rules which we know (from having
|
798
|
+
* accumulated a set of link names appearing in *any* linkage) won't
|
799
|
+
* ever be needed.
|
800
|
+
*/
|
801
|
+
static void prune_irrelevant_rules(Postprocessor *pp)
|
802
|
+
{
|
803
|
+
pp_rule *rule;
|
804
|
+
int coIDX, cnIDX, rcoIDX=0, rcnIDX=0;
|
805
|
+
|
806
|
+
/* If we didn't scan any linkages, there's no pruning to be done. */
|
807
|
+
if (pp_linkset_population(pp->set_of_links_of_sentence)==0) return;
|
808
|
+
|
809
|
+
for (coIDX=0;;coIDX++)
|
810
|
+
{
|
811
|
+
rule = &(pp->knowledge->contains_one_rules[coIDX]);
|
812
|
+
if (rule->msg==NULL) break;
|
813
|
+
if (pp_linkset_match_bw(pp->set_of_links_of_sentence, rule->selector))
|
814
|
+
{
|
815
|
+
/* mark rule as being relevant to this sentence */
|
816
|
+
pp->relevant_contains_one_rules[rcoIDX++] = coIDX;
|
817
|
+
pp_linkset_add(pp->set_of_links_in_an_active_rule, rule->selector);
|
818
|
+
}
|
819
|
+
}
|
820
|
+
pp->relevant_contains_one_rules[rcoIDX] = -1; /* end sentinel */
|
821
|
+
|
822
|
+
for (cnIDX=0;;cnIDX++)
|
823
|
+
{
|
824
|
+
rule = &(pp->knowledge->contains_none_rules[cnIDX]);
|
825
|
+
if (rule->msg==NULL) break;
|
826
|
+
if (pp_linkset_match_bw(pp->set_of_links_of_sentence, rule->selector))
|
827
|
+
{
|
828
|
+
pp->relevant_contains_none_rules[rcnIDX++] = cnIDX;
|
829
|
+
pp_linkset_add(pp->set_of_links_in_an_active_rule, rule->selector);
|
830
|
+
}
|
831
|
+
}
|
832
|
+
pp->relevant_contains_none_rules[rcnIDX] = -1;
|
833
|
+
|
834
|
+
if (verbosity > 1)
|
835
|
+
{
|
836
|
+
printf("Saw %i unique link names in all linkages.\n",
|
837
|
+
pp_linkset_population(pp->set_of_links_of_sentence));
|
838
|
+
printf("Using %i 'contains one' rules and %i 'contains none' rules\n",
|
839
|
+
rcoIDX, rcnIDX);
|
840
|
+
}
|
841
|
+
}
|
842
|
+
|
843
|
+
|
844
|
+
/***************** definitions of exported functions ***********************/
|
845
|
+
|
846
|
+
/**
|
847
|
+
* read rules from path and initialize the appropriate fields in
|
848
|
+
* a postprocessor structure, a pointer to which is returned.
|
849
|
+
*/
|
850
|
+
Postprocessor * post_process_open(const char *path)
|
851
|
+
{
|
852
|
+
Postprocessor *pp;
|
853
|
+
if (path==NULL) return NULL;
|
854
|
+
|
855
|
+
pp = (Postprocessor *) xalloc (sizeof(Postprocessor));
|
856
|
+
pp->knowledge = pp_knowledge_open(path);
|
857
|
+
pp->sentence_link_name_set = string_set_create();
|
858
|
+
pp->set_of_links_of_sentence = pp_linkset_open(1024);
|
859
|
+
pp->set_of_links_in_an_active_rule=pp_linkset_open(1024);
|
860
|
+
pp->relevant_contains_one_rules =
|
861
|
+
(int *) xalloc ((pp->knowledge->n_contains_one_rules+1)
|
862
|
+
*(sizeof pp->relevant_contains_one_rules[0]));
|
863
|
+
pp->relevant_contains_none_rules =
|
864
|
+
(int *) xalloc ((pp->knowledge->n_contains_none_rules+1)
|
865
|
+
*(sizeof pp->relevant_contains_none_rules[0]));
|
866
|
+
pp->relevant_contains_one_rules[0] = -1;
|
867
|
+
pp->relevant_contains_none_rules[0] = -1;
|
868
|
+
pp->pp_node = NULL;
|
869
|
+
pp->pp_data.links_to_ignore = NULL;
|
870
|
+
pp->n_local_rules_firing = 0;
|
871
|
+
pp->n_global_rules_firing = 0;
|
872
|
+
return pp;
|
873
|
+
}
|
874
|
+
|
875
|
+
void post_process_close(Postprocessor *pp)
|
876
|
+
{
|
877
|
+
/* frees up memory associated with pp, previously allocated by open */
|
878
|
+
if (pp==NULL) return;
|
879
|
+
string_set_delete(pp->sentence_link_name_set);
|
880
|
+
pp_linkset_close(pp->set_of_links_of_sentence);
|
881
|
+
pp_linkset_close(pp->set_of_links_in_an_active_rule);
|
882
|
+
xfree(pp->relevant_contains_one_rules,
|
883
|
+
(1+pp->knowledge->n_contains_one_rules)
|
884
|
+
*(sizeof pp->relevant_contains_one_rules[0]));
|
885
|
+
xfree(pp->relevant_contains_none_rules,
|
886
|
+
(1+pp->knowledge->n_contains_none_rules)
|
887
|
+
*(sizeof pp->relevant_contains_none_rules[0]));
|
888
|
+
pp_knowledge_close(pp->knowledge);
|
889
|
+
free_pp_node(pp);
|
890
|
+
xfree(pp, sizeof(Postprocessor));
|
891
|
+
}
|
892
|
+
|
893
|
+
void post_process_close_sentence(Postprocessor *pp)
|
894
|
+
{
|
895
|
+
if (pp==NULL) return;
|
896
|
+
pp_linkset_clear(pp->set_of_links_of_sentence);
|
897
|
+
pp_linkset_clear(pp->set_of_links_in_an_active_rule);
|
898
|
+
string_set_delete(pp->sentence_link_name_set);
|
899
|
+
pp->sentence_link_name_set = string_set_create();
|
900
|
+
pp->n_local_rules_firing = 0;
|
901
|
+
pp->n_global_rules_firing = 0;
|
902
|
+
pp->relevant_contains_one_rules[0] = -1;
|
903
|
+
pp->relevant_contains_none_rules[0] = -1;
|
904
|
+
free_pp_node(pp);
|
905
|
+
}
|
906
|
+
|
907
|
+
/**
|
908
|
+
* During a first pass (prior to actual post-processing of the linkages
|
909
|
+
* of a sentence), call this once for every generated linkage. Here we
|
910
|
+
* simply maintain a set of "seen" link names for rule pruning later on
|
911
|
+
*/
|
912
|
+
void post_process_scan_linkage(Postprocessor *pp, Parse_Options opts,
|
913
|
+
Sentence sent, Sublinkage *sublinkage)
|
914
|
+
{
|
915
|
+
const char *p;
|
916
|
+
int i;
|
917
|
+
if (pp == NULL) return;
|
918
|
+
if (sent->length < opts->twopass_length) return;
|
919
|
+
for (i=0; i<sublinkage->num_links; i++)
|
920
|
+
{
|
921
|
+
if (sublinkage->link[i]->l == -1) continue;
|
922
|
+
p = string_set_add(sublinkage->link[i]->name, pp->sentence_link_name_set);
|
923
|
+
pp_linkset_add(pp->set_of_links_of_sentence, p);
|
924
|
+
}
|
925
|
+
}
|
926
|
+
|
927
|
+
/**
|
928
|
+
* Takes a sublinkage and returns:
|
929
|
+
* . for each link, the domain structure of that link
|
930
|
+
* . a list of the violation strings
|
931
|
+
* NB: sublinkage->link[i]->l=-1 means that this connector
|
932
|
+
* is to be ignored
|
933
|
+
*/
|
934
|
+
PP_node *post_process(Postprocessor *pp, Parse_Options opts,
|
935
|
+
Sentence sent, Sublinkage *sublinkage, int cleanup)
|
936
|
+
{
|
937
|
+
const char *msg;
|
938
|
+
|
939
|
+
if (pp==NULL) return NULL;
|
940
|
+
|
941
|
+
pp->pp_data.links_to_ignore = NULL;
|
942
|
+
pp->pp_data.length = sent->length;
|
943
|
+
|
944
|
+
/* In the name of responsible memory management, we retain a copy of the
|
945
|
+
* returned data structure pp_node as a field in pp, so that we can clear
|
946
|
+
* it out after every call, without relying on the user to do so. */
|
947
|
+
reset_pp_node(pp);
|
948
|
+
|
949
|
+
/* The first time we see a sentence, prune the rules which we won't be
|
950
|
+
* needing during postprocessing the linkages of this sentence */
|
951
|
+
if (sent->q_pruned_rules==FALSE && sent->length >= opts->twopass_length)
|
952
|
+
prune_irrelevant_rules(pp);
|
953
|
+
sent->q_pruned_rules=TRUE;
|
954
|
+
|
955
|
+
switch(internal_process(pp, sublinkage, &msg))
|
956
|
+
{
|
957
|
+
case -1:
|
958
|
+
/* some global test failed even before we had to build the domains */
|
959
|
+
pp->n_global_rules_firing++;
|
960
|
+
pp->pp_node->violation = msg;
|
961
|
+
return pp->pp_node;
|
962
|
+
break;
|
963
|
+
case 1:
|
964
|
+
/* one of the "normal" post processing tests failed */
|
965
|
+
pp->n_local_rules_firing++;
|
966
|
+
pp->pp_node->violation = msg;
|
967
|
+
break;
|
968
|
+
case 0:
|
969
|
+
/* This linkage is legal according to the post processing rules */
|
970
|
+
pp->pp_node->violation = NULL;
|
971
|
+
break;
|
972
|
+
}
|
973
|
+
|
974
|
+
build_type_array(pp);
|
975
|
+
if (cleanup)
|
976
|
+
{
|
977
|
+
post_process_free_data(&pp->pp_data);
|
978
|
+
}
|
979
|
+
return pp->pp_node;
|
980
|
+
}
|
981
|
+
|
982
|
+
/*
|
983
|
+
string comparison in postprocessing. The first parameter is a
|
984
|
+
post-processing symbol. The second one is a connector name from a link. The
|
985
|
+
upper case parts must match. We imagine that the first arg is padded with an
|
986
|
+
infinite sequence of "#" and that the 2nd one is padded with "*". "#"
|
987
|
+
matches anything, but "*" is just like an ordinary char for matching
|
988
|
+
purposes. For efficiency sake there are several different versions of these
|
989
|
+
functions
|
990
|
+
*/
|
991
|
+
|
992
|
+
int post_process_match(const char *s, const char *t)
|
993
|
+
{
|
994
|
+
char c;
|
995
|
+
while(isupper((int)*s) || isupper((int)*t))
|
996
|
+
{
|
997
|
+
if (*s != *t) return FALSE;
|
998
|
+
s++;
|
999
|
+
t++;
|
1000
|
+
}
|
1001
|
+
while (*s != '\0')
|
1002
|
+
{
|
1003
|
+
if (*s != '#')
|
1004
|
+
{
|
1005
|
+
if (*t == '\0') c = '*'; else c = *t;
|
1006
|
+
if (*s != c) return FALSE;
|
1007
|
+
}
|
1008
|
+
s++;
|
1009
|
+
if (*t != '\0') t++;
|
1010
|
+
}
|
1011
|
+
return TRUE;
|
1012
|
+
}
|
1013
|
+
|
1014
|
+
/* OLD COMMENTS (OUT OF DATE):
|
1015
|
+
This file does the post-processing.
|
1016
|
+
The main routine is "post_process()". It uses the link names only,
|
1017
|
+
and not the connectors.
|
1018
|
+
|
1019
|
+
A domain is a set of links. Each domain has a defining link.
|
1020
|
+
Only certain types of links serve to define a domain. These
|
1021
|
+
parameters are set by the lists of link names in a separate,
|
1022
|
+
human-readable file referred to herein as the 'knowledge file.'
|
1023
|
+
|
1024
|
+
The domains are nested: given two domains, either they're disjoint,
|
1025
|
+
or one contains the other, i.e. they're tree structured. The set of links
|
1026
|
+
in a domain (but in no smaller domain) are called the "group" of the
|
1027
|
+
domain. Data structures are built to store all this stuff.
|
1028
|
+
The tree structured property is not mathematically guaranteed by
|
1029
|
+
the domain construction algorithm. Davy simply claims that because
|
1030
|
+
of how he built the dictionary, the domains will always be so
|
1031
|
+
structured. The program checks this and gives an error message
|
1032
|
+
if it's violated.
|
1033
|
+
|
1034
|
+
Define the "root word" of a link (or domain) to be the word at the
|
1035
|
+
left end of the link. The other end of the defining link is called
|
1036
|
+
the "right word".
|
1037
|
+
|
1038
|
+
The domain corresponding to a link is defined to be the set of links
|
1039
|
+
reachable by starting from the right word, following links and never
|
1040
|
+
using the root word or any word to its left.
|
1041
|
+
|
1042
|
+
There are some minor exceptions to this. The "restricted_link" lists
|
1043
|
+
those connectors that, even if they point back before the root word,
|
1044
|
+
are included in the domain. Some of the starting links are included
|
1045
|
+
in their domain, these are listed in the "domain_contains_links" list.
|
1046
|
+
|
1047
|
+
Such was the way it was. Now Davy tells me there should be another type
|
1048
|
+
of domain that's quite different. Let's call these "urfl" domains.
|
1049
|
+
Certain type of connectors start urfl domains. They're listed below.
|
1050
|
+
In a urfl domain, the search includes the root word. It does a separate
|
1051
|
+
search to find urfl domains.
|
1052
|
+
|
1053
|
+
Restricted links should work just as they do with ordinary domains. If they
|
1054
|
+
come out of the right word, or anything to the right of it (that's
|
1055
|
+
in the domain), they should be included but should not be traced
|
1056
|
+
further. If they come out of the root word, they should not be
|
1057
|
+
included.
|
1058
|
+
*/
|
1059
|
+
|
1060
|
+
/*
|
1061
|
+
I also, unfortunately, want to propose a new type of domain. These
|
1062
|
+
would include everything that can be reached from the root word of the
|
1063
|
+
link, to the right, that is closer than the right word of the link.
|
1064
|
+
(They would not include the link itself.)
|
1065
|
+
|
1066
|
+
In the following sentence, then, the "Urfl_Only Domain" of the G link
|
1067
|
+
would include only the "O" link:
|
1068
|
+
|
1069
|
+
+-----G----+
|
1070
|
+
+---O--+ +-AI+
|
1071
|
+
| | | |
|
1072
|
+
hitting dogs is fun.a
|
1073
|
+
|
1074
|
+
In the following sentence it would include the "O", the "TT", the "I",
|
1075
|
+
the second "O", and the "A".
|
1076
|
+
|
1077
|
+
+----------------G---------------+
|
1078
|
+
+-----TT-----+ +-----O-----+ |
|
1079
|
+
+---O---+ +-I+ +---A--+ +-AI+
|
1080
|
+
| | | | | | | |
|
1081
|
+
telling people to do stupid things is fun.a
|
1082
|
+
|
1083
|
+
This would allow us to judge the following:
|
1084
|
+
|
1085
|
+
kicking dogs bores me
|
1086
|
+
*kicking dogs kicks dogs
|
1087
|
+
explaining the program is easy
|
1088
|
+
*explaining the program is running
|
1089
|
+
|
1090
|
+
(These are distinctions that I thought we would never be able to make,
|
1091
|
+
so I told myself they were semantic rather than syntactic. But with
|
1092
|
+
domains, they should be easy.)
|
1093
|
+
*/
|
1094
|
+
|
1095
|
+
/* Modifications, 6/96 ALB:
|
1096
|
+
1) Rules and link sets are relegated to a separate, user-written
|
1097
|
+
file(s), herein referred to as the 'knowledge file'
|
1098
|
+
2) This information is read by a lexer, in pp_lexer.l (lex code)
|
1099
|
+
whose exported routines are all prefixed by 'pp_lexer'
|
1100
|
+
3) when postprocessing a sentence, the links of each domain are
|
1101
|
+
placed in a set for quick lookup, ('contains one' and 'contains none')
|
1102
|
+
4) Functions which were never called have been eliminated:
|
1103
|
+
link_inhabits(), match_in_list(), group_type_contains(),
|
1104
|
+
group_type_contains_one(), group_type_contains_all()
|
1105
|
+
5) Some 'one-by-one' initializations have been replaced by faster
|
1106
|
+
block memory operations (memset etc.)
|
1107
|
+
6) The above comments are correct but incomplete! (1/97)
|
1108
|
+
7) observation: the 'contains one' is, empirically, by far the most
|
1109
|
+
violated rule, so it should come first in applying the rules.
|
1110
|
+
|
1111
|
+
Modifications, 9/97 ALB:
|
1112
|
+
Deglobalization. Made code constistent with api.
|
1113
|
+
*/
|