wukong 3.0.0.pre → 3.0.0.pre2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (476) hide show
  1. data/.gitignore +46 -33
  2. data/.gitmodules +3 -0
  3. data/.rspec +1 -1
  4. data/.travis.yml +8 -1
  5. data/.yardopts +0 -13
  6. data/Guardfile +4 -6
  7. data/{LICENSE.textile → LICENSE.md} +43 -55
  8. data/README-old.md +422 -0
  9. data/README.md +279 -418
  10. data/Rakefile +21 -5
  11. data/TODO.md +6 -6
  12. data/bin/wu-clean-encoding +31 -0
  13. data/bin/wu-lign +2 -2
  14. data/bin/wu-local +69 -0
  15. data/bin/wu-server +70 -0
  16. data/examples/Gemfile +38 -0
  17. data/examples/README.md +9 -0
  18. data/examples/dataflow/apache_log_line.rb +64 -25
  19. data/examples/dataflow/fibonacci_series.rb +101 -0
  20. data/examples/dataflow/parse_apache_logs.rb +37 -7
  21. data/examples/{dataflow.rb → dataflow/scraper_macro_flow.rb} +0 -0
  22. data/examples/dataflow/simple.rb +4 -4
  23. data/examples/geo.rb +4 -0
  24. data/examples/geo/geo_grids.numbers +0 -0
  25. data/examples/geo/geolocated.rb +331 -0
  26. data/examples/geo/quadtile.rb +69 -0
  27. data/examples/geo/spec/geolocated_spec.rb +247 -0
  28. data/examples/geo/tile_fetcher.rb +77 -0
  29. data/examples/graph/minimum_spanning_tree.rb +61 -61
  30. data/examples/jabberwocky.txt +36 -0
  31. data/examples/models/wikipedia.rb +20 -0
  32. data/examples/munging/Gemfile +8 -0
  33. data/examples/munging/airline_flights/airline.rb +57 -0
  34. data/examples/munging/airline_flights/airline_flights.rake +83 -0
  35. data/{lib/wukong/settings.rb → examples/munging/airline_flights/airplane.rb} +0 -0
  36. data/examples/munging/airline_flights/airport.rb +211 -0
  37. data/examples/munging/airline_flights/airport_id_unification.rb +129 -0
  38. data/examples/munging/airline_flights/airport_ok_chars.rb +4 -0
  39. data/examples/munging/airline_flights/flight.rb +156 -0
  40. data/examples/munging/airline_flights/models.rb +4 -0
  41. data/examples/munging/airline_flights/parse.rb +26 -0
  42. data/examples/munging/airline_flights/reconcile_airports.rb +142 -0
  43. data/examples/munging/airline_flights/route.rb +35 -0
  44. data/examples/munging/airline_flights/tasks.rake +83 -0
  45. data/examples/munging/airline_flights/timezone_fixup.rb +62 -0
  46. data/examples/munging/airline_flights/topcities.rb +167 -0
  47. data/examples/munging/airports/40_wbans.txt +40 -0
  48. data/examples/munging/airports/filter_weather_reports.rb +37 -0
  49. data/examples/munging/airports/join.pig +31 -0
  50. data/examples/munging/airports/to_tsv.rb +33 -0
  51. data/examples/munging/airports/usa_wbans.pig +19 -0
  52. data/examples/munging/airports/usa_wbans.txt +2157 -0
  53. data/examples/munging/airports/wbans.pig +19 -0
  54. data/examples/munging/airports/wbans.txt +2310 -0
  55. data/examples/munging/geo/geo_json.rb +54 -0
  56. data/examples/munging/geo/geo_models.rb +69 -0
  57. data/examples/munging/geo/geonames_models.rb +78 -0
  58. data/examples/munging/geo/iso_codes.rb +172 -0
  59. data/examples/munging/geo/reconcile_countries.rb +124 -0
  60. data/examples/munging/geo/tasks.rake +71 -0
  61. data/examples/munging/rake_helper.rb +62 -0
  62. data/examples/munging/weather/.gitignore +1 -0
  63. data/examples/munging/weather/Gemfile +4 -0
  64. data/examples/munging/weather/Rakefile +28 -0
  65. data/examples/munging/weather/extract_ish.rb +13 -0
  66. data/examples/munging/weather/models/weather.rb +119 -0
  67. data/examples/munging/weather/utils/noaa_downloader.rb +46 -0
  68. data/examples/munging/wikipedia/README.md +34 -0
  69. data/examples/munging/wikipedia/Rakefile +193 -0
  70. data/examples/munging/wikipedia/articles/extract_articles-parsed.rb +79 -0
  71. data/examples/munging/wikipedia/articles/extract_articles-templated.rb +136 -0
  72. data/examples/munging/wikipedia/articles/textualize_articles.rb +54 -0
  73. data/examples/munging/wikipedia/articles/verify_structure.rb +43 -0
  74. data/examples/munging/wikipedia/articles/wp2txt-LICENSE.txt +22 -0
  75. data/examples/munging/wikipedia/articles/wp2txt_article.rb +259 -0
  76. data/examples/munging/wikipedia/articles/wp2txt_utils.rb +452 -0
  77. data/examples/munging/wikipedia/dbpedia/dbpedia_common.rb +4 -0
  78. data/examples/munging/wikipedia/dbpedia/dbpedia_extract_geocoordinates.rb +78 -0
  79. data/examples/munging/wikipedia/dbpedia/extract_links.rb +193 -0
  80. data/examples/munging/wikipedia/dbpedia/sameas_extractor.rb +20 -0
  81. data/examples/munging/wikipedia/n1_subuniverse/n1_nodes.pig +18 -0
  82. data/examples/munging/wikipedia/page_metadata/extract_page_metadata.rb +21 -0
  83. data/examples/munging/wikipedia/page_metadata/extract_page_metadata.rb.old +27 -0
  84. data/examples/munging/wikipedia/pagelinks/augment_pagelinks.pig +29 -0
  85. data/examples/munging/wikipedia/pagelinks/extract_pagelinks.rb +14 -0
  86. data/examples/munging/wikipedia/pagelinks/extract_pagelinks.rb.old +25 -0
  87. data/examples/munging/wikipedia/pagelinks/undirect_pagelinks.pig +29 -0
  88. data/examples/munging/wikipedia/pageviews/augment_pageviews.pig +32 -0
  89. data/examples/munging/wikipedia/pageviews/extract_pageviews.rb +85 -0
  90. data/examples/munging/wikipedia/pig_style_guide.md +25 -0
  91. data/examples/munging/wikipedia/redirects/redirects_page_metadata.pig +19 -0
  92. data/examples/munging/wikipedia/subuniverse/sub_articles.pig +23 -0
  93. data/examples/munging/wikipedia/subuniverse/sub_page_metadata.pig +24 -0
  94. data/examples/munging/wikipedia/subuniverse/sub_pagelinks_from.pig +22 -0
  95. data/examples/munging/wikipedia/subuniverse/sub_pagelinks_into.pig +22 -0
  96. data/examples/munging/wikipedia/subuniverse/sub_pagelinks_within.pig +26 -0
  97. data/examples/munging/wikipedia/subuniverse/sub_pageviews.pig +29 -0
  98. data/examples/munging/wikipedia/subuniverse/sub_undirected_pagelinks_within.pig +24 -0
  99. data/examples/munging/wikipedia/utils/get_namespaces.rb +86 -0
  100. data/examples/munging/wikipedia/utils/munging_utils.rb +68 -0
  101. data/examples/munging/wikipedia/utils/namespaces.json +1 -0
  102. data/examples/rake_helper.rb +85 -0
  103. data/examples/server_logs/geo_ip_mapping/munge_geolite.rb +82 -0
  104. data/examples/server_logs/logline.rb +95 -0
  105. data/examples/server_logs/models.rb +66 -0
  106. data/examples/server_logs/page_counts.pig +48 -0
  107. data/examples/server_logs/server_logs-01-parse-script.rb +13 -0
  108. data/examples/server_logs/server_logs-02-histograms-full.rb +33 -0
  109. data/examples/server_logs/server_logs-02-histograms-mapper.rb +14 -0
  110. data/{old/examples/server_logs/breadcrumbs.rb → examples/server_logs/server_logs-03-breadcrumbs-full.rb} +26 -30
  111. data/examples/server_logs/server_logs-04-page_page_edges-full.rb +40 -0
  112. data/examples/string_reverser.rb +26 -0
  113. data/examples/text/pig_latin.rb +2 -2
  114. data/examples/text/regional_flavor/README.md +14 -0
  115. data/examples/text/regional_flavor/article_wordbags.pig +39 -0
  116. data/examples/text/regional_flavor/j01-article_wordbags.rb +4 -0
  117. data/examples/text/regional_flavor/simple_pig_script.pig +27 -0
  118. data/examples/word_count/accumulator.rb +26 -0
  119. data/examples/word_count/tokenizer.rb +13 -0
  120. data/examples/word_count/word_count.rb +6 -0
  121. data/examples/workflow/cherry_pie.dot +97 -0
  122. data/examples/workflow/cherry_pie.png +0 -0
  123. data/examples/workflow/cherry_pie.rb +61 -26
  124. data/lib/hanuman.rb +34 -7
  125. data/lib/hanuman/graph.rb +55 -31
  126. data/lib/hanuman/graphvizzer.rb +199 -178
  127. data/lib/hanuman/graphvizzer/gv_models.rb +161 -0
  128. data/lib/hanuman/graphvizzer/gv_presenter.rb +97 -0
  129. data/lib/hanuman/link.rb +35 -0
  130. data/lib/hanuman/registry.rb +46 -0
  131. data/lib/hanuman/stage.rb +76 -32
  132. data/lib/wukong.rb +23 -24
  133. data/lib/wukong/boot.rb +87 -0
  134. data/lib/wukong/configuration.rb +8 -0
  135. data/lib/wukong/dataflow.rb +45 -78
  136. data/lib/wukong/driver.rb +99 -0
  137. data/lib/wukong/emitter.rb +22 -0
  138. data/lib/wukong/model/faker.rb +24 -24
  139. data/lib/wukong/model/flatpack_parser/flat.rb +60 -0
  140. data/lib/wukong/model/flatpack_parser/flatpack.rb +4 -0
  141. data/lib/wukong/model/flatpack_parser/lang.rb +46 -0
  142. data/lib/wukong/model/flatpack_parser/parser.rb +55 -0
  143. data/lib/wukong/model/flatpack_parser/tokens.rb +130 -0
  144. data/lib/wukong/processor.rb +60 -114
  145. data/lib/wukong/spec_helpers.rb +81 -0
  146. data/lib/wukong/spec_helpers/integration_driver.rb +144 -0
  147. data/lib/wukong/spec_helpers/integration_driver_matchers.rb +219 -0
  148. data/lib/wukong/spec_helpers/processor_helpers.rb +95 -0
  149. data/lib/wukong/spec_helpers/processor_methods.rb +108 -0
  150. data/lib/wukong/spec_helpers/shared_examples.rb +15 -0
  151. data/lib/wukong/spec_helpers/spec_driver.rb +28 -0
  152. data/lib/wukong/spec_helpers/spec_driver_matchers.rb +195 -0
  153. data/lib/wukong/version.rb +2 -1
  154. data/lib/wukong/widget/filters.rb +311 -0
  155. data/lib/wukong/widget/processors.rb +156 -0
  156. data/lib/wukong/widget/reducers.rb +7 -0
  157. data/lib/wukong/widget/reducers/accumulator.rb +73 -0
  158. data/lib/wukong/widget/reducers/bin.rb +318 -0
  159. data/lib/wukong/widget/reducers/count.rb +61 -0
  160. data/lib/wukong/widget/reducers/group.rb +85 -0
  161. data/lib/wukong/widget/reducers/group_concat.rb +70 -0
  162. data/lib/wukong/widget/reducers/moments.rb +72 -0
  163. data/lib/wukong/widget/reducers/sort.rb +130 -0
  164. data/lib/wukong/widget/serializers.rb +287 -0
  165. data/lib/wukong/widget/sink.rb +10 -52
  166. data/lib/wukong/widget/source.rb +7 -113
  167. data/lib/wukong/widget/utils.rb +46 -0
  168. data/lib/wukong/widgets.rb +6 -0
  169. data/spec/examples/dataflow/fibonacci_series_spec.rb +18 -0
  170. data/spec/examples/dataflow/parsing_spec.rb +12 -11
  171. data/spec/examples/dataflow/simple_spec.rb +32 -6
  172. data/spec/examples/dataflow/telegram_spec.rb +36 -36
  173. data/spec/examples/graph/minimum_spanning_tree_spec.rb +30 -31
  174. data/spec/examples/munging/airline_flights/identifiers_spec.rb +16 -0
  175. data/spec/examples/munging/airline_flights_spec.rb +202 -0
  176. data/spec/examples/text/pig_latin_spec.rb +13 -16
  177. data/spec/examples/workflow/cherry_pie_spec.rb +34 -4
  178. data/spec/hanuman/graph_spec.rb +27 -2
  179. data/spec/hanuman/hanuman_spec.rb +10 -0
  180. data/spec/hanuman/registry_spec.rb +123 -0
  181. data/spec/hanuman/stage_spec.rb +61 -7
  182. data/spec/spec_helper.rb +29 -19
  183. data/spec/support/hanuman_test_helpers.rb +14 -12
  184. data/spec/support/shared_context_for_reducers.rb +37 -0
  185. data/spec/support/shared_examples_for_builders.rb +101 -0
  186. data/spec/support/shared_examples_for_shortcuts.rb +57 -0
  187. data/spec/support/wukong_test_helpers.rb +37 -11
  188. data/spec/wukong/dataflow_spec.rb +77 -55
  189. data/spec/wukong/local_runner_spec.rb +24 -24
  190. data/spec/wukong/model/faker_spec.rb +132 -131
  191. data/spec/wukong/runner_spec.rb +8 -8
  192. data/spec/wukong/widget/filters_spec.rb +61 -0
  193. data/spec/wukong/widget/processors_spec.rb +126 -0
  194. data/spec/wukong/widget/reducers/bin_spec.rb +92 -0
  195. data/spec/wukong/widget/reducers/count_spec.rb +11 -0
  196. data/spec/wukong/widget/reducers/group_spec.rb +20 -0
  197. data/spec/wukong/widget/reducers/moments_spec.rb +36 -0
  198. data/spec/wukong/widget/reducers/sort_spec.rb +26 -0
  199. data/spec/wukong/widget/serializers_spec.rb +92 -0
  200. data/spec/wukong/widget/sink_spec.rb +15 -15
  201. data/spec/wukong/widget/source_spec.rb +65 -41
  202. data/spec/wukong/wukong_spec.rb +10 -0
  203. data/wukong.gemspec +17 -10
  204. metadata +359 -335
  205. data/.document +0 -5
  206. data/VERSION +0 -1
  207. data/bin/hdp-bin +0 -44
  208. data/bin/hdp-bzip +0 -23
  209. data/bin/hdp-cat +0 -3
  210. data/bin/hdp-catd +0 -3
  211. data/bin/hdp-cp +0 -3
  212. data/bin/hdp-du +0 -86
  213. data/bin/hdp-get +0 -3
  214. data/bin/hdp-kill +0 -3
  215. data/bin/hdp-kill-task +0 -3
  216. data/bin/hdp-ls +0 -11
  217. data/bin/hdp-mkdir +0 -2
  218. data/bin/hdp-mkdirp +0 -12
  219. data/bin/hdp-mv +0 -3
  220. data/bin/hdp-parts_to_keys.rb +0 -77
  221. data/bin/hdp-ps +0 -3
  222. data/bin/hdp-put +0 -3
  223. data/bin/hdp-rm +0 -32
  224. data/bin/hdp-sort +0 -40
  225. data/bin/hdp-stream +0 -40
  226. data/bin/hdp-stream-flat +0 -22
  227. data/bin/hdp-stream2 +0 -39
  228. data/bin/hdp-sync +0 -17
  229. data/bin/hdp-wc +0 -67
  230. data/bin/wu-flow +0 -10
  231. data/bin/wu-map +0 -17
  232. data/bin/wu-red +0 -17
  233. data/bin/wukong +0 -17
  234. data/data/CREDITS.md +0 -355
  235. data/data/graph/airfares.tsv +0 -2174
  236. data/data/text/gift_of_the_magi.txt +0 -225
  237. data/data/text/jabberwocky.txt +0 -36
  238. data/data/text/rectification_of_names.txt +0 -33
  239. data/data/twitter/a_atsigns_b.tsv +0 -64
  240. data/data/twitter/a_follows_b.tsv +0 -53
  241. data/data/twitter/tweet.tsv +0 -167
  242. data/data/twitter/twitter_user.tsv +0 -55
  243. data/data/wikipedia/dbpedia-sentences.tsv +0 -1000
  244. data/docpages/INSTALL.textile +0 -92
  245. data/docpages/LICENSE.textile +0 -107
  246. data/docpages/README-elastic_map_reduce.textile +0 -377
  247. data/docpages/README-performance.textile +0 -90
  248. data/docpages/README-wulign.textile +0 -65
  249. data/docpages/UsingWukong-part1-get_ready.textile +0 -17
  250. data/docpages/UsingWukong-part2-ThinkingBigData.textile +0 -75
  251. data/docpages/UsingWukong-part3-parsing.textile +0 -138
  252. data/docpages/_config.yml +0 -39
  253. data/docpages/avro/avro_notes.textile +0 -56
  254. data/docpages/avro/performance.textile +0 -36
  255. data/docpages/avro/tethering.textile +0 -19
  256. data/docpages/bigdata-tips.textile +0 -143
  257. data/docpages/code/api_response_example.txt +0 -20
  258. data/docpages/code/parser_skeleton.rb +0 -38
  259. data/docpages/diagrams/MapReduceDiagram.graffle +0 -0
  260. data/docpages/favicon.ico +0 -0
  261. data/docpages/gem.css +0 -16
  262. data/docpages/hadoop-tips.textile +0 -83
  263. data/docpages/index.textile +0 -92
  264. data/docpages/intro.textile +0 -8
  265. data/docpages/moreinfo.textile +0 -174
  266. data/docpages/news.html +0 -24
  267. data/docpages/pig/PigLatinExpressionsList.txt +0 -122
  268. data/docpages/pig/PigLatinReferenceManual.txt +0 -1640
  269. data/docpages/pig/commandline_params.txt +0 -26
  270. data/docpages/pig/cookbook.html +0 -481
  271. data/docpages/pig/images/hadoop-logo.jpg +0 -0
  272. data/docpages/pig/images/instruction_arrow.png +0 -0
  273. data/docpages/pig/images/pig-logo.gif +0 -0
  274. data/docpages/pig/piglatin_ref1.html +0 -1103
  275. data/docpages/pig/piglatin_ref2.html +0 -14340
  276. data/docpages/pig/setup.html +0 -505
  277. data/docpages/pig/skin/basic.css +0 -166
  278. data/docpages/pig/skin/breadcrumbs.js +0 -237
  279. data/docpages/pig/skin/fontsize.js +0 -166
  280. data/docpages/pig/skin/getBlank.js +0 -40
  281. data/docpages/pig/skin/getMenu.js +0 -45
  282. data/docpages/pig/skin/images/chapter.gif +0 -0
  283. data/docpages/pig/skin/images/chapter_open.gif +0 -0
  284. data/docpages/pig/skin/images/current.gif +0 -0
  285. data/docpages/pig/skin/images/external-link.gif +0 -0
  286. data/docpages/pig/skin/images/header_white_line.gif +0 -0
  287. data/docpages/pig/skin/images/page.gif +0 -0
  288. data/docpages/pig/skin/images/pdfdoc.gif +0 -0
  289. data/docpages/pig/skin/images/rc-b-l-15-1body-2menu-3menu.png +0 -0
  290. data/docpages/pig/skin/images/rc-b-r-15-1body-2menu-3menu.png +0 -0
  291. data/docpages/pig/skin/images/rc-b-r-5-1header-2tab-selected-3tab-selected.png +0 -0
  292. data/docpages/pig/skin/images/rc-t-l-5-1header-2searchbox-3searchbox.png +0 -0
  293. data/docpages/pig/skin/images/rc-t-l-5-1header-2tab-selected-3tab-selected.png +0 -0
  294. data/docpages/pig/skin/images/rc-t-l-5-1header-2tab-unselected-3tab-unselected.png +0 -0
  295. data/docpages/pig/skin/images/rc-t-r-15-1body-2menu-3menu.png +0 -0
  296. data/docpages/pig/skin/images/rc-t-r-5-1header-2searchbox-3searchbox.png +0 -0
  297. data/docpages/pig/skin/images/rc-t-r-5-1header-2tab-selected-3tab-selected.png +0 -0
  298. data/docpages/pig/skin/images/rc-t-r-5-1header-2tab-unselected-3tab-unselected.png +0 -0
  299. data/docpages/pig/skin/print.css +0 -54
  300. data/docpages/pig/skin/profile.css +0 -181
  301. data/docpages/pig/skin/screen.css +0 -587
  302. data/docpages/pig/tutorial.html +0 -1059
  303. data/docpages/pig/udf.html +0 -1509
  304. data/docpages/tutorial.textile +0 -283
  305. data/docpages/usage.textile +0 -195
  306. data/docpages/wutils.textile +0 -263
  307. data/examples/dataflow/complex.rb +0 -11
  308. data/examples/dataflow/donuts.rb +0 -13
  309. data/examples/tiny_count/jabberwocky_output.tsv +0 -92
  310. data/examples/word_count.rb +0 -48
  311. data/examples/workflow/fiddle.rb +0 -24
  312. data/lib/away/escapement.rb +0 -129
  313. data/lib/away/exe.rb +0 -11
  314. data/lib/away/experimental.rb +0 -5
  315. data/lib/away/from_file.rb +0 -52
  316. data/lib/away/job.rb +0 -56
  317. data/lib/away/job/rake_compat.rb +0 -17
  318. data/lib/away/registry.rb +0 -79
  319. data/lib/away/runner.rb +0 -276
  320. data/lib/away/runner/execute.rb +0 -121
  321. data/lib/away/script.rb +0 -161
  322. data/lib/away/script/hadoop_command.rb +0 -240
  323. data/lib/away/source/file_list_source.rb +0 -15
  324. data/lib/away/source/looper.rb +0 -18
  325. data/lib/away/task.rb +0 -219
  326. data/lib/hanuman/action.rb +0 -21
  327. data/lib/hanuman/chain.rb +0 -4
  328. data/lib/hanuman/graphviz.rb +0 -74
  329. data/lib/hanuman/resource.rb +0 -6
  330. data/lib/hanuman/slot.rb +0 -87
  331. data/lib/hanuman/slottable.rb +0 -220
  332. data/lib/wukong/bad_record.rb +0 -15
  333. data/lib/wukong/event.rb +0 -44
  334. data/lib/wukong/local_runner.rb +0 -55
  335. data/lib/wukong/mapred.rb +0 -3
  336. data/lib/wukong/universe.rb +0 -48
  337. data/lib/wukong/widget/filter.rb +0 -81
  338. data/lib/wukong/widget/gibberish.rb +0 -123
  339. data/lib/wukong/widget/monitor.rb +0 -26
  340. data/lib/wukong/widget/reducer.rb +0 -66
  341. data/lib/wukong/widget/stringifier.rb +0 -50
  342. data/lib/wukong/workflow.rb +0 -22
  343. data/lib/wukong/workflow/command.rb +0 -42
  344. data/old/config/emr-example.yaml +0 -48
  345. data/old/examples/README.txt +0 -17
  346. data/old/examples/contrib/jeans/README.markdown +0 -165
  347. data/old/examples/contrib/jeans/data/normalized_sizes +0 -3
  348. data/old/examples/contrib/jeans/data/orders.tsv +0 -1302
  349. data/old/examples/contrib/jeans/data/sizes +0 -3
  350. data/old/examples/contrib/jeans/normalize.rb +0 -20
  351. data/old/examples/contrib/jeans/sizes.rb +0 -55
  352. data/old/examples/corpus/bnc_word_freq.rb +0 -44
  353. data/old/examples/corpus/bucket_counter.rb +0 -47
  354. data/old/examples/corpus/dbpedia_abstract_to_sentences.rb +0 -86
  355. data/old/examples/corpus/sentence_bigrams.rb +0 -53
  356. data/old/examples/corpus/sentence_coocurrence.rb +0 -66
  357. data/old/examples/corpus/stopwords.rb +0 -138
  358. data/old/examples/corpus/words_to_bigrams.rb +0 -53
  359. data/old/examples/emr/README.textile +0 -110
  360. data/old/examples/emr/dot_wukong_dir/credentials.json +0 -7
  361. data/old/examples/emr/dot_wukong_dir/emr.yaml +0 -69
  362. data/old/examples/emr/dot_wukong_dir/emr_bootstrap.sh +0 -33
  363. data/old/examples/emr/elastic_mapreduce_example.rb +0 -28
  364. data/old/examples/network_graph/adjacency_list.rb +0 -74
  365. data/old/examples/network_graph/breadth_first_search.rb +0 -72
  366. data/old/examples/network_graph/gen_2paths.rb +0 -68
  367. data/old/examples/network_graph/gen_multi_edge.rb +0 -112
  368. data/old/examples/network_graph/gen_symmetric_links.rb +0 -64
  369. data/old/examples/pagerank/README.textile +0 -6
  370. data/old/examples/pagerank/gen_initial_pagerank_graph.pig +0 -57
  371. data/old/examples/pagerank/pagerank.rb +0 -72
  372. data/old/examples/pagerank/pagerank_initialize.rb +0 -42
  373. data/old/examples/pagerank/run_pagerank.sh +0 -21
  374. data/old/examples/sample_records.rb +0 -33
  375. data/old/examples/server_logs/apache_log_parser.rb +0 -15
  376. data/old/examples/server_logs/nook.rb +0 -48
  377. data/old/examples/server_logs/nook/faraday_dummy_adapter.rb +0 -94
  378. data/old/examples/server_logs/user_agent.rb +0 -40
  379. data/old/examples/simple_word_count.rb +0 -82
  380. data/old/examples/size.rb +0 -61
  381. data/old/examples/stats/avg_value_frequency.rb +0 -86
  382. data/old/examples/stats/binning_percentile_estimator.rb +0 -140
  383. data/old/examples/stats/data/avg_value_frequency.tsv +0 -3
  384. data/old/examples/stats/rank_and_bin.rb +0 -173
  385. data/old/examples/stupidly_simple_filter.rb +0 -40
  386. data/old/examples/word_count.rb +0 -75
  387. data/old/graph/graphviz_builder.rb +0 -580
  388. data/old/graph_easy/Attributes.pm +0 -4181
  389. data/old/graph_easy/Graphviz.pm +0 -2232
  390. data/old/wukong.rb +0 -18
  391. data/old/wukong/and_pig.rb +0 -38
  392. data/old/wukong/bad_record.rb +0 -18
  393. data/old/wukong/datatypes.rb +0 -24
  394. data/old/wukong/datatypes/enum.rb +0 -127
  395. data/old/wukong/datatypes/fake_types.rb +0 -17
  396. data/old/wukong/decorator.rb +0 -28
  397. data/old/wukong/encoding/asciize.rb +0 -108
  398. data/old/wukong/extensions.rb +0 -16
  399. data/old/wukong/extensions/array.rb +0 -18
  400. data/old/wukong/extensions/blank.rb +0 -93
  401. data/old/wukong/extensions/class.rb +0 -189
  402. data/old/wukong/extensions/date_time.rb +0 -53
  403. data/old/wukong/extensions/emittable.rb +0 -69
  404. data/old/wukong/extensions/enumerable.rb +0 -79
  405. data/old/wukong/extensions/hash.rb +0 -167
  406. data/old/wukong/extensions/hash_keys.rb +0 -16
  407. data/old/wukong/extensions/hash_like.rb +0 -150
  408. data/old/wukong/extensions/hashlike_class.rb +0 -47
  409. data/old/wukong/extensions/module.rb +0 -2
  410. data/old/wukong/extensions/pathname.rb +0 -27
  411. data/old/wukong/extensions/string.rb +0 -65
  412. data/old/wukong/extensions/struct.rb +0 -17
  413. data/old/wukong/extensions/symbol.rb +0 -11
  414. data/old/wukong/filename_pattern.rb +0 -74
  415. data/old/wukong/helper.rb +0 -7
  416. data/old/wukong/helper/stopwords.rb +0 -195
  417. data/old/wukong/helper/tokenize.rb +0 -35
  418. data/old/wukong/logger.rb +0 -38
  419. data/old/wukong/periodic_monitor.rb +0 -72
  420. data/old/wukong/schema.rb +0 -269
  421. data/old/wukong/script.rb +0 -286
  422. data/old/wukong/script/avro_command.rb +0 -5
  423. data/old/wukong/script/cassandra_loader_script.rb +0 -40
  424. data/old/wukong/script/emr_command.rb +0 -168
  425. data/old/wukong/script/hadoop_command.rb +0 -237
  426. data/old/wukong/script/local_command.rb +0 -41
  427. data/old/wukong/store.rb +0 -10
  428. data/old/wukong/store/base.rb +0 -27
  429. data/old/wukong/store/cassandra.rb +0 -10
  430. data/old/wukong/store/cassandra/streaming.rb +0 -75
  431. data/old/wukong/store/cassandra/struct_loader.rb +0 -21
  432. data/old/wukong/store/cassandra_model.rb +0 -91
  433. data/old/wukong/store/chh_chunked_flat_file_store.rb +0 -37
  434. data/old/wukong/store/chunked_flat_file_store.rb +0 -48
  435. data/old/wukong/store/conditional_store.rb +0 -57
  436. data/old/wukong/store/factory.rb +0 -8
  437. data/old/wukong/store/flat_file_store.rb +0 -89
  438. data/old/wukong/store/key_store.rb +0 -51
  439. data/old/wukong/store/null_store.rb +0 -15
  440. data/old/wukong/store/read_thru_store.rb +0 -22
  441. data/old/wukong/store/tokyo_tdb_key_store.rb +0 -33
  442. data/old/wukong/store/tyrant_rdb_key_store.rb +0 -57
  443. data/old/wukong/store/tyrant_tdb_key_store.rb +0 -20
  444. data/old/wukong/streamer.rb +0 -30
  445. data/old/wukong/streamer/accumulating_reducer.rb +0 -83
  446. data/old/wukong/streamer/base.rb +0 -126
  447. data/old/wukong/streamer/counting_reducer.rb +0 -25
  448. data/old/wukong/streamer/filter.rb +0 -20
  449. data/old/wukong/streamer/instance_streamer.rb +0 -15
  450. data/old/wukong/streamer/json_streamer.rb +0 -21
  451. data/old/wukong/streamer/line_streamer.rb +0 -12
  452. data/old/wukong/streamer/list_reducer.rb +0 -31
  453. data/old/wukong/streamer/rank_and_bin_reducer.rb +0 -145
  454. data/old/wukong/streamer/record_streamer.rb +0 -14
  455. data/old/wukong/streamer/reducer.rb +0 -11
  456. data/old/wukong/streamer/set_reducer.rb +0 -14
  457. data/old/wukong/streamer/struct_streamer.rb +0 -48
  458. data/old/wukong/streamer/summing_reducer.rb +0 -29
  459. data/old/wukong/streamer/uniq_by_last_reducer.rb +0 -51
  460. data/old/wukong/typed_struct.rb +0 -12
  461. data/spec/away/encoding_spec.rb +0 -32
  462. data/spec/away/exe_spec.rb +0 -20
  463. data/spec/away/flow_spec.rb +0 -82
  464. data/spec/away/graph_spec.rb +0 -6
  465. data/spec/away/job_spec.rb +0 -15
  466. data/spec/away/rake_compat_spec.rb +0 -9
  467. data/spec/away/script_spec.rb +0 -81
  468. data/spec/hanuman/graphviz_spec.rb +0 -29
  469. data/spec/hanuman/slot_spec.rb +0 -2
  470. data/spec/support/examples_helper.rb +0 -10
  471. data/spec/support/streamer_test_helpers.rb +0 -6
  472. data/spec/support/wukong_widget_helpers.rb +0 -66
  473. data/spec/wukong/processor_spec.rb +0 -109
  474. data/spec/wukong/widget/filter_spec.rb +0 -99
  475. data/spec/wukong/widget/stringifier_spec.rb +0 -51
  476. data/spec/wukong/workflow/command_spec.rb +0 -5
@@ -1,2232 +0,0 @@
1
- #############################################################################
2
- # Parse graphviz/dot text into a Graph::Easy object
3
- #
4
- #############################################################################
5
-
6
- package Graph::Easy::Parser::Graphviz;
7
-
8
- $VERSION = '0.17';
9
- use Graph::Easy::Parser;
10
- @ISA = qw/Graph::Easy::Parser/;
11
-
12
- use strict;
13
- use utf8;
14
- use constant NO_MULTIPLES => 1;
15
-
16
- sub _init
17
- {
18
- my $self = shift;
19
-
20
- $self->SUPER::_init(@_);
21
- $self->{attr_sep} = '=';
22
- # remove " <p1> " from autosplit (shape=record) labels
23
- $self->{_qr_part_clean} = qr/\s*<([^>]*)>/;
24
-
25
- $self;
26
- }
27
-
28
- sub reset
29
- {
30
- my $self = shift;
31
-
32
- $self->SUPER::reset(@_);
33
-
34
- # set some default attributes on the graph object, because graphviz has
35
- # different defaults as Graph::Easy
36
- my $g = $self->{_graph};
37
-
38
- $g->set_attribute('colorscheme','x11');
39
- $g->set_attribute('flow','south');
40
- $g->set_attribute('edge','arrow-style', 'filled');
41
- $g->set_attribute('group','align', 'center');
42
- $g->set_attribute('group','fill', 'inherit');
43
-
44
- $self->{scope_stack} = [];
45
-
46
- # allow some temp. values during parsing
47
- $g->_allow_special_attributes(
48
- {
49
- node => {
50
- shape => [
51
- "",
52
- [ qw/ circle diamond edge ellipse hexagon house invisible
53
- invhouse invtrapezium invtriangle octagon parallelogram pentagon
54
- point triangle trapezium septagon rect rounded none img record Mrecord/ ],
55
- '',
56
- '',
57
- undef,
58
- ],
59
- },
60
- } );
61
-
62
- $g->{_warn_on_unknown_attributes} = 1;
63
-
64
- $self;
65
- }
66
-
67
- # map "&tilde;" to "~"
68
- my %entities = (
69
- 'amp' => '&',
70
- 'quot' => '"',
71
- 'lt' => '<',
72
- 'gt' => '>',
73
- 'nbsp' => ' ', # this is a non-break-space between '' here!
74
- 'iexcl' => '¡',
75
- 'cent' => '¢',
76
- 'pound' => '£',
77
- 'curren' => '¤',
78
- 'yen' => '¥',
79
- 'brvbar' => '¦',
80
- 'sect' => '§',
81
- 'uml' => '¨',
82
- 'copy' => '©',
83
- 'ordf' => 'ª',
84
- 'ordf' => 'ª',
85
- 'laquo' => '«',
86
- 'not' => '¬',
87
- 'shy' => "\x{00AD}", # soft-hyphen
88
- 'reg' => '®',
89
- 'macr' => '¯',
90
- 'deg' => '°',
91
- 'plusmn' => '±',
92
- 'sup2' => '²',
93
- 'sup3' => '³',
94
- 'acute' => '´',
95
- 'micro' => 'µ',
96
- 'para' => '¶',
97
- 'midot' => '·',
98
- 'cedil' => '¸',
99
- 'sup1' => '¹',
100
- 'ordm' => 'º',
101
- 'raquo' => '»',
102
- 'frac14' => '¼',
103
- 'frac12' => '½',
104
- 'frac34' => '¾',
105
- 'iquest' => '¿',
106
- 'Agrave' => 'À',
107
- 'Aacute' => 'Á',
108
- 'Acirc' => 'Â',
109
- 'Atilde' => 'Ã',
110
- 'Auml' => 'Ä',
111
- 'Aring' => 'Å',
112
- 'Aelig' => 'Æ',
113
- 'Ccedil' => 'Ç',
114
- 'Egrave' => 'È',
115
- 'Eacute' => 'É',
116
- 'Ecirc' => 'Ê',
117
- 'Euml' => 'Ë',
118
- 'Igrave' => 'Ì',
119
- 'Iacute' => 'Í',
120
- 'Icirc' => 'Î',
121
- 'Iuml' => 'Ï',
122
- 'ETH' => 'Ð',
123
- 'Ntilde' => 'Ñ',
124
- 'Ograve' => 'Ò',
125
- 'Oacute' => 'Ó',
126
- 'Ocirc' => 'Ô',
127
- 'Otilde' => 'Õ',
128
- 'Ouml' => 'Ö',
129
- 'times' => '×',
130
- 'Oslash' => 'Ø',
131
- 'Ugrave' => 'Ù',
132
- 'Uacute' => 'Ù',
133
- 'Ucirc' => 'Û',
134
- 'Uuml' => 'Ü',
135
- 'Yacute' => 'Ý',
136
- 'THORN' => 'Þ',
137
- 'szlig' => 'ß',
138
- 'agrave' => 'à',
139
- 'aacute' => 'á',
140
- 'acirc' => 'â',
141
- 'atilde' => 'ã',
142
- 'auml' => 'ä',
143
- 'aring' => 'å',
144
- 'aelig' => 'æ',
145
- 'ccedil' => 'ç',
146
- 'egrave' => 'è',
147
- 'eacute' => 'é',
148
- 'ecirc' => 'ê',
149
- 'euml' => 'ë',
150
- 'igrave' => 'ì',
151
- 'iacute' => 'í',
152
- 'icirc' => 'î',
153
- 'iuml' => 'ï',
154
- 'eth' => 'ð',
155
- 'ntilde' => 'ñ',
156
- 'ograve' => 'ò',
157
- 'oacute' => 'ó',
158
- 'ocirc' => 'ô',
159
- 'otilde' => 'õ',
160
- 'ouml' => 'ö',
161
- 'divide' => '÷',
162
- 'oslash' => 'ø',
163
- 'ugrave' => 'ù',
164
- 'uacute' => 'ú',
165
- 'ucirc' => 'û',
166
- 'uuml' => 'ü',
167
- 'yacute' => 'ý',
168
- 'thorn' => 'þ',
169
- 'yuml' => 'ÿ',
170
- 'Oelig' => 'Œ',
171
- 'oelig' => 'œ',
172
- 'Scaron' => 'Š',
173
- 'scaron' => 'š',
174
- 'Yuml' => 'Ÿ',
175
- 'fnof' => 'ƒ',
176
- 'circ' => '^',
177
- 'tilde' => '~',
178
- 'Alpha' => 'Α',
179
- 'Beta' => 'Β',
180
- 'Gamma' => 'Γ',
181
- 'Delta' => 'Δ',
182
- 'Epsilon'=> 'Ε',
183
- 'Zeta' => 'Ζ',
184
- 'Eta' => 'Η',
185
- 'Theta' => 'Θ',
186
- 'Iota' => 'Ι',
187
- 'Kappa' => 'Κ',
188
- 'Lambda' => 'Λ',
189
- 'Mu' => 'Μ',
190
- 'Nu' => 'Ν',
191
- 'Xi' => 'Ξ',
192
- 'Omicron'=> 'Ο',
193
- 'Pi' => 'Π',
194
- 'Rho' => 'Ρ',
195
- 'Sigma' => 'Σ',
196
- 'Tau' => 'Τ',
197
- 'Upsilon'=> 'Υ',
198
- 'Phi' => 'Φ',
199
- 'Chi' => 'Χ',
200
- 'Psi' => 'Ψ',
201
- 'Omega' => 'Ω',
202
- 'alpha' => 'α',
203
- 'beta' => 'β',
204
- 'gamma' => 'γ',
205
- 'delta' => 'δ',
206
- 'epsilon'=> 'ε',
207
- 'zeta' => 'ζ',
208
- 'eta' => 'η',
209
- 'theta' => 'θ',
210
- 'iota' => 'ι',
211
- 'kappa' => 'κ',
212
- 'lambda' => 'λ',
213
- 'mu' => 'μ',
214
- 'nu' => 'ν',
215
- 'xi' => 'ξ',
216
- 'omicron'=> 'ο',
217
- 'pi' => 'π',
218
- 'rho' => 'ρ',
219
- 'sigma' => 'σ',
220
- 'tau' => 'τ',
221
- 'upsilon'=> 'υ',
222
- 'phi' => 'φ',
223
- 'chi' => 'χ',
224
- 'psi' => 'ψ',
225
- 'omega' => 'ω',
226
- 'thetasym'=>'ϑ',
227
- 'upsih' => 'ϒ',
228
- 'piv' => 'ϖ',
229
- 'ensp' => "\x{2003}", # normal wide space
230
- 'emsp' => "\x{2004}", # wide space
231
- 'thinsp' => "\x{2009}", # very thin space
232
- 'zwnj' => "\x{200c}", # zero-width-non-joiner
233
- 'zwj' => "\x{200d}", # zero-width-joiner
234
- 'lrm' => "\x{200e}", # left-to-right
235
- 'rlm' => "\x{200f}", # right-to-left
236
- 'ndash' => '–',
237
- 'mdash' => '—',
238
- 'lsquo' => '‘',
239
- 'rsquo' => '’',
240
- 'sbquo' => '‚',
241
- 'ldquo' => '“',
242
- 'rdquo' => '”',
243
- 'bdquo' => '„',
244
- 'dagger' => '†',
245
- 'Dagger' => '‡',
246
- 'bull' => '•',
247
- 'hellip' => '…',
248
- 'permil' => '‰',
249
- 'prime' => '′',
250
- 'Prime' => '′',
251
- 'lsaquo' => '‹',
252
- 'rsaquo' => '›',
253
- 'oline' => '‾',
254
- 'frasl' => '⁄',
255
- 'euro' => '€',
256
- 'image' => 'ℑ',
257
- 'weierp' => '℘',
258
- 'real' => 'ℜ',
259
- 'trade' => '™',
260
- 'alefsym'=> 'ℵ',
261
- 'larr' => '←',
262
- 'uarr' => '↑',
263
- 'rarr' => '→',
264
- 'darr' => '↓',
265
- 'harr' => '↔',
266
- 'crarr' => '↵',
267
- 'lArr' => '⇐',
268
- 'uArr' => '⇑',
269
- 'rArr' => '⇒',
270
- 'dArr' => '⇓',
271
- 'hArr' => '⇔',
272
- 'forall' => '∀',
273
- 'part' => '∂',
274
- 'exist' => '∃',
275
- 'empty' => '∅',
276
- 'nabla' => '∇',
277
- 'isin' => '∈',
278
- 'notin' => '∉',
279
- 'ni' => '∋',
280
- 'prod' => '∏',
281
- 'sum' => '∑',
282
- 'minus' => '−',
283
- 'lowast' => '∗',
284
- 'radic' => '√',
285
- 'prop' => '∝',
286
- 'infin' => '∞',
287
- 'ang' => '∠',
288
- 'and' => '∧',
289
- 'or' => '∨',
290
- 'cap' => '∩',
291
- 'cup' => '∪',
292
- 'int' => '∫',
293
- 'there4' => '∴',
294
- 'sim' => '∼',
295
- 'cong' => '≅',
296
- 'asymp' => '≃',
297
- 'ne' => '≠',
298
- 'eq' => '=',
299
- 'le' => '≤',
300
- 'ge' => '≥',
301
- 'sub' => '⊂',
302
- 'sup' => '⊃',
303
- 'nsub' => '⊄',
304
- 'nsup' => '⊅',
305
- 'sube' => '⊆',
306
- 'supe' => '⊇',
307
- 'oplus' => '⊕',
308
- 'otimes' => '⊗',
309
- 'perp' => '⊥',
310
- 'sdot' => '⋅',
311
- 'lceil' => '⌈',
312
- 'rceil' => '⌉',
313
- 'lfloor' => '⌊',
314
- 'rfloor' => '⌋',
315
- 'lang' => '〈',
316
- 'rang' => '〉',
317
- 'roz' => '◊',
318
- 'spades' => '♠',
319
- 'clubs' => '♣',
320
- 'diamonds'=>'♦',
321
- 'hearts' => '♥',
322
- );
323
-
324
- sub _unquote_attribute
325
- {
326
- my ($self,$name,$val) = @_;
327
-
328
- my $html_like = 0;
329
- if ($name eq 'label')
330
- {
331
- $html_like = 1 if $val =~ /^\s*<\s*</;
332
- # '< >' => ' ', ' < a > ' => ' a '
333
- if ($html_like == 0 && $val =~ /\s*<(.*)>\s*\z/)
334
- {
335
- $val = $1; $val = ' ' if $val eq '';
336
- }
337
- }
338
-
339
- my $v = $self->_unquote($val);
340
-
341
- # Now HTML labels always start with "<", while non-HTML labels
342
- # start with " <" or anything else.
343
- if ($html_like == 0)
344
- {
345
- $v = ' ' . $v if $v =~ /^</;
346
- }
347
- else
348
- {
349
- $v =~ s/^\s*//; $v =~ s/\s*\z//;
350
- }
351
-
352
- $v;
353
- }
354
-
355
- sub _unquote
356
- {
357
- my ($self, $name) = @_;
358
-
359
- $name = '' unless defined $name;
360
-
361
- # string concat
362
- # "foo" + " bar" => "foo bar"
363
- $name =~ s/^
364
- "((?:\\"|[^"])*)" # "foo"
365
- \s*\+\s*"((?:\\"|[^"])*)" # followed by ' + "bar"'
366
- /"$1$2"/x
367
- while $name =~ /^
368
- "(?:\\"|[^"])*" # "foo"
369
- \s*\+\s*"(?:\\"|[^"])*" # followed by ' + "bar"'
370
- /x;
371
-
372
- # map "&!;" to "!"
373
- $name =~ s/&(.);/$1/g;
374
-
375
- # map "&amp;" to "&"
376
- $name =~ s/&([^;]+);/$entities{$1} || '';/eg;
377
-
378
- # "foo bar" => foo bar
379
- $name =~ s/^"\s*//; # remove left-over quotes
380
- $name =~ s/\s*"\z//;
381
-
382
- # unquote special chars
383
- $name =~ s/\\([\[\(\{\}\]\)#"])/$1/g;
384
-
385
- $name;
386
- }
387
-
388
- sub _clean_line
389
- {
390
- # do some cleanups on a line before handling it
391
- my ($self,$line) = @_;
392
-
393
- chomp($line);
394
-
395
- # collapse white space at start
396
- $line =~ s/^\s+//;
397
- # line ending in '\' means a continuation
398
- $line =~ s/\\\z//;
399
-
400
- $line;
401
- }
402
-
403
- sub _line_insert
404
- {
405
- # "a1 -> a2\na3 -> a4" => "a1 -> a2 a3 -> a4"
406
- ' ';
407
- }
408
-
409
- #############################################################################
410
-
411
- sub _match_boolean
412
- {
413
- # not used yet, match a boolean value
414
- qr/(true|false|\d+)/;
415
- }
416
-
417
- sub _match_comment
418
- {
419
- # match the start of a comment
420
-
421
- # // comment
422
- qr#(:[^\\]|)//#;
423
- }
424
-
425
- sub _match_multi_line_comment
426
- {
427
- # match a multi line comment
428
-
429
- # /* * comment * */
430
- qr#(?:\s*/\*.*?\*/\s*)+#;
431
- }
432
-
433
- sub _match_optional_multi_line_comment
434
- {
435
- # match a multi line comment
436
-
437
- # "/* * comment * */" or /* a */ /* b */ or ""
438
- qr#(?:(?:\s*/\*.*?\*/\s*)*|\s+)#;
439
- }
440
-
441
- sub _match_name
442
- {
443
- # Return a regexp that matches an ID in the DOT language.
444
- # See http://www.graphviz.org/doc/info/lang.html for reference.
445
-
446
- # "node", "graph", "edge", "digraph", "subgraph" and "strict" are reserved:
447
- qr/\s*
448
- (
449
- # double quoted string
450
- "(?:\\"|[^"])*" # "foo"
451
- (?:\s*\+\s*"(?:\\"|[^"])*")* # followed by 0 or more ' + "bar"'
452
- |
453
- # number
454
- -? # optional minus sign
455
- (?: # non-capture group
456
- \.[0-9]+ # .00019
457
- | # or
458
- [0-9]+(?:\.[0-9]*)? # 123 or 123.1
459
- )
460
- |
461
- # plain node name (a-z0-9_+)
462
- (?!(?i:node|edge|digraph|subgraph|graph|strict)\s)[\w]+
463
- )/xi;
464
- }
465
-
466
- sub _match_node
467
- {
468
- # Return a regexp that matches something like '"bonn"' or 'bonn' or 'bonn:f1'
469
- my $self = shift;
470
-
471
- my $qr_n = $self->_match_name();
472
-
473
- # Examples: "bonn", "Bonn":f1, "Bonn":"f1", "Bonn":"port":"w", Bonn:port:w
474
- qr/
475
- $qr_n # node name (see _match_name)
476
- (?:
477
- :$qr_n
478
- (?: :(n|ne|e|se|s|sw|w|nw) )? # :port:compass_direction
479
- |
480
- :(n|ne|e|se|s|sw|w|nw) # :compass_direction
481
- )? # optional
482
- /x;
483
- }
484
-
485
- sub _match_group_start
486
- {
487
- # match a subgraph at the beginning (f.i. "graph { ")
488
- my $self = shift;
489
- my $qr_n = $self->_match_name();
490
-
491
- qr/^\s*(?:strict\s+)?(?:(?i)digraph|subgraph|graph)\s+$qr_n\s*\{/i;
492
- }
493
-
494
- sub _match_pseudo_group_start_at_beginning
495
- {
496
- # match an anonymous group start at the beginning (aka " { ")
497
- qr/^\s*\{/;
498
- }
499
-
500
- sub _match_pseudo_group_start
501
- {
502
- # match an anonymous group start (aka " { ")
503
- qr/\s*\{/;
504
- }
505
-
506
- sub _match_group_end
507
- {
508
- # return a regexp that matches something like " }" or "} ;".
509
- qr/^\s*\}\s*;?\s*/;
510
- }
511
-
512
- sub _match_edge
513
- {
514
- # Matches an edge
515
- qr/\s*(->|--)/;
516
- }
517
-
518
- sub _match_html_regexps
519
- {
520
- # Return hash with regexps matching different parts of an HTML label.
521
- my $qr =
522
- {
523
- # BORDER="2"
524
- attribute => qr/\s*([A-Za-z]+)\s*=\s*"((?:\\"|[^"])*)"/,
525
- # BORDER="2" COLSPAN="2"
526
- attributes => qr/(?:\s+(?:[A-Za-z]+)\s*=\s*"(?:\\"|[^"])*")*/,
527
- text => qr/.*?/,
528
- tr => qr/\s*<TR>/i,
529
- tr_end => qr/\s*<\/TR>/i,
530
- td => qr/\s*<TD[^>]*>/i,
531
- td_tag => qr/\s*<TD\s*/i,
532
- td_end => qr/\s*<\/TD>/i,
533
- table => qr/\s*<TABLE[^>]*>/i,
534
- table_tag => qr/\s*<TABLE\s*/i,
535
- table_end => qr/\s*<\/TABLE>/i,
536
- };
537
- $qr->{row} = qr/$qr->{tr}(?:$qr->{td}$qr->{text}$qr->{td_end})*$qr->{tr_end}/;
538
-
539
- $qr;
540
- }
541
-
542
- sub _match_html
543
- {
544
- # build a giant regular expression that matches an HTML label
545
-
546
- # label=<
547
- # <TABLE BORDER="2" CELLBORDER="1" CELLSPACING="0" BGCOLOR="#ffffff">
548
- # <TR><TD PORT="portname" COLSPAN="3" BGCOLOR="#aabbcc" ALIGN="CENTER">port</TD></TR>
549
- # <TR><TD PORT="port2" COLSPAN="2" ALIGN="LEFT">port2</TD><TD PORT="port3" ALIGN="LEFT">port3</TD></TR>
550
- # </TABLE>>
551
-
552
- my $qr = _match_html_regexps();
553
-
554
- # < <TABLE> .. </TABLE> >
555
- qr/<$qr->{table}(?:$qr->{row})*$qr->{table_end}\s*>/;
556
- }
557
-
558
- sub _match_single_attribute
559
- {
560
- my $qr_html = _match_html();
561
-
562
- qr/\s*(\w+)\s*=\s* # the attribute name (label=")
563
- (
564
- "(?:\\"|[^"])*" # "foo"
565
- (?:\s*\+\s*"(?:\\"|[^"])*")* # followed by 0 or more ' + "bar"'
566
- |
567
- $qr_html # or < <TABLE>..<\/TABLE> >
568
- |
569
- <[^>]*> # or something like < a >
570
- |
571
- [^<][^,\]\}\n\s;]* # or simple 'fooobar'
572
- )
573
- [,\]\n\}\s;]?\s*/x; # possible ",", "\n" etc.
574
- }
575
-
576
- sub _match_special_attribute
577
- {
578
- # match boolean attributes, these can appear without a value
579
- qr/\s*(
580
- center|
581
- compound|
582
- concentrate|
583
- constraint|
584
- decorate|
585
- diredgeconstraints|
586
- fixedsize|
587
- headclip|
588
- labelfloat|
589
- landscape|
590
- mosek|
591
- nojustify|
592
- normalize|
593
- overlap|
594
- pack|
595
- pin|
596
- regular|
597
- remincross|
598
- root|
599
- splines|
600
- tailclip|
601
- truecolor
602
- )[,;\s]?\s*/x;
603
- }
604
-
605
- sub _match_attributes
606
- {
607
- # return a regexp that matches something like " [ color=red; ]" and returns
608
- # the inner text without the []
609
-
610
- my $qr_att = _match_single_attribute();
611
- my $qr_satt = _match_special_attribute();
612
- my $qr_cmt = _match_multi_line_comment();
613
-
614
- qr/\s*\[\s*((?:$qr_att|$qr_satt|$qr_cmt)*)\s*\];?/;
615
- }
616
-
617
- sub _match_graph_attribute
618
- {
619
- # return a regexp that matches something like " color=red; " for attributes
620
- # that apply to a graph/subgraph
621
- qr/^\s*(\w+\s*=\s*("[^"]+"|[^;\n\s]+))([;\n\s]\s*|\z)/;
622
- }
623
-
624
- sub _match_optional_attributes
625
- {
626
- # return a regexp that matches something like " [ color=red; ]" and returns
627
- # the inner text with the []
628
-
629
- my $qr_att = _match_single_attribute();
630
- my $qr_satt = _match_special_attribute();
631
- my $qr_cmt = _match_multi_line_comment();
632
-
633
- qr/\s*(\[\s*((?:$qr_att|$qr_satt|$qr_cmt)*)\s*\])?;?/;
634
- }
635
-
636
- sub _clean_attributes
637
- {
638
- my ($self,$text) = @_;
639
-
640
- $text =~ s/^\s*\[\s*//; # remove left-over "[" and spaces
641
- $text =~ s/\s*;?\s*\]\s*\z//; # remove left-over "]" and spaces
642
-
643
- $text;
644
- }
645
-
646
- #############################################################################
647
-
648
- sub _new_scope
649
- {
650
- # create a new scope, with attributes from current scope
651
- my ($self, $is_group) = @_;
652
-
653
- my $scope = {};
654
-
655
- if (@{$self->{scope_stack}} > 0)
656
- {
657
- my $old_scope = $self->{scope_stack}->[-1];
658
-
659
- # make a copy of the old scope's attributes
660
- for my $t (keys %$old_scope)
661
- {
662
- next if $t =~ /^_/;
663
- my $s = $old_scope->{$t};
664
- $scope->{$t} = {} unless ref $scope->{$t}; my $sc = $scope->{$t};
665
- for my $k (keys %$s)
666
- {
667
- # skip things like "_is_group"
668
- $sc->{$k} = $s->{$k} unless $k =~ /^_/;
669
- }
670
- }
671
- }
672
- $scope->{_is_group} = 1 if defined $is_group;
673
-
674
- push @{$self->{scope_stack}}, $scope;
675
- $scope;
676
- }
677
-
678
- sub _add_group_match
679
- {
680
- # register handlers for group start/end
681
- my $self = shift;
682
-
683
- my $qr_pseudo_group_start = $self->_match_pseudo_group_start_at_beginning();
684
- my $qr_group_start = $self->_match_group_start();
685
- my $qr_group_end = $self->_match_group_end();
686
- my $qr_edge = $self->_match_edge();
687
- my $qr_ocmt = $self->_match_optional_multi_line_comment();
688
-
689
- # "subgraph G {"
690
- $self->_register_handler( $qr_group_start,
691
- sub
692
- {
693
- my $self = shift;
694
- my $graph = $self->{_graph};
695
- my $gn = $self->_unquote($1);
696
- print STDERR "# Parser: found subcluster '$gn'\n" if $self->{debug};
697
- push @{$self->{group_stack}}, $self->_new_group($gn);
698
- $self->_new_scope( 1 );
699
- 1;
700
- } );
701
-
702
- # "{ "
703
- $self->_register_handler( $qr_pseudo_group_start,
704
- sub
705
- {
706
- my $self = shift;
707
- print STDERR "# Parser: Creating new scope\n" if $self->{debug};
708
- $self->_new_scope();
709
- # forget the left side
710
- $self->{left_edge} = undef;
711
- $self->{left_stack} = [ ];
712
- 1;
713
- } );
714
-
715
- # "} -> " group/cluster/scope end with an edge
716
- $self->_register_handler( qr/$qr_group_end$qr_ocmt$qr_edge/,
717
- sub
718
- {
719
- my $self = shift;
720
-
721
- my $scope = pop @{$self->{scope_stack}};
722
- return $self->parse_error(0) if !defined $scope;
723
-
724
- if ($scope->{_is_group} && @{$self->{group_stack}})
725
- {
726
- print STDERR "# Parser: end subcluster '$self->{group_stack}->[-1]->{name}'\n" if $self->{debug};
727
- pop @{$self->{group_stack}};
728
- }
729
- else { print STDERR "# Parser: end scope\n" if $self->{debug}; }
730
-
731
- 1;
732
- },
733
- sub
734
- {
735
- my ($self, $line) = @_;
736
- $line =~ qr/$qr_group_end$qr_edge/;
737
- $1 . ' ';
738
- } );
739
-
740
- # "}" group/cluster/scope end
741
- $self->_register_handler( $qr_group_end,
742
- sub
743
- {
744
- my $self = shift;
745
-
746
- my $scope = pop @{$self->{scope_stack}};
747
- return $self->parse_error(0) if !defined $scope;
748
-
749
- if ($scope->{_is_group} && @{$self->{group_stack}})
750
- {
751
- print STDERR "# Parser: end subcluster '$self->{group_stack}->[-1]->{name}'\n" if $self->{debug};
752
- pop @{$self->{group_stack}};
753
- }
754
- # always reset the stack
755
- $self->{stack} = [ ];
756
- 1;
757
- } );
758
- }
759
-
760
- sub _edge_style
761
- {
762
- # To convert "--" or "->" we simple do nothing, since the edge style in
763
- # Graphviz can only be set via the attribute "style"
764
- my ($self, $ed) = @_;
765
-
766
- 'solid';
767
- }
768
-
769
- sub _new_nodes
770
- {
771
- my ($self, $name, $group_stack, $att, $port, $stack) = @_;
772
-
773
- $port = '' unless defined $port;
774
- my @rc = ();
775
- # "name1" => "name1"
776
- if ($port ne '')
777
- {
778
- # create a special node
779
- $name =~ s/^"//; $name =~ s/"\z//;
780
- $port =~ s/^"//; $port =~ s/"\z//;
781
- # XXX TODO: find unique name?
782
- @rc = $self->_new_node ($self->{_graph}, "$name:$port", $group_stack, $att, $stack);
783
- my $node = $rc[0];
784
- $node->{_graphviz_portlet} = $port;
785
- $node->{_graphviz_basename} = $name;
786
- }
787
- else
788
- {
789
- @rc = $self->_new_node ($self->{_graph}, $name, $group_stack, $att, $stack);
790
- }
791
- @rc;
792
- }
793
-
794
- sub _build_match_stack
795
- {
796
- my $self = shift;
797
-
798
- my $qr_node = $self->_match_node();
799
- my $qr_name = $self->_match_name();
800
- my $qr_cmt = $self->_match_multi_line_comment();
801
- my $qr_ocmt = $self->_match_optional_multi_line_comment();
802
- my $qr_attr = $self->_match_attributes();
803
- my $qr_gatr = $self->_match_graph_attribute();
804
- my $qr_oatr = $self->_match_optional_attributes();
805
- my $qr_edge = $self->_match_edge();
806
- my $qr_pgr = $self->_match_pseudo_group_start();
807
-
808
- # remove multi line comments /* comment */
809
- $self->_register_handler( qr/^$qr_cmt/, undef );
810
-
811
- # remove single line comment // comment
812
- $self->_register_handler( qr/^\s*\/\/.*/, undef );
813
-
814
- # simple remove the graph start, but remember that we did this
815
- $self->_register_handler( qr/^\s*((?i)strict)?$qr_ocmt((?i)digraph|graph)$qr_ocmt$qr_node$qr_ocmt\{/,
816
- sub
817
- {
818
- my $self = shift;
819
- return $self->parse_error(6) if @{$self->{scope_stack}} > 0;
820
- $self->{_graphviz_graph_name} = $3;
821
- $self->_new_scope(1);
822
- $self->{_graph}->set_attribute('type','undirected') if lc($2) eq 'graph';
823
- 1;
824
- } );
825
-
826
- # simple remove the graph start, but remember that we did this
827
- $self->_register_handler( qr/^\s*(strict)?$qr_ocmt(di)?graph$qr_ocmt\{/i,
828
- sub
829
- {
830
- my $self = shift;
831
- return $self->parse_error(6) if @{$self->{scope_stack}} > 0;
832
- $self->{_graphviz_graph_name} = 'unnamed';
833
- $self->_new_scope(1);
834
- $self->{_graph}->set_attribute('type','undirected') if lc($2) ne 'di';
835
- 1;
836
- } );
837
-
838
- # end-of-statement
839
- $self->_register_handler( qr/^\s*;/, undef );
840
-
841
- # cluster/subgraph "subgraph G { .. }"
842
- # scope (dummy group): "{ .. }"
843
- # scope/group/subgraph end: "}"
844
- $self->_add_group_match();
845
-
846
- # node [ color="red" ] etc.
847
- # The "(?i)" makes the keywords match case-insensitive.
848
- $self->_register_handler( qr/^\s*((?i)node|graph|edge)$qr_ocmt$qr_attr/,
849
- sub
850
- {
851
- my $self = shift;
852
- my $type = lc($1 || '');
853
- my $att = $self->_parse_attributes($2 || '', $type, NO_MULTIPLES );
854
- return undef unless defined $att; # error in attributes?
855
-
856
- if ($type ne 'graph')
857
- {
858
- # apply the attributes to the current scope
859
- my $scope = $self->{scope_stack}->[-1];
860
- $scope->{$type} = {} unless ref $scope->{$type};
861
- my $s = $scope->{$type};
862
- for my $k (keys %$att)
863
- {
864
- $s->{$k} = $att->{$k};
865
- }
866
- }
867
- else
868
- {
869
- my $graph = $self->{_graph};
870
- $graph->set_attributes ($type, $att);
871
- }
872
-
873
- # forget stacks
874
- $self->{stack} = [];
875
- $self->{left_edge} = undef;
876
- $self->{left_stack} = [];
877
- 1;
878
- } );
879
-
880
- # color=red; (for graphs or subgraphs)
881
- $self->_register_attribute_handler($qr_gatr, 'parent');
882
- # [ color=red; ] (for nodes/edges)
883
- $self->_register_attribute_handler($qr_attr);
884
-
885
- # node chain continued like "-> { ... "
886
- $self->_register_handler( qr/^$qr_edge$qr_ocmt$qr_pgr/,
887
- sub
888
- {
889
- my $self = shift;
890
-
891
- return if @{$self->{stack}} == 0; # only match this if stack non-empty
892
-
893
- my $graph = $self->{_graph};
894
- my $eg = $1; # entire edge ("->" etc)
895
-
896
- my $edge_un = 0; $edge_un = 1 if $eg eq '--'; # undirected edge?
897
-
898
- # need to defer edge attribute parsing until the edge exists
899
- # if inside a scope, set the scope attributes, too:
900
- my $scope = $self->{scope_stack}->[-1] || {};
901
- my $edge_atr = $scope->{edge} || {};
902
-
903
- # create a new scope
904
- $self->_new_scope();
905
-
906
- # remember the left side
907
- $self->{left_edge} = [ 'solid', '', $edge_atr, 0, $edge_un ];
908
- $self->{left_stack} = $self->{stack};
909
-
910
- # forget stack and remember the right side instead
911
- $self->{stack} = [];
912
-
913
- 1;
914
- } );
915
-
916
- # "Berlin"
917
- $self->_register_handler( qr/^$qr_node/,
918
- sub
919
- {
920
- my $self = shift;
921
- my $graph = $self->{_graph};
922
-
923
- # only match this inside a "{ }" (normal, non-group) scope
924
- return if exists $self->{scope_stack}->[-1]->{_is_group};
925
-
926
- my $n1 = $1;
927
- my $port = $2;
928
- push @{$self->{stack}},
929
- $self->_new_nodes ($n1, $self->{group_stack}, {}, $port, $self->{stack});
930
-
931
- if (defined $self->{left_edge})
932
- {
933
- my $e = $self->{use_class}->{edge};
934
- my ($style, $edge_label, $edge_atr, $edge_bd, $edge_un) = @{$self->{left_edge}};
935
-
936
- foreach my $node (@{$self->{left_stack}})
937
- {
938
- my $edge = $e->new( { style => $style, name => $edge_label } );
939
-
940
- # if inside a scope, set the scope attributes, too:
941
- my $scope = $self->{scope_stack}->[-1];
942
- $edge->set_attributes($scope->{edge}) if $scope;
943
-
944
- # override with the local attributes
945
- # 'string' => [ 'string' ]
946
- # [ { hash }, 'string' ] => [ { hash }, 'string' ]
947
- my $e = $edge_atr; $e = [ $edge_atr ] unless ref($e) eq 'ARRAY';
948
-
949
- for my $a (@$e)
950
- {
951
- if (ref $a)
952
- {
953
- $edge->set_attributes($a);
954
- }
955
- else
956
- {
957
- # deferred parsing with the object as param:
958
- my $out = $self->_parse_attributes($a, $edge, NO_MULTIPLES);
959
- return undef unless defined $out; # error in attributes?
960
- $edge->set_attributes($out);
961
- }
962
- }
963
-
964
- # "<--->": bidirectional
965
- $edge->bidirectional(1) if $edge_bd;
966
- $edge->undirected(1) if $edge_un;
967
- $graph->add_edge ( $node, $self->{stack}->[-1], $edge );
968
- }
969
- }
970
- 1;
971
- } );
972
-
973
- # "Berlin" [ color=red ] or "Bonn":"a" [ color=red ]
974
- $self->_register_handler( qr/^$qr_node$qr_oatr/,
975
- sub
976
- {
977
- my $self = shift;
978
- my $name = $1;
979
- my $port = $2;
980
- my $compass = $4 || ''; $port .= ":$compass" if $compass;
981
-
982
- $self->{stack} = [ $self->_new_nodes ($name, $self->{group_stack}, {}, $port ) ];
983
-
984
- # defer attribute parsing until object exists
985
- my $node = $self->{stack}->[0];
986
- my $a1 = $self->_parse_attributes($5||'', $node);
987
- return undef if $self->{error};
988
- $node->set_attributes($a1);
989
-
990
- # forget left stack
991
- $self->{left_edge} = undef;
992
- $self->{left_stack} = [];
993
- 1;
994
- } );
995
-
996
- # Things like ' "Node" ' will be consumed before, so we do not need a case
997
- # for '"Bonn" -> "Berlin"'
998
-
999
- # node chain continued like "-> "Kassel" [ ... ]"
1000
- $self->_register_handler( qr/^$qr_edge$qr_ocmt$qr_node$qr_ocmt$qr_oatr/,
1001
- sub
1002
- {
1003
- my $self = shift;
1004
-
1005
- return if @{$self->{stack}} == 0; # only match this if stack non-empty
1006
-
1007
- my $graph = $self->{_graph};
1008
- my $eg = $1; # entire edge ("->" etc)
1009
- my $n = $2; # node name
1010
- my $port = $3;
1011
- my $compass = $4 || $5 || ''; $port .= ":$compass" if $compass;
1012
-
1013
- my $edge_un = 0; $edge_un = 1 if $eg eq '--'; # undirected edge?
1014
-
1015
- my $scope = $self->{scope_stack}->[-1] || {};
1016
-
1017
- # need to defer edge attribute parsing until the edge exists
1018
- my $edge_atr = [ $6||'', $scope->{edge} || {} ];
1019
-
1020
- # the right side nodes:
1021
- my $nodes_b = [ $self->_new_nodes ($n, $self->{group_stack}, {}, $port) ];
1022
-
1023
- my $style = $self->_link_lists( $self->{stack}, $nodes_b,
1024
- '--', '', $edge_atr, 0, $edge_un);
1025
-
1026
- # remember the left side
1027
- $self->{left_edge} = [ $style, '', $edge_atr, 0, $edge_un ];
1028
- $self->{left_stack} = $self->{stack};
1029
-
1030
- # forget stack and remember the right side instead
1031
- $self->{stack} = $nodes_b;
1032
- 1;
1033
- } );
1034
-
1035
- $self;
1036
- }
1037
-
1038
- sub _add_node
1039
- {
1040
- # add a node to the graph, overridable by subclasses
1041
- my ($self, $graph, $name) = @_;
1042
-
1043
- # "a -- clusterB" should not create a spurious node named "clusterB"
1044
- my @groups = $graph->groups();
1045
- for my $g (@groups)
1046
- {
1047
- return $g if $g->{name} eq $name;
1048
- }
1049
-
1050
- my $node = $graph->node($name);
1051
-
1052
- if (!defined $node)
1053
- {
1054
- $node = $graph->add_node($name); # add
1055
-
1056
- # apply attributes from the current scope (only for new nodes)
1057
- my $scope = $self->{scope_stack}->[-1];
1058
- return $self->error("Scope stack is empty!") unless defined $scope;
1059
-
1060
- my $is_group = $scope->{_is_group};
1061
- delete $scope->{_is_group};
1062
- $node->set_attributes($scope->{node});
1063
- $scope->{_is_group} = $is_group if $is_group;
1064
- }
1065
-
1066
- $node;
1067
- }
1068
-
1069
- #############################################################################
1070
- # attribute remapping
1071
-
1072
- # undef => drop that attribute
1073
- # not listed attributes will result in "x-dot-$attribute" and a warning
1074
-
1075
- my $remap = {
1076
- 'node' => {
1077
- 'distortion' => 'x-dot-distortion',
1078
-
1079
- 'fixedsize' => undef,
1080
- 'group' => 'x-dot-group',
1081
- 'height' => 'x-dot-height',
1082
-
1083
- # XXX TODO: ignore non-node attributes set in a scope
1084
- 'dir' => undef,
1085
-
1086
- 'layer' => 'x-dot-layer',
1087
- 'margin' => 'x-dot-margin',
1088
- 'orientation' => \&_from_graphviz_node_orientation,
1089
- 'peripheries' => \&_from_graphviz_node_peripheries,
1090
- 'pin' => 'x-dot-pin',
1091
- 'pos' => 'x-dot-pos',
1092
- # XXX TODO: rank=0 should make that node the root node
1093
- # 'rank' => undef,
1094
- 'rects' => 'x-dot-rects',
1095
- 'regular' => 'x-dot-regular',
1096
- # 'root' => undef,
1097
- 'sides' => 'x-dot-sides',
1098
- 'shapefile' => 'x-dot-shapefile',
1099
- 'shape' => \&_from_graphviz_node_shape,
1100
- 'skew' => 'x-dot-skew',
1101
- 'style' => \&_from_graphviz_style,
1102
- 'width' => 'x-dot-width',
1103
- 'z' => 'x-dot-z',
1104
- },
1105
-
1106
- 'edge' => {
1107
- 'arrowsize' => 'x-dot-arrowsize',
1108
- 'arrowhead' => \&_from_graphviz_arrow_style,
1109
- 'arrowtail' => 'x-dot-arrowtail',
1110
- # important for color lists like "red:red" => double edge
1111
- 'color' => \&_from_graphviz_edge_color,
1112
- 'constraint' => 'x-dot-constraint',
1113
- 'dir' => \&_from_graphviz_edge_dir,
1114
- 'decorate' => 'x-dot-decorate',
1115
- 'f' => 'x-dot-f',
1116
- 'headclip' => 'x-dot-headclip',
1117
- 'headhref' => 'headlink',
1118
- 'headurl' => 'headlink',
1119
- 'headport' => \&_from_graphviz_headport,
1120
- 'headlabel' => 'headlabel',
1121
- 'headtarget' => 'x-dot-headtarget',
1122
- 'headtooltip' => 'headtitle',
1123
- 'labelangle' => 'x-dot-labelangle',
1124
- 'labeldistance' => 'x-dot-labeldistance',
1125
- 'labelfloat' => 'x-dot-labelfloat',
1126
- 'labelfontcolor' => \&_from_graphviz_color,
1127
- 'labelfontname' => 'font',
1128
- 'labelfontsize' => 'font-size',
1129
- 'layer' => 'x-dot-layer',
1130
- 'len' => 'x-dot-len',
1131
- 'lhead' => 'x-dot-lhead',
1132
- 'ltail' => 'x-dot-tail',
1133
- 'minlen' => \&_from_graphviz_edge_minlen,
1134
- 'pos' => 'x-dot-pos',
1135
- 'samehead' => 'x-dot-samehead',
1136
- 'samearrowhead' => 'x-dot-samearrowhead',
1137
- 'sametail' => 'x-dot-sametail',
1138
- 'style' => \&_from_graphviz_edge_style,
1139
- 'tailclip' => 'x-dot-tailclip',
1140
- 'tailhref' => 'taillink',
1141
- 'tailurl' => 'taillink',
1142
- 'tailport' => \&_from_graphviz_tailport,
1143
- 'taillabel' => 'taillabel',
1144
- 'tailtarget' => 'x-dot-tailtarget',
1145
- 'tailtooltip' => 'tailtitle',
1146
- 'weight' => 'x-dot-weight',
1147
- },
1148
-
1149
- 'graph' => {
1150
- 'damping' => 'x-dot-damping',
1151
- 'K' => 'x-dot-k',
1152
- 'bb' => 'x-dot-bb',
1153
- 'center' => 'x-dot-center',
1154
- # will be handled automatically:
1155
- 'charset' => undef,
1156
- 'clusterrank' => 'x-dot-clusterrank',
1157
- 'compound' => 'x-dot-compound',
1158
- 'concentrate' => 'x-dot-concentrate',
1159
- 'defaultdist' => 'x-dot-defaultdist',
1160
- 'dim' => 'x-dot-dim',
1161
- 'dpi' => 'x-dot-dpi',
1162
- 'epsilon' => 'x-dot-epsilon',
1163
- 'esep' => 'x-dot-esep',
1164
- 'fontpath' => 'x-dot-fontpath',
1165
- 'labeljust' => \&_from_graphviz_graph_labeljust,
1166
- 'labelloc' => \&_from_graphviz_labelloc,
1167
- 'landscape' => 'x-dot-landscape',
1168
- 'layers' => 'x-dot-layers',
1169
- 'layersep' => 'x-dot-layersep',
1170
- 'levelsgap' => 'x-dot-levelsgap',
1171
- 'margin' => 'x-dot-margin',
1172
- 'maxiter' => 'x-dot-maxiter',
1173
- 'mclimit' => 'x-dot-mclimit',
1174
- 'mindist' => 'x-dot-mindist',
1175
- 'minquit' => 'x-dot-minquit',
1176
- 'mode' => 'x-dot-mode',
1177
- 'model' => 'x-dot-model',
1178
- 'nodesep' => 'x-dot-nodesep',
1179
- 'normalize' => 'x-dot-normalize',
1180
- 'nslimit' => 'x-dot-nslimit',
1181
- 'nslimit1' => 'x-dot-nslimit1',
1182
- 'ordering' => 'x-dot-ordering',
1183
- 'orientation' => 'x-dot-orientation',
1184
- 'output' => 'output',
1185
- 'outputorder' => 'x-dot-outputorder',
1186
- 'overlap' => 'x-dot-overlap',
1187
- 'pack' => 'x-dot-pack',
1188
- 'packmode' => 'x-dot-packmode',
1189
- 'page' => 'x-dot-page',
1190
- 'pagedir' => 'x-dot-pagedir',
1191
- 'pencolor' => \&_from_graphviz_color,
1192
- 'quantum' => 'x-dot-quantum',
1193
- 'rankdir' => \&_from_graphviz_graph_rankdir,
1194
- 'ranksep' => 'x-dot-ranksep',
1195
- 'ratio' => 'x-dot-ratio',
1196
- 'remincross' => 'x-dot-remincross',
1197
- 'resolution' => 'x-dot-resolution',
1198
- 'rotate' => 'x-dot-rotate',
1199
- 'samplepoints' => 'x-dot-samplepoints',
1200
- 'searchsize' => 'x-dot-searchsize',
1201
- 'sep' => 'x-dot-sep',
1202
- 'size' => 'x-dot-size',
1203
- 'splines' => 'x-dot-splines',
1204
- 'start' => 'x-dot-start',
1205
- 'style' => \&_from_graphviz_style,
1206
- 'stylesheet' => 'x-dot-stylesheet',
1207
- 'truecolor' => 'x-dot-truecolor',
1208
- 'viewport' => 'x-dot-viewport',
1209
- 'voro-margin' => 'x-dot-voro-margin',
1210
- },
1211
-
1212
- 'group' => {
1213
- 'labeljust' => \&_from_graphviz_graph_labeljust,
1214
- 'labelloc' => \&_from_graphviz_labelloc,
1215
- 'pencolor' => \&_from_graphviz_color,
1216
- 'style' => \&_from_graphviz_style,
1217
- 'K' => 'x-dot-k',
1218
- },
1219
-
1220
- 'all' => {
1221
- 'color' => \&_from_graphviz_color,
1222
- 'colorscheme' => 'x-colorscheme',
1223
- 'bgcolor' => \&_from_graphviz_color,
1224
- 'fillcolor' => \&_from_graphviz_color,
1225
- 'fontsize' => \&_from_graphviz_font_size,
1226
- 'fontcolor' => \&_from_graphviz_color,
1227
- 'fontname' => 'font',
1228
- 'lp' => 'x-dot-lp',
1229
- 'nojustify' => 'x-dot-nojustify',
1230
- 'rank' => 'x-dot-rank',
1231
- 'showboxes' => 'x-dot-showboxes',
1232
- 'target' => 'x-dot-target',
1233
- 'tooltip' => 'title',
1234
- 'URL' => 'link',
1235
- 'href' => 'link',
1236
- },
1237
- };
1238
-
1239
- sub _remap { $remap; }
1240
-
1241
- my $rankdir = {
1242
- 'LR' => 'east',
1243
- 'RL' => 'west',
1244
- 'TB' => 'south',
1245
- 'BT' => 'north',
1246
- };
1247
-
1248
- sub _from_graphviz_graph_rankdir
1249
- {
1250
- my ($self, $name, $dir, $object) = @_;
1251
-
1252
- my $d = $rankdir->{$dir} || 'east';
1253
-
1254
- ('flow', $d);
1255
- }
1256
-
1257
- my $shapes = {
1258
- box => 'rect',
1259
- polygon => 'rect',
1260
- egg => 'rect',
1261
- rectangle => 'rect',
1262
- mdiamond => 'diamond',
1263
- msquare => 'rect',
1264
- plaintext => 'none',
1265
- none => 'none',
1266
- #
1267
- mrecord => 'record',
1268
- Mrecord => 'record',
1269
- square => 'rect',
1270
- triangle => 'diamond',
1271
- };
1272
-
1273
- sub _from_graphviz_node_shape
1274
- {
1275
- my ($self, $name, $shape) = @_;
1276
-
1277
- my @rc;
1278
- my $s = lc($shape);
1279
- if ($s =~ /^(triple|double)/)
1280
- {
1281
- $s =~ s/^(triple|double)//;
1282
- push @rc, ('border-style','double');
1283
- }
1284
-
1285
- # map the name to what Graph::Easy expects (ellipse stays as ellipse f.i.)
1286
- $s = $shapes->{$s} || $s;
1287
-
1288
- (@rc, $name, $s);
1289
- }
1290
-
1291
- sub _from_graphviz_style
1292
- {
1293
- my ($self, $name, $style, $class) = @_;
1294
-
1295
- my @styles = split /\s*,\s*/, $style;
1296
-
1297
- my $is_node = 0;
1298
- $is_node = 1 if ref($class) && !$class->isa('Graph::Easy::Group');
1299
- $is_node = 1 if !ref($class) && defined $class && $class eq 'node';
1300
-
1301
- my @rc;
1302
- for my $s (@styles)
1303
- {
1304
- @rc = ('shape', 'rounded') if $s eq 'rounded';
1305
- @rc = ('shape', 'invisible') if $s eq 'invis';
1306
- @rc = ('border', 'black ' . $1) if $s =~ /^(bold|dotted|dashed)\z/;
1307
- if ($is_node != 0)
1308
- {
1309
- @rc = ('shape', 'rect') if $s eq 'filled';
1310
- }
1311
- # convert "setlinewidth(12)" =>
1312
- if ($s =~ /setlinewidth\((\d+|\d*\.\d+)\)/)
1313
- {
1314
- my $width = abs($1 || 1);
1315
- my $style = '';
1316
- $style = 'wide'; # > 11
1317
- $style = 'solid' if $width < 3;
1318
- $style = 'bold' if $width >= 3 && $width < 5;
1319
- $style = 'broad' if $width >= 5 && $width < 11;
1320
- push @rc, ('borderstyle',$style);
1321
- }
1322
- }
1323
-
1324
- @rc;
1325
- }
1326
-
1327
- sub _from_graphviz_node_orientation
1328
- {
1329
- my ($self, $name, $o) = @_;
1330
-
1331
- my $r = int($o);
1332
-
1333
- return (undef,undef) if $r == 0;
1334
-
1335
- # 1.0 => 1
1336
- ('rotate', $r);
1337
- }
1338
-
1339
- my $port_remap = {
1340
- n => 'north',
1341
- e => 'east',
1342
- w => 'west',
1343
- s => 'south',
1344
- };
1345
-
1346
- sub _from_graphviz_headport
1347
- {
1348
- my ($self, $name, $compass) = @_;
1349
-
1350
- # XXX TODO
1351
- # handle "port:compass" too
1352
-
1353
- # one of "n","ne","e","se","s","sw","w","nw
1354
- # "ne => n"
1355
- my $c = $port_remap->{ substr(lc($compass),0,1) } || 'east';
1356
-
1357
- ('end', $c);
1358
- }
1359
-
1360
- sub _from_graphviz_tailport
1361
- {
1362
- my ($self, $name, $compass) = @_;
1363
-
1364
- # XXX TODO
1365
- # handle "port:compass" too
1366
-
1367
- # one of "n","ne","e","se","s","sw","w","nw
1368
- # "ne => n" => "north"
1369
- my $c = $port_remap->{ substr(lc($compass),0,1) } || 'east';
1370
-
1371
- ('start', $c);
1372
- }
1373
-
1374
- sub _from_graphviz_node_peripheries
1375
- {
1376
- my ($self, $name, $cnt) = @_;
1377
-
1378
- return (undef,undef) if $cnt < 2;
1379
-
1380
- # peripheries = 2 => double border
1381
- ('border-style', 'double');
1382
- }
1383
-
1384
- sub _from_graphviz_edge_minlen
1385
- {
1386
- my ($self, $name, $len) = @_;
1387
-
1388
- # 1 => 1, 2 => 3, 3 => 5 etc
1389
- $len = $len * 2 - 1;
1390
- ($name, $len);
1391
- }
1392
-
1393
- sub _from_graphviz_font_size
1394
- {
1395
- my ($self, $f, $size) = @_;
1396
-
1397
- # 20 => 20px
1398
- $size = $size . 'px' if $size =~ /^\d+(\.\d+)?\z/;
1399
-
1400
- ('fontsize', $size);
1401
- }
1402
-
1403
- sub _from_graphviz_labelloc
1404
- {
1405
- my ($self, $name, $loc) = @_;
1406
-
1407
- my $l = 'top';
1408
- $l = 'bottom' if $loc =~ /^b/;
1409
-
1410
- ('labelpos', $l);
1411
- }
1412
-
1413
- sub _from_graphviz_edge_dir
1414
- {
1415
- my ($self, $name, $dir, $edge) = @_;
1416
-
1417
- # Modify the edge, depending on dir
1418
- if (ref($edge))
1419
- {
1420
- # "forward" is the default and ignored
1421
- $edge->flip() if $dir eq 'back';
1422
- $edge->bidirectional(1) if $dir eq 'both';
1423
- $edge->undirected(1) if $dir eq 'none';
1424
- }
1425
-
1426
- (undef, undef);
1427
- }
1428
-
1429
- sub _from_graphviz_edge_style
1430
- {
1431
- my ($self, $name, $style, $object) = @_;
1432
-
1433
- # input: solid dashed dotted bold invis
1434
- $style = 'invisible' if $style eq 'invis';
1435
-
1436
- # although "normal" is not documented, it occurs in the wild
1437
- $style = 'solid' if $style eq 'normal';
1438
-
1439
- # convert "setlinewidth(12)" =>
1440
- if ($style =~ /setlinewidth\((\d+|\d*\.\d+)\)/)
1441
- {
1442
- my $width = abs($1 || 1);
1443
- $style = 'wide'; # > 11
1444
- $style = 'solid' if $width < 3;
1445
- $style = 'bold' if $width >= 3 && $width < 5;
1446
- $style = 'broad' if $width >= 5 && $width < 11;
1447
- }
1448
-
1449
- ($name, $style);
1450
- }
1451
-
1452
- sub _from_graphviz_arrow_style
1453
- {
1454
- my ($self, $name, $shape, $object) = @_;
1455
-
1456
- my $style = 'open';
1457
-
1458
- $style = 'closed' if $shape =~ /^(empty|onormal)\z/;
1459
- $style = 'filled' if $shape eq 'normal' || $shape eq 'normalnormal';
1460
- $style = 'open' if $shape eq 'vee' || $shape eq 'veevee';
1461
- $style = 'none' if $shape eq 'none' || $shape eq 'nonenone';
1462
-
1463
- ('arrow-style', $style);
1464
- }
1465
-
1466
- my $color_atr_map = {
1467
- fontcolor => 'color',
1468
- bgcolor => 'background',
1469
- fillcolor => 'fill',
1470
- pencolor => 'bordercolor',
1471
- labelfontcolor => 'labelcolor',
1472
- color => 'color',
1473
- };
1474
-
1475
- sub _from_graphviz_color
1476
- {
1477
- # Remap the color name and value
1478
- my ($self, $name, $color) = @_;
1479
-
1480
- # "//red" => "red"
1481
- $color =~ s/^\/\///;
1482
-
1483
- my $colorscheme = 'x11';
1484
- if ($color =~ /^\//)
1485
- {
1486
- # "/set9/red" => "red"
1487
- $color =~ s/^\/([^\/]+)\///;
1488
- $colorscheme = $1;
1489
- # map the color to the right color according to the colorscheme
1490
- $color = Graph::Easy->color_value($color,$colorscheme) || 'black';
1491
- }
1492
-
1493
- # "#AA BB CC => "#AABBCC"
1494
- $color =~ s/\s+//g if $color =~ /^#/;
1495
-
1496
- # "0.1 0.4 0.5" => "hsv(0.1,0.4,0.5)"
1497
- $color =~ s/\s+/,/g if $color =~ /\s/;
1498
- $color = 'hsv(' . $color . ')' if $color =~ /,/;
1499
-
1500
- ($color_atr_map->{$name}, $color);
1501
- }
1502
-
1503
- sub _from_graphviz_edge_color
1504
- {
1505
- # remap the color name and value
1506
- my ($self, $name, $color) = @_;
1507
-
1508
- my @colors = split /:/, $color;
1509
-
1510
- for my $c (@colors)
1511
- {
1512
- $c = Graph::Easy::Parser::Graphviz::_from_graphviz_color($self,$name,$c);
1513
- }
1514
-
1515
- my @rc;
1516
- if (@colors > 1)
1517
- {
1518
- # 'red:blue' => "style: double; color: red"
1519
- push @rc, 'style', 'double';
1520
- }
1521
-
1522
- (@rc, $color_atr_map->{$name}, $colors[0]);
1523
- }
1524
-
1525
- sub _from_graphviz_graph_labeljust
1526
- {
1527
- my ($self, $name, $l) = @_;
1528
-
1529
- # input: "l" "r" or "c", output "left", "right" or "center"
1530
- my $a = 'center';
1531
- $a = 'left' if $l eq 'l';
1532
- $a = 'right' if $l eq 'r';
1533
-
1534
- ('align', $a);
1535
- }
1536
-
1537
- #############################################################################
1538
-
1539
- sub _remap_attributes
1540
- {
1541
- my ($self, $att, $object, $r) = @_;
1542
-
1543
- if ($self->{debug})
1544
- {
1545
- my $o = ''; $o = " for $object" if $object;
1546
- print STDERR "# remapping attributes '$att'$o\n";
1547
- require Data::Dumper; print STDERR "#" , Data::Dumper::Dumper($att),"\n";
1548
- }
1549
-
1550
- $r = $self->_remap() unless defined $r;
1551
-
1552
- $self->{_graph}->_remap_attributes($object, $att, $r, 'noquote', undef, undef);
1553
- }
1554
-
1555
- #############################################################################
1556
-
1557
- my $html_remap = {
1558
- 'table' => {
1559
- 'align' => 'align',
1560
- 'balign' => undef,
1561
- 'bgcolor' => 'fill',
1562
- 'border' => 'border',
1563
- # XXX TODO
1564
- 'cellborder' => 'border',
1565
- 'cellspacing' => undef,
1566
- 'cellpadding' => undef,
1567
- 'fixedsize' => undef,
1568
- 'height' => undef,
1569
- 'href' => 'link',
1570
- 'port' => undef,
1571
- 'target' => undef,
1572
- 'title' => 'title',
1573
- 'tooltip' => 'title',
1574
- 'valign' => undef,
1575
- 'width' => undef,
1576
- },
1577
- 'td' => {
1578
- 'align' => 'align',
1579
- 'balign' => undef,
1580
- 'bgcolor' => 'fill',
1581
- 'border' => 'border',
1582
- 'cellspacing' => undef,
1583
- 'cellpadding' => undef,
1584
- 'colspan' => 'columns',
1585
- 'fixedsize' => undef,
1586
- 'height' => undef,
1587
- 'href' => 'link',
1588
- 'port' => undef,
1589
- 'rowspan' => 'rows',
1590
- 'target' => undef,
1591
- 'title' => 'title',
1592
- 'tooltip' => 'title',
1593
- 'valign' => undef,
1594
- 'width' => undef,
1595
- },
1596
- };
1597
-
1598
- sub _parse_html_attributes
1599
- {
1600
- my ($self, $text, $qr, $tag) = @_;
1601
-
1602
- # "<TD ...>" => " ..."
1603
- $text =~ s/^$qr->{td_tag}//;
1604
- $text =~ s/\s*>\z//;
1605
-
1606
- my $attr = {};
1607
- while ($text ne '')
1608
- {
1609
-
1610
- return $self->error("HTML-like attribute '$text' doesn't look valid to me.")
1611
- unless $text =~ s/^($qr->{attribute})//;
1612
-
1613
- my $name = lc($2); my $value = $3;
1614
-
1615
- $self->_unquote($value);
1616
- $value = lc($value) if $name eq 'align';
1617
- $self->error ("Unknown attribute '$name' in HTML-like label") unless exists $html_remap->{$tag}->{$name};
1618
- # filter out attributes we do not yet support
1619
- $attr->{$name} = $value if defined $html_remap->{$tag}->{$name};
1620
- }
1621
-
1622
- $attr;
1623
- }
1624
-
1625
- sub _html_per_table
1626
- {
1627
- # take the HTML-like attributes found per TABLE and create a hash with them
1628
- # so they can be applied as default to each node
1629
- my ($self, $attributes) = @_;
1630
-
1631
- $self->_remap_attributes($attributes,'table',$html_remap);
1632
- }
1633
-
1634
- sub _html_per_node
1635
- {
1636
- # take the HTML-like attributes found per TD and apply them to the node
1637
- my ($self, $attr, $node) = @_;
1638
-
1639
- my $c = $attr->{colspan} || 1;
1640
- $node->set_attribute('columns',$c) if $c != 1;
1641
-
1642
- my $r = $attr->{rowspan} || 1;
1643
- $node->set_attribute('rows',$r) if $r != 1;
1644
-
1645
- $node->{autosplit_portname} = $attr->{port} if exists $attr->{port};
1646
-
1647
- for my $k (qw/port colspan rowspan/)
1648
- {
1649
- delete $attr->{$k};
1650
- }
1651
-
1652
- my $att = $self->_remap_attributes($attr,$node,$html_remap);
1653
-
1654
- $node->set_attributes($att);
1655
-
1656
- $self;
1657
- }
1658
-
1659
- sub _parse_html
1660
- {
1661
- # Given an HTML label, parses that into the individual parts. Returns a
1662
- # list of nodes.
1663
- my ($self, $n, $qr) = @_;
1664
-
1665
- my $graph = $self->{_graph};
1666
-
1667
- my $label = $n->label(1); $label = '' unless defined $label;
1668
- my $org_label = $label;
1669
-
1670
- # print STDERR "# 1 HTML-like label is now: $label\n";
1671
-
1672
- # "unquote" the HTML-like label
1673
- $label =~ s/^<\s*//;
1674
- $label =~ s/\s*>\z//;
1675
-
1676
- # print STDERR "# 2 HTML-like label is now: $label\n";
1677
-
1678
- # remove the table end (at the end)
1679
- $label =~ s/$qr->{table_end}\s*\z//;
1680
- # print STDERR "# 2.a HTML-like label is now: $label\n";
1681
- # remove the table start
1682
- $label =~ s/($qr->{table})//;
1683
-
1684
- # print STDERR "# 3 HTML-like label is now: $label\n";
1685
-
1686
- my $table_tag = $1 || '';
1687
- $table_tag =~ /$qr->{table_tag}(.*?)>/;
1688
- my $table_attr = $self->_parse_html_attributes($1 || '', $qr, 'table');
1689
-
1690
- # use Data::Dumper;
1691
- # print STDERR "# 3 HTML-like table-tag attributes are: ", Dumper($table_attr),"\n";
1692
-
1693
- # generate the base name from the actual graphviz node name to allow links to
1694
- # it
1695
- my $base_name = $n->{name};
1696
-
1697
- my $class = $self->{use_class}->{node};
1698
-
1699
- my $raw_attributes = $n->raw_attributes();
1700
- delete $raw_attributes->{label};
1701
- delete $raw_attributes->{shape};
1702
-
1703
- my @rc; my $first_in_row;
1704
- my $x = 0; my $y = 0; my $idx = 0;
1705
- while ($label ne '')
1706
- {
1707
- $label =~ s/^\s*($qr->{row})//;
1708
-
1709
- return $self->error ("Cannot parse HTML-like label: '$label'")
1710
- unless defined $1;
1711
-
1712
- # we now got one row:
1713
- my $row = $1;
1714
-
1715
- # print STDERR "# 3 HTML-like row is $row\n";
1716
-
1717
- # remove <TR>
1718
- $row =~ s/^\s*$qr->{tr}\s*//;
1719
- # remove </TR>
1720
- $row =~ s/\s*$qr->{tr_end}\s*\z//;
1721
-
1722
- my $first = 1;
1723
- while ($row ne '')
1724
- {
1725
- # remove one TD from the current row text
1726
- $row =~ s/^($qr->{td})($qr->{text})$qr->{td_end}//;
1727
- return $self->error ("Cannot parse HTML-like row: '$row'")
1728
- unless defined $1;
1729
-
1730
- my $node_label = $2;
1731
- my $attr_txt = $1;
1732
-
1733
- # convert "<BR/>" etc. to line breaks
1734
- # XXX TODO apply here the default of BALIGN
1735
- $node_label =~ s/<BR\s*\/?>/\\n/gi;
1736
-
1737
- # if the font covers the entire node, set "font" attribute
1738
- my $font_face = undef;
1739
- if ($node_label =~ /^[ ]*<FONT FACE="([^"]+)">(.*)<\/FONT>[ ]*\z/i)
1740
- {
1741
- $node_label = $2; $font_face = $1;
1742
- }
1743
- # XXX TODO if not, allow inline font changes
1744
- $node_label =~ s/<FONT[^>]+>(.*)<\/FONT>/$1/ig;
1745
-
1746
- my $node_name = $base_name . '.' . $idx;
1747
-
1748
- # if it doesn't exist, add it, otherwise retrieve node object to $node
1749
-
1750
- my $node = $graph->node($node_name);
1751
- if (!defined $node)
1752
- {
1753
- # create node object from the correct class
1754
- $node = $class->new($node_name);
1755
- $graph->add_node($node);
1756
- $node->set_attributes($raw_attributes);
1757
- $node->{autosplit_portname} = $idx; # some sensible default
1758
- }
1759
-
1760
- # apply the default attributes from the table
1761
- $node->set_attributes($table_attr);
1762
- # if found a global font attribute, override the font attribute with it
1763
- $node->set_attribute('font',$font_face) if defined $font_face;
1764
-
1765
- # parse the attributes and apply them to the node
1766
- $self->_html_per_node( $self->_parse_html_attributes($attr_txt,$qr,'td'), $node );
1767
-
1768
- # print STDERR "# Created $node_name\n";
1769
-
1770
- $node->{autosplit_label} = $node_label;
1771
- $node->{autosplit_basename} = $base_name;
1772
-
1773
- push @rc, $node;
1774
- if (@rc == 1)
1775
- {
1776
- # for correct as_txt output
1777
- $node->{autosplit} = $org_label;
1778
- $node->{autosplit} =~ s/\s+\z//; # strip trailing spaces
1779
- $node->{autosplit} =~ s/^\s+//; # strip leading spaces
1780
- $first_in_row = $node;
1781
- }
1782
- else
1783
- {
1784
- # second, third etc. get previous as origin
1785
- my ($sx,$sy) = (1,0);
1786
- my $origin = $rc[-2];
1787
- # the first node in one row is relative to the first node in the
1788
- # prev row
1789
- if ($first == 1)
1790
- {
1791
- ($sx,$sy) = (0,1); $origin = $first_in_row;
1792
- $first_in_row = $node;
1793
- $first = 0;
1794
- }
1795
- $node->relative_to($origin,$sx,$sy);
1796
- # suppress as_txt output for other parts
1797
- $node->{autosplit} = undef;
1798
- }
1799
- # nec. for border-collapse
1800
- $node->{autosplit_xy} = "$x,$y";
1801
-
1802
- $idx++; # next node ID
1803
- $x++;
1804
- }
1805
-
1806
- # next row
1807
- $y++;
1808
- }
1809
-
1810
- # return created nodes
1811
- @rc;
1812
- }
1813
-
1814
- #############################################################################
1815
-
1816
- sub _parser_cleanup
1817
- {
1818
- # After initial parsing, do cleanup, e.g. autosplit nodes with shape record,
1819
- # parse HTML-like labels, re-connect edges to the parts etc.
1820
- my ($self) = @_;
1821
-
1822
- print STDERR "# Parser cleanup pass\n" if $self->{debug};
1823
-
1824
- my $g = $self->{_graph};
1825
- my @nodes = $g->nodes();
1826
-
1827
- # For all nodes that have a shape of "record", break down their label into
1828
- # parts and create these as autosplit nodes.
1829
- # For all nodes that have a label starting with "<", parse it as HTML.
1830
-
1831
- # keep a record of all nodes to be deleted later:
1832
- my $delete = {};
1833
-
1834
- my $html_regexps = $self->_match_html_regexps();
1835
- my $graph_flow = $g->attribute('flow');
1836
- for my $n (@nodes)
1837
- {
1838
- my $label = $n->label(1);
1839
- # we can get away with a direct lookup, since DOT does not have classes
1840
- my $shape = $n->{att}->{shape} || 'rect';
1841
-
1842
- if ($shape !~ /record/ && $label =~ /^<\s*<.*>\z/)
1843
- {
1844
- print STDERR "# HTML-like label found: $label\n" if $self->{debug};
1845
- my @nodes = $self->_parse_html($n, $html_regexps);
1846
- # remove the temp. and spurious node
1847
- $delete->{$n->{name}} = undef;
1848
- my @edges = $n->edges();
1849
- # reconnect the found edges to the new autosplit parts
1850
- for my $e (@edges)
1851
- {
1852
- # XXX TODO: connect to better suited parts based on flow?
1853
- $e->start_at($nodes[0]) if ($e->{from} == $n);
1854
- $e->end_at($nodes[0]) if ($e->{to} == $n);
1855
- }
1856
- $g->del_node($n);
1857
- next;
1858
- }
1859
-
1860
- if ($shape =~ /record/ && $label =~ /\|/)
1861
- {
1862
- my $att = {};
1863
- # create basename only when node name differes from label
1864
- $att->{basename} = $n->{name};
1865
- if ($n->{name} ne $label)
1866
- {
1867
- $att->{basename} = $n->{name};
1868
- }
1869
- # XXX TODO: autosplit needs to handle nesting like "{}".
1870
-
1871
- # Replace "{ ... | ... | ... }" with "...|| ... || ...." as a cheat
1872
- # to fix some common cases
1873
- if ($label =~ /^\s*\{[^\{\}]+\}\s*\z/)
1874
- {
1875
- $label =~ s/[\{\}]//g; # {..|..} => ..|..
1876
- # if flow up/down: {A||B} => "[ A|| || B ]"
1877
- $label =~ s/\|/\|\| /g # ..|.. => ..|| ..
1878
- if ($graph_flow =~ /^(east|west)/);
1879
- # if flow left/right: {A||B} => "[ A| |B ]"
1880
- $label =~ s/\|\|/\| \|/g # ..|.. => ..| |..
1881
- if ($graph_flow =~ /^(north|south)/);
1882
- }
1883
- my @rc = $self->_autosplit_node($g, $label, $att, 0 );
1884
- my $group = $n->group();
1885
- $n->del_attribute('label');
1886
-
1887
- my $qr_clean = $self->{_qr_part_clean};
1888
- # clean the base name of ports:
1889
- # "<f1> test | <f2> test" => "test|test"
1890
- $rc[0]->{autosplit} =~ s/(^|\|)$qr_clean/$1/g;
1891
- $rc[0]->{att}->{basename} =~ s/(^|\|)$qr_clean/$1/g;
1892
- $rc[0]->{autosplit} =~ s/^\s*//;
1893
- $rc[0]->{att}->{basename} =~ s/^\s*//;
1894
- # '| |' => '| |' to avoid empty parts via as_txt() => as_ascii()
1895
- $rc[0]->{autosplit} =~ s/\|\s\|/\| \|/g;
1896
- $rc[0]->{att}->{basename} =~ s/\|\s\|/\| \|/g;
1897
- $rc[0]->{autosplit} =~ s/\|\s\|/\| \|/g;
1898
- $rc[0]->{att}->{basename} =~ s/\|\s\|/\| \|/g;
1899
- delete $rc[0]->{att}->{basename} if $rc[0]->{att}->{basename} eq $rc[0]->{autosplit};
1900
-
1901
- for my $n1 (@rc)
1902
- {
1903
- $n1->add_to_group($group) if $group;
1904
- $n1->set_attributes($n->{att});
1905
- # remove the temp. "shape=record"
1906
- $n1->del_attribute('shape');
1907
- }
1908
-
1909
- # If the helper node has edges, reconnect them to the first
1910
- # part of the autosplit node (dot seems to render them arbitrarily
1911
- # on the autosplit node):
1912
-
1913
- for my $e (values %{$n->{edges}})
1914
- {
1915
- $e->start_at($rc[0]) if $e->{from} == $n;
1916
- $e->end_at($rc[0]) if $e->{to} == $n;
1917
- }
1918
- # remove the temp. and spurious node
1919
- $delete->{$n->{name}} = undef;
1920
- $g->del_node($n);
1921
- }
1922
- }
1923
-
1924
- # During parsing, "bonn:f1" -> "berlin:f2" results in "bonn:f1" and
1925
- # "berlin:f2" as nodes, plus an edge connecting them
1926
-
1927
- # We find all of these nodes, move the edges to the freshly created
1928
- # autosplit parts above, then delete the superflous temporary nodes.
1929
-
1930
- # if we looked up "Bonn:f1", remember it here to save time:
1931
- my $node_cache = {};
1932
-
1933
- my @edges = $g->edges();
1934
- @nodes = $g->nodes(); # get a fresh list of nodes after split
1935
- for my $e (@edges)
1936
- {
1937
- # do this for both the "from" and "to" side of the edge:
1938
- for my $side ('from','to')
1939
- {
1940
- my $n = $e->{$side};
1941
- next unless defined $n->{_graphviz_portlet};
1942
-
1943
- my $port = $n->{_graphviz_portlet};
1944
- my $base = $n->{_graphviz_basename};
1945
-
1946
- my $compass = '';
1947
- if ($port =~ s/:(n|ne|e|se|s|sw|w|nw)\z//)
1948
- {
1949
- $compass = $1;
1950
- }
1951
- # "Bonn:w" is port "w", and only "west" when that port doesnt exist
1952
-
1953
- # look it up in the cache first
1954
- my $node = $node_cache->{"$base:$port"};
1955
-
1956
- my $p = undef;
1957
- if (!defined $node)
1958
- {
1959
- # go thru all nodes and for see if we find one with the right port name
1960
- for my $na (@nodes)
1961
- {
1962
- next unless exists $na->{autosplit_portname} && exists $na->{autosplit_basename};
1963
- next unless $na->{autosplit_basename} eq $base;
1964
- next unless $na->{autosplit_portname} eq $port;
1965
- # cache result
1966
- $node_cache->{"$base:$port"} = $na;
1967
- $node = $na;
1968
- $p = $port_remap->{substr($compass,0,1)} if $compass; # ne => n => north
1969
- }
1970
- }
1971
-
1972
- if (!defined $node)
1973
- {
1974
- # Still not defined?
1975
- # port looks like a compass node?
1976
- if ($port =~ /^(n|ne|e|se|s|sw|w|nw)\z/)
1977
- {
1978
- # get the first node matching the base
1979
- for my $na (@nodes)
1980
- {
1981
- #print STDERR "# evaluating $na ($na->{name} $na->{autosplit_basename}) ($base)\n";
1982
- next unless exists $na->{autosplit_basename};
1983
- next unless $na->{autosplit_basename} eq $base;
1984
- # cache result
1985
- $node_cache->{"$base:$port"} = $na;
1986
- $node = $na;
1987
- }
1988
- if (!defined $node)
1989
- {
1990
- return $self->error("Cannot find autosplit node for $base:$port on edge $e->{id}");
1991
- }
1992
- $p = $port_remap->{substr($port,0,1)}; # ne => n => north
1993
- }
1994
- else
1995
- {
1996
- # uhoh...
1997
- return $self->error("Cannot find autosplit node for $base:$port on edge $e->{id}");
1998
- }
1999
- }
2000
-
2001
- if ($side eq 'from')
2002
- {
2003
- $delete->{$e->{from}->{name}} = undef;
2004
- print STDERR "# Setting new edge start point to $node->{name}\n" if $self->{debug};
2005
- $e->start_at($node);
2006
- print STDERR "# Setting new edge end point to start at $p\n" if $self->{debug} && $p;
2007
- $e->set_attribute('start', $p) if $p;
2008
- }
2009
- else
2010
- {
2011
- $delete->{$e->{to}->{name}} = undef;
2012
- print STDERR "# Setting new edge end point to $node->{name}\n" if $self->{debug};
2013
- $e->end_at($node);
2014
- print STDERR "# Setting new edge end point to end at $p\n" if $self->{debug} && $p;
2015
- $e->set_attribute('end', $p) if $p;
2016
- }
2017
-
2018
- } # end for side "from" and "to"
2019
- # we have reconnected this edge
2020
- }
2021
-
2022
- # after reconnecting all edges, we can delete temp. nodes:
2023
- for my $n (@nodes)
2024
- {
2025
- next unless exists $n->{_graphviz_portlet};
2026
- # "c:w" => "c"
2027
- my $name = $n->{name}; $name =~ s/:.*?\z//;
2028
- # add "c" unless we should delete the base node (this deletes record
2029
- # and autosplit nodes, but keeps loners like "c:w" around as "c":
2030
- $g->add_node($name) unless exists $delete->{$name};
2031
- # delete "c:w"
2032
- $g->del_node($n);
2033
- }
2034
-
2035
- # if the graph doesn't have a title, set the graph name as title
2036
- $g->set_attribute('title', $self->{_graphviz_graph_name})
2037
- unless defined $g->raw_attribute('title');
2038
-
2039
- # cleanup if there are no groups
2040
- if ($g->groups() == 0)
2041
- {
2042
- $g->del_attribute('group', 'align');
2043
- $g->del_attribute('group', 'fill');
2044
- }
2045
- $g->{_warn_on_unknown_attributes} = 0; # reset to die again
2046
-
2047
- $self;
2048
- }
2049
-
2050
- 1;
2051
- __END__
2052
-
2053
- =head1 NAME
2054
-
2055
- Graph::Easy::Parser::Graphviz - Parse Graphviz text into Graph::Easy
2056
-
2057
- =head1 SYNOPSIS
2058
-
2059
- # creating a graph from a textual description
2060
-
2061
- use Graph::Easy::Parser::Graphviz;
2062
- my $parser = Graph::Easy::Parser::Graphviz->new();
2063
-
2064
- my $graph = $parser->from_text(
2065
- "digraph MyGraph { \n" .
2066
- " Bonn -> \"Berlin\" \n }"
2067
- );
2068
- print $graph->as_ascii();
2069
-
2070
- print $parser->from_file('mygraph.dot')->as_ascii();
2071
-
2072
- =head1 DESCRIPTION
2073
-
2074
- C<Graph::Easy::Parser::Graphviz> parses the text format from the DOT language
2075
- use by Graphviz and constructs a C<Graph::Easy> object from it.
2076
-
2077
- The resulting object can than be used to layout and output the graph
2078
- in various formats.
2079
-
2080
- Please see the Graphviz manual for a full description of the syntax
2081
- rules of the DOT language.
2082
-
2083
- =head2 Output
2084
-
2085
- The output will be a L<Graph::Easy|Graph::Easy> object (unless overrriden
2086
- with C<use_class()>), see the documentation for Graph::Easy what you can do
2087
- with it.
2088
-
2089
- =head2 Attributes
2090
-
2091
- Attributes will be remapped to the proper Graph::Easy attribute names and
2092
- values, as much as possible.
2093
-
2094
- Anything else will be converted to custom attributes starting with "x-dot-".
2095
- So "ranksep: 2" will become "x-dot-ranksep: 2".
2096
-
2097
- =head1 METHODS
2098
-
2099
- C<Graph::Easy::Parser::Graphviz> supports the same methods
2100
- as its parent class C<Graph::Easy::Parser>:
2101
-
2102
- =head2 new()
2103
-
2104
- use Graph::Easy::Parser::Graphviz;
2105
- my $parser = Graph::Easy::Parser::Graphviz->new();
2106
-
2107
- Creates a new parser object. There are two valid parameters:
2108
-
2109
- debug
2110
- fatal_errors
2111
-
2112
- Both take either a false or a true value.
2113
-
2114
- my $parser = Graph::Easy::Parser::Graphviz->new( debug => 1 );
2115
- $parser->from_text('digraph G { A -> B }');
2116
-
2117
- =head2 reset()
2118
-
2119
- $parser->reset();
2120
-
2121
- Reset the status of the parser, clear errors etc. Automatically called
2122
- when you call any of the C<from_XXX()> methods below.
2123
-
2124
- =head2 use_class()
2125
-
2126
- $parser->use_class('node', 'Graph::Easy::MyNode');
2127
-
2128
- Override the class to be used to constructs objects while parsing.
2129
-
2130
- See L<Graph::Easy::Parser> for further information.
2131
-
2132
- =head2 from_text()
2133
-
2134
- my $graph = $parser->from_text( $text );
2135
-
2136
- Create a L<Graph::Easy|Graph::Easy> object from the textual description in C<$text>.
2137
-
2138
- Returns undef for error, you can find out what the error was
2139
- with L<error()>.
2140
-
2141
- This method will reset any previous error, and thus the C<$parser> object
2142
- can be re-used to parse different texts by just calling C<from_text()>
2143
- multiple times.
2144
-
2145
- =head2 from_file()
2146
-
2147
- my $graph = $parser->from_file( $filename );
2148
- my $graph = Graph::Easy::Parser->from_file( $filename );
2149
-
2150
- Creates a L<Graph::Easy|Graph::Easy> object from the textual description in the file
2151
- C<$filename>.
2152
-
2153
- The second calling style will create a temporary parser object,
2154
- parse the file and return the resulting C<Graph::Easy> object.
2155
-
2156
- Returns undef for error, you can find out what the error was
2157
- with L<error()> when using the first calling style.
2158
-
2159
- =head2 error()
2160
-
2161
- my $error = $parser->error();
2162
-
2163
- Returns the last error, or the empty string if no error occured.
2164
-
2165
- =head2 parse_error()
2166
-
2167
- $parser->parse_error( $msg_nr, @params);
2168
-
2169
- Sets an error message from a message number and replaces embedded
2170
- templates like C<##param1##> with the passed parameters.
2171
-
2172
- =head1 CAVEATS
2173
-
2174
- The parser has problems with the following things:
2175
-
2176
- =over 12
2177
-
2178
- =item encoding and charset attribute
2179
-
2180
- The parser assumes the input to be C<utf-8>. Input files in <code>Latin1</code>
2181
- are not parsed properly, even when they have the charset attribute set.
2182
-
2183
- =item shape=record
2184
-
2185
- Nodes with shape record are only parsed properly when the label does not
2186
- contain groups delimited by "{" and "}", so the following is parsed
2187
- wrongly:
2188
-
2189
- node1 [ shape=record, label="A|{B|C}" ]
2190
-
2191
- =item default shape
2192
-
2193
- The default shape for a node is 'rect', opposed to 'circle' as dot renders
2194
- nodes.
2195
-
2196
- =item attributes
2197
-
2198
- Some attributes are B<not> remapped properly to what Graph::Easy expects, thus
2199
- losing information, either because Graph::Easy doesn't support this feature
2200
- yet, or because the mapping is incomplete.
2201
-
2202
- Some attributes meant only for nodes or edges etc. might be incorrectly applied
2203
- to other objects, resulting in unnec. warnings while parsing.
2204
-
2205
- Attributes not valid in the original DOT language are silently ignored by dot,
2206
- but result in a warning when parsing under Graph::Easy. This helps catching all
2207
- these pesky misspellings, but it's not yet possible to disable these warnings.
2208
-
2209
- =item comments
2210
-
2211
- Comments written in the source code itself are discarded. If you want to have
2212
- comments on the graph, clusters, nodes or edges, use the attribute C<comment>.
2213
- These are correctly read in and stored, and then output into the different
2214
- formats, too.
2215
-
2216
- =back
2217
-
2218
- =head1 EXPORT
2219
-
2220
- Exports nothing.
2221
-
2222
- =head1 SEE ALSO
2223
-
2224
- L<Graph::Easy>, L<Graph::Reader::Dot>.
2225
-
2226
- =head1 AUTHOR
2227
-
2228
- Copyright (C) 2005 - 2007 by Tels L<http://bloodgate.com>
2229
-
2230
- See the LICENSE file for information.
2231
-
2232
- =cut