rroonga 6.0.7-x64-mingw32 → 6.0.9-x64-mingw32

Sign up to get free protection for your applications and to get access to all the features.
Files changed (797) hide show
  1. checksums.yaml +4 -4
  2. data/doc/text/cross-compile.md +24 -23
  3. data/doc/text/news.md +10 -0
  4. data/ext/groonga/rb-grn-database.c +33 -0
  5. data/ext/groonga/rb-grn-id.c +19 -0
  6. data/ext/groonga/rb-grn-table.c +3 -1
  7. data/ext/groonga/rb-grn.h +1 -1
  8. data/lib/2.1/groonga.so +0 -0
  9. data/lib/2.2/groonga.so +0 -0
  10. data/lib/2.3/groonga.so +0 -0
  11. data/rroonga-build.rb +3 -3
  12. data/test/test-database.rb +21 -1
  13. data/test/test-id.rb +16 -0
  14. data/vendor/local/bin/grndb.exe +0 -0
  15. data/vendor/local/bin/groonga-benchmark.exe +0 -0
  16. data/vendor/local/bin/groonga-suggest-create-dataset.exe +0 -0
  17. data/vendor/local/bin/groonga.exe +0 -0
  18. data/vendor/local/bin/libgroonga-0.dll +0 -0
  19. data/vendor/local/bin/libmecab-2.dll +0 -0
  20. data/vendor/local/bin/libmsgpackc.dll +0 -0
  21. data/vendor/local/bin/libonig-5.dll +0 -0
  22. data/vendor/local/bin/libpcre-1.dll +0 -0
  23. data/vendor/local/bin/libpcrecpp-0.dll +0 -0
  24. data/vendor/local/bin/libpcreposix-0.dll +0 -0
  25. data/vendor/local/bin/lz4.exe +0 -0
  26. data/vendor/local/bin/lz4c.exe +0 -0
  27. data/vendor/local/bin/lz4cat +0 -0
  28. data/vendor/local/bin/mecab.exe +0 -0
  29. data/vendor/local/bin/pcre-config +133 -0
  30. data/vendor/local/bin/pcregrep.exe +0 -0
  31. data/vendor/local/bin/pcretest.exe +0 -0
  32. data/vendor/local/bin/zlib1.dll +0 -0
  33. data/vendor/local/include/groonga/groonga/db.h +22 -0
  34. data/vendor/local/include/groonga/groonga/groonga.h +21 -1
  35. data/vendor/local/include/groonga/groonga/id.h +1 -0
  36. data/vendor/local/include/pcre.h +677 -0
  37. data/vendor/local/include/pcre_scanner.h +172 -0
  38. data/vendor/local/include/pcre_stringpiece.h +180 -0
  39. data/vendor/local/include/pcrecpp.h +710 -0
  40. data/vendor/local/include/pcrecpparg.h +174 -0
  41. data/vendor/local/include/pcreposix.h +146 -0
  42. data/vendor/local/lib/groonga/plugins/functions/number.a +0 -0
  43. data/vendor/local/lib/groonga/plugins/functions/number.dll +0 -0
  44. data/vendor/local/lib/groonga/plugins/functions/number.dll.a +0 -0
  45. data/vendor/local/lib/groonga/plugins/functions/string.a +0 -0
  46. data/vendor/local/lib/groonga/plugins/functions/string.dll +0 -0
  47. data/vendor/local/lib/groonga/plugins/functions/string.dll.a +0 -0
  48. data/vendor/local/lib/groonga/plugins/functions/time.a +0 -0
  49. data/vendor/local/lib/groonga/plugins/functions/time.dll +0 -0
  50. data/vendor/local/lib/groonga/plugins/functions/time.dll.a +0 -0
  51. data/vendor/local/lib/groonga/plugins/functions/vector.a +0 -0
  52. data/vendor/local/lib/groonga/plugins/functions/vector.dll +0 -0
  53. data/vendor/local/lib/groonga/plugins/functions/vector.dll.a +0 -0
  54. data/vendor/local/lib/groonga/plugins/query_expanders/tsv.a +0 -0
  55. data/vendor/local/lib/groonga/plugins/query_expanders/tsv.dll +0 -0
  56. data/vendor/local/lib/groonga/plugins/query_expanders/tsv.dll.a +0 -0
  57. data/vendor/local/lib/groonga/plugins/sharding/logical_table_remove.rb +253 -23
  58. data/vendor/local/lib/groonga/plugins/suggest/suggest.a +0 -0
  59. data/vendor/local/lib/groonga/plugins/suggest/suggest.dll +0 -0
  60. data/vendor/local/lib/groonga/plugins/suggest/suggest.dll.a +0 -0
  61. data/vendor/local/lib/groonga/plugins/table/table.a +0 -0
  62. data/vendor/local/lib/groonga/plugins/table/table.dll +0 -0
  63. data/vendor/local/lib/groonga/plugins/table/table.dll.a +0 -0
  64. data/vendor/local/lib/groonga/plugins/token_filters/stop_word.a +0 -0
  65. data/vendor/local/lib/groonga/plugins/token_filters/stop_word.dll +0 -0
  66. data/vendor/local/lib/groonga/plugins/token_filters/stop_word.dll.a +0 -0
  67. data/vendor/local/lib/groonga/plugins/tokenizers/mecab.a +0 -0
  68. data/vendor/local/lib/groonga/plugins/tokenizers/mecab.dll +0 -0
  69. data/vendor/local/lib/groonga/plugins/tokenizers/mecab.dll.a +0 -0
  70. data/vendor/local/lib/groonga/scripts/ruby/context.rb +19 -0
  71. data/vendor/local/lib/groonga/scripts/ruby/context/rc.rb +12 -4
  72. data/vendor/local/lib/groonga/scripts/ruby/database.rb +36 -18
  73. data/vendor/local/lib/groonga/scripts/ruby/scan_info_data.rb +13 -10
  74. data/vendor/local/lib/libgroonga.a +0 -0
  75. data/vendor/local/lib/libgroonga.dll.a +0 -0
  76. data/vendor/local/lib/liblz4.a +0 -0
  77. data/vendor/local/lib/liblz4.dll +0 -0
  78. data/vendor/local/lib/liblz4.dll.1 +0 -0
  79. data/vendor/local/lib/liblz4.dll.1.5.0 +0 -0
  80. data/vendor/local/lib/libmecab.a +0 -0
  81. data/vendor/local/lib/libmecab.dll.a +0 -0
  82. data/vendor/local/lib/libmsgpackc.a +0 -0
  83. data/vendor/local/lib/libmsgpackc.dll.a +0 -0
  84. data/vendor/local/lib/libonig.a +0 -0
  85. data/vendor/local/lib/libonig.dll.a +0 -0
  86. data/vendor/local/lib/libpcre.a +0 -0
  87. data/vendor/local/lib/libpcre.dll.a +0 -0
  88. data/vendor/local/lib/libpcre.la +41 -0
  89. data/vendor/local/lib/libpcrecpp.a +0 -0
  90. data/vendor/local/lib/libpcrecpp.dll.a +0 -0
  91. data/vendor/local/lib/libpcrecpp.la +41 -0
  92. data/vendor/local/lib/libpcreposix.a +0 -0
  93. data/vendor/local/lib/libpcreposix.dll.a +0 -0
  94. data/vendor/local/lib/libpcreposix.la +41 -0
  95. data/vendor/local/lib/libz.a +0 -0
  96. data/vendor/local/lib/libz.dll.a +0 -0
  97. data/vendor/local/lib/pkgconfig/groonga.pc +2 -2
  98. data/vendor/local/lib/pkgconfig/libpcre.pc +13 -0
  99. data/vendor/local/lib/pkgconfig/libpcrecpp.pc +12 -0
  100. data/vendor/local/lib/pkgconfig/libpcreposix.pc +13 -0
  101. data/vendor/local/libexec/mecab/mecab-cost-train.exe +0 -0
  102. data/vendor/local/libexec/mecab/mecab-dict-gen.exe +0 -0
  103. data/vendor/local/libexec/mecab/mecab-dict-index.exe +0 -0
  104. data/vendor/local/libexec/mecab/mecab-system-eval.exe +0 -0
  105. data/vendor/local/libexec/mecab/mecab-test-gen.exe +0 -0
  106. data/vendor/local/sbin/groonga-httpd.exe +0 -0
  107. data/vendor/local/share/doc/groonga/en/html/.buildinfo +1 -1
  108. data/vendor/local/share/doc/groonga/en/html/_sources/install/centos.txt +3 -3
  109. data/vendor/local/share/doc/groonga/en/html/_sources/install/debian.txt +3 -3
  110. data/vendor/local/share/doc/groonga/en/html/_sources/install/fedora.txt +3 -3
  111. data/vendor/local/share/doc/groonga/en/html/_sources/install/mac_os_x.txt +3 -3
  112. data/vendor/local/share/doc/groonga/en/html/_sources/install/others.txt +3 -3
  113. data/vendor/local/share/doc/groonga/en/html/_sources/install/solaris.txt +3 -3
  114. data/vendor/local/share/doc/groonga/en/html/_sources/install/ubuntu.txt +3 -3
  115. data/vendor/local/share/doc/groonga/en/html/_sources/install/windows.txt +9 -9
  116. data/vendor/local/share/doc/groonga/en/html/_sources/limitations.txt +24 -5
  117. data/vendor/local/share/doc/groonga/en/html/_sources/news.txt +156 -4
  118. data/vendor/local/share/doc/groonga/en/html/_sources/reference/commands/lock_acquire.txt +1 -1
  119. data/vendor/local/share/doc/groonga/en/html/_sources/reference/commands/lock_release.txt +1 -1
  120. data/vendor/local/share/doc/groonga/en/html/_sources/reference/commands/logical_table_remove.txt +86 -0
  121. data/vendor/local/share/doc/groonga/en/html/_sources/reference/commands/object_list.txt +23 -11
  122. data/vendor/local/share/doc/groonga/en/html/_sources/reference/commands/table_copy.txt +64 -0
  123. data/vendor/local/share/doc/groonga/en/html/_sources/reference/tables.txt +88 -45
  124. data/vendor/local/share/doc/groonga/en/html/characteristic.html +5 -5
  125. data/vendor/local/share/doc/groonga/en/html/client.html +5 -5
  126. data/vendor/local/share/doc/groonga/en/html/community.html +5 -5
  127. data/vendor/local/share/doc/groonga/en/html/contribution.html +5 -5
  128. data/vendor/local/share/doc/groonga/en/html/contribution/development.html +5 -5
  129. data/vendor/local/share/doc/groonga/en/html/contribution/development/build.html +5 -5
  130. data/vendor/local/share/doc/groonga/en/html/contribution/development/build/unix_autotools.html +5 -5
  131. data/vendor/local/share/doc/groonga/en/html/contribution/development/build/unix_cmake.html +5 -5
  132. data/vendor/local/share/doc/groonga/en/html/contribution/development/build/windows_cmake.html +5 -5
  133. data/vendor/local/share/doc/groonga/en/html/contribution/development/com.html +5 -5
  134. data/vendor/local/share/doc/groonga/en/html/contribution/development/cooperation.html +5 -5
  135. data/vendor/local/share/doc/groonga/en/html/contribution/development/query.html +5 -5
  136. data/vendor/local/share/doc/groonga/en/html/contribution/development/release.html +5 -5
  137. data/vendor/local/share/doc/groonga/en/html/contribution/development/repository.html +5 -5
  138. data/vendor/local/share/doc/groonga/en/html/contribution/development/test.html +5 -5
  139. data/vendor/local/share/doc/groonga/en/html/contribution/documentation.html +5 -5
  140. data/vendor/local/share/doc/groonga/en/html/contribution/documentation/c-api.html +5 -5
  141. data/vendor/local/share/doc/groonga/en/html/contribution/documentation/i18n.html +5 -5
  142. data/vendor/local/share/doc/groonga/en/html/contribution/documentation/introduction.html +5 -5
  143. data/vendor/local/share/doc/groonga/en/html/contribution/report.html +5 -5
  144. data/vendor/local/share/doc/groonga/en/html/development.html +5 -5
  145. data/vendor/local/share/doc/groonga/en/html/development/travis-ci.html +5 -5
  146. data/vendor/local/share/doc/groonga/en/html/genindex.html +5 -5
  147. data/vendor/local/share/doc/groonga/en/html/index.html +15 -14
  148. data/vendor/local/share/doc/groonga/en/html/install.html +5 -5
  149. data/vendor/local/share/doc/groonga/en/html/install/centos.html +8 -8
  150. data/vendor/local/share/doc/groonga/en/html/install/debian.html +8 -8
  151. data/vendor/local/share/doc/groonga/en/html/install/fedora.html +8 -8
  152. data/vendor/local/share/doc/groonga/en/html/install/mac_os_x.html +8 -8
  153. data/vendor/local/share/doc/groonga/en/html/install/others.html +8 -8
  154. data/vendor/local/share/doc/groonga/en/html/install/solaris.html +8 -8
  155. data/vendor/local/share/doc/groonga/en/html/install/ubuntu.html +8 -8
  156. data/vendor/local/share/doc/groonga/en/html/install/windows.html +14 -14
  157. data/vendor/local/share/doc/groonga/en/html/limitations.html +28 -9
  158. data/vendor/local/share/doc/groonga/en/html/news.html +196 -61
  159. data/vendor/local/share/doc/groonga/en/html/news/0.x.html +5 -5
  160. data/vendor/local/share/doc/groonga/en/html/news/1.0.x.html +5 -5
  161. data/vendor/local/share/doc/groonga/en/html/news/1.1.x.html +5 -5
  162. data/vendor/local/share/doc/groonga/en/html/news/1.2.x.html +5 -5
  163. data/vendor/local/share/doc/groonga/en/html/news/1.3.x.html +5 -5
  164. data/vendor/local/share/doc/groonga/en/html/news/2.x.html +5 -5
  165. data/vendor/local/share/doc/groonga/en/html/news/3.x.html +5 -5
  166. data/vendor/local/share/doc/groonga/en/html/news/4.x.html +5 -5
  167. data/vendor/local/share/doc/groonga/en/html/news/5.x.html +5 -5
  168. data/vendor/local/share/doc/groonga/en/html/news/senna.html +5 -5
  169. data/vendor/local/share/doc/groonga/en/html/objects.inv +0 -0
  170. data/vendor/local/share/doc/groonga/en/html/reference.html +15 -14
  171. data/vendor/local/share/doc/groonga/en/html/reference/alias.html +5 -5
  172. data/vendor/local/share/doc/groonga/en/html/reference/api.html +5 -5
  173. data/vendor/local/share/doc/groonga/en/html/reference/api/global_configurations.html +5 -5
  174. data/vendor/local/share/doc/groonga/en/html/reference/api/grn_cache.html +5 -5
  175. data/vendor/local/share/doc/groonga/en/html/reference/api/grn_column.html +5 -5
  176. data/vendor/local/share/doc/groonga/en/html/reference/api/grn_command_version.html +5 -5
  177. data/vendor/local/share/doc/groonga/en/html/reference/api/grn_content_type.html +5 -5
  178. data/vendor/local/share/doc/groonga/en/html/reference/api/grn_ctx.html +5 -5
  179. data/vendor/local/share/doc/groonga/en/html/reference/api/grn_db.html +5 -5
  180. data/vendor/local/share/doc/groonga/en/html/reference/api/grn_encoding.html +5 -5
  181. data/vendor/local/share/doc/groonga/en/html/reference/api/grn_expr.html +5 -5
  182. data/vendor/local/share/doc/groonga/en/html/reference/api/grn_geo.html +5 -5
  183. data/vendor/local/share/doc/groonga/en/html/reference/api/grn_hook.html +5 -5
  184. data/vendor/local/share/doc/groonga/en/html/reference/api/grn_ii.html +5 -5
  185. data/vendor/local/share/doc/groonga/en/html/reference/api/grn_index_cursor.html +5 -5
  186. data/vendor/local/share/doc/groonga/en/html/reference/api/grn_info.html +5 -5
  187. data/vendor/local/share/doc/groonga/en/html/reference/api/grn_match_escalation.html +5 -5
  188. data/vendor/local/share/doc/groonga/en/html/reference/api/grn_obj.html +5 -5
  189. data/vendor/local/share/doc/groonga/en/html/reference/api/grn_proc.html +5 -5
  190. data/vendor/local/share/doc/groonga/en/html/reference/api/grn_search.html +5 -5
  191. data/vendor/local/share/doc/groonga/en/html/reference/api/grn_table.html +5 -5
  192. data/vendor/local/share/doc/groonga/en/html/reference/api/grn_table_cursor.html +5 -5
  193. data/vendor/local/share/doc/groonga/en/html/reference/api/grn_thread.html +5 -5
  194. data/vendor/local/share/doc/groonga/en/html/reference/api/grn_type.html +5 -5
  195. data/vendor/local/share/doc/groonga/en/html/reference/api/grn_user_data.html +5 -5
  196. data/vendor/local/share/doc/groonga/en/html/reference/api/overview.html +5 -5
  197. data/vendor/local/share/doc/groonga/en/html/reference/api/plugin.html +5 -5
  198. data/vendor/local/share/doc/groonga/en/html/reference/cast.html +5 -5
  199. data/vendor/local/share/doc/groonga/en/html/reference/column.html +5 -5
  200. data/vendor/local/share/doc/groonga/en/html/reference/columns/index.html +5 -5
  201. data/vendor/local/share/doc/groonga/en/html/reference/columns/pseudo.html +5 -5
  202. data/vendor/local/share/doc/groonga/en/html/reference/columns/scalar.html +5 -5
  203. data/vendor/local/share/doc/groonga/en/html/reference/columns/vector.html +5 -5
  204. data/vendor/local/share/doc/groonga/en/html/reference/command.html +15 -14
  205. data/vendor/local/share/doc/groonga/en/html/reference/command/command_version.html +5 -5
  206. data/vendor/local/share/doc/groonga/en/html/reference/command/output_format.html +5 -5
  207. data/vendor/local/share/doc/groonga/en/html/reference/command/pretty_print.html +5 -5
  208. data/vendor/local/share/doc/groonga/en/html/reference/command/request_id.html +5 -5
  209. data/vendor/local/share/doc/groonga/en/html/reference/command/request_timeout.html +5 -5
  210. data/vendor/local/share/doc/groonga/en/html/reference/command/return_code.html +5 -5
  211. data/vendor/local/share/doc/groonga/en/html/reference/commands/cache_limit.html +5 -5
  212. data/vendor/local/share/doc/groonga/en/html/reference/commands/check.html +5 -5
  213. data/vendor/local/share/doc/groonga/en/html/reference/commands/clearlock.html +5 -5
  214. data/vendor/local/share/doc/groonga/en/html/reference/commands/column_copy.html +5 -5
  215. data/vendor/local/share/doc/groonga/en/html/reference/commands/column_create.html +5 -5
  216. data/vendor/local/share/doc/groonga/en/html/reference/commands/column_list.html +5 -5
  217. data/vendor/local/share/doc/groonga/en/html/reference/commands/column_remove.html +5 -5
  218. data/vendor/local/share/doc/groonga/en/html/reference/commands/column_rename.html +5 -5
  219. data/vendor/local/share/doc/groonga/en/html/reference/commands/config_delete.html +5 -5
  220. data/vendor/local/share/doc/groonga/en/html/reference/commands/config_get.html +5 -5
  221. data/vendor/local/share/doc/groonga/en/html/reference/commands/config_set.html +5 -5
  222. data/vendor/local/share/doc/groonga/en/html/reference/commands/database_unmap.html +5 -5
  223. data/vendor/local/share/doc/groonga/en/html/reference/commands/define_selector.html +5 -5
  224. data/vendor/local/share/doc/groonga/en/html/reference/commands/defrag.html +5 -5
  225. data/vendor/local/share/doc/groonga/en/html/reference/commands/delete.html +5 -5
  226. data/vendor/local/share/doc/groonga/en/html/reference/commands/dump.html +5 -5
  227. data/vendor/local/share/doc/groonga/en/html/reference/commands/io_flush.html +5 -5
  228. data/vendor/local/share/doc/groonga/en/html/reference/commands/load.html +5 -5
  229. data/vendor/local/share/doc/groonga/en/html/reference/commands/lock_acquire.html +6 -6
  230. data/vendor/local/share/doc/groonga/en/html/reference/commands/lock_clear.html +5 -5
  231. data/vendor/local/share/doc/groonga/en/html/reference/commands/lock_release.html +6 -6
  232. data/vendor/local/share/doc/groonga/en/html/reference/commands/log_level.html +5 -5
  233. data/vendor/local/share/doc/groonga/en/html/reference/commands/log_put.html +5 -5
  234. data/vendor/local/share/doc/groonga/en/html/reference/commands/log_reopen.html +5 -5
  235. data/vendor/local/share/doc/groonga/en/html/reference/commands/logical_count.html +5 -5
  236. data/vendor/local/share/doc/groonga/en/html/reference/commands/logical_parameters.html +5 -5
  237. data/vendor/local/share/doc/groonga/en/html/reference/commands/logical_range_filter.html +5 -5
  238. data/vendor/local/share/doc/groonga/en/html/reference/commands/logical_select.html +5 -5
  239. data/vendor/local/share/doc/groonga/en/html/reference/commands/logical_shard_list.html +5 -5
  240. data/vendor/local/share/doc/groonga/en/html/reference/commands/logical_table_remove.html +98 -8
  241. data/vendor/local/share/doc/groonga/en/html/reference/commands/normalize.html +5 -5
  242. data/vendor/local/share/doc/groonga/en/html/reference/commands/normalizer_list.html +5 -5
  243. data/vendor/local/share/doc/groonga/en/html/reference/commands/object_exist.html +5 -5
  244. data/vendor/local/share/doc/groonga/en/html/reference/commands/object_inspect.html +5 -5
  245. data/vendor/local/share/doc/groonga/en/html/reference/commands/object_list.html +32 -18
  246. data/vendor/local/share/doc/groonga/en/html/reference/commands/object_remove.html +5 -5
  247. data/vendor/local/share/doc/groonga/en/html/reference/commands/plugin_register.html +5 -5
  248. data/vendor/local/share/doc/groonga/en/html/reference/commands/plugin_unregister.html +5 -5
  249. data/vendor/local/share/doc/groonga/en/html/reference/commands/query_expand.html +5 -5
  250. data/vendor/local/share/doc/groonga/en/html/reference/commands/quit.html +5 -5
  251. data/vendor/local/share/doc/groonga/en/html/reference/commands/range_filter.html +5 -5
  252. data/vendor/local/share/doc/groonga/en/html/reference/commands/register.html +5 -5
  253. data/vendor/local/share/doc/groonga/en/html/reference/commands/reindex.html +5 -5
  254. data/vendor/local/share/doc/groonga/en/html/reference/commands/request_cancel.html +5 -5
  255. data/vendor/local/share/doc/groonga/en/html/reference/commands/ruby_eval.html +5 -5
  256. data/vendor/local/share/doc/groonga/en/html/reference/commands/ruby_load.html +5 -5
  257. data/vendor/local/share/doc/groonga/en/html/reference/commands/schema.html +5 -5
  258. data/vendor/local/share/doc/groonga/en/html/reference/commands/select.html +5 -5
  259. data/vendor/local/share/doc/groonga/en/html/reference/commands/shutdown.html +5 -5
  260. data/vendor/local/share/doc/groonga/en/html/reference/commands/status.html +5 -5
  261. data/vendor/local/share/doc/groonga/en/html/reference/commands/suggest.html +10 -10
  262. data/vendor/local/share/doc/groonga/en/html/reference/commands/table_copy.html +200 -0
  263. data/vendor/local/share/doc/groonga/en/html/reference/commands/table_create.html +52 -52
  264. data/vendor/local/share/doc/groonga/en/html/reference/commands/table_list.html +25 -25
  265. data/vendor/local/share/doc/groonga/en/html/reference/commands/table_remove.html +41 -41
  266. data/vendor/local/share/doc/groonga/en/html/reference/commands/table_rename.html +31 -31
  267. data/vendor/local/share/doc/groonga/en/html/reference/commands/table_tokenize.html +41 -41
  268. data/vendor/local/share/doc/groonga/en/html/reference/commands/thread_limit.html +31 -31
  269. data/vendor/local/share/doc/groonga/en/html/reference/commands/tokenize.html +43 -43
  270. data/vendor/local/share/doc/groonga/en/html/reference/commands/tokenizer_list.html +25 -25
  271. data/vendor/local/share/doc/groonga/en/html/reference/commands/truncate.html +25 -25
  272. data/vendor/local/share/doc/groonga/en/html/reference/configuration.html +5 -5
  273. data/vendor/local/share/doc/groonga/en/html/reference/executables.html +5 -5
  274. data/vendor/local/share/doc/groonga/en/html/reference/executables/grndb.html +5 -5
  275. data/vendor/local/share/doc/groonga/en/html/reference/executables/grnslap.html +5 -5
  276. data/vendor/local/share/doc/groonga/en/html/reference/executables/groonga-benchmark.html +5 -5
  277. data/vendor/local/share/doc/groonga/en/html/reference/executables/groonga-httpd.html +5 -5
  278. data/vendor/local/share/doc/groonga/en/html/reference/executables/groonga-server-http.html +5 -5
  279. data/vendor/local/share/doc/groonga/en/html/reference/executables/groonga-suggest-create-dataset.html +5 -5
  280. data/vendor/local/share/doc/groonga/en/html/reference/executables/groonga-suggest-httpd.html +5 -5
  281. data/vendor/local/share/doc/groonga/en/html/reference/executables/groonga-suggest-learner.html +5 -5
  282. data/vendor/local/share/doc/groonga/en/html/reference/executables/groonga.html +5 -5
  283. data/vendor/local/share/doc/groonga/en/html/reference/function.html +5 -5
  284. data/vendor/local/share/doc/groonga/en/html/reference/functions/between.html +5 -5
  285. data/vendor/local/share/doc/groonga/en/html/reference/functions/edit_distance.html +5 -5
  286. data/vendor/local/share/doc/groonga/en/html/reference/functions/fuzzy_search.html +5 -5
  287. data/vendor/local/share/doc/groonga/en/html/reference/functions/geo_distance.html +5 -5
  288. data/vendor/local/share/doc/groonga/en/html/reference/functions/geo_in_circle.html +5 -5
  289. data/vendor/local/share/doc/groonga/en/html/reference/functions/geo_in_rectangle.html +5 -5
  290. data/vendor/local/share/doc/groonga/en/html/reference/functions/highlight_full.html +5 -5
  291. data/vendor/local/share/doc/groonga/en/html/reference/functions/highlight_html.html +5 -5
  292. data/vendor/local/share/doc/groonga/en/html/reference/functions/html_untag.html +5 -5
  293. data/vendor/local/share/doc/groonga/en/html/reference/functions/in_values.html +5 -5
  294. data/vendor/local/share/doc/groonga/en/html/reference/functions/now.html +5 -5
  295. data/vendor/local/share/doc/groonga/en/html/reference/functions/number_classify.html +5 -5
  296. data/vendor/local/share/doc/groonga/en/html/reference/functions/prefix_rk_search.html +5 -5
  297. data/vendor/local/share/doc/groonga/en/html/reference/functions/query.html +5 -5
  298. data/vendor/local/share/doc/groonga/en/html/reference/functions/rand.html +5 -5
  299. data/vendor/local/share/doc/groonga/en/html/reference/functions/record_number.html +5 -5
  300. data/vendor/local/share/doc/groonga/en/html/reference/functions/snippet_html.html +5 -5
  301. data/vendor/local/share/doc/groonga/en/html/reference/functions/string_substring.html +5 -5
  302. data/vendor/local/share/doc/groonga/en/html/reference/functions/sub_filter.html +5 -5
  303. data/vendor/local/share/doc/groonga/en/html/reference/functions/time_classify_day.html +5 -5
  304. data/vendor/local/share/doc/groonga/en/html/reference/functions/time_classify_hour.html +5 -5
  305. data/vendor/local/share/doc/groonga/en/html/reference/functions/time_classify_minute.html +5 -5
  306. data/vendor/local/share/doc/groonga/en/html/reference/functions/time_classify_month.html +5 -5
  307. data/vendor/local/share/doc/groonga/en/html/reference/functions/time_classify_second.html +5 -5
  308. data/vendor/local/share/doc/groonga/en/html/reference/functions/time_classify_week.html +5 -5
  309. data/vendor/local/share/doc/groonga/en/html/reference/functions/time_classify_year.html +5 -5
  310. data/vendor/local/share/doc/groonga/en/html/reference/functions/vector_size.html +5 -5
  311. data/vendor/local/share/doc/groonga/en/html/reference/functions/vector_slice.html +5 -5
  312. data/vendor/local/share/doc/groonga/en/html/reference/grn_expr.html +5 -5
  313. data/vendor/local/share/doc/groonga/en/html/reference/grn_expr/query_syntax.html +5 -5
  314. data/vendor/local/share/doc/groonga/en/html/reference/grn_expr/script_syntax.html +5 -5
  315. data/vendor/local/share/doc/groonga/en/html/reference/indexing.html +5 -5
  316. data/vendor/local/share/doc/groonga/en/html/reference/log.html +5 -5
  317. data/vendor/local/share/doc/groonga/en/html/reference/normalizers.html +5 -5
  318. data/vendor/local/share/doc/groonga/en/html/reference/operations.html +5 -5
  319. data/vendor/local/share/doc/groonga/en/html/reference/operations/geolocation_search.html +5 -5
  320. data/vendor/local/share/doc/groonga/en/html/reference/operations/prefix_rk_search.html +5 -5
  321. data/vendor/local/share/doc/groonga/en/html/reference/output.html +5 -5
  322. data/vendor/local/share/doc/groonga/en/html/reference/query_expanders.html +5 -5
  323. data/vendor/local/share/doc/groonga/en/html/reference/query_expanders/tsv.html +5 -5
  324. data/vendor/local/share/doc/groonga/en/html/reference/regular_expression.html +5 -5
  325. data/vendor/local/share/doc/groonga/en/html/reference/scorer.html +5 -5
  326. data/vendor/local/share/doc/groonga/en/html/reference/scorers/scorer_tf_at_most.html +5 -5
  327. data/vendor/local/share/doc/groonga/en/html/reference/scorers/scorer_tf_idf.html +5 -5
  328. data/vendor/local/share/doc/groonga/en/html/reference/sharding.html +5 -5
  329. data/vendor/local/share/doc/groonga/en/html/reference/suggest.html +5 -5
  330. data/vendor/local/share/doc/groonga/en/html/reference/suggest/completion.html +5 -5
  331. data/vendor/local/share/doc/groonga/en/html/reference/suggest/correction.html +5 -5
  332. data/vendor/local/share/doc/groonga/en/html/reference/suggest/introduction.html +5 -5
  333. data/vendor/local/share/doc/groonga/en/html/reference/suggest/suggestion.html +5 -5
  334. data/vendor/local/share/doc/groonga/en/html/reference/tables.html +41 -34
  335. data/vendor/local/share/doc/groonga/en/html/reference/token_filters.html +5 -5
  336. data/vendor/local/share/doc/groonga/en/html/reference/tokenizers.html +5 -5
  337. data/vendor/local/share/doc/groonga/en/html/reference/tuning.html +5 -5
  338. data/vendor/local/share/doc/groonga/en/html/reference/types.html +9 -9
  339. data/vendor/local/share/doc/groonga/en/html/search.html +5 -5
  340. data/vendor/local/share/doc/groonga/en/html/searchindex.js +1 -1
  341. data/vendor/local/share/doc/groonga/en/html/server.html +5 -5
  342. data/vendor/local/share/doc/groonga/en/html/server/gqtp.html +5 -5
  343. data/vendor/local/share/doc/groonga/en/html/server/http.html +5 -5
  344. data/vendor/local/share/doc/groonga/en/html/server/http/comparison.html +5 -5
  345. data/vendor/local/share/doc/groonga/en/html/server/http/groonga-httpd.html +5 -5
  346. data/vendor/local/share/doc/groonga/en/html/server/http/groonga.html +5 -5
  347. data/vendor/local/share/doc/groonga/en/html/server/memcached.html +5 -5
  348. data/vendor/local/share/doc/groonga/en/html/server/package.html +5 -5
  349. data/vendor/local/share/doc/groonga/en/html/spec.html +5 -5
  350. data/vendor/local/share/doc/groonga/en/html/spec/gqtp.html +5 -5
  351. data/vendor/local/share/doc/groonga/en/html/spec/search.html +5 -5
  352. data/vendor/local/share/doc/groonga/en/html/troubleshooting.html +5 -5
  353. data/vendor/local/share/doc/groonga/en/html/troubleshooting/different_results_with_the_same_keyword.html +5 -5
  354. data/vendor/local/share/doc/groonga/en/html/troubleshooting/mmap_cannot_allocate_memory.html +5 -5
  355. data/vendor/local/share/doc/groonga/en/html/tutorial.html +5 -5
  356. data/vendor/local/share/doc/groonga/en/html/tutorial/data.html +5 -5
  357. data/vendor/local/share/doc/groonga/en/html/tutorial/drilldown.html +5 -5
  358. data/vendor/local/share/doc/groonga/en/html/tutorial/index.html +5 -5
  359. data/vendor/local/share/doc/groonga/en/html/tutorial/introduction.html +5 -5
  360. data/vendor/local/share/doc/groonga/en/html/tutorial/lexicon.html +5 -5
  361. data/vendor/local/share/doc/groonga/en/html/tutorial/match_columns.html +5 -5
  362. data/vendor/local/share/doc/groonga/en/html/tutorial/micro_blog.html +5 -5
  363. data/vendor/local/share/doc/groonga/en/html/tutorial/network.html +5 -5
  364. data/vendor/local/share/doc/groonga/en/html/tutorial/patricia_trie.html +5 -5
  365. data/vendor/local/share/doc/groonga/en/html/tutorial/query_expansion.html +5 -5
  366. data/vendor/local/share/doc/groonga/en/html/tutorial/search.html +5 -5
  367. data/vendor/local/share/doc/groonga/ja/html/.buildinfo +1 -1
  368. data/vendor/local/share/doc/groonga/ja/html/_sources/install/centos.txt +3 -3
  369. data/vendor/local/share/doc/groonga/ja/html/_sources/install/debian.txt +3 -3
  370. data/vendor/local/share/doc/groonga/ja/html/_sources/install/fedora.txt +3 -3
  371. data/vendor/local/share/doc/groonga/ja/html/_sources/install/mac_os_x.txt +3 -3
  372. data/vendor/local/share/doc/groonga/ja/html/_sources/install/others.txt +3 -3
  373. data/vendor/local/share/doc/groonga/ja/html/_sources/install/solaris.txt +3 -3
  374. data/vendor/local/share/doc/groonga/ja/html/_sources/install/ubuntu.txt +3 -3
  375. data/vendor/local/share/doc/groonga/ja/html/_sources/install/windows.txt +9 -9
  376. data/vendor/local/share/doc/groonga/ja/html/_sources/limitations.txt +24 -5
  377. data/vendor/local/share/doc/groonga/ja/html/_sources/news.txt +156 -4
  378. data/vendor/local/share/doc/groonga/ja/html/_sources/reference/commands/lock_acquire.txt +1 -1
  379. data/vendor/local/share/doc/groonga/ja/html/_sources/reference/commands/lock_release.txt +1 -1
  380. data/vendor/local/share/doc/groonga/ja/html/_sources/reference/commands/logical_table_remove.txt +86 -0
  381. data/vendor/local/share/doc/groonga/ja/html/_sources/reference/commands/object_list.txt +23 -11
  382. data/vendor/local/share/doc/groonga/ja/html/_sources/reference/commands/table_copy.txt +64 -0
  383. data/vendor/local/share/doc/groonga/ja/html/_sources/reference/tables.txt +88 -45
  384. data/vendor/local/share/doc/groonga/ja/html/characteristic.html +5 -5
  385. data/vendor/local/share/doc/groonga/ja/html/client.html +5 -5
  386. data/vendor/local/share/doc/groonga/ja/html/community.html +5 -5
  387. data/vendor/local/share/doc/groonga/ja/html/contribution.html +5 -5
  388. data/vendor/local/share/doc/groonga/ja/html/contribution/development.html +5 -5
  389. data/vendor/local/share/doc/groonga/ja/html/contribution/development/build.html +5 -5
  390. data/vendor/local/share/doc/groonga/ja/html/contribution/development/build/unix_autotools.html +5 -5
  391. data/vendor/local/share/doc/groonga/ja/html/contribution/development/build/unix_cmake.html +5 -5
  392. data/vendor/local/share/doc/groonga/ja/html/contribution/development/build/windows_cmake.html +5 -5
  393. data/vendor/local/share/doc/groonga/ja/html/contribution/development/com.html +5 -5
  394. data/vendor/local/share/doc/groonga/ja/html/contribution/development/cooperation.html +5 -5
  395. data/vendor/local/share/doc/groonga/ja/html/contribution/development/query.html +5 -5
  396. data/vendor/local/share/doc/groonga/ja/html/contribution/development/release.html +5 -5
  397. data/vendor/local/share/doc/groonga/ja/html/contribution/development/repository.html +5 -5
  398. data/vendor/local/share/doc/groonga/ja/html/contribution/development/test.html +5 -5
  399. data/vendor/local/share/doc/groonga/ja/html/contribution/documentation.html +5 -5
  400. data/vendor/local/share/doc/groonga/ja/html/contribution/documentation/c-api.html +5 -5
  401. data/vendor/local/share/doc/groonga/ja/html/contribution/documentation/i18n.html +5 -5
  402. data/vendor/local/share/doc/groonga/ja/html/contribution/documentation/introduction.html +5 -5
  403. data/vendor/local/share/doc/groonga/ja/html/contribution/report.html +5 -5
  404. data/vendor/local/share/doc/groonga/ja/html/development.html +5 -5
  405. data/vendor/local/share/doc/groonga/ja/html/development/travis-ci.html +5 -5
  406. data/vendor/local/share/doc/groonga/ja/html/genindex.html +5 -5
  407. data/vendor/local/share/doc/groonga/ja/html/index.html +15 -14
  408. data/vendor/local/share/doc/groonga/ja/html/install.html +5 -5
  409. data/vendor/local/share/doc/groonga/ja/html/install/centos.html +8 -8
  410. data/vendor/local/share/doc/groonga/ja/html/install/debian.html +8 -8
  411. data/vendor/local/share/doc/groonga/ja/html/install/fedora.html +8 -8
  412. data/vendor/local/share/doc/groonga/ja/html/install/mac_os_x.html +8 -8
  413. data/vendor/local/share/doc/groonga/ja/html/install/others.html +8 -8
  414. data/vendor/local/share/doc/groonga/ja/html/install/solaris.html +8 -8
  415. data/vendor/local/share/doc/groonga/ja/html/install/ubuntu.html +8 -8
  416. data/vendor/local/share/doc/groonga/ja/html/install/windows.html +14 -14
  417. data/vendor/local/share/doc/groonga/ja/html/limitations.html +21 -8
  418. data/vendor/local/share/doc/groonga/ja/html/news.html +185 -61
  419. data/vendor/local/share/doc/groonga/ja/html/news/0.x.html +5 -5
  420. data/vendor/local/share/doc/groonga/ja/html/news/1.0.x.html +5 -5
  421. data/vendor/local/share/doc/groonga/ja/html/news/1.1.x.html +5 -5
  422. data/vendor/local/share/doc/groonga/ja/html/news/1.2.x.html +5 -5
  423. data/vendor/local/share/doc/groonga/ja/html/news/1.3.x.html +5 -5
  424. data/vendor/local/share/doc/groonga/ja/html/news/2.x.html +5 -5
  425. data/vendor/local/share/doc/groonga/ja/html/news/3.x.html +5 -5
  426. data/vendor/local/share/doc/groonga/ja/html/news/4.x.html +5 -5
  427. data/vendor/local/share/doc/groonga/ja/html/news/5.x.html +5 -5
  428. data/vendor/local/share/doc/groonga/ja/html/news/senna.html +5 -5
  429. data/vendor/local/share/doc/groonga/ja/html/objects.inv +0 -0
  430. data/vendor/local/share/doc/groonga/ja/html/reference.html +15 -14
  431. data/vendor/local/share/doc/groonga/ja/html/reference/alias.html +5 -5
  432. data/vendor/local/share/doc/groonga/ja/html/reference/api.html +5 -5
  433. data/vendor/local/share/doc/groonga/ja/html/reference/api/global_configurations.html +5 -5
  434. data/vendor/local/share/doc/groonga/ja/html/reference/api/grn_cache.html +5 -5
  435. data/vendor/local/share/doc/groonga/ja/html/reference/api/grn_column.html +5 -5
  436. data/vendor/local/share/doc/groonga/ja/html/reference/api/grn_command_version.html +5 -5
  437. data/vendor/local/share/doc/groonga/ja/html/reference/api/grn_content_type.html +5 -5
  438. data/vendor/local/share/doc/groonga/ja/html/reference/api/grn_ctx.html +5 -5
  439. data/vendor/local/share/doc/groonga/ja/html/reference/api/grn_db.html +5 -5
  440. data/vendor/local/share/doc/groonga/ja/html/reference/api/grn_encoding.html +5 -5
  441. data/vendor/local/share/doc/groonga/ja/html/reference/api/grn_expr.html +5 -5
  442. data/vendor/local/share/doc/groonga/ja/html/reference/api/grn_geo.html +5 -5
  443. data/vendor/local/share/doc/groonga/ja/html/reference/api/grn_hook.html +5 -5
  444. data/vendor/local/share/doc/groonga/ja/html/reference/api/grn_ii.html +5 -5
  445. data/vendor/local/share/doc/groonga/ja/html/reference/api/grn_index_cursor.html +5 -5
  446. data/vendor/local/share/doc/groonga/ja/html/reference/api/grn_info.html +5 -5
  447. data/vendor/local/share/doc/groonga/ja/html/reference/api/grn_match_escalation.html +5 -5
  448. data/vendor/local/share/doc/groonga/ja/html/reference/api/grn_obj.html +5 -5
  449. data/vendor/local/share/doc/groonga/ja/html/reference/api/grn_proc.html +5 -5
  450. data/vendor/local/share/doc/groonga/ja/html/reference/api/grn_search.html +5 -5
  451. data/vendor/local/share/doc/groonga/ja/html/reference/api/grn_table.html +5 -5
  452. data/vendor/local/share/doc/groonga/ja/html/reference/api/grn_table_cursor.html +5 -5
  453. data/vendor/local/share/doc/groonga/ja/html/reference/api/grn_thread.html +5 -5
  454. data/vendor/local/share/doc/groonga/ja/html/reference/api/grn_type.html +5 -5
  455. data/vendor/local/share/doc/groonga/ja/html/reference/api/grn_user_data.html +5 -5
  456. data/vendor/local/share/doc/groonga/ja/html/reference/api/overview.html +5 -5
  457. data/vendor/local/share/doc/groonga/ja/html/reference/api/plugin.html +5 -5
  458. data/vendor/local/share/doc/groonga/ja/html/reference/cast.html +5 -5
  459. data/vendor/local/share/doc/groonga/ja/html/reference/column.html +5 -5
  460. data/vendor/local/share/doc/groonga/ja/html/reference/columns/index.html +5 -5
  461. data/vendor/local/share/doc/groonga/ja/html/reference/columns/pseudo.html +5 -5
  462. data/vendor/local/share/doc/groonga/ja/html/reference/columns/scalar.html +5 -5
  463. data/vendor/local/share/doc/groonga/ja/html/reference/columns/vector.html +5 -5
  464. data/vendor/local/share/doc/groonga/ja/html/reference/command.html +15 -14
  465. data/vendor/local/share/doc/groonga/ja/html/reference/command/command_version.html +5 -5
  466. data/vendor/local/share/doc/groonga/ja/html/reference/command/output_format.html +5 -5
  467. data/vendor/local/share/doc/groonga/ja/html/reference/command/pretty_print.html +5 -5
  468. data/vendor/local/share/doc/groonga/ja/html/reference/command/request_id.html +5 -5
  469. data/vendor/local/share/doc/groonga/ja/html/reference/command/request_timeout.html +5 -5
  470. data/vendor/local/share/doc/groonga/ja/html/reference/command/return_code.html +5 -5
  471. data/vendor/local/share/doc/groonga/ja/html/reference/commands/cache_limit.html +5 -5
  472. data/vendor/local/share/doc/groonga/ja/html/reference/commands/check.html +5 -5
  473. data/vendor/local/share/doc/groonga/ja/html/reference/commands/clearlock.html +5 -5
  474. data/vendor/local/share/doc/groonga/ja/html/reference/commands/column_copy.html +5 -5
  475. data/vendor/local/share/doc/groonga/ja/html/reference/commands/column_create.html +5 -5
  476. data/vendor/local/share/doc/groonga/ja/html/reference/commands/column_list.html +5 -5
  477. data/vendor/local/share/doc/groonga/ja/html/reference/commands/column_remove.html +5 -5
  478. data/vendor/local/share/doc/groonga/ja/html/reference/commands/column_rename.html +5 -5
  479. data/vendor/local/share/doc/groonga/ja/html/reference/commands/config_delete.html +5 -5
  480. data/vendor/local/share/doc/groonga/ja/html/reference/commands/config_get.html +5 -5
  481. data/vendor/local/share/doc/groonga/ja/html/reference/commands/config_set.html +5 -5
  482. data/vendor/local/share/doc/groonga/ja/html/reference/commands/database_unmap.html +5 -5
  483. data/vendor/local/share/doc/groonga/ja/html/reference/commands/define_selector.html +5 -5
  484. data/vendor/local/share/doc/groonga/ja/html/reference/commands/defrag.html +5 -5
  485. data/vendor/local/share/doc/groonga/ja/html/reference/commands/delete.html +5 -5
  486. data/vendor/local/share/doc/groonga/ja/html/reference/commands/dump.html +5 -5
  487. data/vendor/local/share/doc/groonga/ja/html/reference/commands/io_flush.html +5 -5
  488. data/vendor/local/share/doc/groonga/ja/html/reference/commands/load.html +5 -5
  489. data/vendor/local/share/doc/groonga/ja/html/reference/commands/lock_acquire.html +6 -6
  490. data/vendor/local/share/doc/groonga/ja/html/reference/commands/lock_clear.html +5 -5
  491. data/vendor/local/share/doc/groonga/ja/html/reference/commands/lock_release.html +6 -6
  492. data/vendor/local/share/doc/groonga/ja/html/reference/commands/log_level.html +5 -5
  493. data/vendor/local/share/doc/groonga/ja/html/reference/commands/log_put.html +5 -5
  494. data/vendor/local/share/doc/groonga/ja/html/reference/commands/log_reopen.html +5 -5
  495. data/vendor/local/share/doc/groonga/ja/html/reference/commands/logical_count.html +5 -5
  496. data/vendor/local/share/doc/groonga/ja/html/reference/commands/logical_parameters.html +5 -5
  497. data/vendor/local/share/doc/groonga/ja/html/reference/commands/logical_range_filter.html +5 -5
  498. data/vendor/local/share/doc/groonga/ja/html/reference/commands/logical_select.html +5 -5
  499. data/vendor/local/share/doc/groonga/ja/html/reference/commands/logical_shard_list.html +5 -5
  500. data/vendor/local/share/doc/groonga/ja/html/reference/commands/logical_table_remove.html +88 -8
  501. data/vendor/local/share/doc/groonga/ja/html/reference/commands/normalize.html +5 -5
  502. data/vendor/local/share/doc/groonga/ja/html/reference/commands/normalizer_list.html +5 -5
  503. data/vendor/local/share/doc/groonga/ja/html/reference/commands/object_exist.html +5 -5
  504. data/vendor/local/share/doc/groonga/ja/html/reference/commands/object_inspect.html +5 -5
  505. data/vendor/local/share/doc/groonga/ja/html/reference/commands/object_list.html +103 -103
  506. data/vendor/local/share/doc/groonga/ja/html/reference/commands/object_remove.html +6 -6
  507. data/vendor/local/share/doc/groonga/ja/html/reference/commands/plugin_register.html +5 -5
  508. data/vendor/local/share/doc/groonga/ja/html/reference/commands/plugin_unregister.html +5 -5
  509. data/vendor/local/share/doc/groonga/ja/html/reference/commands/query_expand.html +5 -5
  510. data/vendor/local/share/doc/groonga/ja/html/reference/commands/quit.html +5 -5
  511. data/vendor/local/share/doc/groonga/ja/html/reference/commands/range_filter.html +5 -5
  512. data/vendor/local/share/doc/groonga/ja/html/reference/commands/register.html +5 -5
  513. data/vendor/local/share/doc/groonga/ja/html/reference/commands/reindex.html +5 -5
  514. data/vendor/local/share/doc/groonga/ja/html/reference/commands/request_cancel.html +5 -5
  515. data/vendor/local/share/doc/groonga/ja/html/reference/commands/ruby_eval.html +5 -5
  516. data/vendor/local/share/doc/groonga/ja/html/reference/commands/ruby_load.html +5 -5
  517. data/vendor/local/share/doc/groonga/ja/html/reference/commands/schema.html +5 -5
  518. data/vendor/local/share/doc/groonga/ja/html/reference/commands/select.html +5 -5
  519. data/vendor/local/share/doc/groonga/ja/html/reference/commands/shutdown.html +5 -5
  520. data/vendor/local/share/doc/groonga/ja/html/reference/commands/status.html +5 -5
  521. data/vendor/local/share/doc/groonga/ja/html/reference/commands/suggest.html +10 -10
  522. data/vendor/local/share/doc/groonga/ja/html/reference/commands/table_copy.html +201 -0
  523. data/vendor/local/share/doc/groonga/ja/html/reference/commands/table_create.html +52 -52
  524. data/vendor/local/share/doc/groonga/ja/html/reference/commands/table_list.html +25 -25
  525. data/vendor/local/share/doc/groonga/ja/html/reference/commands/table_remove.html +41 -41
  526. data/vendor/local/share/doc/groonga/ja/html/reference/commands/table_rename.html +31 -31
  527. data/vendor/local/share/doc/groonga/ja/html/reference/commands/table_tokenize.html +41 -41
  528. data/vendor/local/share/doc/groonga/ja/html/reference/commands/thread_limit.html +31 -31
  529. data/vendor/local/share/doc/groonga/ja/html/reference/commands/tokenize.html +43 -43
  530. data/vendor/local/share/doc/groonga/ja/html/reference/commands/tokenizer_list.html +25 -25
  531. data/vendor/local/share/doc/groonga/ja/html/reference/commands/truncate.html +25 -25
  532. data/vendor/local/share/doc/groonga/ja/html/reference/configuration.html +5 -5
  533. data/vendor/local/share/doc/groonga/ja/html/reference/executables.html +5 -5
  534. data/vendor/local/share/doc/groonga/ja/html/reference/executables/grndb.html +5 -5
  535. data/vendor/local/share/doc/groonga/ja/html/reference/executables/grnslap.html +5 -5
  536. data/vendor/local/share/doc/groonga/ja/html/reference/executables/groonga-benchmark.html +5 -5
  537. data/vendor/local/share/doc/groonga/ja/html/reference/executables/groonga-httpd.html +5 -5
  538. data/vendor/local/share/doc/groonga/ja/html/reference/executables/groonga-server-http.html +5 -5
  539. data/vendor/local/share/doc/groonga/ja/html/reference/executables/groonga-suggest-create-dataset.html +5 -5
  540. data/vendor/local/share/doc/groonga/ja/html/reference/executables/groonga-suggest-httpd.html +6 -6
  541. data/vendor/local/share/doc/groonga/ja/html/reference/executables/groonga-suggest-learner.html +5 -5
  542. data/vendor/local/share/doc/groonga/ja/html/reference/executables/groonga.html +5 -5
  543. data/vendor/local/share/doc/groonga/ja/html/reference/function.html +5 -5
  544. data/vendor/local/share/doc/groonga/ja/html/reference/functions/between.html +5 -5
  545. data/vendor/local/share/doc/groonga/ja/html/reference/functions/edit_distance.html +5 -5
  546. data/vendor/local/share/doc/groonga/ja/html/reference/functions/fuzzy_search.html +5 -5
  547. data/vendor/local/share/doc/groonga/ja/html/reference/functions/geo_distance.html +5 -5
  548. data/vendor/local/share/doc/groonga/ja/html/reference/functions/geo_in_circle.html +5 -5
  549. data/vendor/local/share/doc/groonga/ja/html/reference/functions/geo_in_rectangle.html +5 -5
  550. data/vendor/local/share/doc/groonga/ja/html/reference/functions/highlight_full.html +5 -5
  551. data/vendor/local/share/doc/groonga/ja/html/reference/functions/highlight_html.html +5 -5
  552. data/vendor/local/share/doc/groonga/ja/html/reference/functions/html_untag.html +5 -5
  553. data/vendor/local/share/doc/groonga/ja/html/reference/functions/in_values.html +5 -5
  554. data/vendor/local/share/doc/groonga/ja/html/reference/functions/now.html +5 -5
  555. data/vendor/local/share/doc/groonga/ja/html/reference/functions/number_classify.html +5 -5
  556. data/vendor/local/share/doc/groonga/ja/html/reference/functions/prefix_rk_search.html +5 -5
  557. data/vendor/local/share/doc/groonga/ja/html/reference/functions/query.html +5 -5
  558. data/vendor/local/share/doc/groonga/ja/html/reference/functions/rand.html +5 -5
  559. data/vendor/local/share/doc/groonga/ja/html/reference/functions/record_number.html +5 -5
  560. data/vendor/local/share/doc/groonga/ja/html/reference/functions/snippet_html.html +5 -5
  561. data/vendor/local/share/doc/groonga/ja/html/reference/functions/string_substring.html +5 -5
  562. data/vendor/local/share/doc/groonga/ja/html/reference/functions/sub_filter.html +5 -5
  563. data/vendor/local/share/doc/groonga/ja/html/reference/functions/time_classify_day.html +5 -5
  564. data/vendor/local/share/doc/groonga/ja/html/reference/functions/time_classify_hour.html +5 -5
  565. data/vendor/local/share/doc/groonga/ja/html/reference/functions/time_classify_minute.html +5 -5
  566. data/vendor/local/share/doc/groonga/ja/html/reference/functions/time_classify_month.html +5 -5
  567. data/vendor/local/share/doc/groonga/ja/html/reference/functions/time_classify_second.html +5 -5
  568. data/vendor/local/share/doc/groonga/ja/html/reference/functions/time_classify_week.html +5 -5
  569. data/vendor/local/share/doc/groonga/ja/html/reference/functions/time_classify_year.html +5 -5
  570. data/vendor/local/share/doc/groonga/ja/html/reference/functions/vector_size.html +5 -5
  571. data/vendor/local/share/doc/groonga/ja/html/reference/functions/vector_slice.html +5 -5
  572. data/vendor/local/share/doc/groonga/ja/html/reference/grn_expr.html +5 -5
  573. data/vendor/local/share/doc/groonga/ja/html/reference/grn_expr/query_syntax.html +5 -5
  574. data/vendor/local/share/doc/groonga/ja/html/reference/grn_expr/script_syntax.html +5 -5
  575. data/vendor/local/share/doc/groonga/ja/html/reference/indexing.html +5 -5
  576. data/vendor/local/share/doc/groonga/ja/html/reference/log.html +5 -5
  577. data/vendor/local/share/doc/groonga/ja/html/reference/normalizers.html +5 -5
  578. data/vendor/local/share/doc/groonga/ja/html/reference/operations.html +5 -5
  579. data/vendor/local/share/doc/groonga/ja/html/reference/operations/geolocation_search.html +5 -5
  580. data/vendor/local/share/doc/groonga/ja/html/reference/operations/prefix_rk_search.html +5 -5
  581. data/vendor/local/share/doc/groonga/ja/html/reference/output.html +5 -5
  582. data/vendor/local/share/doc/groonga/ja/html/reference/query_expanders.html +5 -5
  583. data/vendor/local/share/doc/groonga/ja/html/reference/query_expanders/tsv.html +5 -5
  584. data/vendor/local/share/doc/groonga/ja/html/reference/regular_expression.html +5 -5
  585. data/vendor/local/share/doc/groonga/ja/html/reference/scorer.html +5 -5
  586. data/vendor/local/share/doc/groonga/ja/html/reference/scorers/scorer_tf_at_most.html +5 -5
  587. data/vendor/local/share/doc/groonga/ja/html/reference/scorers/scorer_tf_idf.html +5 -5
  588. data/vendor/local/share/doc/groonga/ja/html/reference/sharding.html +5 -5
  589. data/vendor/local/share/doc/groonga/ja/html/reference/suggest.html +5 -5
  590. data/vendor/local/share/doc/groonga/ja/html/reference/suggest/completion.html +5 -5
  591. data/vendor/local/share/doc/groonga/ja/html/reference/suggest/correction.html +5 -5
  592. data/vendor/local/share/doc/groonga/ja/html/reference/suggest/introduction.html +5 -5
  593. data/vendor/local/share/doc/groonga/ja/html/reference/suggest/suggestion.html +5 -5
  594. data/vendor/local/share/doc/groonga/ja/html/reference/tables.html +33 -10
  595. data/vendor/local/share/doc/groonga/ja/html/reference/token_filters.html +5 -5
  596. data/vendor/local/share/doc/groonga/ja/html/reference/tokenizers.html +5 -5
  597. data/vendor/local/share/doc/groonga/ja/html/reference/tuning.html +5 -5
  598. data/vendor/local/share/doc/groonga/ja/html/reference/types.html +9 -9
  599. data/vendor/local/share/doc/groonga/ja/html/search.html +5 -5
  600. data/vendor/local/share/doc/groonga/ja/html/searchindex.js +1 -1
  601. data/vendor/local/share/doc/groonga/ja/html/server.html +5 -5
  602. data/vendor/local/share/doc/groonga/ja/html/server/gqtp.html +5 -5
  603. data/vendor/local/share/doc/groonga/ja/html/server/http.html +5 -5
  604. data/vendor/local/share/doc/groonga/ja/html/server/http/comparison.html +5 -5
  605. data/vendor/local/share/doc/groonga/ja/html/server/http/groonga-httpd.html +5 -5
  606. data/vendor/local/share/doc/groonga/ja/html/server/http/groonga.html +5 -5
  607. data/vendor/local/share/doc/groonga/ja/html/server/memcached.html +5 -5
  608. data/vendor/local/share/doc/groonga/ja/html/server/package.html +5 -5
  609. data/vendor/local/share/doc/groonga/ja/html/spec.html +5 -5
  610. data/vendor/local/share/doc/groonga/ja/html/spec/gqtp.html +5 -5
  611. data/vendor/local/share/doc/groonga/ja/html/spec/search.html +5 -5
  612. data/vendor/local/share/doc/groonga/ja/html/troubleshooting.html +5 -5
  613. data/vendor/local/share/doc/groonga/ja/html/troubleshooting/different_results_with_the_same_keyword.html +5 -5
  614. data/vendor/local/share/doc/groonga/ja/html/troubleshooting/mmap_cannot_allocate_memory.html +5 -5
  615. data/vendor/local/share/doc/groonga/ja/html/tutorial.html +5 -5
  616. data/vendor/local/share/doc/groonga/ja/html/tutorial/data.html +5 -5
  617. data/vendor/local/share/doc/groonga/ja/html/tutorial/drilldown.html +5 -5
  618. data/vendor/local/share/doc/groonga/ja/html/tutorial/index.html +5 -5
  619. data/vendor/local/share/doc/groonga/ja/html/tutorial/introduction.html +5 -5
  620. data/vendor/local/share/doc/groonga/ja/html/tutorial/lexicon.html +5 -5
  621. data/vendor/local/share/doc/groonga/ja/html/tutorial/match_columns.html +5 -5
  622. data/vendor/local/share/doc/groonga/ja/html/tutorial/micro_blog.html +5 -5
  623. data/vendor/local/share/doc/groonga/ja/html/tutorial/network.html +5 -5
  624. data/vendor/local/share/doc/groonga/ja/html/tutorial/patricia_trie.html +5 -5
  625. data/vendor/local/share/doc/groonga/ja/html/tutorial/query_expansion.html +5 -5
  626. data/vendor/local/share/doc/groonga/ja/html/tutorial/search.html +5 -5
  627. data/vendor/local/share/doc/pcre/AUTHORS +45 -0
  628. data/vendor/local/share/doc/pcre/COPYING +5 -0
  629. data/vendor/local/share/doc/pcre/ChangeLog +6010 -0
  630. data/vendor/local/share/doc/pcre/LICENCE +93 -0
  631. data/vendor/local/share/doc/pcre/NEWS +725 -0
  632. data/vendor/local/share/doc/pcre/README +1002 -0
  633. data/vendor/local/share/doc/pcre/html/NON-AUTOTOOLS-BUILD.txt +772 -0
  634. data/vendor/local/share/doc/pcre/html/README.txt +1002 -0
  635. data/vendor/local/share/doc/pcre/html/index.html +185 -0
  636. data/vendor/local/share/doc/pcre/html/pcre-config.html +109 -0
  637. data/vendor/local/share/doc/pcre/html/pcre.html +224 -0
  638. data/vendor/local/share/doc/pcre/html/pcre16.html +384 -0
  639. data/vendor/local/share/doc/pcre/html/pcre32.html +382 -0
  640. data/vendor/local/share/doc/pcre/html/pcre_assign_jit_stack.html +76 -0
  641. data/vendor/local/share/doc/pcre/html/pcre_compile.html +111 -0
  642. data/vendor/local/share/doc/pcre/html/pcre_compile2.html +115 -0
  643. data/vendor/local/share/doc/pcre/html/pcre_config.html +94 -0
  644. data/vendor/local/share/doc/pcre/html/pcre_copy_named_substring.html +65 -0
  645. data/vendor/local/share/doc/pcre/html/pcre_copy_substring.html +61 -0
  646. data/vendor/local/share/doc/pcre/html/pcre_dfa_exec.html +129 -0
  647. data/vendor/local/share/doc/pcre/html/pcre_exec.html +111 -0
  648. data/vendor/local/share/doc/pcre/html/pcre_free_study.html +46 -0
  649. data/vendor/local/share/doc/pcre/html/pcre_free_substring.html +46 -0
  650. data/vendor/local/share/doc/pcre/html/pcre_free_substring_list.html +46 -0
  651. data/vendor/local/share/doc/pcre/html/pcre_fullinfo.html +118 -0
  652. data/vendor/local/share/doc/pcre/html/pcre_get_named_substring.html +68 -0
  653. data/vendor/local/share/doc/pcre/html/pcre_get_stringnumber.html +57 -0
  654. data/vendor/local/share/doc/pcre/html/pcre_get_stringtable_entries.html +60 -0
  655. data/vendor/local/share/doc/pcre/html/pcre_get_substring.html +64 -0
  656. data/vendor/local/share/doc/pcre/html/pcre_get_substring_list.html +61 -0
  657. data/vendor/local/share/doc/pcre/html/pcre_jit_exec.html +108 -0
  658. data/vendor/local/share/doc/pcre/html/pcre_jit_stack_alloc.html +55 -0
  659. data/vendor/local/share/doc/pcre/html/pcre_jit_stack_free.html +48 -0
  660. data/vendor/local/share/doc/pcre/html/pcre_maketables.html +48 -0
  661. data/vendor/local/share/doc/pcre/html/pcre_pattern_to_host_byte_order.html +58 -0
  662. data/vendor/local/share/doc/pcre/html/pcre_refcount.html +51 -0
  663. data/vendor/local/share/doc/pcre/html/pcre_study.html +68 -0
  664. data/vendor/local/share/doc/pcre/html/pcre_utf16_to_host_byte_order.html +57 -0
  665. data/vendor/local/share/doc/pcre/html/pcre_utf32_to_host_byte_order.html +57 -0
  666. data/vendor/local/share/doc/pcre/html/pcre_version.html +46 -0
  667. data/vendor/local/share/doc/pcre/html/pcreapi.html +2921 -0
  668. data/vendor/local/share/doc/pcre/html/pcrebuild.html +534 -0
  669. data/vendor/local/share/doc/pcre/html/pcrecallout.html +286 -0
  670. data/vendor/local/share/doc/pcre/html/pcrecompat.html +235 -0
  671. data/vendor/local/share/doc/pcre/html/pcrecpp.html +368 -0
  672. data/vendor/local/share/doc/pcre/html/pcredemo.html +426 -0
  673. data/vendor/local/share/doc/pcre/html/pcregrep.html +759 -0
  674. data/vendor/local/share/doc/pcre/html/pcrejit.html +452 -0
  675. data/vendor/local/share/doc/pcre/html/pcrelimits.html +90 -0
  676. data/vendor/local/share/doc/pcre/html/pcrematching.html +242 -0
  677. data/vendor/local/share/doc/pcre/html/pcrepartial.html +509 -0
  678. data/vendor/local/share/doc/pcre/html/pcrepattern.html +3273 -0
  679. data/vendor/local/share/doc/pcre/html/pcreperform.html +195 -0
  680. data/vendor/local/share/doc/pcre/html/pcreposix.html +290 -0
  681. data/vendor/local/share/doc/pcre/html/pcreprecompile.html +163 -0
  682. data/vendor/local/share/doc/pcre/html/pcresample.html +110 -0
  683. data/vendor/local/share/doc/pcre/html/pcrestack.html +225 -0
  684. data/vendor/local/share/doc/pcre/html/pcresyntax.html +561 -0
  685. data/vendor/local/share/doc/pcre/html/pcretest.html +1158 -0
  686. data/vendor/local/share/doc/pcre/html/pcreunicode.html +262 -0
  687. data/vendor/local/share/doc/pcre/pcre-config.txt +86 -0
  688. data/vendor/local/share/doc/pcre/pcre.txt +10454 -0
  689. data/vendor/local/share/doc/pcre/pcregrep.txt +741 -0
  690. data/vendor/local/share/doc/pcre/pcretest.txt +1087 -0
  691. data/vendor/local/share/groonga/html/admin.old/js/groonga-admin.ja.js +11 -6
  692. data/vendor/local/share/groonga/html/admin.old/js/groonga-admin.js +11 -6
  693. data/vendor/local/share/license/pcre/LICENCE +93 -0
  694. data/vendor/local/share/man/man1/pcre-config.1 +92 -0
  695. data/vendor/local/share/man/man1/pcregrep.1 +683 -0
  696. data/vendor/local/share/man/man1/pcretest.1 +1156 -0
  697. data/vendor/local/share/man/man3/pcre.3 +230 -0
  698. data/vendor/local/share/man/man3/pcre16.3 +371 -0
  699. data/vendor/local/share/man/man3/pcre16_assign_jit_stack.3 +59 -0
  700. data/vendor/local/share/man/man3/pcre16_compile.3 +96 -0
  701. data/vendor/local/share/man/man3/pcre16_compile2.3 +101 -0
  702. data/vendor/local/share/man/man3/pcre16_config.3 +79 -0
  703. data/vendor/local/share/man/man3/pcre16_copy_named_substring.3 +51 -0
  704. data/vendor/local/share/man/man3/pcre16_copy_substring.3 +47 -0
  705. data/vendor/local/share/man/man3/pcre16_dfa_exec.3 +118 -0
  706. data/vendor/local/share/man/man3/pcre16_exec.3 +99 -0
  707. data/vendor/local/share/man/man3/pcre16_free_study.3 +31 -0
  708. data/vendor/local/share/man/man3/pcre16_free_substring.3 +31 -0
  709. data/vendor/local/share/man/man3/pcre16_free_substring_list.3 +31 -0
  710. data/vendor/local/share/man/man3/pcre16_fullinfo.3 +103 -0
  711. data/vendor/local/share/man/man3/pcre16_get_named_substring.3 +54 -0
  712. data/vendor/local/share/man/man3/pcre16_get_stringnumber.3 +43 -0
  713. data/vendor/local/share/man/man3/pcre16_get_stringtable_entries.3 +46 -0
  714. data/vendor/local/share/man/man3/pcre16_get_substring.3 +50 -0
  715. data/vendor/local/share/man/man3/pcre16_get_substring_list.3 +47 -0
  716. data/vendor/local/share/man/man3/pcre16_jit_exec.3 +96 -0
  717. data/vendor/local/share/man/man3/pcre16_jit_stack_alloc.3 +43 -0
  718. data/vendor/local/share/man/man3/pcre16_jit_stack_free.3 +35 -0
  719. data/vendor/local/share/man/man3/pcre16_maketables.3 +33 -0
  720. data/vendor/local/share/man/man3/pcre16_pattern_to_host_byte_order.3 +44 -0
  721. data/vendor/local/share/man/man3/pcre16_refcount.3 +36 -0
  722. data/vendor/local/share/man/man3/pcre16_study.3 +54 -0
  723. data/vendor/local/share/man/man3/pcre16_utf16_to_host_byte_order.3 +45 -0
  724. data/vendor/local/share/man/man3/pcre16_version.3 +31 -0
  725. data/vendor/local/share/man/man3/pcre32.3 +369 -0
  726. data/vendor/local/share/man/man3/pcre32_assign_jit_stack.3 +59 -0
  727. data/vendor/local/share/man/man3/pcre32_compile.3 +96 -0
  728. data/vendor/local/share/man/man3/pcre32_compile2.3 +101 -0
  729. data/vendor/local/share/man/man3/pcre32_config.3 +79 -0
  730. data/vendor/local/share/man/man3/pcre32_copy_named_substring.3 +51 -0
  731. data/vendor/local/share/man/man3/pcre32_copy_substring.3 +47 -0
  732. data/vendor/local/share/man/man3/pcre32_dfa_exec.3 +118 -0
  733. data/vendor/local/share/man/man3/pcre32_exec.3 +99 -0
  734. data/vendor/local/share/man/man3/pcre32_free_study.3 +31 -0
  735. data/vendor/local/share/man/man3/pcre32_free_substring.3 +31 -0
  736. data/vendor/local/share/man/man3/pcre32_free_substring_list.3 +31 -0
  737. data/vendor/local/share/man/man3/pcre32_fullinfo.3 +103 -0
  738. data/vendor/local/share/man/man3/pcre32_get_named_substring.3 +54 -0
  739. data/vendor/local/share/man/man3/pcre32_get_stringnumber.3 +43 -0
  740. data/vendor/local/share/man/man3/pcre32_get_stringtable_entries.3 +46 -0
  741. data/vendor/local/share/man/man3/pcre32_get_substring.3 +50 -0
  742. data/vendor/local/share/man/man3/pcre32_get_substring_list.3 +47 -0
  743. data/vendor/local/share/man/man3/pcre32_jit_exec.3 +96 -0
  744. data/vendor/local/share/man/man3/pcre32_jit_stack_alloc.3 +43 -0
  745. data/vendor/local/share/man/man3/pcre32_jit_stack_free.3 +35 -0
  746. data/vendor/local/share/man/man3/pcre32_maketables.3 +33 -0
  747. data/vendor/local/share/man/man3/pcre32_pattern_to_host_byte_order.3 +44 -0
  748. data/vendor/local/share/man/man3/pcre32_refcount.3 +36 -0
  749. data/vendor/local/share/man/man3/pcre32_study.3 +54 -0
  750. data/vendor/local/share/man/man3/pcre32_utf32_to_host_byte_order.3 +45 -0
  751. data/vendor/local/share/man/man3/pcre32_version.3 +31 -0
  752. data/vendor/local/share/man/man3/pcre_assign_jit_stack.3 +59 -0
  753. data/vendor/local/share/man/man3/pcre_compile.3 +96 -0
  754. data/vendor/local/share/man/man3/pcre_compile2.3 +101 -0
  755. data/vendor/local/share/man/man3/pcre_config.3 +79 -0
  756. data/vendor/local/share/man/man3/pcre_copy_named_substring.3 +51 -0
  757. data/vendor/local/share/man/man3/pcre_copy_substring.3 +47 -0
  758. data/vendor/local/share/man/man3/pcre_dfa_exec.3 +118 -0
  759. data/vendor/local/share/man/man3/pcre_exec.3 +99 -0
  760. data/vendor/local/share/man/man3/pcre_free_study.3 +31 -0
  761. data/vendor/local/share/man/man3/pcre_free_substring.3 +31 -0
  762. data/vendor/local/share/man/man3/pcre_free_substring_list.3 +31 -0
  763. data/vendor/local/share/man/man3/pcre_fullinfo.3 +103 -0
  764. data/vendor/local/share/man/man3/pcre_get_named_substring.3 +54 -0
  765. data/vendor/local/share/man/man3/pcre_get_stringnumber.3 +43 -0
  766. data/vendor/local/share/man/man3/pcre_get_stringtable_entries.3 +46 -0
  767. data/vendor/local/share/man/man3/pcre_get_substring.3 +50 -0
  768. data/vendor/local/share/man/man3/pcre_get_substring_list.3 +47 -0
  769. data/vendor/local/share/man/man3/pcre_jit_exec.3 +96 -0
  770. data/vendor/local/share/man/man3/pcre_jit_stack_alloc.3 +43 -0
  771. data/vendor/local/share/man/man3/pcre_jit_stack_free.3 +35 -0
  772. data/vendor/local/share/man/man3/pcre_maketables.3 +33 -0
  773. data/vendor/local/share/man/man3/pcre_pattern_to_host_byte_order.3 +44 -0
  774. data/vendor/local/share/man/man3/pcre_refcount.3 +36 -0
  775. data/vendor/local/share/man/man3/pcre_study.3 +54 -0
  776. data/vendor/local/share/man/man3/pcre_utf16_to_host_byte_order.3 +45 -0
  777. data/vendor/local/share/man/man3/pcre_utf32_to_host_byte_order.3 +45 -0
  778. data/vendor/local/share/man/man3/pcre_version.3 +31 -0
  779. data/vendor/local/share/man/man3/pcreapi.3 +2918 -0
  780. data/vendor/local/share/man/man3/pcrebuild.3 +550 -0
  781. data/vendor/local/share/man/man3/pcrecallout.3 +255 -0
  782. data/vendor/local/share/man/man3/pcrecompat.3 +200 -0
  783. data/vendor/local/share/man/man3/pcrecpp.3 +348 -0
  784. data/vendor/local/share/man/man3/pcredemo.3 +424 -0
  785. data/vendor/local/share/man/man3/pcrejit.3 +431 -0
  786. data/vendor/local/share/man/man3/pcrelimits.3 +71 -0
  787. data/vendor/local/share/man/man3/pcrematching.3 +214 -0
  788. data/vendor/local/share/man/man3/pcrepartial.3 +476 -0
  789. data/vendor/local/share/man/man3/pcrepattern.3 +3301 -0
  790. data/vendor/local/share/man/man3/pcreperform.3 +177 -0
  791. data/vendor/local/share/man/man3/pcreposix.3 +267 -0
  792. data/vendor/local/share/man/man3/pcreprecompile.3 +155 -0
  793. data/vendor/local/share/man/man3/pcresample.3 +99 -0
  794. data/vendor/local/share/man/man3/pcrestack.3 +215 -0
  795. data/vendor/local/share/man/man3/pcresyntax.3 +540 -0
  796. data/vendor/local/share/man/man3/pcreunicode.3 +249 -0
  797. metadata +255 -59
@@ -0,0 +1,3273 @@
1
+ <html>
2
+ <head>
3
+ <title>pcrepattern specification</title>
4
+ </head>
5
+ <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
6
+ <h1>pcrepattern man page</h1>
7
+ <p>
8
+ Return to the <a href="index.html">PCRE index page</a>.
9
+ </p>
10
+ <p>
11
+ This page is part of the PCRE HTML documentation. It was generated automatically
12
+ from the original man page. If there is any nonsense in it, please consult the
13
+ man page, in case the conversion went wrong.
14
+ <br>
15
+ <ul>
16
+ <li><a name="TOC1" href="#SEC1">PCRE REGULAR EXPRESSION DETAILS</a>
17
+ <li><a name="TOC2" href="#SEC2">SPECIAL START-OF-PATTERN ITEMS</a>
18
+ <li><a name="TOC3" href="#SEC3">EBCDIC CHARACTER CODES</a>
19
+ <li><a name="TOC4" href="#SEC4">CHARACTERS AND METACHARACTERS</a>
20
+ <li><a name="TOC5" href="#SEC5">BACKSLASH</a>
21
+ <li><a name="TOC6" href="#SEC6">CIRCUMFLEX AND DOLLAR</a>
22
+ <li><a name="TOC7" href="#SEC7">FULL STOP (PERIOD, DOT) AND \N</a>
23
+ <li><a name="TOC8" href="#SEC8">MATCHING A SINGLE DATA UNIT</a>
24
+ <li><a name="TOC9" href="#SEC9">SQUARE BRACKETS AND CHARACTER CLASSES</a>
25
+ <li><a name="TOC10" href="#SEC10">POSIX CHARACTER CLASSES</a>
26
+ <li><a name="TOC11" href="#SEC11">COMPATIBILITY FEATURE FOR WORD BOUNDARIES</a>
27
+ <li><a name="TOC12" href="#SEC12">VERTICAL BAR</a>
28
+ <li><a name="TOC13" href="#SEC13">INTERNAL OPTION SETTING</a>
29
+ <li><a name="TOC14" href="#SEC14">SUBPATTERNS</a>
30
+ <li><a name="TOC15" href="#SEC15">DUPLICATE SUBPATTERN NUMBERS</a>
31
+ <li><a name="TOC16" href="#SEC16">NAMED SUBPATTERNS</a>
32
+ <li><a name="TOC17" href="#SEC17">REPETITION</a>
33
+ <li><a name="TOC18" href="#SEC18">ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS</a>
34
+ <li><a name="TOC19" href="#SEC19">BACK REFERENCES</a>
35
+ <li><a name="TOC20" href="#SEC20">ASSERTIONS</a>
36
+ <li><a name="TOC21" href="#SEC21">CONDITIONAL SUBPATTERNS</a>
37
+ <li><a name="TOC22" href="#SEC22">COMMENTS</a>
38
+ <li><a name="TOC23" href="#SEC23">RECURSIVE PATTERNS</a>
39
+ <li><a name="TOC24" href="#SEC24">SUBPATTERNS AS SUBROUTINES</a>
40
+ <li><a name="TOC25" href="#SEC25">ONIGURUMA SUBROUTINE SYNTAX</a>
41
+ <li><a name="TOC26" href="#SEC26">CALLOUTS</a>
42
+ <li><a name="TOC27" href="#SEC27">BACKTRACKING CONTROL</a>
43
+ <li><a name="TOC28" href="#SEC28">SEE ALSO</a>
44
+ <li><a name="TOC29" href="#SEC29">AUTHOR</a>
45
+ <li><a name="TOC30" href="#SEC30">REVISION</a>
46
+ </ul>
47
+ <br><a name="SEC1" href="#TOC1">PCRE REGULAR EXPRESSION DETAILS</a><br>
48
+ <P>
49
+ The syntax and semantics of the regular expressions that are supported by PCRE
50
+ are described in detail below. There is a quick-reference syntax summary in the
51
+ <a href="pcresyntax.html"><b>pcresyntax</b></a>
52
+ page. PCRE tries to match Perl syntax and semantics as closely as it can. PCRE
53
+ also supports some alternative regular expression syntax (which does not
54
+ conflict with the Perl syntax) in order to provide some compatibility with
55
+ regular expressions in Python, .NET, and Oniguruma.
56
+ </P>
57
+ <P>
58
+ Perl's regular expressions are described in its own documentation, and
59
+ regular expressions in general are covered in a number of books, some of which
60
+ have copious examples. Jeffrey Friedl's "Mastering Regular Expressions",
61
+ published by O'Reilly, covers regular expressions in great detail. This
62
+ description of PCRE's regular expressions is intended as reference material.
63
+ </P>
64
+ <P>
65
+ This document discusses the patterns that are supported by PCRE when one its
66
+ main matching functions, <b>pcre_exec()</b> (8-bit) or <b>pcre[16|32]_exec()</b>
67
+ (16- or 32-bit), is used. PCRE also has alternative matching functions,
68
+ <b>pcre_dfa_exec()</b> and <b>pcre[16|32_dfa_exec()</b>, which match using a
69
+ different algorithm that is not Perl-compatible. Some of the features discussed
70
+ below are not available when DFA matching is used. The advantages and
71
+ disadvantages of the alternative functions, and how they differ from the normal
72
+ functions, are discussed in the
73
+ <a href="pcrematching.html"><b>pcrematching</b></a>
74
+ page.
75
+ </P>
76
+ <br><a name="SEC2" href="#TOC1">SPECIAL START-OF-PATTERN ITEMS</a><br>
77
+ <P>
78
+ A number of options that can be passed to <b>pcre_compile()</b> can also be set
79
+ by special items at the start of a pattern. These are not Perl-compatible, but
80
+ are provided to make these options accessible to pattern writers who are not
81
+ able to change the program that processes the pattern. Any number of these
82
+ items may appear, but they must all be together right at the start of the
83
+ pattern string, and the letters must be in upper case.
84
+ </P>
85
+ <br><b>
86
+ UTF support
87
+ </b><br>
88
+ <P>
89
+ The original operation of PCRE was on strings of one-byte characters. However,
90
+ there is now also support for UTF-8 strings in the original library, an
91
+ extra library that supports 16-bit and UTF-16 character strings, and a
92
+ third library that supports 32-bit and UTF-32 character strings. To use these
93
+ features, PCRE must be built to include appropriate support. When using UTF
94
+ strings you must either call the compiling function with the PCRE_UTF8,
95
+ PCRE_UTF16, or PCRE_UTF32 option, or the pattern must start with one of
96
+ these special sequences:
97
+ <pre>
98
+ (*UTF8)
99
+ (*UTF16)
100
+ (*UTF32)
101
+ (*UTF)
102
+ </pre>
103
+ (*UTF) is a generic sequence that can be used with any of the libraries.
104
+ Starting a pattern with such a sequence is equivalent to setting the relevant
105
+ option. How setting a UTF mode affects pattern matching is mentioned in several
106
+ places below. There is also a summary of features in the
107
+ <a href="pcreunicode.html"><b>pcreunicode</b></a>
108
+ page.
109
+ </P>
110
+ <P>
111
+ Some applications that allow their users to supply patterns may wish to
112
+ restrict them to non-UTF data for security reasons. If the PCRE_NEVER_UTF
113
+ option is set at compile time, (*UTF) etc. are not allowed, and their
114
+ appearance causes an error.
115
+ </P>
116
+ <br><b>
117
+ Unicode property support
118
+ </b><br>
119
+ <P>
120
+ Another special sequence that may appear at the start of a pattern is (*UCP).
121
+ This has the same effect as setting the PCRE_UCP option: it causes sequences
122
+ such as \d and \w to use Unicode properties to determine character types,
123
+ instead of recognizing only characters with codes less than 128 via a lookup
124
+ table.
125
+ </P>
126
+ <br><b>
127
+ Disabling auto-possessification
128
+ </b><br>
129
+ <P>
130
+ If a pattern starts with (*NO_AUTO_POSSESS), it has the same effect as setting
131
+ the PCRE_NO_AUTO_POSSESS option at compile time. This stops PCRE from making
132
+ quantifiers possessive when what follows cannot match the repeated item. For
133
+ example, by default a+b is treated as a++b. For more details, see the
134
+ <a href="pcreapi.html"><b>pcreapi</b></a>
135
+ documentation.
136
+ </P>
137
+ <br><b>
138
+ Disabling start-up optimizations
139
+ </b><br>
140
+ <P>
141
+ If a pattern starts with (*NO_START_OPT), it has the same effect as setting the
142
+ PCRE_NO_START_OPTIMIZE option either at compile or matching time. This disables
143
+ several optimizations for quickly reaching "no match" results. For more
144
+ details, see the
145
+ <a href="pcreapi.html"><b>pcreapi</b></a>
146
+ documentation.
147
+ <a name="newlines"></a></P>
148
+ <br><b>
149
+ Newline conventions
150
+ </b><br>
151
+ <P>
152
+ PCRE supports five different conventions for indicating line breaks in
153
+ strings: a single CR (carriage return) character, a single LF (linefeed)
154
+ character, the two-character sequence CRLF, any of the three preceding, or any
155
+ Unicode newline sequence. The
156
+ <a href="pcreapi.html"><b>pcreapi</b></a>
157
+ page has
158
+ <a href="pcreapi.html#newlines">further discussion</a>
159
+ about newlines, and shows how to set the newline convention in the
160
+ <i>options</i> arguments for the compiling and matching functions.
161
+ </P>
162
+ <P>
163
+ It is also possible to specify a newline convention by starting a pattern
164
+ string with one of the following five sequences:
165
+ <pre>
166
+ (*CR) carriage return
167
+ (*LF) linefeed
168
+ (*CRLF) carriage return, followed by linefeed
169
+ (*ANYCRLF) any of the three above
170
+ (*ANY) all Unicode newline sequences
171
+ </pre>
172
+ These override the default and the options given to the compiling function. For
173
+ example, on a Unix system where LF is the default newline sequence, the pattern
174
+ <pre>
175
+ (*CR)a.b
176
+ </pre>
177
+ changes the convention to CR. That pattern matches "a\nb" because LF is no
178
+ longer a newline. If more than one of these settings is present, the last one
179
+ is used.
180
+ </P>
181
+ <P>
182
+ The newline convention affects where the circumflex and dollar assertions are
183
+ true. It also affects the interpretation of the dot metacharacter when
184
+ PCRE_DOTALL is not set, and the behaviour of \N. However, it does not affect
185
+ what the \R escape sequence matches. By default, this is any Unicode newline
186
+ sequence, for Perl compatibility. However, this can be changed; see the
187
+ description of \R in the section entitled
188
+ <a href="#newlineseq">"Newline sequences"</a>
189
+ below. A change of \R setting can be combined with a change of newline
190
+ convention.
191
+ </P>
192
+ <br><b>
193
+ Setting match and recursion limits
194
+ </b><br>
195
+ <P>
196
+ The caller of <b>pcre_exec()</b> can set a limit on the number of times the
197
+ internal <b>match()</b> function is called and on the maximum depth of
198
+ recursive calls. These facilities are provided to catch runaway matches that
199
+ are provoked by patterns with huge matching trees (a typical example is a
200
+ pattern with nested unlimited repeats) and to avoid running out of system stack
201
+ by too much recursion. When one of these limits is reached, <b>pcre_exec()</b>
202
+ gives an error return. The limits can also be set by items at the start of the
203
+ pattern of the form
204
+ <pre>
205
+ (*LIMIT_MATCH=d)
206
+ (*LIMIT_RECURSION=d)
207
+ </pre>
208
+ where d is any number of decimal digits. However, the value of the setting must
209
+ be less than the value set (or defaulted) by the caller of <b>pcre_exec()</b>
210
+ for it to have any effect. In other words, the pattern writer can lower the
211
+ limits set by the programmer, but not raise them. If there is more than one
212
+ setting of one of these limits, the lower value is used.
213
+ </P>
214
+ <br><a name="SEC3" href="#TOC1">EBCDIC CHARACTER CODES</a><br>
215
+ <P>
216
+ PCRE can be compiled to run in an environment that uses EBCDIC as its character
217
+ code rather than ASCII or Unicode (typically a mainframe system). In the
218
+ sections below, character code values are ASCII or Unicode; in an EBCDIC
219
+ environment these characters may have different code values, and there are no
220
+ code points greater than 255.
221
+ </P>
222
+ <br><a name="SEC4" href="#TOC1">CHARACTERS AND METACHARACTERS</a><br>
223
+ <P>
224
+ A regular expression is a pattern that is matched against a subject string from
225
+ left to right. Most characters stand for themselves in a pattern, and match the
226
+ corresponding characters in the subject. As a trivial example, the pattern
227
+ <pre>
228
+ The quick brown fox
229
+ </pre>
230
+ matches a portion of a subject string that is identical to itself. When
231
+ caseless matching is specified (the PCRE_CASELESS option), letters are matched
232
+ independently of case. In a UTF mode, PCRE always understands the concept of
233
+ case for characters whose values are less than 128, so caseless matching is
234
+ always possible. For characters with higher values, the concept of case is
235
+ supported if PCRE is compiled with Unicode property support, but not otherwise.
236
+ If you want to use caseless matching for characters 128 and above, you must
237
+ ensure that PCRE is compiled with Unicode property support as well as with
238
+ UTF support.
239
+ </P>
240
+ <P>
241
+ The power of regular expressions comes from the ability to include alternatives
242
+ and repetitions in the pattern. These are encoded in the pattern by the use of
243
+ <i>metacharacters</i>, which do not stand for themselves but instead are
244
+ interpreted in some special way.
245
+ </P>
246
+ <P>
247
+ There are two different sets of metacharacters: those that are recognized
248
+ anywhere in the pattern except within square brackets, and those that are
249
+ recognized within square brackets. Outside square brackets, the metacharacters
250
+ are as follows:
251
+ <pre>
252
+ \ general escape character with several uses
253
+ ^ assert start of string (or line, in multiline mode)
254
+ $ assert end of string (or line, in multiline mode)
255
+ . match any character except newline (by default)
256
+ [ start character class definition
257
+ | start of alternative branch
258
+ ( start subpattern
259
+ ) end subpattern
260
+ ? extends the meaning of (
261
+ also 0 or 1 quantifier
262
+ also quantifier minimizer
263
+ * 0 or more quantifier
264
+ + 1 or more quantifier
265
+ also "possessive quantifier"
266
+ { start min/max quantifier
267
+ </pre>
268
+ Part of a pattern that is in square brackets is called a "character class". In
269
+ a character class the only metacharacters are:
270
+ <pre>
271
+ \ general escape character
272
+ ^ negate the class, but only if the first character
273
+ - indicates character range
274
+ [ POSIX character class (only if followed by POSIX syntax)
275
+ ] terminates the character class
276
+ </pre>
277
+ The following sections describe the use of each of the metacharacters.
278
+ </P>
279
+ <br><a name="SEC5" href="#TOC1">BACKSLASH</a><br>
280
+ <P>
281
+ The backslash character has several uses. Firstly, if it is followed by a
282
+ character that is not a number or a letter, it takes away any special meaning
283
+ that character may have. This use of backslash as an escape character applies
284
+ both inside and outside character classes.
285
+ </P>
286
+ <P>
287
+ For example, if you want to match a * character, you write \* in the pattern.
288
+ This escaping action applies whether or not the following character would
289
+ otherwise be interpreted as a metacharacter, so it is always safe to precede a
290
+ non-alphanumeric with backslash to specify that it stands for itself. In
291
+ particular, if you want to match a backslash, you write \\.
292
+ </P>
293
+ <P>
294
+ In a UTF mode, only ASCII numbers and letters have any special meaning after a
295
+ backslash. All other characters (in particular, those whose codepoints are
296
+ greater than 127) are treated as literals.
297
+ </P>
298
+ <P>
299
+ If a pattern is compiled with the PCRE_EXTENDED option, most white space in the
300
+ pattern (other than in a character class), and characters between a # outside a
301
+ character class and the next newline, inclusive, are ignored. An escaping
302
+ backslash can be used to include a white space or # character as part of the
303
+ pattern.
304
+ </P>
305
+ <P>
306
+ If you want to remove the special meaning from a sequence of characters, you
307
+ can do so by putting them between \Q and \E. This is different from Perl in
308
+ that $ and @ are handled as literals in \Q...\E sequences in PCRE, whereas in
309
+ Perl, $ and @ cause variable interpolation. Note the following examples:
310
+ <pre>
311
+ Pattern PCRE matches Perl matches
312
+
313
+ \Qabc$xyz\E abc$xyz abc followed by the contents of $xyz
314
+ \Qabc\$xyz\E abc\$xyz abc\$xyz
315
+ \Qabc\E\$\Qxyz\E abc$xyz abc$xyz
316
+ </pre>
317
+ The \Q...\E sequence is recognized both inside and outside character classes.
318
+ An isolated \E that is not preceded by \Q is ignored. If \Q is not followed
319
+ by \E later in the pattern, the literal interpretation continues to the end of
320
+ the pattern (that is, \E is assumed at the end). If the isolated \Q is inside
321
+ a character class, this causes an error, because the character class is not
322
+ terminated.
323
+ <a name="digitsafterbackslash"></a></P>
324
+ <br><b>
325
+ Non-printing characters
326
+ </b><br>
327
+ <P>
328
+ A second use of backslash provides a way of encoding non-printing characters
329
+ in patterns in a visible manner. There is no restriction on the appearance of
330
+ non-printing characters, apart from the binary zero that terminates a pattern,
331
+ but when a pattern is being prepared by text editing, it is often easier to use
332
+ one of the following escape sequences than the binary character it represents.
333
+ In an ASCII or Unicode environment, these escapes are as follows:
334
+ <pre>
335
+ \a alarm, that is, the BEL character (hex 07)
336
+ \cx "control-x", where x is any ASCII character
337
+ \e escape (hex 1B)
338
+ \f form feed (hex 0C)
339
+ \n linefeed (hex 0A)
340
+ \r carriage return (hex 0D)
341
+ \t tab (hex 09)
342
+ \0dd character with octal code 0dd
343
+ \ddd character with octal code ddd, or back reference
344
+ \o{ddd..} character with octal code ddd..
345
+ \xhh character with hex code hh
346
+ \x{hhh..} character with hex code hhh.. (non-JavaScript mode)
347
+ \uhhhh character with hex code hhhh (JavaScript mode only)
348
+ </pre>
349
+ The precise effect of \cx on ASCII characters is as follows: if x is a lower
350
+ case letter, it is converted to upper case. Then bit 6 of the character (hex
351
+ 40) is inverted. Thus \cA to \cZ become hex 01 to hex 1A (A is 41, Z is 5A),
352
+ but \c{ becomes hex 3B ({ is 7B), and \c; becomes hex 7B (; is 3B). If the
353
+ data item (byte or 16-bit value) following \c has a value greater than 127, a
354
+ compile-time error occurs. This locks out non-ASCII characters in all modes.
355
+ </P>
356
+ <P>
357
+ When PCRE is compiled in EBCDIC mode, \a, \e, \f, \n, \r, and \t
358
+ generate the appropriate EBCDIC code values. The \c escape is processed
359
+ as specified for Perl in the <b>perlebcdic</b> document. The only characters
360
+ that are allowed after \c are A-Z, a-z, or one of @, [, \, ], ^, _, or ?. Any
361
+ other character provokes a compile-time error. The sequence \@ encodes
362
+ character code 0; the letters (in either case) encode characters 1-26 (hex 01
363
+ to hex 1A); [, \, ], ^, and _ encode characters 27-31 (hex 1B to hex 1F), and
364
+ \? becomes either 255 (hex FF) or 95 (hex 5F).
365
+ </P>
366
+ <P>
367
+ Thus, apart from \?, these escapes generate the same character code values as
368
+ they do in an ASCII environment, though the meanings of the values mostly
369
+ differ. For example, \G always generates code value 7, which is BEL in ASCII
370
+ but DEL in EBCDIC.
371
+ </P>
372
+ <P>
373
+ The sequence \? generates DEL (127, hex 7F) in an ASCII environment, but
374
+ because 127 is not a control character in EBCDIC, Perl makes it generate the
375
+ APC character. Unfortunately, there are several variants of EBCDIC. In most of
376
+ them the APC character has the value 255 (hex FF), but in the one Perl calls
377
+ POSIX-BC its value is 95 (hex 5F). If certain other characters have POSIX-BC
378
+ values, PCRE makes \? generate 95; otherwise it generates 255.
379
+ </P>
380
+ <P>
381
+ After \0 up to two further octal digits are read. If there are fewer than two
382
+ digits, just those that are present are used. Thus the sequence \0\x\015
383
+ specifies two binary zeros followed by a CR character (code value 13). Make
384
+ sure you supply two digits after the initial zero if the pattern character that
385
+ follows is itself an octal digit.
386
+ </P>
387
+ <P>
388
+ The escape \o must be followed by a sequence of octal digits, enclosed in
389
+ braces. An error occurs if this is not the case. This escape is a recent
390
+ addition to Perl; it provides way of specifying character code points as octal
391
+ numbers greater than 0777, and it also allows octal numbers and back references
392
+ to be unambiguously specified.
393
+ </P>
394
+ <P>
395
+ For greater clarity and unambiguity, it is best to avoid following \ by a
396
+ digit greater than zero. Instead, use \o{} or \x{} to specify character
397
+ numbers, and \g{} to specify back references. The following paragraphs
398
+ describe the old, ambiguous syntax.
399
+ </P>
400
+ <P>
401
+ The handling of a backslash followed by a digit other than 0 is complicated,
402
+ and Perl has changed in recent releases, causing PCRE also to change. Outside a
403
+ character class, PCRE reads the digit and any following digits as a decimal
404
+ number. If the number is less than 8, or if there have been at least that many
405
+ previous capturing left parentheses in the expression, the entire sequence is
406
+ taken as a <i>back reference</i>. A description of how this works is given
407
+ <a href="#backreferences">later,</a>
408
+ following the discussion of
409
+ <a href="#subpattern">parenthesized subpatterns.</a>
410
+ </P>
411
+ <P>
412
+ Inside a character class, or if the decimal number following \ is greater than
413
+ 7 and there have not been that many capturing subpatterns, PCRE handles \8 and
414
+ \9 as the literal characters "8" and "9", and otherwise re-reads up to three
415
+ octal digits following the backslash, using them to generate a data character.
416
+ Any subsequent digits stand for themselves. For example:
417
+ <pre>
418
+ \040 is another way of writing an ASCII space
419
+ \40 is the same, provided there are fewer than 40 previous capturing subpatterns
420
+ \7 is always a back reference
421
+ \11 might be a back reference, or another way of writing a tab
422
+ \011 is always a tab
423
+ \0113 is a tab followed by the character "3"
424
+ \113 might be a back reference, otherwise the character with octal code 113
425
+ \377 might be a back reference, otherwise the value 255 (decimal)
426
+ \81 is either a back reference, or the two characters "8" and "1"
427
+ </pre>
428
+ Note that octal values of 100 or greater that are specified using this syntax
429
+ must not be introduced by a leading zero, because no more than three octal
430
+ digits are ever read.
431
+ </P>
432
+ <P>
433
+ By default, after \x that is not followed by {, from zero to two hexadecimal
434
+ digits are read (letters can be in upper or lower case). Any number of
435
+ hexadecimal digits may appear between \x{ and }. If a character other than
436
+ a hexadecimal digit appears between \x{ and }, or if there is no terminating
437
+ }, an error occurs.
438
+ </P>
439
+ <P>
440
+ If the PCRE_JAVASCRIPT_COMPAT option is set, the interpretation of \x is
441
+ as just described only when it is followed by two hexadecimal digits.
442
+ Otherwise, it matches a literal "x" character. In JavaScript mode, support for
443
+ code points greater than 256 is provided by \u, which must be followed by
444
+ four hexadecimal digits; otherwise it matches a literal "u" character.
445
+ </P>
446
+ <P>
447
+ Characters whose value is less than 256 can be defined by either of the two
448
+ syntaxes for \x (or by \u in JavaScript mode). There is no difference in the
449
+ way they are handled. For example, \xdc is exactly the same as \x{dc} (or
450
+ \u00dc in JavaScript mode).
451
+ </P>
452
+ <br><b>
453
+ Constraints on character values
454
+ </b><br>
455
+ <P>
456
+ Characters that are specified using octal or hexadecimal numbers are
457
+ limited to certain values, as follows:
458
+ <pre>
459
+ 8-bit non-UTF mode less than 0x100
460
+ 8-bit UTF-8 mode less than 0x10ffff and a valid codepoint
461
+ 16-bit non-UTF mode less than 0x10000
462
+ 16-bit UTF-16 mode less than 0x10ffff and a valid codepoint
463
+ 32-bit non-UTF mode less than 0x100000000
464
+ 32-bit UTF-32 mode less than 0x10ffff and a valid codepoint
465
+ </pre>
466
+ Invalid Unicode codepoints are the range 0xd800 to 0xdfff (the so-called
467
+ "surrogate" codepoints), and 0xffef.
468
+ </P>
469
+ <br><b>
470
+ Escape sequences in character classes
471
+ </b><br>
472
+ <P>
473
+ All the sequences that define a single character value can be used both inside
474
+ and outside character classes. In addition, inside a character class, \b is
475
+ interpreted as the backspace character (hex 08).
476
+ </P>
477
+ <P>
478
+ \N is not allowed in a character class. \B, \R, and \X are not special
479
+ inside a character class. Like other unrecognized escape sequences, they are
480
+ treated as the literal characters "B", "R", and "X" by default, but cause an
481
+ error if the PCRE_EXTRA option is set. Outside a character class, these
482
+ sequences have different meanings.
483
+ </P>
484
+ <br><b>
485
+ Unsupported escape sequences
486
+ </b><br>
487
+ <P>
488
+ In Perl, the sequences \l, \L, \u, and \U are recognized by its string
489
+ handler and used to modify the case of following characters. By default, PCRE
490
+ does not support these escape sequences. However, if the PCRE_JAVASCRIPT_COMPAT
491
+ option is set, \U matches a "U" character, and \u can be used to define a
492
+ character by code point, as described in the previous section.
493
+ </P>
494
+ <br><b>
495
+ Absolute and relative back references
496
+ </b><br>
497
+ <P>
498
+ The sequence \g followed by an unsigned or a negative number, optionally
499
+ enclosed in braces, is an absolute or relative back reference. A named back
500
+ reference can be coded as \g{name}. Back references are discussed
501
+ <a href="#backreferences">later,</a>
502
+ following the discussion of
503
+ <a href="#subpattern">parenthesized subpatterns.</a>
504
+ </P>
505
+ <br><b>
506
+ Absolute and relative subroutine calls
507
+ </b><br>
508
+ <P>
509
+ For compatibility with Oniguruma, the non-Perl syntax \g followed by a name or
510
+ a number enclosed either in angle brackets or single quotes, is an alternative
511
+ syntax for referencing a subpattern as a "subroutine". Details are discussed
512
+ <a href="#onigurumasubroutines">later.</a>
513
+ Note that \g{...} (Perl syntax) and \g&#60;...&#62; (Oniguruma syntax) are <i>not</i>
514
+ synonymous. The former is a back reference; the latter is a
515
+ <a href="#subpatternsassubroutines">subroutine</a>
516
+ call.
517
+ <a name="genericchartypes"></a></P>
518
+ <br><b>
519
+ Generic character types
520
+ </b><br>
521
+ <P>
522
+ Another use of backslash is for specifying generic character types:
523
+ <pre>
524
+ \d any decimal digit
525
+ \D any character that is not a decimal digit
526
+ \h any horizontal white space character
527
+ \H any character that is not a horizontal white space character
528
+ \s any white space character
529
+ \S any character that is not a white space character
530
+ \v any vertical white space character
531
+ \V any character that is not a vertical white space character
532
+ \w any "word" character
533
+ \W any "non-word" character
534
+ </pre>
535
+ There is also the single sequence \N, which matches a non-newline character.
536
+ This is the same as
537
+ <a href="#fullstopdot">the "." metacharacter</a>
538
+ when PCRE_DOTALL is not set. Perl also uses \N to match characters by name;
539
+ PCRE does not support this.
540
+ </P>
541
+ <P>
542
+ Each pair of lower and upper case escape sequences partitions the complete set
543
+ of characters into two disjoint sets. Any given character matches one, and only
544
+ one, of each pair. The sequences can appear both inside and outside character
545
+ classes. They each match one character of the appropriate type. If the current
546
+ matching point is at the end of the subject string, all of them fail, because
547
+ there is no character to match.
548
+ </P>
549
+ <P>
550
+ For compatibility with Perl, \s did not used to match the VT character (code
551
+ 11), which made it different from the the POSIX "space" class. However, Perl
552
+ added VT at release 5.18, and PCRE followed suit at release 8.34. The default
553
+ \s characters are now HT (9), LF (10), VT (11), FF (12), CR (13), and space
554
+ (32), which are defined as white space in the "C" locale. This list may vary if
555
+ locale-specific matching is taking place. For example, in some locales the
556
+ "non-breaking space" character (\xA0) is recognized as white space, and in
557
+ others the VT character is not.
558
+ </P>
559
+ <P>
560
+ A "word" character is an underscore or any character that is a letter or digit.
561
+ By default, the definition of letters and digits is controlled by PCRE's
562
+ low-valued character tables, and may vary if locale-specific matching is taking
563
+ place (see
564
+ <a href="pcreapi.html#localesupport">"Locale support"</a>
565
+ in the
566
+ <a href="pcreapi.html"><b>pcreapi</b></a>
567
+ page). For example, in a French locale such as "fr_FR" in Unix-like systems,
568
+ or "french" in Windows, some character codes greater than 127 are used for
569
+ accented letters, and these are then matched by \w. The use of locales with
570
+ Unicode is discouraged.
571
+ </P>
572
+ <P>
573
+ By default, characters whose code points are greater than 127 never match \d,
574
+ \s, or \w, and always match \D, \S, and \W, although this may vary for
575
+ characters in the range 128-255 when locale-specific matching is happening.
576
+ These escape sequences retain their original meanings from before Unicode
577
+ support was available, mainly for efficiency reasons. If PCRE is compiled with
578
+ Unicode property support, and the PCRE_UCP option is set, the behaviour is
579
+ changed so that Unicode properties are used to determine character types, as
580
+ follows:
581
+ <pre>
582
+ \d any character that matches \p{Nd} (decimal digit)
583
+ \s any character that matches \p{Z} or \h or \v
584
+ \w any character that matches \p{L} or \p{N}, plus underscore
585
+ </pre>
586
+ The upper case escapes match the inverse sets of characters. Note that \d
587
+ matches only decimal digits, whereas \w matches any Unicode digit, as well as
588
+ any Unicode letter, and underscore. Note also that PCRE_UCP affects \b, and
589
+ \B because they are defined in terms of \w and \W. Matching these sequences
590
+ is noticeably slower when PCRE_UCP is set.
591
+ </P>
592
+ <P>
593
+ The sequences \h, \H, \v, and \V are features that were added to Perl at
594
+ release 5.10. In contrast to the other sequences, which match only ASCII
595
+ characters by default, these always match certain high-valued code points,
596
+ whether or not PCRE_UCP is set. The horizontal space characters are:
597
+ <pre>
598
+ U+0009 Horizontal tab (HT)
599
+ U+0020 Space
600
+ U+00A0 Non-break space
601
+ U+1680 Ogham space mark
602
+ U+180E Mongolian vowel separator
603
+ U+2000 En quad
604
+ U+2001 Em quad
605
+ U+2002 En space
606
+ U+2003 Em space
607
+ U+2004 Three-per-em space
608
+ U+2005 Four-per-em space
609
+ U+2006 Six-per-em space
610
+ U+2007 Figure space
611
+ U+2008 Punctuation space
612
+ U+2009 Thin space
613
+ U+200A Hair space
614
+ U+202F Narrow no-break space
615
+ U+205F Medium mathematical space
616
+ U+3000 Ideographic space
617
+ </pre>
618
+ The vertical space characters are:
619
+ <pre>
620
+ U+000A Linefeed (LF)
621
+ U+000B Vertical tab (VT)
622
+ U+000C Form feed (FF)
623
+ U+000D Carriage return (CR)
624
+ U+0085 Next line (NEL)
625
+ U+2028 Line separator
626
+ U+2029 Paragraph separator
627
+ </pre>
628
+ In 8-bit, non-UTF-8 mode, only the characters with codepoints less than 256 are
629
+ relevant.
630
+ <a name="newlineseq"></a></P>
631
+ <br><b>
632
+ Newline sequences
633
+ </b><br>
634
+ <P>
635
+ Outside a character class, by default, the escape sequence \R matches any
636
+ Unicode newline sequence. In 8-bit non-UTF-8 mode \R is equivalent to the
637
+ following:
638
+ <pre>
639
+ (?&#62;\r\n|\n|\x0b|\f|\r|\x85)
640
+ </pre>
641
+ This is an example of an "atomic group", details of which are given
642
+ <a href="#atomicgroup">below.</a>
643
+ This particular group matches either the two-character sequence CR followed by
644
+ LF, or one of the single characters LF (linefeed, U+000A), VT (vertical tab,
645
+ U+000B), FF (form feed, U+000C), CR (carriage return, U+000D), or NEL (next
646
+ line, U+0085). The two-character sequence is treated as a single unit that
647
+ cannot be split.
648
+ </P>
649
+ <P>
650
+ In other modes, two additional characters whose codepoints are greater than 255
651
+ are added: LS (line separator, U+2028) and PS (paragraph separator, U+2029).
652
+ Unicode character property support is not needed for these characters to be
653
+ recognized.
654
+ </P>
655
+ <P>
656
+ It is possible to restrict \R to match only CR, LF, or CRLF (instead of the
657
+ complete set of Unicode line endings) by setting the option PCRE_BSR_ANYCRLF
658
+ either at compile time or when the pattern is matched. (BSR is an abbrevation
659
+ for "backslash R".) This can be made the default when PCRE is built; if this is
660
+ the case, the other behaviour can be requested via the PCRE_BSR_UNICODE option.
661
+ It is also possible to specify these settings by starting a pattern string with
662
+ one of the following sequences:
663
+ <pre>
664
+ (*BSR_ANYCRLF) CR, LF, or CRLF only
665
+ (*BSR_UNICODE) any Unicode newline sequence
666
+ </pre>
667
+ These override the default and the options given to the compiling function, but
668
+ they can themselves be overridden by options given to a matching function. Note
669
+ that these special settings, which are not Perl-compatible, are recognized only
670
+ at the very start of a pattern, and that they must be in upper case. If more
671
+ than one of them is present, the last one is used. They can be combined with a
672
+ change of newline convention; for example, a pattern can start with:
673
+ <pre>
674
+ (*ANY)(*BSR_ANYCRLF)
675
+ </pre>
676
+ They can also be combined with the (*UTF8), (*UTF16), (*UTF32), (*UTF) or
677
+ (*UCP) special sequences. Inside a character class, \R is treated as an
678
+ unrecognized escape sequence, and so matches the letter "R" by default, but
679
+ causes an error if PCRE_EXTRA is set.
680
+ <a name="uniextseq"></a></P>
681
+ <br><b>
682
+ Unicode character properties
683
+ </b><br>
684
+ <P>
685
+ When PCRE is built with Unicode character property support, three additional
686
+ escape sequences that match characters with specific properties are available.
687
+ When in 8-bit non-UTF-8 mode, these sequences are of course limited to testing
688
+ characters whose codepoints are less than 256, but they do work in this mode.
689
+ The extra escape sequences are:
690
+ <pre>
691
+ \p{<i>xx</i>} a character with the <i>xx</i> property
692
+ \P{<i>xx</i>} a character without the <i>xx</i> property
693
+ \X a Unicode extended grapheme cluster
694
+ </pre>
695
+ The property names represented by <i>xx</i> above are limited to the Unicode
696
+ script names, the general category properties, "Any", which matches any
697
+ character (including newline), and some special PCRE properties (described
698
+ in the
699
+ <a href="#extraprops">next section).</a>
700
+ Other Perl properties such as "InMusicalSymbols" are not currently supported by
701
+ PCRE. Note that \P{Any} does not match any characters, so always causes a
702
+ match failure.
703
+ </P>
704
+ <P>
705
+ Sets of Unicode characters are defined as belonging to certain scripts. A
706
+ character from one of these sets can be matched using a script name. For
707
+ example:
708
+ <pre>
709
+ \p{Greek}
710
+ \P{Han}
711
+ </pre>
712
+ Those that are not part of an identified script are lumped together as
713
+ "Common". The current list of scripts is:
714
+ </P>
715
+ <P>
716
+ Arabic,
717
+ Armenian,
718
+ Avestan,
719
+ Balinese,
720
+ Bamum,
721
+ Bassa_Vah,
722
+ Batak,
723
+ Bengali,
724
+ Bopomofo,
725
+ Brahmi,
726
+ Braille,
727
+ Buginese,
728
+ Buhid,
729
+ Canadian_Aboriginal,
730
+ Carian,
731
+ Caucasian_Albanian,
732
+ Chakma,
733
+ Cham,
734
+ Cherokee,
735
+ Common,
736
+ Coptic,
737
+ Cuneiform,
738
+ Cypriot,
739
+ Cyrillic,
740
+ Deseret,
741
+ Devanagari,
742
+ Duployan,
743
+ Egyptian_Hieroglyphs,
744
+ Elbasan,
745
+ Ethiopic,
746
+ Georgian,
747
+ Glagolitic,
748
+ Gothic,
749
+ Grantha,
750
+ Greek,
751
+ Gujarati,
752
+ Gurmukhi,
753
+ Han,
754
+ Hangul,
755
+ Hanunoo,
756
+ Hebrew,
757
+ Hiragana,
758
+ Imperial_Aramaic,
759
+ Inherited,
760
+ Inscriptional_Pahlavi,
761
+ Inscriptional_Parthian,
762
+ Javanese,
763
+ Kaithi,
764
+ Kannada,
765
+ Katakana,
766
+ Kayah_Li,
767
+ Kharoshthi,
768
+ Khmer,
769
+ Khojki,
770
+ Khudawadi,
771
+ Lao,
772
+ Latin,
773
+ Lepcha,
774
+ Limbu,
775
+ Linear_A,
776
+ Linear_B,
777
+ Lisu,
778
+ Lycian,
779
+ Lydian,
780
+ Mahajani,
781
+ Malayalam,
782
+ Mandaic,
783
+ Manichaean,
784
+ Meetei_Mayek,
785
+ Mende_Kikakui,
786
+ Meroitic_Cursive,
787
+ Meroitic_Hieroglyphs,
788
+ Miao,
789
+ Modi,
790
+ Mongolian,
791
+ Mro,
792
+ Myanmar,
793
+ Nabataean,
794
+ New_Tai_Lue,
795
+ Nko,
796
+ Ogham,
797
+ Ol_Chiki,
798
+ Old_Italic,
799
+ Old_North_Arabian,
800
+ Old_Permic,
801
+ Old_Persian,
802
+ Old_South_Arabian,
803
+ Old_Turkic,
804
+ Oriya,
805
+ Osmanya,
806
+ Pahawh_Hmong,
807
+ Palmyrene,
808
+ Pau_Cin_Hau,
809
+ Phags_Pa,
810
+ Phoenician,
811
+ Psalter_Pahlavi,
812
+ Rejang,
813
+ Runic,
814
+ Samaritan,
815
+ Saurashtra,
816
+ Sharada,
817
+ Shavian,
818
+ Siddham,
819
+ Sinhala,
820
+ Sora_Sompeng,
821
+ Sundanese,
822
+ Syloti_Nagri,
823
+ Syriac,
824
+ Tagalog,
825
+ Tagbanwa,
826
+ Tai_Le,
827
+ Tai_Tham,
828
+ Tai_Viet,
829
+ Takri,
830
+ Tamil,
831
+ Telugu,
832
+ Thaana,
833
+ Thai,
834
+ Tibetan,
835
+ Tifinagh,
836
+ Tirhuta,
837
+ Ugaritic,
838
+ Vai,
839
+ Warang_Citi,
840
+ Yi.
841
+ </P>
842
+ <P>
843
+ Each character has exactly one Unicode general category property, specified by
844
+ a two-letter abbreviation. For compatibility with Perl, negation can be
845
+ specified by including a circumflex between the opening brace and the property
846
+ name. For example, \p{^Lu} is the same as \P{Lu}.
847
+ </P>
848
+ <P>
849
+ If only one letter is specified with \p or \P, it includes all the general
850
+ category properties that start with that letter. In this case, in the absence
851
+ of negation, the curly brackets in the escape sequence are optional; these two
852
+ examples have the same effect:
853
+ <pre>
854
+ \p{L}
855
+ \pL
856
+ </pre>
857
+ The following general category property codes are supported:
858
+ <pre>
859
+ C Other
860
+ Cc Control
861
+ Cf Format
862
+ Cn Unassigned
863
+ Co Private use
864
+ Cs Surrogate
865
+
866
+ L Letter
867
+ Ll Lower case letter
868
+ Lm Modifier letter
869
+ Lo Other letter
870
+ Lt Title case letter
871
+ Lu Upper case letter
872
+
873
+ M Mark
874
+ Mc Spacing mark
875
+ Me Enclosing mark
876
+ Mn Non-spacing mark
877
+
878
+ N Number
879
+ Nd Decimal number
880
+ Nl Letter number
881
+ No Other number
882
+
883
+ P Punctuation
884
+ Pc Connector punctuation
885
+ Pd Dash punctuation
886
+ Pe Close punctuation
887
+ Pf Final punctuation
888
+ Pi Initial punctuation
889
+ Po Other punctuation
890
+ Ps Open punctuation
891
+
892
+ S Symbol
893
+ Sc Currency symbol
894
+ Sk Modifier symbol
895
+ Sm Mathematical symbol
896
+ So Other symbol
897
+
898
+ Z Separator
899
+ Zl Line separator
900
+ Zp Paragraph separator
901
+ Zs Space separator
902
+ </pre>
903
+ The special property L& is also supported: it matches a character that has
904
+ the Lu, Ll, or Lt property, in other words, a letter that is not classified as
905
+ a modifier or "other".
906
+ </P>
907
+ <P>
908
+ The Cs (Surrogate) property applies only to characters in the range U+D800 to
909
+ U+DFFF. Such characters are not valid in Unicode strings and so
910
+ cannot be tested by PCRE, unless UTF validity checking has been turned off
911
+ (see the discussion of PCRE_NO_UTF8_CHECK, PCRE_NO_UTF16_CHECK and
912
+ PCRE_NO_UTF32_CHECK in the
913
+ <a href="pcreapi.html"><b>pcreapi</b></a>
914
+ page). Perl does not support the Cs property.
915
+ </P>
916
+ <P>
917
+ The long synonyms for property names that Perl supports (such as \p{Letter})
918
+ are not supported by PCRE, nor is it permitted to prefix any of these
919
+ properties with "Is".
920
+ </P>
921
+ <P>
922
+ No character that is in the Unicode table has the Cn (unassigned) property.
923
+ Instead, this property is assumed for any code point that is not in the
924
+ Unicode table.
925
+ </P>
926
+ <P>
927
+ Specifying caseless matching does not affect these escape sequences. For
928
+ example, \p{Lu} always matches only upper case letters. This is different from
929
+ the behaviour of current versions of Perl.
930
+ </P>
931
+ <P>
932
+ Matching characters by Unicode property is not fast, because PCRE has to do a
933
+ multistage table lookup in order to find a character's property. That is why
934
+ the traditional escape sequences such as \d and \w do not use Unicode
935
+ properties in PCRE by default, though you can make them do so by setting the
936
+ PCRE_UCP option or by starting the pattern with (*UCP).
937
+ </P>
938
+ <br><b>
939
+ Extended grapheme clusters
940
+ </b><br>
941
+ <P>
942
+ The \X escape matches any number of Unicode characters that form an "extended
943
+ grapheme cluster", and treats the sequence as an atomic group
944
+ <a href="#atomicgroup">(see below).</a>
945
+ Up to and including release 8.31, PCRE matched an earlier, simpler definition
946
+ that was equivalent to
947
+ <pre>
948
+ (?&#62;\PM\pM*)
949
+ </pre>
950
+ That is, it matched a character without the "mark" property, followed by zero
951
+ or more characters with the "mark" property. Characters with the "mark"
952
+ property are typically non-spacing accents that affect the preceding character.
953
+ </P>
954
+ <P>
955
+ This simple definition was extended in Unicode to include more complicated
956
+ kinds of composite character by giving each character a grapheme breaking
957
+ property, and creating rules that use these properties to define the boundaries
958
+ of extended grapheme clusters. In releases of PCRE later than 8.31, \X matches
959
+ one of these clusters.
960
+ </P>
961
+ <P>
962
+ \X always matches at least one character. Then it decides whether to add
963
+ additional characters according to the following rules for ending a cluster:
964
+ </P>
965
+ <P>
966
+ 1. End at the end of the subject string.
967
+ </P>
968
+ <P>
969
+ 2. Do not end between CR and LF; otherwise end after any control character.
970
+ </P>
971
+ <P>
972
+ 3. Do not break Hangul (a Korean script) syllable sequences. Hangul characters
973
+ are of five types: L, V, T, LV, and LVT. An L character may be followed by an
974
+ L, V, LV, or LVT character; an LV or V character may be followed by a V or T
975
+ character; an LVT or T character may be follwed only by a T character.
976
+ </P>
977
+ <P>
978
+ 4. Do not end before extending characters or spacing marks. Characters with
979
+ the "mark" property always have the "extend" grapheme breaking property.
980
+ </P>
981
+ <P>
982
+ 5. Do not end after prepend characters.
983
+ </P>
984
+ <P>
985
+ 6. Otherwise, end the cluster.
986
+ <a name="extraprops"></a></P>
987
+ <br><b>
988
+ PCRE's additional properties
989
+ </b><br>
990
+ <P>
991
+ As well as the standard Unicode properties described above, PCRE supports four
992
+ more that make it possible to convert traditional escape sequences such as \w
993
+ and \s to use Unicode properties. PCRE uses these non-standard, non-Perl
994
+ properties internally when PCRE_UCP is set. However, they may also be used
995
+ explicitly. These properties are:
996
+ <pre>
997
+ Xan Any alphanumeric character
998
+ Xps Any POSIX space character
999
+ Xsp Any Perl space character
1000
+ Xwd Any Perl "word" character
1001
+ </pre>
1002
+ Xan matches characters that have either the L (letter) or the N (number)
1003
+ property. Xps matches the characters tab, linefeed, vertical tab, form feed, or
1004
+ carriage return, and any other character that has the Z (separator) property.
1005
+ Xsp is the same as Xps; it used to exclude vertical tab, for Perl
1006
+ compatibility, but Perl changed, and so PCRE followed at release 8.34. Xwd
1007
+ matches the same characters as Xan, plus underscore.
1008
+ </P>
1009
+ <P>
1010
+ There is another non-standard property, Xuc, which matches any character that
1011
+ can be represented by a Universal Character Name in C++ and other programming
1012
+ languages. These are the characters $, @, ` (grave accent), and all characters
1013
+ with Unicode code points greater than or equal to U+00A0, except for the
1014
+ surrogates U+D800 to U+DFFF. Note that most base (ASCII) characters are
1015
+ excluded. (Universal Character Names are of the form \uHHHH or \UHHHHHHHH
1016
+ where H is a hexadecimal digit. Note that the Xuc property does not match these
1017
+ sequences but the characters that they represent.)
1018
+ <a name="resetmatchstart"></a></P>
1019
+ <br><b>
1020
+ Resetting the match start
1021
+ </b><br>
1022
+ <P>
1023
+ The escape sequence \K causes any previously matched characters not to be
1024
+ included in the final matched sequence. For example, the pattern:
1025
+ <pre>
1026
+ foo\Kbar
1027
+ </pre>
1028
+ matches "foobar", but reports that it has matched "bar". This feature is
1029
+ similar to a lookbehind assertion
1030
+ <a href="#lookbehind">(described below).</a>
1031
+ However, in this case, the part of the subject before the real match does not
1032
+ have to be of fixed length, as lookbehind assertions do. The use of \K does
1033
+ not interfere with the setting of
1034
+ <a href="#subpattern">captured substrings.</a>
1035
+ For example, when the pattern
1036
+ <pre>
1037
+ (foo)\Kbar
1038
+ </pre>
1039
+ matches "foobar", the first substring is still set to "foo".
1040
+ </P>
1041
+ <P>
1042
+ Perl documents that the use of \K within assertions is "not well defined". In
1043
+ PCRE, \K is acted upon when it occurs inside positive assertions, but is
1044
+ ignored in negative assertions. Note that when a pattern such as (?=ab\K)
1045
+ matches, the reported start of the match can be greater than the end of the
1046
+ match.
1047
+ <a name="smallassertions"></a></P>
1048
+ <br><b>
1049
+ Simple assertions
1050
+ </b><br>
1051
+ <P>
1052
+ The final use of backslash is for certain simple assertions. An assertion
1053
+ specifies a condition that has to be met at a particular point in a match,
1054
+ without consuming any characters from the subject string. The use of
1055
+ subpatterns for more complicated assertions is described
1056
+ <a href="#bigassertions">below.</a>
1057
+ The backslashed assertions are:
1058
+ <pre>
1059
+ \b matches at a word boundary
1060
+ \B matches when not at a word boundary
1061
+ \A matches at the start of the subject
1062
+ \Z matches at the end of the subject
1063
+ also matches before a newline at the end of the subject
1064
+ \z matches only at the end of the subject
1065
+ \G matches at the first matching position in the subject
1066
+ </pre>
1067
+ Inside a character class, \b has a different meaning; it matches the backspace
1068
+ character. If any other of these assertions appears in a character class, by
1069
+ default it matches the corresponding literal character (for example, \B
1070
+ matches the letter B). However, if the PCRE_EXTRA option is set, an "invalid
1071
+ escape sequence" error is generated instead.
1072
+ </P>
1073
+ <P>
1074
+ A word boundary is a position in the subject string where the current character
1075
+ and the previous character do not both match \w or \W (i.e. one matches
1076
+ \w and the other matches \W), or the start or end of the string if the
1077
+ first or last character matches \w, respectively. In a UTF mode, the meanings
1078
+ of \w and \W can be changed by setting the PCRE_UCP option. When this is
1079
+ done, it also affects \b and \B. Neither PCRE nor Perl has a separate "start
1080
+ of word" or "end of word" metasequence. However, whatever follows \b normally
1081
+ determines which it is. For example, the fragment \ba matches "a" at the start
1082
+ of a word.
1083
+ </P>
1084
+ <P>
1085
+ The \A, \Z, and \z assertions differ from the traditional circumflex and
1086
+ dollar (described in the next section) in that they only ever match at the very
1087
+ start and end of the subject string, whatever options are set. Thus, they are
1088
+ independent of multiline mode. These three assertions are not affected by the
1089
+ PCRE_NOTBOL or PCRE_NOTEOL options, which affect only the behaviour of the
1090
+ circumflex and dollar metacharacters. However, if the <i>startoffset</i>
1091
+ argument of <b>pcre_exec()</b> is non-zero, indicating that matching is to start
1092
+ at a point other than the beginning of the subject, \A can never match. The
1093
+ difference between \Z and \z is that \Z matches before a newline at the end
1094
+ of the string as well as at the very end, whereas \z matches only at the end.
1095
+ </P>
1096
+ <P>
1097
+ The \G assertion is true only when the current matching position is at the
1098
+ start point of the match, as specified by the <i>startoffset</i> argument of
1099
+ <b>pcre_exec()</b>. It differs from \A when the value of <i>startoffset</i> is
1100
+ non-zero. By calling <b>pcre_exec()</b> multiple times with appropriate
1101
+ arguments, you can mimic Perl's /g option, and it is in this kind of
1102
+ implementation where \G can be useful.
1103
+ </P>
1104
+ <P>
1105
+ Note, however, that PCRE's interpretation of \G, as the start of the current
1106
+ match, is subtly different from Perl's, which defines it as the end of the
1107
+ previous match. In Perl, these can be different when the previously matched
1108
+ string was empty. Because PCRE does just one match at a time, it cannot
1109
+ reproduce this behaviour.
1110
+ </P>
1111
+ <P>
1112
+ If all the alternatives of a pattern begin with \G, the expression is anchored
1113
+ to the starting match position, and the "anchored" flag is set in the compiled
1114
+ regular expression.
1115
+ </P>
1116
+ <br><a name="SEC6" href="#TOC1">CIRCUMFLEX AND DOLLAR</a><br>
1117
+ <P>
1118
+ The circumflex and dollar metacharacters are zero-width assertions. That is,
1119
+ they test for a particular condition being true without consuming any
1120
+ characters from the subject string.
1121
+ </P>
1122
+ <P>
1123
+ Outside a character class, in the default matching mode, the circumflex
1124
+ character is an assertion that is true only if the current matching point is at
1125
+ the start of the subject string. If the <i>startoffset</i> argument of
1126
+ <b>pcre_exec()</b> is non-zero, circumflex can never match if the PCRE_MULTILINE
1127
+ option is unset. Inside a character class, circumflex has an entirely different
1128
+ meaning
1129
+ <a href="#characterclass">(see below).</a>
1130
+ </P>
1131
+ <P>
1132
+ Circumflex need not be the first character of the pattern if a number of
1133
+ alternatives are involved, but it should be the first thing in each alternative
1134
+ in which it appears if the pattern is ever to match that branch. If all
1135
+ possible alternatives start with a circumflex, that is, if the pattern is
1136
+ constrained to match only at the start of the subject, it is said to be an
1137
+ "anchored" pattern. (There are also other constructs that can cause a pattern
1138
+ to be anchored.)
1139
+ </P>
1140
+ <P>
1141
+ The dollar character is an assertion that is true only if the current matching
1142
+ point is at the end of the subject string, or immediately before a newline at
1143
+ the end of the string (by default). Note, however, that it does not actually
1144
+ match the newline. Dollar need not be the last character of the pattern if a
1145
+ number of alternatives are involved, but it should be the last item in any
1146
+ branch in which it appears. Dollar has no special meaning in a character class.
1147
+ </P>
1148
+ <P>
1149
+ The meaning of dollar can be changed so that it matches only at the very end of
1150
+ the string, by setting the PCRE_DOLLAR_ENDONLY option at compile time. This
1151
+ does not affect the \Z assertion.
1152
+ </P>
1153
+ <P>
1154
+ The meanings of the circumflex and dollar characters are changed if the
1155
+ PCRE_MULTILINE option is set. When this is the case, a circumflex matches
1156
+ immediately after internal newlines as well as at the start of the subject
1157
+ string. It does not match after a newline that ends the string. A dollar
1158
+ matches before any newlines in the string, as well as at the very end, when
1159
+ PCRE_MULTILINE is set. When newline is specified as the two-character
1160
+ sequence CRLF, isolated CR and LF characters do not indicate newlines.
1161
+ </P>
1162
+ <P>
1163
+ For example, the pattern /^abc$/ matches the subject string "def\nabc" (where
1164
+ \n represents a newline) in multiline mode, but not otherwise. Consequently,
1165
+ patterns that are anchored in single line mode because all branches start with
1166
+ ^ are not anchored in multiline mode, and a match for circumflex is possible
1167
+ when the <i>startoffset</i> argument of <b>pcre_exec()</b> is non-zero. The
1168
+ PCRE_DOLLAR_ENDONLY option is ignored if PCRE_MULTILINE is set.
1169
+ </P>
1170
+ <P>
1171
+ Note that the sequences \A, \Z, and \z can be used to match the start and
1172
+ end of the subject in both modes, and if all branches of a pattern start with
1173
+ \A it is always anchored, whether or not PCRE_MULTILINE is set.
1174
+ <a name="fullstopdot"></a></P>
1175
+ <br><a name="SEC7" href="#TOC1">FULL STOP (PERIOD, DOT) AND \N</a><br>
1176
+ <P>
1177
+ Outside a character class, a dot in the pattern matches any one character in
1178
+ the subject string except (by default) a character that signifies the end of a
1179
+ line.
1180
+ </P>
1181
+ <P>
1182
+ When a line ending is defined as a single character, dot never matches that
1183
+ character; when the two-character sequence CRLF is used, dot does not match CR
1184
+ if it is immediately followed by LF, but otherwise it matches all characters
1185
+ (including isolated CRs and LFs). When any Unicode line endings are being
1186
+ recognized, dot does not match CR or LF or any of the other line ending
1187
+ characters.
1188
+ </P>
1189
+ <P>
1190
+ The behaviour of dot with regard to newlines can be changed. If the PCRE_DOTALL
1191
+ option is set, a dot matches any one character, without exception. If the
1192
+ two-character sequence CRLF is present in the subject string, it takes two dots
1193
+ to match it.
1194
+ </P>
1195
+ <P>
1196
+ The handling of dot is entirely independent of the handling of circumflex and
1197
+ dollar, the only relationship being that they both involve newlines. Dot has no
1198
+ special meaning in a character class.
1199
+ </P>
1200
+ <P>
1201
+ The escape sequence \N behaves like a dot, except that it is not affected by
1202
+ the PCRE_DOTALL option. In other words, it matches any character except one
1203
+ that signifies the end of a line. Perl also uses \N to match characters by
1204
+ name; PCRE does not support this.
1205
+ </P>
1206
+ <br><a name="SEC8" href="#TOC1">MATCHING A SINGLE DATA UNIT</a><br>
1207
+ <P>
1208
+ Outside a character class, the escape sequence \C matches any one data unit,
1209
+ whether or not a UTF mode is set. In the 8-bit library, one data unit is one
1210
+ byte; in the 16-bit library it is a 16-bit unit; in the 32-bit library it is
1211
+ a 32-bit unit. Unlike a dot, \C always
1212
+ matches line-ending characters. The feature is provided in Perl in order to
1213
+ match individual bytes in UTF-8 mode, but it is unclear how it can usefully be
1214
+ used. Because \C breaks up characters into individual data units, matching one
1215
+ unit with \C in a UTF mode means that the rest of the string may start with a
1216
+ malformed UTF character. This has undefined results, because PCRE assumes that
1217
+ it is dealing with valid UTF strings (and by default it checks this at the
1218
+ start of processing unless the PCRE_NO_UTF8_CHECK, PCRE_NO_UTF16_CHECK or
1219
+ PCRE_NO_UTF32_CHECK option is used).
1220
+ </P>
1221
+ <P>
1222
+ PCRE does not allow \C to appear in lookbehind assertions
1223
+ <a href="#lookbehind">(described below)</a>
1224
+ in a UTF mode, because this would make it impossible to calculate the length of
1225
+ the lookbehind.
1226
+ </P>
1227
+ <P>
1228
+ In general, the \C escape sequence is best avoided. However, one
1229
+ way of using it that avoids the problem of malformed UTF characters is to use a
1230
+ lookahead to check the length of the next character, as in this pattern, which
1231
+ could be used with a UTF-8 string (ignore white space and line breaks):
1232
+ <pre>
1233
+ (?| (?=[\x00-\x7f])(\C) |
1234
+ (?=[\x80-\x{7ff}])(\C)(\C) |
1235
+ (?=[\x{800}-\x{ffff}])(\C)(\C)(\C) |
1236
+ (?=[\x{10000}-\x{1fffff}])(\C)(\C)(\C)(\C))
1237
+ </pre>
1238
+ A group that starts with (?| resets the capturing parentheses numbers in each
1239
+ alternative (see
1240
+ <a href="#dupsubpatternnumber">"Duplicate Subpattern Numbers"</a>
1241
+ below). The assertions at the start of each branch check the next UTF-8
1242
+ character for values whose encoding uses 1, 2, 3, or 4 bytes, respectively. The
1243
+ character's individual bytes are then captured by the appropriate number of
1244
+ groups.
1245
+ <a name="characterclass"></a></P>
1246
+ <br><a name="SEC9" href="#TOC1">SQUARE BRACKETS AND CHARACTER CLASSES</a><br>
1247
+ <P>
1248
+ An opening square bracket introduces a character class, terminated by a closing
1249
+ square bracket. A closing square bracket on its own is not special by default.
1250
+ However, if the PCRE_JAVASCRIPT_COMPAT option is set, a lone closing square
1251
+ bracket causes a compile-time error. If a closing square bracket is required as
1252
+ a member of the class, it should be the first data character in the class
1253
+ (after an initial circumflex, if present) or escaped with a backslash.
1254
+ </P>
1255
+ <P>
1256
+ A character class matches a single character in the subject. In a UTF mode, the
1257
+ character may be more than one data unit long. A matched character must be in
1258
+ the set of characters defined by the class, unless the first character in the
1259
+ class definition is a circumflex, in which case the subject character must not
1260
+ be in the set defined by the class. If a circumflex is actually required as a
1261
+ member of the class, ensure it is not the first character, or escape it with a
1262
+ backslash.
1263
+ </P>
1264
+ <P>
1265
+ For example, the character class [aeiou] matches any lower case vowel, while
1266
+ [^aeiou] matches any character that is not a lower case vowel. Note that a
1267
+ circumflex is just a convenient notation for specifying the characters that
1268
+ are in the class by enumerating those that are not. A class that starts with a
1269
+ circumflex is not an assertion; it still consumes a character from the subject
1270
+ string, and therefore it fails if the current pointer is at the end of the
1271
+ string.
1272
+ </P>
1273
+ <P>
1274
+ In UTF-8 (UTF-16, UTF-32) mode, characters with values greater than 255 (0xffff)
1275
+ can be included in a class as a literal string of data units, or by using the
1276
+ \x{ escaping mechanism.
1277
+ </P>
1278
+ <P>
1279
+ When caseless matching is set, any letters in a class represent both their
1280
+ upper case and lower case versions, so for example, a caseless [aeiou] matches
1281
+ "A" as well as "a", and a caseless [^aeiou] does not match "A", whereas a
1282
+ caseful version would. In a UTF mode, PCRE always understands the concept of
1283
+ case for characters whose values are less than 128, so caseless matching is
1284
+ always possible. For characters with higher values, the concept of case is
1285
+ supported if PCRE is compiled with Unicode property support, but not otherwise.
1286
+ If you want to use caseless matching in a UTF mode for characters 128 and
1287
+ above, you must ensure that PCRE is compiled with Unicode property support as
1288
+ well as with UTF support.
1289
+ </P>
1290
+ <P>
1291
+ Characters that might indicate line breaks are never treated in any special way
1292
+ when matching character classes, whatever line-ending sequence is in use, and
1293
+ whatever setting of the PCRE_DOTALL and PCRE_MULTILINE options is used. A class
1294
+ such as [^a] always matches one of these characters.
1295
+ </P>
1296
+ <P>
1297
+ The minus (hyphen) character can be used to specify a range of characters in a
1298
+ character class. For example, [d-m] matches any letter between d and m,
1299
+ inclusive. If a minus character is required in a class, it must be escaped with
1300
+ a backslash or appear in a position where it cannot be interpreted as
1301
+ indicating a range, typically as the first or last character in the class, or
1302
+ immediately after a range. For example, [b-d-z] matches letters in the range b
1303
+ to d, a hyphen character, or z.
1304
+ </P>
1305
+ <P>
1306
+ It is not possible to have the literal character "]" as the end character of a
1307
+ range. A pattern such as [W-]46] is interpreted as a class of two characters
1308
+ ("W" and "-") followed by a literal string "46]", so it would match "W46]" or
1309
+ "-46]". However, if the "]" is escaped with a backslash it is interpreted as
1310
+ the end of range, so [W-\]46] is interpreted as a class containing a range
1311
+ followed by two other characters. The octal or hexadecimal representation of
1312
+ "]" can also be used to end a range.
1313
+ </P>
1314
+ <P>
1315
+ An error is generated if a POSIX character class (see below) or an escape
1316
+ sequence other than one that defines a single character appears at a point
1317
+ where a range ending character is expected. For example, [z-\xff] is valid,
1318
+ but [A-\d] and [A-[:digit:]] are not.
1319
+ </P>
1320
+ <P>
1321
+ Ranges operate in the collating sequence of character values. They can also be
1322
+ used for characters specified numerically, for example [\000-\037]. Ranges
1323
+ can include any characters that are valid for the current mode.
1324
+ </P>
1325
+ <P>
1326
+ If a range that includes letters is used when caseless matching is set, it
1327
+ matches the letters in either case. For example, [W-c] is equivalent to
1328
+ [][\\^_`wxyzabc], matched caselessly, and in a non-UTF mode, if character
1329
+ tables for a French locale are in use, [\xc8-\xcb] matches accented E
1330
+ characters in both cases. In UTF modes, PCRE supports the concept of case for
1331
+ characters with values greater than 128 only when it is compiled with Unicode
1332
+ property support.
1333
+ </P>
1334
+ <P>
1335
+ The character escape sequences \d, \D, \h, \H, \p, \P, \s, \S, \v,
1336
+ \V, \w, and \W may appear in a character class, and add the characters that
1337
+ they match to the class. For example, [\dABCDEF] matches any hexadecimal
1338
+ digit. In UTF modes, the PCRE_UCP option affects the meanings of \d, \s, \w
1339
+ and their upper case partners, just as it does when they appear outside a
1340
+ character class, as described in the section entitled
1341
+ <a href="#genericchartypes">"Generic character types"</a>
1342
+ above. The escape sequence \b has a different meaning inside a character
1343
+ class; it matches the backspace character. The sequences \B, \N, \R, and \X
1344
+ are not special inside a character class. Like any other unrecognized escape
1345
+ sequences, they are treated as the literal characters "B", "N", "R", and "X" by
1346
+ default, but cause an error if the PCRE_EXTRA option is set.
1347
+ </P>
1348
+ <P>
1349
+ A circumflex can conveniently be used with the upper case character types to
1350
+ specify a more restricted set of characters than the matching lower case type.
1351
+ For example, the class [^\W_] matches any letter or digit, but not underscore,
1352
+ whereas [\w] includes underscore. A positive character class should be read as
1353
+ "something OR something OR ..." and a negative class as "NOT something AND NOT
1354
+ something AND NOT ...".
1355
+ </P>
1356
+ <P>
1357
+ The only metacharacters that are recognized in character classes are backslash,
1358
+ hyphen (only where it can be interpreted as specifying a range), circumflex
1359
+ (only at the start), opening square bracket (only when it can be interpreted as
1360
+ introducing a POSIX class name, or for a special compatibility feature - see
1361
+ the next two sections), and the terminating closing square bracket. However,
1362
+ escaping other non-alphanumeric characters does no harm.
1363
+ </P>
1364
+ <br><a name="SEC10" href="#TOC1">POSIX CHARACTER CLASSES</a><br>
1365
+ <P>
1366
+ Perl supports the POSIX notation for character classes. This uses names
1367
+ enclosed by [: and :] within the enclosing square brackets. PCRE also supports
1368
+ this notation. For example,
1369
+ <pre>
1370
+ [01[:alpha:]%]
1371
+ </pre>
1372
+ matches "0", "1", any alphabetic character, or "%". The supported class names
1373
+ are:
1374
+ <pre>
1375
+ alnum letters and digits
1376
+ alpha letters
1377
+ ascii character codes 0 - 127
1378
+ blank space or tab only
1379
+ cntrl control characters
1380
+ digit decimal digits (same as \d)
1381
+ graph printing characters, excluding space
1382
+ lower lower case letters
1383
+ print printing characters, including space
1384
+ punct printing characters, excluding letters and digits and space
1385
+ space white space (the same as \s from PCRE 8.34)
1386
+ upper upper case letters
1387
+ word "word" characters (same as \w)
1388
+ xdigit hexadecimal digits
1389
+ </pre>
1390
+ The default "space" characters are HT (9), LF (10), VT (11), FF (12), CR (13),
1391
+ and space (32). If locale-specific matching is taking place, the list of space
1392
+ characters may be different; there may be fewer or more of them. "Space" used
1393
+ to be different to \s, which did not include VT, for Perl compatibility.
1394
+ However, Perl changed at release 5.18, and PCRE followed at release 8.34.
1395
+ "Space" and \s now match the same set of characters.
1396
+ </P>
1397
+ <P>
1398
+ The name "word" is a Perl extension, and "blank" is a GNU extension from Perl
1399
+ 5.8. Another Perl extension is negation, which is indicated by a ^ character
1400
+ after the colon. For example,
1401
+ <pre>
1402
+ [12[:^digit:]]
1403
+ </pre>
1404
+ matches "1", "2", or any non-digit. PCRE (and Perl) also recognize the POSIX
1405
+ syntax [.ch.] and [=ch=] where "ch" is a "collating element", but these are not
1406
+ supported, and an error is given if they are encountered.
1407
+ </P>
1408
+ <P>
1409
+ By default, characters with values greater than 128 do not match any of the
1410
+ POSIX character classes. However, if the PCRE_UCP option is passed to
1411
+ <b>pcre_compile()</b>, some of the classes are changed so that Unicode character
1412
+ properties are used. This is achieved by replacing certain POSIX classes by
1413
+ other sequences, as follows:
1414
+ <pre>
1415
+ [:alnum:] becomes \p{Xan}
1416
+ [:alpha:] becomes \p{L}
1417
+ [:blank:] becomes \h
1418
+ [:digit:] becomes \p{Nd}
1419
+ [:lower:] becomes \p{Ll}
1420
+ [:space:] becomes \p{Xps}
1421
+ [:upper:] becomes \p{Lu}
1422
+ [:word:] becomes \p{Xwd}
1423
+ </pre>
1424
+ Negated versions, such as [:^alpha:] use \P instead of \p. Three other POSIX
1425
+ classes are handled specially in UCP mode:
1426
+ </P>
1427
+ <P>
1428
+ [:graph:]
1429
+ This matches characters that have glyphs that mark the page when printed. In
1430
+ Unicode property terms, it matches all characters with the L, M, N, P, S, or Cf
1431
+ properties, except for:
1432
+ <pre>
1433
+ U+061C Arabic Letter Mark
1434
+ U+180E Mongolian Vowel Separator
1435
+ U+2066 - U+2069 Various "isolate"s
1436
+
1437
+ </PRE>
1438
+ </P>
1439
+ <P>
1440
+ [:print:]
1441
+ This matches the same characters as [:graph:] plus space characters that are
1442
+ not controls, that is, characters with the Zs property.
1443
+ </P>
1444
+ <P>
1445
+ [:punct:]
1446
+ This matches all characters that have the Unicode P (punctuation) property,
1447
+ plus those characters whose code points are less than 128 that have the S
1448
+ (Symbol) property.
1449
+ </P>
1450
+ <P>
1451
+ The other POSIX classes are unchanged, and match only characters with code
1452
+ points less than 128.
1453
+ </P>
1454
+ <br><a name="SEC11" href="#TOC1">COMPATIBILITY FEATURE FOR WORD BOUNDARIES</a><br>
1455
+ <P>
1456
+ In the POSIX.2 compliant library that was included in 4.4BSD Unix, the ugly
1457
+ syntax [[:&#60;:]] and [[:&#62;:]] is used for matching "start of word" and "end of
1458
+ word". PCRE treats these items as follows:
1459
+ <pre>
1460
+ [[:&#60;:]] is converted to \b(?=\w)
1461
+ [[:&#62;:]] is converted to \b(?&#60;=\w)
1462
+ </pre>
1463
+ Only these exact character sequences are recognized. A sequence such as
1464
+ [a[:&#60;:]b] provokes error for an unrecognized POSIX class name. This support is
1465
+ not compatible with Perl. It is provided to help migrations from other
1466
+ environments, and is best not used in any new patterns. Note that \b matches
1467
+ at the start and the end of a word (see
1468
+ <a href="#smallassertions">"Simple assertions"</a>
1469
+ above), and in a Perl-style pattern the preceding or following character
1470
+ normally shows which is wanted, without the need for the assertions that are
1471
+ used above in order to give exactly the POSIX behaviour.
1472
+ </P>
1473
+ <br><a name="SEC12" href="#TOC1">VERTICAL BAR</a><br>
1474
+ <P>
1475
+ Vertical bar characters are used to separate alternative patterns. For example,
1476
+ the pattern
1477
+ <pre>
1478
+ gilbert|sullivan
1479
+ </pre>
1480
+ matches either "gilbert" or "sullivan". Any number of alternatives may appear,
1481
+ and an empty alternative is permitted (matching the empty string). The matching
1482
+ process tries each alternative in turn, from left to right, and the first one
1483
+ that succeeds is used. If the alternatives are within a subpattern
1484
+ <a href="#subpattern">(defined below),</a>
1485
+ "succeeds" means matching the rest of the main pattern as well as the
1486
+ alternative in the subpattern.
1487
+ </P>
1488
+ <br><a name="SEC13" href="#TOC1">INTERNAL OPTION SETTING</a><br>
1489
+ <P>
1490
+ The settings of the PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, and
1491
+ PCRE_EXTENDED options (which are Perl-compatible) can be changed from within
1492
+ the pattern by a sequence of Perl option letters enclosed between "(?" and ")".
1493
+ The option letters are
1494
+ <pre>
1495
+ i for PCRE_CASELESS
1496
+ m for PCRE_MULTILINE
1497
+ s for PCRE_DOTALL
1498
+ x for PCRE_EXTENDED
1499
+ </pre>
1500
+ For example, (?im) sets caseless, multiline matching. It is also possible to
1501
+ unset these options by preceding the letter with a hyphen, and a combined
1502
+ setting and unsetting such as (?im-sx), which sets PCRE_CASELESS and
1503
+ PCRE_MULTILINE while unsetting PCRE_DOTALL and PCRE_EXTENDED, is also
1504
+ permitted. If a letter appears both before and after the hyphen, the option is
1505
+ unset.
1506
+ </P>
1507
+ <P>
1508
+ The PCRE-specific options PCRE_DUPNAMES, PCRE_UNGREEDY, and PCRE_EXTRA can be
1509
+ changed in the same way as the Perl-compatible options by using the characters
1510
+ J, U and X respectively.
1511
+ </P>
1512
+ <P>
1513
+ When one of these option changes occurs at top level (that is, not inside
1514
+ subpattern parentheses), the change applies to the remainder of the pattern
1515
+ that follows. If the change is placed right at the start of a pattern, PCRE
1516
+ extracts it into the global options (and it will therefore show up in data
1517
+ extracted by the <b>pcre_fullinfo()</b> function).
1518
+ </P>
1519
+ <P>
1520
+ An option change within a subpattern (see below for a description of
1521
+ subpatterns) affects only that part of the subpattern that follows it, so
1522
+ <pre>
1523
+ (a(?i)b)c
1524
+ </pre>
1525
+ matches abc and aBc and no other strings (assuming PCRE_CASELESS is not used).
1526
+ By this means, options can be made to have different settings in different
1527
+ parts of the pattern. Any changes made in one alternative do carry on
1528
+ into subsequent branches within the same subpattern. For example,
1529
+ <pre>
1530
+ (a(?i)b|c)
1531
+ </pre>
1532
+ matches "ab", "aB", "c", and "C", even though when matching "C" the first
1533
+ branch is abandoned before the option setting. This is because the effects of
1534
+ option settings happen at compile time. There would be some very weird
1535
+ behaviour otherwise.
1536
+ </P>
1537
+ <P>
1538
+ <b>Note:</b> There are other PCRE-specific options that can be set by the
1539
+ application when the compiling or matching functions are called. In some cases
1540
+ the pattern can contain special leading sequences such as (*CRLF) to override
1541
+ what the application has set or what has been defaulted. Details are given in
1542
+ the section entitled
1543
+ <a href="#newlineseq">"Newline sequences"</a>
1544
+ above. There are also the (*UTF8), (*UTF16),(*UTF32), and (*UCP) leading
1545
+ sequences that can be used to set UTF and Unicode property modes; they are
1546
+ equivalent to setting the PCRE_UTF8, PCRE_UTF16, PCRE_UTF32 and the PCRE_UCP
1547
+ options, respectively. The (*UTF) sequence is a generic version that can be
1548
+ used with any of the libraries. However, the application can set the
1549
+ PCRE_NEVER_UTF option, which locks out the use of the (*UTF) sequences.
1550
+ <a name="subpattern"></a></P>
1551
+ <br><a name="SEC14" href="#TOC1">SUBPATTERNS</a><br>
1552
+ <P>
1553
+ Subpatterns are delimited by parentheses (round brackets), which can be nested.
1554
+ Turning part of a pattern into a subpattern does two things:
1555
+ <br>
1556
+ <br>
1557
+ 1. It localizes a set of alternatives. For example, the pattern
1558
+ <pre>
1559
+ cat(aract|erpillar|)
1560
+ </pre>
1561
+ matches "cataract", "caterpillar", or "cat". Without the parentheses, it would
1562
+ match "cataract", "erpillar" or an empty string.
1563
+ <br>
1564
+ <br>
1565
+ 2. It sets up the subpattern as a capturing subpattern. This means that, when
1566
+ the whole pattern matches, that portion of the subject string that matched the
1567
+ subpattern is passed back to the caller via the <i>ovector</i> argument of the
1568
+ matching function. (This applies only to the traditional matching functions;
1569
+ the DFA matching functions do not support capturing.)
1570
+ </P>
1571
+ <P>
1572
+ Opening parentheses are counted from left to right (starting from 1) to obtain
1573
+ numbers for the capturing subpatterns. For example, if the string "the red
1574
+ king" is matched against the pattern
1575
+ <pre>
1576
+ the ((red|white) (king|queen))
1577
+ </pre>
1578
+ the captured substrings are "red king", "red", and "king", and are numbered 1,
1579
+ 2, and 3, respectively.
1580
+ </P>
1581
+ <P>
1582
+ The fact that plain parentheses fulfil two functions is not always helpful.
1583
+ There are often times when a grouping subpattern is required without a
1584
+ capturing requirement. If an opening parenthesis is followed by a question mark
1585
+ and a colon, the subpattern does not do any capturing, and is not counted when
1586
+ computing the number of any subsequent capturing subpatterns. For example, if
1587
+ the string "the white queen" is matched against the pattern
1588
+ <pre>
1589
+ the ((?:red|white) (king|queen))
1590
+ </pre>
1591
+ the captured substrings are "white queen" and "queen", and are numbered 1 and
1592
+ 2. The maximum number of capturing subpatterns is 65535.
1593
+ </P>
1594
+ <P>
1595
+ As a convenient shorthand, if any option settings are required at the start of
1596
+ a non-capturing subpattern, the option letters may appear between the "?" and
1597
+ the ":". Thus the two patterns
1598
+ <pre>
1599
+ (?i:saturday|sunday)
1600
+ (?:(?i)saturday|sunday)
1601
+ </pre>
1602
+ match exactly the same set of strings. Because alternative branches are tried
1603
+ from left to right, and options are not reset until the end of the subpattern
1604
+ is reached, an option setting in one branch does affect subsequent branches, so
1605
+ the above patterns match "SUNDAY" as well as "Saturday".
1606
+ <a name="dupsubpatternnumber"></a></P>
1607
+ <br><a name="SEC15" href="#TOC1">DUPLICATE SUBPATTERN NUMBERS</a><br>
1608
+ <P>
1609
+ Perl 5.10 introduced a feature whereby each alternative in a subpattern uses
1610
+ the same numbers for its capturing parentheses. Such a subpattern starts with
1611
+ (?| and is itself a non-capturing subpattern. For example, consider this
1612
+ pattern:
1613
+ <pre>
1614
+ (?|(Sat)ur|(Sun))day
1615
+ </pre>
1616
+ Because the two alternatives are inside a (?| group, both sets of capturing
1617
+ parentheses are numbered one. Thus, when the pattern matches, you can look
1618
+ at captured substring number one, whichever alternative matched. This construct
1619
+ is useful when you want to capture part, but not all, of one of a number of
1620
+ alternatives. Inside a (?| group, parentheses are numbered as usual, but the
1621
+ number is reset at the start of each branch. The numbers of any capturing
1622
+ parentheses that follow the subpattern start after the highest number used in
1623
+ any branch. The following example is taken from the Perl documentation. The
1624
+ numbers underneath show in which buffer the captured content will be stored.
1625
+ <pre>
1626
+ # before ---------------branch-reset----------- after
1627
+ / ( a ) (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x
1628
+ # 1 2 2 3 2 3 4
1629
+ </pre>
1630
+ A back reference to a numbered subpattern uses the most recent value that is
1631
+ set for that number by any subpattern. The following pattern matches "abcabc"
1632
+ or "defdef":
1633
+ <pre>
1634
+ /(?|(abc)|(def))\1/
1635
+ </pre>
1636
+ In contrast, a subroutine call to a numbered subpattern always refers to the
1637
+ first one in the pattern with the given number. The following pattern matches
1638
+ "abcabc" or "defabc":
1639
+ <pre>
1640
+ /(?|(abc)|(def))(?1)/
1641
+ </pre>
1642
+ If a
1643
+ <a href="#conditions">condition test</a>
1644
+ for a subpattern's having matched refers to a non-unique number, the test is
1645
+ true if any of the subpatterns of that number have matched.
1646
+ </P>
1647
+ <P>
1648
+ An alternative approach to using this "branch reset" feature is to use
1649
+ duplicate named subpatterns, as described in the next section.
1650
+ </P>
1651
+ <br><a name="SEC16" href="#TOC1">NAMED SUBPATTERNS</a><br>
1652
+ <P>
1653
+ Identifying capturing parentheses by number is simple, but it can be very hard
1654
+ to keep track of the numbers in complicated regular expressions. Furthermore,
1655
+ if an expression is modified, the numbers may change. To help with this
1656
+ difficulty, PCRE supports the naming of subpatterns. This feature was not
1657
+ added to Perl until release 5.10. Python had the feature earlier, and PCRE
1658
+ introduced it at release 4.0, using the Python syntax. PCRE now supports both
1659
+ the Perl and the Python syntax. Perl allows identically numbered subpatterns to
1660
+ have different names, but PCRE does not.
1661
+ </P>
1662
+ <P>
1663
+ In PCRE, a subpattern can be named in one of three ways: (?&#60;name&#62;...) or
1664
+ (?'name'...) as in Perl, or (?P&#60;name&#62;...) as in Python. References to capturing
1665
+ parentheses from other parts of the pattern, such as
1666
+ <a href="#backreferences">back references,</a>
1667
+ <a href="#recursion">recursion,</a>
1668
+ and
1669
+ <a href="#conditions">conditions,</a>
1670
+ can be made by name as well as by number.
1671
+ </P>
1672
+ <P>
1673
+ Names consist of up to 32 alphanumeric characters and underscores, but must
1674
+ start with a non-digit. Named capturing parentheses are still allocated numbers
1675
+ as well as names, exactly as if the names were not present. The PCRE API
1676
+ provides function calls for extracting the name-to-number translation table
1677
+ from a compiled pattern. There is also a convenience function for extracting a
1678
+ captured substring by name.
1679
+ </P>
1680
+ <P>
1681
+ By default, a name must be unique within a pattern, but it is possible to relax
1682
+ this constraint by setting the PCRE_DUPNAMES option at compile time. (Duplicate
1683
+ names are also always permitted for subpatterns with the same number, set up as
1684
+ described in the previous section.) Duplicate names can be useful for patterns
1685
+ where only one instance of the named parentheses can match. Suppose you want to
1686
+ match the name of a weekday, either as a 3-letter abbreviation or as the full
1687
+ name, and in both cases you want to extract the abbreviation. This pattern
1688
+ (ignoring the line breaks) does the job:
1689
+ <pre>
1690
+ (?&#60;DN&#62;Mon|Fri|Sun)(?:day)?|
1691
+ (?&#60;DN&#62;Tue)(?:sday)?|
1692
+ (?&#60;DN&#62;Wed)(?:nesday)?|
1693
+ (?&#60;DN&#62;Thu)(?:rsday)?|
1694
+ (?&#60;DN&#62;Sat)(?:urday)?
1695
+ </pre>
1696
+ There are five capturing substrings, but only one is ever set after a match.
1697
+ (An alternative way of solving this problem is to use a "branch reset"
1698
+ subpattern, as described in the previous section.)
1699
+ </P>
1700
+ <P>
1701
+ The convenience function for extracting the data by name returns the substring
1702
+ for the first (and in this example, the only) subpattern of that name that
1703
+ matched. This saves searching to find which numbered subpattern it was.
1704
+ </P>
1705
+ <P>
1706
+ If you make a back reference to a non-unique named subpattern from elsewhere in
1707
+ the pattern, the subpatterns to which the name refers are checked in the order
1708
+ in which they appear in the overall pattern. The first one that is set is used
1709
+ for the reference. For example, this pattern matches both "foofoo" and
1710
+ "barbar" but not "foobar" or "barfoo":
1711
+ <pre>
1712
+ (?:(?&#60;n&#62;foo)|(?&#60;n&#62;bar))\k&#60;n&#62;
1713
+
1714
+ </PRE>
1715
+ </P>
1716
+ <P>
1717
+ If you make a subroutine call to a non-unique named subpattern, the one that
1718
+ corresponds to the first occurrence of the name is used. In the absence of
1719
+ duplicate numbers (see the previous section) this is the one with the lowest
1720
+ number.
1721
+ </P>
1722
+ <P>
1723
+ If you use a named reference in a condition
1724
+ test (see the
1725
+ <a href="#conditions">section about conditions</a>
1726
+ below), either to check whether a subpattern has matched, or to check for
1727
+ recursion, all subpatterns with the same name are tested. If the condition is
1728
+ true for any one of them, the overall condition is true. This is the same
1729
+ behaviour as testing by number. For further details of the interfaces for
1730
+ handling named subpatterns, see the
1731
+ <a href="pcreapi.html"><b>pcreapi</b></a>
1732
+ documentation.
1733
+ </P>
1734
+ <P>
1735
+ <b>Warning:</b> You cannot use different names to distinguish between two
1736
+ subpatterns with the same number because PCRE uses only the numbers when
1737
+ matching. For this reason, an error is given at compile time if different names
1738
+ are given to subpatterns with the same number. However, you can always give the
1739
+ same name to subpatterns with the same number, even when PCRE_DUPNAMES is not
1740
+ set.
1741
+ </P>
1742
+ <br><a name="SEC17" href="#TOC1">REPETITION</a><br>
1743
+ <P>
1744
+ Repetition is specified by quantifiers, which can follow any of the following
1745
+ items:
1746
+ <pre>
1747
+ a literal data character
1748
+ the dot metacharacter
1749
+ the \C escape sequence
1750
+ the \X escape sequence
1751
+ the \R escape sequence
1752
+ an escape such as \d or \pL that matches a single character
1753
+ a character class
1754
+ a back reference (see next section)
1755
+ a parenthesized subpattern (including assertions)
1756
+ a subroutine call to a subpattern (recursive or otherwise)
1757
+ </pre>
1758
+ The general repetition quantifier specifies a minimum and maximum number of
1759
+ permitted matches, by giving the two numbers in curly brackets (braces),
1760
+ separated by a comma. The numbers must be less than 65536, and the first must
1761
+ be less than or equal to the second. For example:
1762
+ <pre>
1763
+ z{2,4}
1764
+ </pre>
1765
+ matches "zz", "zzz", or "zzzz". A closing brace on its own is not a special
1766
+ character. If the second number is omitted, but the comma is present, there is
1767
+ no upper limit; if the second number and the comma are both omitted, the
1768
+ quantifier specifies an exact number of required matches. Thus
1769
+ <pre>
1770
+ [aeiou]{3,}
1771
+ </pre>
1772
+ matches at least 3 successive vowels, but may match many more, while
1773
+ <pre>
1774
+ \d{8}
1775
+ </pre>
1776
+ matches exactly 8 digits. An opening curly bracket that appears in a position
1777
+ where a quantifier is not allowed, or one that does not match the syntax of a
1778
+ quantifier, is taken as a literal character. For example, {,6} is not a
1779
+ quantifier, but a literal string of four characters.
1780
+ </P>
1781
+ <P>
1782
+ In UTF modes, quantifiers apply to characters rather than to individual data
1783
+ units. Thus, for example, \x{100}{2} matches two characters, each of
1784
+ which is represented by a two-byte sequence in a UTF-8 string. Similarly,
1785
+ \X{3} matches three Unicode extended grapheme clusters, each of which may be
1786
+ several data units long (and they may be of different lengths).
1787
+ </P>
1788
+ <P>
1789
+ The quantifier {0} is permitted, causing the expression to behave as if the
1790
+ previous item and the quantifier were not present. This may be useful for
1791
+ subpatterns that are referenced as
1792
+ <a href="#subpatternsassubroutines">subroutines</a>
1793
+ from elsewhere in the pattern (but see also the section entitled
1794
+ <a href="#subdefine">"Defining subpatterns for use by reference only"</a>
1795
+ below). Items other than subpatterns that have a {0} quantifier are omitted
1796
+ from the compiled pattern.
1797
+ </P>
1798
+ <P>
1799
+ For convenience, the three most common quantifiers have single-character
1800
+ abbreviations:
1801
+ <pre>
1802
+ * is equivalent to {0,}
1803
+ + is equivalent to {1,}
1804
+ ? is equivalent to {0,1}
1805
+ </pre>
1806
+ It is possible to construct infinite loops by following a subpattern that can
1807
+ match no characters with a quantifier that has no upper limit, for example:
1808
+ <pre>
1809
+ (a?)*
1810
+ </pre>
1811
+ Earlier versions of Perl and PCRE used to give an error at compile time for
1812
+ such patterns. However, because there are cases where this can be useful, such
1813
+ patterns are now accepted, but if any repetition of the subpattern does in fact
1814
+ match no characters, the loop is forcibly broken.
1815
+ </P>
1816
+ <P>
1817
+ By default, the quantifiers are "greedy", that is, they match as much as
1818
+ possible (up to the maximum number of permitted times), without causing the
1819
+ rest of the pattern to fail. The classic example of where this gives problems
1820
+ is in trying to match comments in C programs. These appear between /* and */
1821
+ and within the comment, individual * and / characters may appear. An attempt to
1822
+ match C comments by applying the pattern
1823
+ <pre>
1824
+ /\*.*\*/
1825
+ </pre>
1826
+ to the string
1827
+ <pre>
1828
+ /* first comment */ not comment /* second comment */
1829
+ </pre>
1830
+ fails, because it matches the entire string owing to the greediness of the .*
1831
+ item.
1832
+ </P>
1833
+ <P>
1834
+ However, if a quantifier is followed by a question mark, it ceases to be
1835
+ greedy, and instead matches the minimum number of times possible, so the
1836
+ pattern
1837
+ <pre>
1838
+ /\*.*?\*/
1839
+ </pre>
1840
+ does the right thing with the C comments. The meaning of the various
1841
+ quantifiers is not otherwise changed, just the preferred number of matches.
1842
+ Do not confuse this use of question mark with its use as a quantifier in its
1843
+ own right. Because it has two uses, it can sometimes appear doubled, as in
1844
+ <pre>
1845
+ \d??\d
1846
+ </pre>
1847
+ which matches one digit by preference, but can match two if that is the only
1848
+ way the rest of the pattern matches.
1849
+ </P>
1850
+ <P>
1851
+ If the PCRE_UNGREEDY option is set (an option that is not available in Perl),
1852
+ the quantifiers are not greedy by default, but individual ones can be made
1853
+ greedy by following them with a question mark. In other words, it inverts the
1854
+ default behaviour.
1855
+ </P>
1856
+ <P>
1857
+ When a parenthesized subpattern is quantified with a minimum repeat count that
1858
+ is greater than 1 or with a limited maximum, more memory is required for the
1859
+ compiled pattern, in proportion to the size of the minimum or maximum.
1860
+ </P>
1861
+ <P>
1862
+ If a pattern starts with .* or .{0,} and the PCRE_DOTALL option (equivalent
1863
+ to Perl's /s) is set, thus allowing the dot to match newlines, the pattern is
1864
+ implicitly anchored, because whatever follows will be tried against every
1865
+ character position in the subject string, so there is no point in retrying the
1866
+ overall match at any position after the first. PCRE normally treats such a
1867
+ pattern as though it were preceded by \A.
1868
+ </P>
1869
+ <P>
1870
+ In cases where it is known that the subject string contains no newlines, it is
1871
+ worth setting PCRE_DOTALL in order to obtain this optimization, or
1872
+ alternatively using ^ to indicate anchoring explicitly.
1873
+ </P>
1874
+ <P>
1875
+ However, there are some cases where the optimization cannot be used. When .*
1876
+ is inside capturing parentheses that are the subject of a back reference
1877
+ elsewhere in the pattern, a match at the start may fail where a later one
1878
+ succeeds. Consider, for example:
1879
+ <pre>
1880
+ (.*)abc\1
1881
+ </pre>
1882
+ If the subject is "xyz123abc123" the match point is the fourth character. For
1883
+ this reason, such a pattern is not implicitly anchored.
1884
+ </P>
1885
+ <P>
1886
+ Another case where implicit anchoring is not applied is when the leading .* is
1887
+ inside an atomic group. Once again, a match at the start may fail where a later
1888
+ one succeeds. Consider this pattern:
1889
+ <pre>
1890
+ (?&#62;.*?a)b
1891
+ </pre>
1892
+ It matches "ab" in the subject "aab". The use of the backtracking control verbs
1893
+ (*PRUNE) and (*SKIP) also disable this optimization.
1894
+ </P>
1895
+ <P>
1896
+ When a capturing subpattern is repeated, the value captured is the substring
1897
+ that matched the final iteration. For example, after
1898
+ <pre>
1899
+ (tweedle[dume]{3}\s*)+
1900
+ </pre>
1901
+ has matched "tweedledum tweedledee" the value of the captured substring is
1902
+ "tweedledee". However, if there are nested capturing subpatterns, the
1903
+ corresponding captured values may have been set in previous iterations. For
1904
+ example, after
1905
+ <pre>
1906
+ /(a|(b))+/
1907
+ </pre>
1908
+ matches "aba" the value of the second captured substring is "b".
1909
+ <a name="atomicgroup"></a></P>
1910
+ <br><a name="SEC18" href="#TOC1">ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS</a><br>
1911
+ <P>
1912
+ With both maximizing ("greedy") and minimizing ("ungreedy" or "lazy")
1913
+ repetition, failure of what follows normally causes the repeated item to be
1914
+ re-evaluated to see if a different number of repeats allows the rest of the
1915
+ pattern to match. Sometimes it is useful to prevent this, either to change the
1916
+ nature of the match, or to cause it fail earlier than it otherwise might, when
1917
+ the author of the pattern knows there is no point in carrying on.
1918
+ </P>
1919
+ <P>
1920
+ Consider, for example, the pattern \d+foo when applied to the subject line
1921
+ <pre>
1922
+ 123456bar
1923
+ </pre>
1924
+ After matching all 6 digits and then failing to match "foo", the normal
1925
+ action of the matcher is to try again with only 5 digits matching the \d+
1926
+ item, and then with 4, and so on, before ultimately failing. "Atomic grouping"
1927
+ (a term taken from Jeffrey Friedl's book) provides the means for specifying
1928
+ that once a subpattern has matched, it is not to be re-evaluated in this way.
1929
+ </P>
1930
+ <P>
1931
+ If we use atomic grouping for the previous example, the matcher gives up
1932
+ immediately on failing to match "foo" the first time. The notation is a kind of
1933
+ special parenthesis, starting with (?&#62; as in this example:
1934
+ <pre>
1935
+ (?&#62;\d+)foo
1936
+ </pre>
1937
+ This kind of parenthesis "locks up" the part of the pattern it contains once
1938
+ it has matched, and a failure further into the pattern is prevented from
1939
+ backtracking into it. Backtracking past it to previous items, however, works as
1940
+ normal.
1941
+ </P>
1942
+ <P>
1943
+ An alternative description is that a subpattern of this type matches the string
1944
+ of characters that an identical standalone pattern would match, if anchored at
1945
+ the current point in the subject string.
1946
+ </P>
1947
+ <P>
1948
+ Atomic grouping subpatterns are not capturing subpatterns. Simple cases such as
1949
+ the above example can be thought of as a maximizing repeat that must swallow
1950
+ everything it can. So, while both \d+ and \d+? are prepared to adjust the
1951
+ number of digits they match in order to make the rest of the pattern match,
1952
+ (?&#62;\d+) can only match an entire sequence of digits.
1953
+ </P>
1954
+ <P>
1955
+ Atomic groups in general can of course contain arbitrarily complicated
1956
+ subpatterns, and can be nested. However, when the subpattern for an atomic
1957
+ group is just a single repeated item, as in the example above, a simpler
1958
+ notation, called a "possessive quantifier" can be used. This consists of an
1959
+ additional + character following a quantifier. Using this notation, the
1960
+ previous example can be rewritten as
1961
+ <pre>
1962
+ \d++foo
1963
+ </pre>
1964
+ Note that a possessive quantifier can be used with an entire group, for
1965
+ example:
1966
+ <pre>
1967
+ (abc|xyz){2,3}+
1968
+ </pre>
1969
+ Possessive quantifiers are always greedy; the setting of the PCRE_UNGREEDY
1970
+ option is ignored. They are a convenient notation for the simpler forms of
1971
+ atomic group. However, there is no difference in the meaning of a possessive
1972
+ quantifier and the equivalent atomic group, though there may be a performance
1973
+ difference; possessive quantifiers should be slightly faster.
1974
+ </P>
1975
+ <P>
1976
+ The possessive quantifier syntax is an extension to the Perl 5.8 syntax.
1977
+ Jeffrey Friedl originated the idea (and the name) in the first edition of his
1978
+ book. Mike McCloskey liked it, so implemented it when he built Sun's Java
1979
+ package, and PCRE copied it from there. It ultimately found its way into Perl
1980
+ at release 5.10.
1981
+ </P>
1982
+ <P>
1983
+ PCRE has an optimization that automatically "possessifies" certain simple
1984
+ pattern constructs. For example, the sequence A+B is treated as A++B because
1985
+ there is no point in backtracking into a sequence of A's when B must follow.
1986
+ </P>
1987
+ <P>
1988
+ When a pattern contains an unlimited repeat inside a subpattern that can itself
1989
+ be repeated an unlimited number of times, the use of an atomic group is the
1990
+ only way to avoid some failing matches taking a very long time indeed. The
1991
+ pattern
1992
+ <pre>
1993
+ (\D+|&#60;\d+&#62;)*[!?]
1994
+ </pre>
1995
+ matches an unlimited number of substrings that either consist of non-digits, or
1996
+ digits enclosed in &#60;&#62;, followed by either ! or ?. When it matches, it runs
1997
+ quickly. However, if it is applied to
1998
+ <pre>
1999
+ aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
2000
+ </pre>
2001
+ it takes a long time before reporting failure. This is because the string can
2002
+ be divided between the internal \D+ repeat and the external * repeat in a
2003
+ large number of ways, and all have to be tried. (The example uses [!?] rather
2004
+ than a single character at the end, because both PCRE and Perl have an
2005
+ optimization that allows for fast failure when a single character is used. They
2006
+ remember the last single character that is required for a match, and fail early
2007
+ if it is not present in the string.) If the pattern is changed so that it uses
2008
+ an atomic group, like this:
2009
+ <pre>
2010
+ ((?&#62;\D+)|&#60;\d+&#62;)*[!?]
2011
+ </pre>
2012
+ sequences of non-digits cannot be broken, and failure happens quickly.
2013
+ <a name="backreferences"></a></P>
2014
+ <br><a name="SEC19" href="#TOC1">BACK REFERENCES</a><br>
2015
+ <P>
2016
+ Outside a character class, a backslash followed by a digit greater than 0 (and
2017
+ possibly further digits) is a back reference to a capturing subpattern earlier
2018
+ (that is, to its left) in the pattern, provided there have been that many
2019
+ previous capturing left parentheses.
2020
+ </P>
2021
+ <P>
2022
+ However, if the decimal number following the backslash is less than 10, it is
2023
+ always taken as a back reference, and causes an error only if there are not
2024
+ that many capturing left parentheses in the entire pattern. In other words, the
2025
+ parentheses that are referenced need not be to the left of the reference for
2026
+ numbers less than 10. A "forward back reference" of this type can make sense
2027
+ when a repetition is involved and the subpattern to the right has participated
2028
+ in an earlier iteration.
2029
+ </P>
2030
+ <P>
2031
+ It is not possible to have a numerical "forward back reference" to a subpattern
2032
+ whose number is 10 or more using this syntax because a sequence such as \50 is
2033
+ interpreted as a character defined in octal. See the subsection entitled
2034
+ "Non-printing characters"
2035
+ <a href="#digitsafterbackslash">above</a>
2036
+ for further details of the handling of digits following a backslash. There is
2037
+ no such problem when named parentheses are used. A back reference to any
2038
+ subpattern is possible using named parentheses (see below).
2039
+ </P>
2040
+ <P>
2041
+ Another way of avoiding the ambiguity inherent in the use of digits following a
2042
+ backslash is to use the \g escape sequence. This escape must be followed by an
2043
+ unsigned number or a negative number, optionally enclosed in braces. These
2044
+ examples are all identical:
2045
+ <pre>
2046
+ (ring), \1
2047
+ (ring), \g1
2048
+ (ring), \g{1}
2049
+ </pre>
2050
+ An unsigned number specifies an absolute reference without the ambiguity that
2051
+ is present in the older syntax. It is also useful when literal digits follow
2052
+ the reference. A negative number is a relative reference. Consider this
2053
+ example:
2054
+ <pre>
2055
+ (abc(def)ghi)\g{-1}
2056
+ </pre>
2057
+ The sequence \g{-1} is a reference to the most recently started capturing
2058
+ subpattern before \g, that is, is it equivalent to \2 in this example.
2059
+ Similarly, \g{-2} would be equivalent to \1. The use of relative references
2060
+ can be helpful in long patterns, and also in patterns that are created by
2061
+ joining together fragments that contain references within themselves.
2062
+ </P>
2063
+ <P>
2064
+ A back reference matches whatever actually matched the capturing subpattern in
2065
+ the current subject string, rather than anything matching the subpattern
2066
+ itself (see
2067
+ <a href="#subpatternsassubroutines">"Subpatterns as subroutines"</a>
2068
+ below for a way of doing that). So the pattern
2069
+ <pre>
2070
+ (sens|respons)e and \1ibility
2071
+ </pre>
2072
+ matches "sense and sensibility" and "response and responsibility", but not
2073
+ "sense and responsibility". If caseful matching is in force at the time of the
2074
+ back reference, the case of letters is relevant. For example,
2075
+ <pre>
2076
+ ((?i)rah)\s+\1
2077
+ </pre>
2078
+ matches "rah rah" and "RAH RAH", but not "RAH rah", even though the original
2079
+ capturing subpattern is matched caselessly.
2080
+ </P>
2081
+ <P>
2082
+ There are several different ways of writing back references to named
2083
+ subpatterns. The .NET syntax \k{name} and the Perl syntax \k&#60;name&#62; or
2084
+ \k'name' are supported, as is the Python syntax (?P=name). Perl 5.10's unified
2085
+ back reference syntax, in which \g can be used for both numeric and named
2086
+ references, is also supported. We could rewrite the above example in any of
2087
+ the following ways:
2088
+ <pre>
2089
+ (?&#60;p1&#62;(?i)rah)\s+\k&#60;p1&#62;
2090
+ (?'p1'(?i)rah)\s+\k{p1}
2091
+ (?P&#60;p1&#62;(?i)rah)\s+(?P=p1)
2092
+ (?&#60;p1&#62;(?i)rah)\s+\g{p1}
2093
+ </pre>
2094
+ A subpattern that is referenced by name may appear in the pattern before or
2095
+ after the reference.
2096
+ </P>
2097
+ <P>
2098
+ There may be more than one back reference to the same subpattern. If a
2099
+ subpattern has not actually been used in a particular match, any back
2100
+ references to it always fail by default. For example, the pattern
2101
+ <pre>
2102
+ (a|(bc))\2
2103
+ </pre>
2104
+ always fails if it starts to match "a" rather than "bc". However, if the
2105
+ PCRE_JAVASCRIPT_COMPAT option is set at compile time, a back reference to an
2106
+ unset value matches an empty string.
2107
+ </P>
2108
+ <P>
2109
+ Because there may be many capturing parentheses in a pattern, all digits
2110
+ following a backslash are taken as part of a potential back reference number.
2111
+ If the pattern continues with a digit character, some delimiter must be used to
2112
+ terminate the back reference. If the PCRE_EXTENDED option is set, this can be
2113
+ white space. Otherwise, the \g{ syntax or an empty comment (see
2114
+ <a href="#comments">"Comments"</a>
2115
+ below) can be used.
2116
+ </P>
2117
+ <br><b>
2118
+ Recursive back references
2119
+ </b><br>
2120
+ <P>
2121
+ A back reference that occurs inside the parentheses to which it refers fails
2122
+ when the subpattern is first used, so, for example, (a\1) never matches.
2123
+ However, such references can be useful inside repeated subpatterns. For
2124
+ example, the pattern
2125
+ <pre>
2126
+ (a|b\1)+
2127
+ </pre>
2128
+ matches any number of "a"s and also "aba", "ababbaa" etc. At each iteration of
2129
+ the subpattern, the back reference matches the character string corresponding
2130
+ to the previous iteration. In order for this to work, the pattern must be such
2131
+ that the first iteration does not need to match the back reference. This can be
2132
+ done using alternation, as in the example above, or by a quantifier with a
2133
+ minimum of zero.
2134
+ </P>
2135
+ <P>
2136
+ Back references of this type cause the group that they reference to be treated
2137
+ as an
2138
+ <a href="#atomicgroup">atomic group.</a>
2139
+ Once the whole group has been matched, a subsequent matching failure cannot
2140
+ cause backtracking into the middle of the group.
2141
+ <a name="bigassertions"></a></P>
2142
+ <br><a name="SEC20" href="#TOC1">ASSERTIONS</a><br>
2143
+ <P>
2144
+ An assertion is a test on the characters following or preceding the current
2145
+ matching point that does not actually consume any characters. The simple
2146
+ assertions coded as \b, \B, \A, \G, \Z, \z, ^ and $ are described
2147
+ <a href="#smallassertions">above.</a>
2148
+ </P>
2149
+ <P>
2150
+ More complicated assertions are coded as subpatterns. There are two kinds:
2151
+ those that look ahead of the current position in the subject string, and those
2152
+ that look behind it. An assertion subpattern is matched in the normal way,
2153
+ except that it does not cause the current matching position to be changed.
2154
+ </P>
2155
+ <P>
2156
+ Assertion subpatterns are not capturing subpatterns. If such an assertion
2157
+ contains capturing subpatterns within it, these are counted for the purposes of
2158
+ numbering the capturing subpatterns in the whole pattern. However, substring
2159
+ capturing is carried out only for positive assertions. (Perl sometimes, but not
2160
+ always, does do capturing in negative assertions.)
2161
+ </P>
2162
+ <P>
2163
+ For compatibility with Perl, assertion subpatterns may be repeated; though
2164
+ it makes no sense to assert the same thing several times, the side effect of
2165
+ capturing parentheses may occasionally be useful. In practice, there only three
2166
+ cases:
2167
+ <br>
2168
+ <br>
2169
+ (1) If the quantifier is {0}, the assertion is never obeyed during matching.
2170
+ However, it may contain internal capturing parenthesized groups that are called
2171
+ from elsewhere via the
2172
+ <a href="#subpatternsassubroutines">subroutine mechanism.</a>
2173
+ <br>
2174
+ <br>
2175
+ (2) If quantifier is {0,n} where n is greater than zero, it is treated as if it
2176
+ were {0,1}. At run time, the rest of the pattern match is tried with and
2177
+ without the assertion, the order depending on the greediness of the quantifier.
2178
+ <br>
2179
+ <br>
2180
+ (3) If the minimum repetition is greater than zero, the quantifier is ignored.
2181
+ The assertion is obeyed just once when encountered during matching.
2182
+ </P>
2183
+ <br><b>
2184
+ Lookahead assertions
2185
+ </b><br>
2186
+ <P>
2187
+ Lookahead assertions start with (?= for positive assertions and (?! for
2188
+ negative assertions. For example,
2189
+ <pre>
2190
+ \w+(?=;)
2191
+ </pre>
2192
+ matches a word followed by a semicolon, but does not include the semicolon in
2193
+ the match, and
2194
+ <pre>
2195
+ foo(?!bar)
2196
+ </pre>
2197
+ matches any occurrence of "foo" that is not followed by "bar". Note that the
2198
+ apparently similar pattern
2199
+ <pre>
2200
+ (?!foo)bar
2201
+ </pre>
2202
+ does not find an occurrence of "bar" that is preceded by something other than
2203
+ "foo"; it finds any occurrence of "bar" whatsoever, because the assertion
2204
+ (?!foo) is always true when the next three characters are "bar". A
2205
+ lookbehind assertion is needed to achieve the other effect.
2206
+ </P>
2207
+ <P>
2208
+ If you want to force a matching failure at some point in a pattern, the most
2209
+ convenient way to do it is with (?!) because an empty string always matches, so
2210
+ an assertion that requires there not to be an empty string must always fail.
2211
+ The backtracking control verb (*FAIL) or (*F) is a synonym for (?!).
2212
+ <a name="lookbehind"></a></P>
2213
+ <br><b>
2214
+ Lookbehind assertions
2215
+ </b><br>
2216
+ <P>
2217
+ Lookbehind assertions start with (?&#60;= for positive assertions and (?&#60;! for
2218
+ negative assertions. For example,
2219
+ <pre>
2220
+ (?&#60;!foo)bar
2221
+ </pre>
2222
+ does find an occurrence of "bar" that is not preceded by "foo". The contents of
2223
+ a lookbehind assertion are restricted such that all the strings it matches must
2224
+ have a fixed length. However, if there are several top-level alternatives, they
2225
+ do not all have to have the same fixed length. Thus
2226
+ <pre>
2227
+ (?&#60;=bullock|donkey)
2228
+ </pre>
2229
+ is permitted, but
2230
+ <pre>
2231
+ (?&#60;!dogs?|cats?)
2232
+ </pre>
2233
+ causes an error at compile time. Branches that match different length strings
2234
+ are permitted only at the top level of a lookbehind assertion. This is an
2235
+ extension compared with Perl, which requires all branches to match the same
2236
+ length of string. An assertion such as
2237
+ <pre>
2238
+ (?&#60;=ab(c|de))
2239
+ </pre>
2240
+ is not permitted, because its single top-level branch can match two different
2241
+ lengths, but it is acceptable to PCRE if rewritten to use two top-level
2242
+ branches:
2243
+ <pre>
2244
+ (?&#60;=abc|abde)
2245
+ </pre>
2246
+ In some cases, the escape sequence \K
2247
+ <a href="#resetmatchstart">(see above)</a>
2248
+ can be used instead of a lookbehind assertion to get round the fixed-length
2249
+ restriction.
2250
+ </P>
2251
+ <P>
2252
+ The implementation of lookbehind assertions is, for each alternative, to
2253
+ temporarily move the current position back by the fixed length and then try to
2254
+ match. If there are insufficient characters before the current position, the
2255
+ assertion fails.
2256
+ </P>
2257
+ <P>
2258
+ In a UTF mode, PCRE does not allow the \C escape (which matches a single data
2259
+ unit even in a UTF mode) to appear in lookbehind assertions, because it makes
2260
+ it impossible to calculate the length of the lookbehind. The \X and \R
2261
+ escapes, which can match different numbers of data units, are also not
2262
+ permitted.
2263
+ </P>
2264
+ <P>
2265
+ <a href="#subpatternsassubroutines">"Subroutine"</a>
2266
+ calls (see below) such as (?2) or (?&X) are permitted in lookbehinds, as long
2267
+ as the subpattern matches a fixed-length string.
2268
+ <a href="#recursion">Recursion,</a>
2269
+ however, is not supported.
2270
+ </P>
2271
+ <P>
2272
+ Possessive quantifiers can be used in conjunction with lookbehind assertions to
2273
+ specify efficient matching of fixed-length strings at the end of subject
2274
+ strings. Consider a simple pattern such as
2275
+ <pre>
2276
+ abcd$
2277
+ </pre>
2278
+ when applied to a long string that does not match. Because matching proceeds
2279
+ from left to right, PCRE will look for each "a" in the subject and then see if
2280
+ what follows matches the rest of the pattern. If the pattern is specified as
2281
+ <pre>
2282
+ ^.*abcd$
2283
+ </pre>
2284
+ the initial .* matches the entire string at first, but when this fails (because
2285
+ there is no following "a"), it backtracks to match all but the last character,
2286
+ then all but the last two characters, and so on. Once again the search for "a"
2287
+ covers the entire string, from right to left, so we are no better off. However,
2288
+ if the pattern is written as
2289
+ <pre>
2290
+ ^.*+(?&#60;=abcd)
2291
+ </pre>
2292
+ there can be no backtracking for the .*+ item; it can match only the entire
2293
+ string. The subsequent lookbehind assertion does a single test on the last four
2294
+ characters. If it fails, the match fails immediately. For long strings, this
2295
+ approach makes a significant difference to the processing time.
2296
+ </P>
2297
+ <br><b>
2298
+ Using multiple assertions
2299
+ </b><br>
2300
+ <P>
2301
+ Several assertions (of any sort) may occur in succession. For example,
2302
+ <pre>
2303
+ (?&#60;=\d{3})(?&#60;!999)foo
2304
+ </pre>
2305
+ matches "foo" preceded by three digits that are not "999". Notice that each of
2306
+ the assertions is applied independently at the same point in the subject
2307
+ string. First there is a check that the previous three characters are all
2308
+ digits, and then there is a check that the same three characters are not "999".
2309
+ This pattern does <i>not</i> match "foo" preceded by six characters, the first
2310
+ of which are digits and the last three of which are not "999". For example, it
2311
+ doesn't match "123abcfoo". A pattern to do that is
2312
+ <pre>
2313
+ (?&#60;=\d{3}...)(?&#60;!999)foo
2314
+ </pre>
2315
+ This time the first assertion looks at the preceding six characters, checking
2316
+ that the first three are digits, and then the second assertion checks that the
2317
+ preceding three characters are not "999".
2318
+ </P>
2319
+ <P>
2320
+ Assertions can be nested in any combination. For example,
2321
+ <pre>
2322
+ (?&#60;=(?&#60;!foo)bar)baz
2323
+ </pre>
2324
+ matches an occurrence of "baz" that is preceded by "bar" which in turn is not
2325
+ preceded by "foo", while
2326
+ <pre>
2327
+ (?&#60;=\d{3}(?!999)...)foo
2328
+ </pre>
2329
+ is another pattern that matches "foo" preceded by three digits and any three
2330
+ characters that are not "999".
2331
+ <a name="conditions"></a></P>
2332
+ <br><a name="SEC21" href="#TOC1">CONDITIONAL SUBPATTERNS</a><br>
2333
+ <P>
2334
+ It is possible to cause the matching process to obey a subpattern
2335
+ conditionally or to choose between two alternative subpatterns, depending on
2336
+ the result of an assertion, or whether a specific capturing subpattern has
2337
+ already been matched. The two possible forms of conditional subpattern are:
2338
+ <pre>
2339
+ (?(condition)yes-pattern)
2340
+ (?(condition)yes-pattern|no-pattern)
2341
+ </pre>
2342
+ If the condition is satisfied, the yes-pattern is used; otherwise the
2343
+ no-pattern (if present) is used. If there are more than two alternatives in the
2344
+ subpattern, a compile-time error occurs. Each of the two alternatives may
2345
+ itself contain nested subpatterns of any form, including conditional
2346
+ subpatterns; the restriction to two alternatives applies only at the level of
2347
+ the condition. This pattern fragment is an example where the alternatives are
2348
+ complex:
2349
+ <pre>
2350
+ (?(1) (A|B|C) | (D | (?(2)E|F) | E) )
2351
+
2352
+ </PRE>
2353
+ </P>
2354
+ <P>
2355
+ There are four kinds of condition: references to subpatterns, references to
2356
+ recursion, a pseudo-condition called DEFINE, and assertions.
2357
+ </P>
2358
+ <br><b>
2359
+ Checking for a used subpattern by number
2360
+ </b><br>
2361
+ <P>
2362
+ If the text between the parentheses consists of a sequence of digits, the
2363
+ condition is true if a capturing subpattern of that number has previously
2364
+ matched. If there is more than one capturing subpattern with the same number
2365
+ (see the earlier
2366
+ <a href="#recursion">section about duplicate subpattern numbers),</a>
2367
+ the condition is true if any of them have matched. An alternative notation is
2368
+ to precede the digits with a plus or minus sign. In this case, the subpattern
2369
+ number is relative rather than absolute. The most recently opened parentheses
2370
+ can be referenced by (?(-1), the next most recent by (?(-2), and so on. Inside
2371
+ loops it can also make sense to refer to subsequent groups. The next
2372
+ parentheses to be opened can be referenced as (?(+1), and so on. (The value
2373
+ zero in any of these forms is not used; it provokes a compile-time error.)
2374
+ </P>
2375
+ <P>
2376
+ Consider the following pattern, which contains non-significant white space to
2377
+ make it more readable (assume the PCRE_EXTENDED option) and to divide it into
2378
+ three parts for ease of discussion:
2379
+ <pre>
2380
+ ( \( )? [^()]+ (?(1) \) )
2381
+ </pre>
2382
+ The first part matches an optional opening parenthesis, and if that
2383
+ character is present, sets it as the first captured substring. The second part
2384
+ matches one or more characters that are not parentheses. The third part is a
2385
+ conditional subpattern that tests whether or not the first set of parentheses
2386
+ matched. If they did, that is, if subject started with an opening parenthesis,
2387
+ the condition is true, and so the yes-pattern is executed and a closing
2388
+ parenthesis is required. Otherwise, since no-pattern is not present, the
2389
+ subpattern matches nothing. In other words, this pattern matches a sequence of
2390
+ non-parentheses, optionally enclosed in parentheses.
2391
+ </P>
2392
+ <P>
2393
+ If you were embedding this pattern in a larger one, you could use a relative
2394
+ reference:
2395
+ <pre>
2396
+ ...other stuff... ( \( )? [^()]+ (?(-1) \) ) ...
2397
+ </pre>
2398
+ This makes the fragment independent of the parentheses in the larger pattern.
2399
+ </P>
2400
+ <br><b>
2401
+ Checking for a used subpattern by name
2402
+ </b><br>
2403
+ <P>
2404
+ Perl uses the syntax (?(&#60;name&#62;)...) or (?('name')...) to test for a used
2405
+ subpattern by name. For compatibility with earlier versions of PCRE, which had
2406
+ this facility before Perl, the syntax (?(name)...) is also recognized.
2407
+ </P>
2408
+ <P>
2409
+ Rewriting the above example to use a named subpattern gives this:
2410
+ <pre>
2411
+ (?&#60;OPEN&#62; \( )? [^()]+ (?(&#60;OPEN&#62;) \) )
2412
+ </pre>
2413
+ If the name used in a condition of this kind is a duplicate, the test is
2414
+ applied to all subpatterns of the same name, and is true if any one of them has
2415
+ matched.
2416
+ </P>
2417
+ <br><b>
2418
+ Checking for pattern recursion
2419
+ </b><br>
2420
+ <P>
2421
+ If the condition is the string (R), and there is no subpattern with the name R,
2422
+ the condition is true if a recursive call to the whole pattern or any
2423
+ subpattern has been made. If digits or a name preceded by ampersand follow the
2424
+ letter R, for example:
2425
+ <pre>
2426
+ (?(R3)...) or (?(R&name)...)
2427
+ </pre>
2428
+ the condition is true if the most recent recursion is into a subpattern whose
2429
+ number or name is given. This condition does not check the entire recursion
2430
+ stack. If the name used in a condition of this kind is a duplicate, the test is
2431
+ applied to all subpatterns of the same name, and is true if any one of them is
2432
+ the most recent recursion.
2433
+ </P>
2434
+ <P>
2435
+ At "top level", all these recursion test conditions are false.
2436
+ <a href="#recursion">The syntax for recursive patterns</a>
2437
+ is described below.
2438
+ <a name="subdefine"></a></P>
2439
+ <br><b>
2440
+ Defining subpatterns for use by reference only
2441
+ </b><br>
2442
+ <P>
2443
+ If the condition is the string (DEFINE), and there is no subpattern with the
2444
+ name DEFINE, the condition is always false. In this case, there may be only one
2445
+ alternative in the subpattern. It is always skipped if control reaches this
2446
+ point in the pattern; the idea of DEFINE is that it can be used to define
2447
+ subroutines that can be referenced from elsewhere. (The use of
2448
+ <a href="#subpatternsassubroutines">subroutines</a>
2449
+ is described below.) For example, a pattern to match an IPv4 address such as
2450
+ "192.168.23.245" could be written like this (ignore white space and line
2451
+ breaks):
2452
+ <pre>
2453
+ (?(DEFINE) (?&#60;byte&#62; 2[0-4]\d | 25[0-5] | 1\d\d | [1-9]?\d) )
2454
+ \b (?&byte) (\.(?&byte)){3} \b
2455
+ </pre>
2456
+ The first part of the pattern is a DEFINE group inside which a another group
2457
+ named "byte" is defined. This matches an individual component of an IPv4
2458
+ address (a number less than 256). When matching takes place, this part of the
2459
+ pattern is skipped because DEFINE acts like a false condition. The rest of the
2460
+ pattern uses references to the named group to match the four dot-separated
2461
+ components of an IPv4 address, insisting on a word boundary at each end.
2462
+ </P>
2463
+ <br><b>
2464
+ Assertion conditions
2465
+ </b><br>
2466
+ <P>
2467
+ If the condition is not in any of the above formats, it must be an assertion.
2468
+ This may be a positive or negative lookahead or lookbehind assertion. Consider
2469
+ this pattern, again containing non-significant white space, and with the two
2470
+ alternatives on the second line:
2471
+ <pre>
2472
+ (?(?=[^a-z]*[a-z])
2473
+ \d{2}-[a-z]{3}-\d{2} | \d{2}-\d{2}-\d{2} )
2474
+ </pre>
2475
+ The condition is a positive lookahead assertion that matches an optional
2476
+ sequence of non-letters followed by a letter. In other words, it tests for the
2477
+ presence of at least one letter in the subject. If a letter is found, the
2478
+ subject is matched against the first alternative; otherwise it is matched
2479
+ against the second. This pattern matches strings in one of the two forms
2480
+ dd-aaa-dd or dd-dd-dd, where aaa are letters and dd are digits.
2481
+ <a name="comments"></a></P>
2482
+ <br><a name="SEC22" href="#TOC1">COMMENTS</a><br>
2483
+ <P>
2484
+ There are two ways of including comments in patterns that are processed by
2485
+ PCRE. In both cases, the start of the comment must not be in a character class,
2486
+ nor in the middle of any other sequence of related characters such as (?: or a
2487
+ subpattern name or number. The characters that make up a comment play no part
2488
+ in the pattern matching.
2489
+ </P>
2490
+ <P>
2491
+ The sequence (?# marks the start of a comment that continues up to the next
2492
+ closing parenthesis. Nested parentheses are not permitted. If the PCRE_EXTENDED
2493
+ option is set, an unescaped # character also introduces a comment, which in
2494
+ this case continues to immediately after the next newline character or
2495
+ character sequence in the pattern. Which characters are interpreted as newlines
2496
+ is controlled by the options passed to a compiling function or by a special
2497
+ sequence at the start of the pattern, as described in the section entitled
2498
+ <a href="#newlines">"Newline conventions"</a>
2499
+ above. Note that the end of this type of comment is a literal newline sequence
2500
+ in the pattern; escape sequences that happen to represent a newline do not
2501
+ count. For example, consider this pattern when PCRE_EXTENDED is set, and the
2502
+ default newline convention is in force:
2503
+ <pre>
2504
+ abc #comment \n still comment
2505
+ </pre>
2506
+ On encountering the # character, <b>pcre_compile()</b> skips along, looking for
2507
+ a newline in the pattern. The sequence \n is still literal at this stage, so
2508
+ it does not terminate the comment. Only an actual character with the code value
2509
+ 0x0a (the default newline) does so.
2510
+ <a name="recursion"></a></P>
2511
+ <br><a name="SEC23" href="#TOC1">RECURSIVE PATTERNS</a><br>
2512
+ <P>
2513
+ Consider the problem of matching a string in parentheses, allowing for
2514
+ unlimited nested parentheses. Without the use of recursion, the best that can
2515
+ be done is to use a pattern that matches up to some fixed depth of nesting. It
2516
+ is not possible to handle an arbitrary nesting depth.
2517
+ </P>
2518
+ <P>
2519
+ For some time, Perl has provided a facility that allows regular expressions to
2520
+ recurse (amongst other things). It does this by interpolating Perl code in the
2521
+ expression at run time, and the code can refer to the expression itself. A Perl
2522
+ pattern using code interpolation to solve the parentheses problem can be
2523
+ created like this:
2524
+ <pre>
2525
+ $re = qr{\( (?: (?&#62;[^()]+) | (?p{$re}) )* \)}x;
2526
+ </pre>
2527
+ The (?p{...}) item interpolates Perl code at run time, and in this case refers
2528
+ recursively to the pattern in which it appears.
2529
+ </P>
2530
+ <P>
2531
+ Obviously, PCRE cannot support the interpolation of Perl code. Instead, it
2532
+ supports special syntax for recursion of the entire pattern, and also for
2533
+ individual subpattern recursion. After its introduction in PCRE and Python,
2534
+ this kind of recursion was subsequently introduced into Perl at release 5.10.
2535
+ </P>
2536
+ <P>
2537
+ A special item that consists of (? followed by a number greater than zero and a
2538
+ closing parenthesis is a recursive subroutine call of the subpattern of the
2539
+ given number, provided that it occurs inside that subpattern. (If not, it is a
2540
+ <a href="#subpatternsassubroutines">non-recursive subroutine</a>
2541
+ call, which is described in the next section.) The special item (?R) or (?0) is
2542
+ a recursive call of the entire regular expression.
2543
+ </P>
2544
+ <P>
2545
+ This PCRE pattern solves the nested parentheses problem (assume the
2546
+ PCRE_EXTENDED option is set so that white space is ignored):
2547
+ <pre>
2548
+ \( ( [^()]++ | (?R) )* \)
2549
+ </pre>
2550
+ First it matches an opening parenthesis. Then it matches any number of
2551
+ substrings which can either be a sequence of non-parentheses, or a recursive
2552
+ match of the pattern itself (that is, a correctly parenthesized substring).
2553
+ Finally there is a closing parenthesis. Note the use of a possessive quantifier
2554
+ to avoid backtracking into sequences of non-parentheses.
2555
+ </P>
2556
+ <P>
2557
+ If this were part of a larger pattern, you would not want to recurse the entire
2558
+ pattern, so instead you could use this:
2559
+ <pre>
2560
+ ( \( ( [^()]++ | (?1) )* \) )
2561
+ </pre>
2562
+ We have put the pattern into parentheses, and caused the recursion to refer to
2563
+ them instead of the whole pattern.
2564
+ </P>
2565
+ <P>
2566
+ In a larger pattern, keeping track of parenthesis numbers can be tricky. This
2567
+ is made easier by the use of relative references. Instead of (?1) in the
2568
+ pattern above you can write (?-2) to refer to the second most recently opened
2569
+ parentheses preceding the recursion. In other words, a negative number counts
2570
+ capturing parentheses leftwards from the point at which it is encountered.
2571
+ </P>
2572
+ <P>
2573
+ It is also possible to refer to subsequently opened parentheses, by writing
2574
+ references such as (?+2). However, these cannot be recursive because the
2575
+ reference is not inside the parentheses that are referenced. They are always
2576
+ <a href="#subpatternsassubroutines">non-recursive subroutine</a>
2577
+ calls, as described in the next section.
2578
+ </P>
2579
+ <P>
2580
+ An alternative approach is to use named parentheses instead. The Perl syntax
2581
+ for this is (?&name); PCRE's earlier syntax (?P&#62;name) is also supported. We
2582
+ could rewrite the above example as follows:
2583
+ <pre>
2584
+ (?&#60;pn&#62; \( ( [^()]++ | (?&pn) )* \) )
2585
+ </pre>
2586
+ If there is more than one subpattern with the same name, the earliest one is
2587
+ used.
2588
+ </P>
2589
+ <P>
2590
+ This particular example pattern that we have been looking at contains nested
2591
+ unlimited repeats, and so the use of a possessive quantifier for matching
2592
+ strings of non-parentheses is important when applying the pattern to strings
2593
+ that do not match. For example, when this pattern is applied to
2594
+ <pre>
2595
+ (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()
2596
+ </pre>
2597
+ it yields "no match" quickly. However, if a possessive quantifier is not used,
2598
+ the match runs for a very long time indeed because there are so many different
2599
+ ways the + and * repeats can carve up the subject, and all have to be tested
2600
+ before failure can be reported.
2601
+ </P>
2602
+ <P>
2603
+ At the end of a match, the values of capturing parentheses are those from
2604
+ the outermost level. If you want to obtain intermediate values, a callout
2605
+ function can be used (see below and the
2606
+ <a href="pcrecallout.html"><b>pcrecallout</b></a>
2607
+ documentation). If the pattern above is matched against
2608
+ <pre>
2609
+ (ab(cd)ef)
2610
+ </pre>
2611
+ the value for the inner capturing parentheses (numbered 2) is "ef", which is
2612
+ the last value taken on at the top level. If a capturing subpattern is not
2613
+ matched at the top level, its final captured value is unset, even if it was
2614
+ (temporarily) set at a deeper level during the matching process.
2615
+ </P>
2616
+ <P>
2617
+ If there are more than 15 capturing parentheses in a pattern, PCRE has to
2618
+ obtain extra memory to store data during a recursion, which it does by using
2619
+ <b>pcre_malloc</b>, freeing it via <b>pcre_free</b> afterwards. If no memory can
2620
+ be obtained, the match fails with the PCRE_ERROR_NOMEMORY error.
2621
+ </P>
2622
+ <P>
2623
+ Do not confuse the (?R) item with the condition (R), which tests for recursion.
2624
+ Consider this pattern, which matches text in angle brackets, allowing for
2625
+ arbitrary nesting. Only digits are allowed in nested brackets (that is, when
2626
+ recursing), whereas any characters are permitted at the outer level.
2627
+ <pre>
2628
+ &#60; (?: (?(R) \d++ | [^&#60;&#62;]*+) | (?R)) * &#62;
2629
+ </pre>
2630
+ In this pattern, (?(R) is the start of a conditional subpattern, with two
2631
+ different alternatives for the recursive and non-recursive cases. The (?R) item
2632
+ is the actual recursive call.
2633
+ <a name="recursiondifference"></a></P>
2634
+ <br><b>
2635
+ Differences in recursion processing between PCRE and Perl
2636
+ </b><br>
2637
+ <P>
2638
+ Recursion processing in PCRE differs from Perl in two important ways. In PCRE
2639
+ (like Python, but unlike Perl), a recursive subpattern call is always treated
2640
+ as an atomic group. That is, once it has matched some of the subject string, it
2641
+ is never re-entered, even if it contains untried alternatives and there is a
2642
+ subsequent matching failure. This can be illustrated by the following pattern,
2643
+ which purports to match a palindromic string that contains an odd number of
2644
+ characters (for example, "a", "aba", "abcba", "abcdcba"):
2645
+ <pre>
2646
+ ^(.|(.)(?1)\2)$
2647
+ </pre>
2648
+ The idea is that it either matches a single character, or two identical
2649
+ characters surrounding a sub-palindrome. In Perl, this pattern works; in PCRE
2650
+ it does not if the pattern is longer than three characters. Consider the
2651
+ subject string "abcba":
2652
+ </P>
2653
+ <P>
2654
+ At the top level, the first character is matched, but as it is not at the end
2655
+ of the string, the first alternative fails; the second alternative is taken
2656
+ and the recursion kicks in. The recursive call to subpattern 1 successfully
2657
+ matches the next character ("b"). (Note that the beginning and end of line
2658
+ tests are not part of the recursion).
2659
+ </P>
2660
+ <P>
2661
+ Back at the top level, the next character ("c") is compared with what
2662
+ subpattern 2 matched, which was "a". This fails. Because the recursion is
2663
+ treated as an atomic group, there are now no backtracking points, and so the
2664
+ entire match fails. (Perl is able, at this point, to re-enter the recursion and
2665
+ try the second alternative.) However, if the pattern is written with the
2666
+ alternatives in the other order, things are different:
2667
+ <pre>
2668
+ ^((.)(?1)\2|.)$
2669
+ </pre>
2670
+ This time, the recursing alternative is tried first, and continues to recurse
2671
+ until it runs out of characters, at which point the recursion fails. But this
2672
+ time we do have another alternative to try at the higher level. That is the big
2673
+ difference: in the previous case the remaining alternative is at a deeper
2674
+ recursion level, which PCRE cannot use.
2675
+ </P>
2676
+ <P>
2677
+ To change the pattern so that it matches all palindromic strings, not just
2678
+ those with an odd number of characters, it is tempting to change the pattern to
2679
+ this:
2680
+ <pre>
2681
+ ^((.)(?1)\2|.?)$
2682
+ </pre>
2683
+ Again, this works in Perl, but not in PCRE, and for the same reason. When a
2684
+ deeper recursion has matched a single character, it cannot be entered again in
2685
+ order to match an empty string. The solution is to separate the two cases, and
2686
+ write out the odd and even cases as alternatives at the higher level:
2687
+ <pre>
2688
+ ^(?:((.)(?1)\2|)|((.)(?3)\4|.))
2689
+ </pre>
2690
+ If you want to match typical palindromic phrases, the pattern has to ignore all
2691
+ non-word characters, which can be done like this:
2692
+ <pre>
2693
+ ^\W*+(?:((.)\W*+(?1)\W*+\2|)|((.)\W*+(?3)\W*+\4|\W*+.\W*+))\W*+$
2694
+ </pre>
2695
+ If run with the PCRE_CASELESS option, this pattern matches phrases such as "A
2696
+ man, a plan, a canal: Panama!" and it works well in both PCRE and Perl. Note
2697
+ the use of the possessive quantifier *+ to avoid backtracking into sequences of
2698
+ non-word characters. Without this, PCRE takes a great deal longer (ten times or
2699
+ more) to match typical phrases, and Perl takes so long that you think it has
2700
+ gone into a loop.
2701
+ </P>
2702
+ <P>
2703
+ <b>WARNING</b>: The palindrome-matching patterns above work only if the subject
2704
+ string does not start with a palindrome that is shorter than the entire string.
2705
+ For example, although "abcba" is correctly matched, if the subject is "ababa",
2706
+ PCRE finds the palindrome "aba" at the start, then fails at top level because
2707
+ the end of the string does not follow. Once again, it cannot jump back into the
2708
+ recursion to try other alternatives, so the entire match fails.
2709
+ </P>
2710
+ <P>
2711
+ The second way in which PCRE and Perl differ in their recursion processing is
2712
+ in the handling of captured values. In Perl, when a subpattern is called
2713
+ recursively or as a subpattern (see the next section), it has no access to any
2714
+ values that were captured outside the recursion, whereas in PCRE these values
2715
+ can be referenced. Consider this pattern:
2716
+ <pre>
2717
+ ^(.)(\1|a(?2))
2718
+ </pre>
2719
+ In PCRE, this pattern matches "bab". The first capturing parentheses match "b",
2720
+ then in the second group, when the back reference \1 fails to match "b", the
2721
+ second alternative matches "a" and then recurses. In the recursion, \1 does
2722
+ now match "b" and so the whole match succeeds. In Perl, the pattern fails to
2723
+ match because inside the recursive call \1 cannot access the externally set
2724
+ value.
2725
+ <a name="subpatternsassubroutines"></a></P>
2726
+ <br><a name="SEC24" href="#TOC1">SUBPATTERNS AS SUBROUTINES</a><br>
2727
+ <P>
2728
+ If the syntax for a recursive subpattern call (either by number or by
2729
+ name) is used outside the parentheses to which it refers, it operates like a
2730
+ subroutine in a programming language. The called subpattern may be defined
2731
+ before or after the reference. A numbered reference can be absolute or
2732
+ relative, as in these examples:
2733
+ <pre>
2734
+ (...(absolute)...)...(?2)...
2735
+ (...(relative)...)...(?-1)...
2736
+ (...(?+1)...(relative)...
2737
+ </pre>
2738
+ An earlier example pointed out that the pattern
2739
+ <pre>
2740
+ (sens|respons)e and \1ibility
2741
+ </pre>
2742
+ matches "sense and sensibility" and "response and responsibility", but not
2743
+ "sense and responsibility". If instead the pattern
2744
+ <pre>
2745
+ (sens|respons)e and (?1)ibility
2746
+ </pre>
2747
+ is used, it does match "sense and responsibility" as well as the other two
2748
+ strings. Another example is given in the discussion of DEFINE above.
2749
+ </P>
2750
+ <P>
2751
+ All subroutine calls, whether recursive or not, are always treated as atomic
2752
+ groups. That is, once a subroutine has matched some of the subject string, it
2753
+ is never re-entered, even if it contains untried alternatives and there is a
2754
+ subsequent matching failure. Any capturing parentheses that are set during the
2755
+ subroutine call revert to their previous values afterwards.
2756
+ </P>
2757
+ <P>
2758
+ Processing options such as case-independence are fixed when a subpattern is
2759
+ defined, so if it is used as a subroutine, such options cannot be changed for
2760
+ different calls. For example, consider this pattern:
2761
+ <pre>
2762
+ (abc)(?i:(?-1))
2763
+ </pre>
2764
+ It matches "abcabc". It does not match "abcABC" because the change of
2765
+ processing option does not affect the called subpattern.
2766
+ <a name="onigurumasubroutines"></a></P>
2767
+ <br><a name="SEC25" href="#TOC1">ONIGURUMA SUBROUTINE SYNTAX</a><br>
2768
+ <P>
2769
+ For compatibility with Oniguruma, the non-Perl syntax \g followed by a name or
2770
+ a number enclosed either in angle brackets or single quotes, is an alternative
2771
+ syntax for referencing a subpattern as a subroutine, possibly recursively. Here
2772
+ are two of the examples used above, rewritten using this syntax:
2773
+ <pre>
2774
+ (?&#60;pn&#62; \( ( (?&#62;[^()]+) | \g&#60;pn&#62; )* \) )
2775
+ (sens|respons)e and \g'1'ibility
2776
+ </pre>
2777
+ PCRE supports an extension to Oniguruma: if a number is preceded by a
2778
+ plus or a minus sign it is taken as a relative reference. For example:
2779
+ <pre>
2780
+ (abc)(?i:\g&#60;-1&#62;)
2781
+ </pre>
2782
+ Note that \g{...} (Perl syntax) and \g&#60;...&#62; (Oniguruma syntax) are <i>not</i>
2783
+ synonymous. The former is a back reference; the latter is a subroutine call.
2784
+ </P>
2785
+ <br><a name="SEC26" href="#TOC1">CALLOUTS</a><br>
2786
+ <P>
2787
+ Perl has a feature whereby using the sequence (?{...}) causes arbitrary Perl
2788
+ code to be obeyed in the middle of matching a regular expression. This makes it
2789
+ possible, amongst other things, to extract different substrings that match the
2790
+ same pair of parentheses when there is a repetition.
2791
+ </P>
2792
+ <P>
2793
+ PCRE provides a similar feature, but of course it cannot obey arbitrary Perl
2794
+ code. The feature is called "callout". The caller of PCRE provides an external
2795
+ function by putting its entry point in the global variable <i>pcre_callout</i>
2796
+ (8-bit library) or <i>pcre[16|32]_callout</i> (16-bit or 32-bit library).
2797
+ By default, this variable contains NULL, which disables all calling out.
2798
+ </P>
2799
+ <P>
2800
+ Within a regular expression, (?C) indicates the points at which the external
2801
+ function is to be called. If you want to identify different callout points, you
2802
+ can put a number less than 256 after the letter C. The default value is zero.
2803
+ For example, this pattern has two callout points:
2804
+ <pre>
2805
+ (?C1)abc(?C2)def
2806
+ </pre>
2807
+ If the PCRE_AUTO_CALLOUT flag is passed to a compiling function, callouts are
2808
+ automatically installed before each item in the pattern. They are all numbered
2809
+ 255. If there is a conditional group in the pattern whose condition is an
2810
+ assertion, an additional callout is inserted just before the condition. An
2811
+ explicit callout may also be set at this position, as in this example:
2812
+ <pre>
2813
+ (?(?C9)(?=a)abc|def)
2814
+ </pre>
2815
+ Note that this applies only to assertion conditions, not to other types of
2816
+ condition.
2817
+ </P>
2818
+ <P>
2819
+ During matching, when PCRE reaches a callout point, the external function is
2820
+ called. It is provided with the number of the callout, the position in the
2821
+ pattern, and, optionally, one item of data originally supplied by the caller of
2822
+ the matching function. The callout function may cause matching to proceed, to
2823
+ backtrack, or to fail altogether.
2824
+ </P>
2825
+ <P>
2826
+ By default, PCRE implements a number of optimizations at compile time and
2827
+ matching time, and one side-effect is that sometimes callouts are skipped. If
2828
+ you need all possible callouts to happen, you need to set options that disable
2829
+ the relevant optimizations. More details, and a complete description of the
2830
+ interface to the callout function, are given in the
2831
+ <a href="pcrecallout.html"><b>pcrecallout</b></a>
2832
+ documentation.
2833
+ <a name="backtrackcontrol"></a></P>
2834
+ <br><a name="SEC27" href="#TOC1">BACKTRACKING CONTROL</a><br>
2835
+ <P>
2836
+ Perl 5.10 introduced a number of "Special Backtracking Control Verbs", which
2837
+ are still described in the Perl documentation as "experimental and subject to
2838
+ change or removal in a future version of Perl". It goes on to say: "Their usage
2839
+ in production code should be noted to avoid problems during upgrades." The same
2840
+ remarks apply to the PCRE features described in this section.
2841
+ </P>
2842
+ <P>
2843
+ The new verbs make use of what was previously invalid syntax: an opening
2844
+ parenthesis followed by an asterisk. They are generally of the form
2845
+ (*VERB) or (*VERB:NAME). Some may take either form, possibly behaving
2846
+ differently depending on whether or not a name is present. A name is any
2847
+ sequence of characters that does not include a closing parenthesis. The maximum
2848
+ length of name is 255 in the 8-bit library and 65535 in the 16-bit and 32-bit
2849
+ libraries. If the name is empty, that is, if the closing parenthesis
2850
+ immediately follows the colon, the effect is as if the colon were not there.
2851
+ Any number of these verbs may occur in a pattern.
2852
+ </P>
2853
+ <P>
2854
+ Since these verbs are specifically related to backtracking, most of them can be
2855
+ used only when the pattern is to be matched using one of the traditional
2856
+ matching functions, because these use a backtracking algorithm. With the
2857
+ exception of (*FAIL), which behaves like a failing negative assertion, the
2858
+ backtracking control verbs cause an error if encountered by a DFA matching
2859
+ function.
2860
+ </P>
2861
+ <P>
2862
+ The behaviour of these verbs in
2863
+ <a href="#btrepeat">repeated groups,</a>
2864
+ <a href="#btassert">assertions,</a>
2865
+ and in
2866
+ <a href="#btsub">subpatterns called as subroutines</a>
2867
+ (whether or not recursively) is documented below.
2868
+ <a name="nooptimize"></a></P>
2869
+ <br><b>
2870
+ Optimizations that affect backtracking verbs
2871
+ </b><br>
2872
+ <P>
2873
+ PCRE contains some optimizations that are used to speed up matching by running
2874
+ some checks at the start of each match attempt. For example, it may know the
2875
+ minimum length of matching subject, or that a particular character must be
2876
+ present. When one of these optimizations bypasses the running of a match, any
2877
+ included backtracking verbs will not, of course, be processed. You can suppress
2878
+ the start-of-match optimizations by setting the PCRE_NO_START_OPTIMIZE option
2879
+ when calling <b>pcre_compile()</b> or <b>pcre_exec()</b>, or by starting the
2880
+ pattern with (*NO_START_OPT). There is more discussion of this option in the
2881
+ section entitled
2882
+ <a href="pcreapi.html#execoptions">"Option bits for <b>pcre_exec()</b>"</a>
2883
+ in the
2884
+ <a href="pcreapi.html"><b>pcreapi</b></a>
2885
+ documentation.
2886
+ </P>
2887
+ <P>
2888
+ Experiments with Perl suggest that it too has similar optimizations, sometimes
2889
+ leading to anomalous results.
2890
+ </P>
2891
+ <br><b>
2892
+ Verbs that act immediately
2893
+ </b><br>
2894
+ <P>
2895
+ The following verbs act as soon as they are encountered. They may not be
2896
+ followed by a name.
2897
+ <pre>
2898
+ (*ACCEPT)
2899
+ </pre>
2900
+ This verb causes the match to end successfully, skipping the remainder of the
2901
+ pattern. However, when it is inside a subpattern that is called as a
2902
+ subroutine, only that subpattern is ended successfully. Matching then continues
2903
+ at the outer level. If (*ACCEPT) in triggered in a positive assertion, the
2904
+ assertion succeeds; in a negative assertion, the assertion fails.
2905
+ </P>
2906
+ <P>
2907
+ If (*ACCEPT) is inside capturing parentheses, the data so far is captured. For
2908
+ example:
2909
+ <pre>
2910
+ A((?:A|B(*ACCEPT)|C)D)
2911
+ </pre>
2912
+ This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is captured by
2913
+ the outer parentheses.
2914
+ <pre>
2915
+ (*FAIL) or (*F)
2916
+ </pre>
2917
+ This verb causes a matching failure, forcing backtracking to occur. It is
2918
+ equivalent to (?!) but easier to read. The Perl documentation notes that it is
2919
+ probably useful only when combined with (?{}) or (??{}). Those are, of course,
2920
+ Perl features that are not present in PCRE. The nearest equivalent is the
2921
+ callout feature, as for example in this pattern:
2922
+ <pre>
2923
+ a+(?C)(*FAIL)
2924
+ </pre>
2925
+ A match with the string "aaaa" always fails, but the callout is taken before
2926
+ each backtrack happens (in this example, 10 times).
2927
+ </P>
2928
+ <br><b>
2929
+ Recording which path was taken
2930
+ </b><br>
2931
+ <P>
2932
+ There is one verb whose main purpose is to track how a match was arrived at,
2933
+ though it also has a secondary use in conjunction with advancing the match
2934
+ starting point (see (*SKIP) below).
2935
+ <pre>
2936
+ (*MARK:NAME) or (*:NAME)
2937
+ </pre>
2938
+ A name is always required with this verb. There may be as many instances of
2939
+ (*MARK) as you like in a pattern, and their names do not have to be unique.
2940
+ </P>
2941
+ <P>
2942
+ When a match succeeds, the name of the last-encountered (*MARK:NAME),
2943
+ (*PRUNE:NAME), or (*THEN:NAME) on the matching path is passed back to the
2944
+ caller as described in the section entitled
2945
+ <a href="pcreapi.html#extradata">"Extra data for <b>pcre_exec()</b>"</a>
2946
+ in the
2947
+ <a href="pcreapi.html"><b>pcreapi</b></a>
2948
+ documentation. Here is an example of <b>pcretest</b> output, where the /K
2949
+ modifier requests the retrieval and outputting of (*MARK) data:
2950
+ <pre>
2951
+ re&#62; /X(*MARK:A)Y|X(*MARK:B)Z/K
2952
+ data&#62; XY
2953
+ 0: XY
2954
+ MK: A
2955
+ XZ
2956
+ 0: XZ
2957
+ MK: B
2958
+ </pre>
2959
+ The (*MARK) name is tagged with "MK:" in this output, and in this example it
2960
+ indicates which of the two alternatives matched. This is a more efficient way
2961
+ of obtaining this information than putting each alternative in its own
2962
+ capturing parentheses.
2963
+ </P>
2964
+ <P>
2965
+ If a verb with a name is encountered in a positive assertion that is true, the
2966
+ name is recorded and passed back if it is the last-encountered. This does not
2967
+ happen for negative assertions or failing positive assertions.
2968
+ </P>
2969
+ <P>
2970
+ After a partial match or a failed match, the last encountered name in the
2971
+ entire match process is returned. For example:
2972
+ <pre>
2973
+ re&#62; /X(*MARK:A)Y|X(*MARK:B)Z/K
2974
+ data&#62; XP
2975
+ No match, mark = B
2976
+ </pre>
2977
+ Note that in this unanchored example the mark is retained from the match
2978
+ attempt that started at the letter "X" in the subject. Subsequent match
2979
+ attempts starting at "P" and then with an empty string do not get as far as the
2980
+ (*MARK) item, but nevertheless do not reset it.
2981
+ </P>
2982
+ <P>
2983
+ If you are interested in (*MARK) values after failed matches, you should
2984
+ probably set the PCRE_NO_START_OPTIMIZE option
2985
+ <a href="#nooptimize">(see above)</a>
2986
+ to ensure that the match is always attempted.
2987
+ </P>
2988
+ <br><b>
2989
+ Verbs that act after backtracking
2990
+ </b><br>
2991
+ <P>
2992
+ The following verbs do nothing when they are encountered. Matching continues
2993
+ with what follows, but if there is no subsequent match, causing a backtrack to
2994
+ the verb, a failure is forced. That is, backtracking cannot pass to the left of
2995
+ the verb. However, when one of these verbs appears inside an atomic group or an
2996
+ assertion that is true, its effect is confined to that group, because once the
2997
+ group has been matched, there is never any backtracking into it. In this
2998
+ situation, backtracking can "jump back" to the left of the entire atomic group
2999
+ or assertion. (Remember also, as stated above, that this localization also
3000
+ applies in subroutine calls.)
3001
+ </P>
3002
+ <P>
3003
+ These verbs differ in exactly what kind of failure occurs when backtracking
3004
+ reaches them. The behaviour described below is what happens when the verb is
3005
+ not in a subroutine or an assertion. Subsequent sections cover these special
3006
+ cases.
3007
+ <pre>
3008
+ (*COMMIT)
3009
+ </pre>
3010
+ This verb, which may not be followed by a name, causes the whole match to fail
3011
+ outright if there is a later matching failure that causes backtracking to reach
3012
+ it. Even if the pattern is unanchored, no further attempts to find a match by
3013
+ advancing the starting point take place. If (*COMMIT) is the only backtracking
3014
+ verb that is encountered, once it has been passed <b>pcre_exec()</b> is
3015
+ committed to finding a match at the current starting point, or not at all. For
3016
+ example:
3017
+ <pre>
3018
+ a+(*COMMIT)b
3019
+ </pre>
3020
+ This matches "xxaab" but not "aacaab". It can be thought of as a kind of
3021
+ dynamic anchor, or "I've started, so I must finish." The name of the most
3022
+ recently passed (*MARK) in the path is passed back when (*COMMIT) forces a
3023
+ match failure.
3024
+ </P>
3025
+ <P>
3026
+ If there is more than one backtracking verb in a pattern, a different one that
3027
+ follows (*COMMIT) may be triggered first, so merely passing (*COMMIT) during a
3028
+ match does not always guarantee that a match must be at this starting point.
3029
+ </P>
3030
+ <P>
3031
+ Note that (*COMMIT) at the start of a pattern is not the same as an anchor,
3032
+ unless PCRE's start-of-match optimizations are turned off, as shown in this
3033
+ output from <b>pcretest</b>:
3034
+ <pre>
3035
+ re&#62; /(*COMMIT)abc/
3036
+ data&#62; xyzabc
3037
+ 0: abc
3038
+ data&#62; xyzabc\Y
3039
+ No match
3040
+ </pre>
3041
+ For this pattern, PCRE knows that any match must start with "a", so the
3042
+ optimization skips along the subject to "a" before applying the pattern to the
3043
+ first set of data. The match attempt then succeeds. In the second set of data,
3044
+ the escape sequence \Y is interpreted by the <b>pcretest</b> program. It causes
3045
+ the PCRE_NO_START_OPTIMIZE option to be set when <b>pcre_exec()</b> is called.
3046
+ This disables the optimization that skips along to the first character. The
3047
+ pattern is now applied starting at "x", and so the (*COMMIT) causes the match
3048
+ to fail without trying any other starting points.
3049
+ <pre>
3050
+ (*PRUNE) or (*PRUNE:NAME)
3051
+ </pre>
3052
+ This verb causes the match to fail at the current starting position in the
3053
+ subject if there is a later matching failure that causes backtracking to reach
3054
+ it. If the pattern is unanchored, the normal "bumpalong" advance to the next
3055
+ starting character then happens. Backtracking can occur as usual to the left of
3056
+ (*PRUNE), before it is reached, or when matching to the right of (*PRUNE), but
3057
+ if there is no match to the right, backtracking cannot cross (*PRUNE). In
3058
+ simple cases, the use of (*PRUNE) is just an alternative to an atomic group or
3059
+ possessive quantifier, but there are some uses of (*PRUNE) that cannot be
3060
+ expressed in any other way. In an anchored pattern (*PRUNE) has the same effect
3061
+ as (*COMMIT).
3062
+ </P>
3063
+ <P>
3064
+ The behaviour of (*PRUNE:NAME) is the not the same as (*MARK:NAME)(*PRUNE).
3065
+ It is like (*MARK:NAME) in that the name is remembered for passing back to the
3066
+ caller. However, (*SKIP:NAME) searches only for names set with (*MARK).
3067
+ <pre>
3068
+ (*SKIP)
3069
+ </pre>
3070
+ This verb, when given without a name, is like (*PRUNE), except that if the
3071
+ pattern is unanchored, the "bumpalong" advance is not to the next character,
3072
+ but to the position in the subject where (*SKIP) was encountered. (*SKIP)
3073
+ signifies that whatever text was matched leading up to it cannot be part of a
3074
+ successful match. Consider:
3075
+ <pre>
3076
+ a+(*SKIP)b
3077
+ </pre>
3078
+ If the subject is "aaaac...", after the first match attempt fails (starting at
3079
+ the first character in the string), the starting point skips on to start the
3080
+ next attempt at "c". Note that a possessive quantifer does not have the same
3081
+ effect as this example; although it would suppress backtracking during the
3082
+ first match attempt, the second attempt would start at the second character
3083
+ instead of skipping on to "c".
3084
+ <pre>
3085
+ (*SKIP:NAME)
3086
+ </pre>
3087
+ When (*SKIP) has an associated name, its behaviour is modified. When it is
3088
+ triggered, the previous path through the pattern is searched for the most
3089
+ recent (*MARK) that has the same name. If one is found, the "bumpalong" advance
3090
+ is to the subject position that corresponds to that (*MARK) instead of to where
3091
+ (*SKIP) was encountered. If no (*MARK) with a matching name is found, the
3092
+ (*SKIP) is ignored.
3093
+ </P>
3094
+ <P>
3095
+ Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores
3096
+ names that are set by (*PRUNE:NAME) or (*THEN:NAME).
3097
+ <pre>
3098
+ (*THEN) or (*THEN:NAME)
3099
+ </pre>
3100
+ This verb causes a skip to the next innermost alternative when backtracking
3101
+ reaches it. That is, it cancels any further backtracking within the current
3102
+ alternative. Its name comes from the observation that it can be used for a
3103
+ pattern-based if-then-else block:
3104
+ <pre>
3105
+ ( COND1 (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ ) ...
3106
+ </pre>
3107
+ If the COND1 pattern matches, FOO is tried (and possibly further items after
3108
+ the end of the group if FOO succeeds); on failure, the matcher skips to the
3109
+ second alternative and tries COND2, without backtracking into COND1. If that
3110
+ succeeds and BAR fails, COND3 is tried. If subsequently BAZ fails, there are no
3111
+ more alternatives, so there is a backtrack to whatever came before the entire
3112
+ group. If (*THEN) is not inside an alternation, it acts like (*PRUNE).
3113
+ </P>
3114
+ <P>
3115
+ The behaviour of (*THEN:NAME) is the not the same as (*MARK:NAME)(*THEN).
3116
+ It is like (*MARK:NAME) in that the name is remembered for passing back to the
3117
+ caller. However, (*SKIP:NAME) searches only for names set with (*MARK).
3118
+ </P>
3119
+ <P>
3120
+ A subpattern that does not contain a | character is just a part of the
3121
+ enclosing alternative; it is not a nested alternation with only one
3122
+ alternative. The effect of (*THEN) extends beyond such a subpattern to the
3123
+ enclosing alternative. Consider this pattern, where A, B, etc. are complex
3124
+ pattern fragments that do not contain any | characters at this level:
3125
+ <pre>
3126
+ A (B(*THEN)C) | D
3127
+ </pre>
3128
+ If A and B are matched, but there is a failure in C, matching does not
3129
+ backtrack into A; instead it moves to the next alternative, that is, D.
3130
+ However, if the subpattern containing (*THEN) is given an alternative, it
3131
+ behaves differently:
3132
+ <pre>
3133
+ A (B(*THEN)C | (*FAIL)) | D
3134
+ </pre>
3135
+ The effect of (*THEN) is now confined to the inner subpattern. After a failure
3136
+ in C, matching moves to (*FAIL), which causes the whole subpattern to fail
3137
+ because there are no more alternatives to try. In this case, matching does now
3138
+ backtrack into A.
3139
+ </P>
3140
+ <P>
3141
+ Note that a conditional subpattern is not considered as having two
3142
+ alternatives, because only one is ever used. In other words, the | character in
3143
+ a conditional subpattern has a different meaning. Ignoring white space,
3144
+ consider:
3145
+ <pre>
3146
+ ^.*? (?(?=a) a | b(*THEN)c )
3147
+ </pre>
3148
+ If the subject is "ba", this pattern does not match. Because .*? is ungreedy,
3149
+ it initially matches zero characters. The condition (?=a) then fails, the
3150
+ character "b" is matched, but "c" is not. At this point, matching does not
3151
+ backtrack to .*? as might perhaps be expected from the presence of the |
3152
+ character. The conditional subpattern is part of the single alternative that
3153
+ comprises the whole pattern, and so the match fails. (If there was a backtrack
3154
+ into .*?, allowing it to match "b", the match would succeed.)
3155
+ </P>
3156
+ <P>
3157
+ The verbs just described provide four different "strengths" of control when
3158
+ subsequent matching fails. (*THEN) is the weakest, carrying on the match at the
3159
+ next alternative. (*PRUNE) comes next, failing the match at the current
3160
+ starting position, but allowing an advance to the next character (for an
3161
+ unanchored pattern). (*SKIP) is similar, except that the advance may be more
3162
+ than one character. (*COMMIT) is the strongest, causing the entire match to
3163
+ fail.
3164
+ </P>
3165
+ <br><b>
3166
+ More than one backtracking verb
3167
+ </b><br>
3168
+ <P>
3169
+ If more than one backtracking verb is present in a pattern, the one that is
3170
+ backtracked onto first acts. For example, consider this pattern, where A, B,
3171
+ etc. are complex pattern fragments:
3172
+ <pre>
3173
+ (A(*COMMIT)B(*THEN)C|ABD)
3174
+ </pre>
3175
+ If A matches but B fails, the backtrack to (*COMMIT) causes the entire match to
3176
+ fail. However, if A and B match, but C fails, the backtrack to (*THEN) causes
3177
+ the next alternative (ABD) to be tried. This behaviour is consistent, but is
3178
+ not always the same as Perl's. It means that if two or more backtracking verbs
3179
+ appear in succession, all the the last of them has no effect. Consider this
3180
+ example:
3181
+ <pre>
3182
+ ...(*COMMIT)(*PRUNE)...
3183
+ </pre>
3184
+ If there is a matching failure to the right, backtracking onto (*PRUNE) causes
3185
+ it to be triggered, and its action is taken. There can never be a backtrack
3186
+ onto (*COMMIT).
3187
+ <a name="btrepeat"></a></P>
3188
+ <br><b>
3189
+ Backtracking verbs in repeated groups
3190
+ </b><br>
3191
+ <P>
3192
+ PCRE differs from Perl in its handling of backtracking verbs in repeated
3193
+ groups. For example, consider:
3194
+ <pre>
3195
+ /(a(*COMMIT)b)+ac/
3196
+ </pre>
3197
+ If the subject is "abac", Perl matches, but PCRE fails because the (*COMMIT) in
3198
+ the second repeat of the group acts.
3199
+ <a name="btassert"></a></P>
3200
+ <br><b>
3201
+ Backtracking verbs in assertions
3202
+ </b><br>
3203
+ <P>
3204
+ (*FAIL) in an assertion has its normal effect: it forces an immediate backtrack.
3205
+ </P>
3206
+ <P>
3207
+ (*ACCEPT) in a positive assertion causes the assertion to succeed without any
3208
+ further processing. In a negative assertion, (*ACCEPT) causes the assertion to
3209
+ fail without any further processing.
3210
+ </P>
3211
+ <P>
3212
+ The other backtracking verbs are not treated specially if they appear in a
3213
+ positive assertion. In particular, (*THEN) skips to the next alternative in the
3214
+ innermost enclosing group that has alternations, whether or not this is within
3215
+ the assertion.
3216
+ </P>
3217
+ <P>
3218
+ Negative assertions are, however, different, in order to ensure that changing a
3219
+ positive assertion into a negative assertion changes its result. Backtracking
3220
+ into (*COMMIT), (*SKIP), or (*PRUNE) causes a negative assertion to be true,
3221
+ without considering any further alternative branches in the assertion.
3222
+ Backtracking into (*THEN) causes it to skip to the next enclosing alternative
3223
+ within the assertion (the normal behaviour), but if the assertion does not have
3224
+ such an alternative, (*THEN) behaves like (*PRUNE).
3225
+ <a name="btsub"></a></P>
3226
+ <br><b>
3227
+ Backtracking verbs in subroutines
3228
+ </b><br>
3229
+ <P>
3230
+ These behaviours occur whether or not the subpattern is called recursively.
3231
+ Perl's treatment of subroutines is different in some cases.
3232
+ </P>
3233
+ <P>
3234
+ (*FAIL) in a subpattern called as a subroutine has its normal effect: it forces
3235
+ an immediate backtrack.
3236
+ </P>
3237
+ <P>
3238
+ (*ACCEPT) in a subpattern called as a subroutine causes the subroutine match to
3239
+ succeed without any further processing. Matching then continues after the
3240
+ subroutine call.
3241
+ </P>
3242
+ <P>
3243
+ (*COMMIT), (*SKIP), and (*PRUNE) in a subpattern called as a subroutine cause
3244
+ the subroutine match to fail.
3245
+ </P>
3246
+ <P>
3247
+ (*THEN) skips to the next alternative in the innermost enclosing group within
3248
+ the subpattern that has alternatives. If there is no such group within the
3249
+ subpattern, (*THEN) causes the subroutine match to fail.
3250
+ </P>
3251
+ <br><a name="SEC28" href="#TOC1">SEE ALSO</a><br>
3252
+ <P>
3253
+ <b>pcreapi</b>(3), <b>pcrecallout</b>(3), <b>pcrematching</b>(3),
3254
+ <b>pcresyntax</b>(3), <b>pcre</b>(3), <b>pcre16(3)</b>, <b>pcre32(3)</b>.
3255
+ </P>
3256
+ <br><a name="SEC29" href="#TOC1">AUTHOR</a><br>
3257
+ <P>
3258
+ Philip Hazel
3259
+ <br>
3260
+ University Computing Service
3261
+ <br>
3262
+ Cambridge CB2 3QH, England.
3263
+ <br>
3264
+ </P>
3265
+ <br><a name="SEC30" href="#TOC1">REVISION</a><br>
3266
+ <P>
3267
+ Last updated: 14 June 2015
3268
+ <br>
3269
+ Copyright &copy; 1997-2015 University of Cambridge.
3270
+ <br>
3271
+ <p>
3272
+ Return to the <a href="index.html">PCRE index page</a>.
3273
+ </p>