rabbit-slide-kou-pgconf-asia-2017 2017.12.5.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: de0d4303fea6331a03b9f01805dff512058c84c2
4
+ data.tar.gz: 93bd721275f432b353a8889560bf3479e0040661
5
+ SHA512:
6
+ metadata.gz: 8818106c63119ef4092dae199fdbc24bdcc853d9d9f9980590cfbdf446c1816f68d5f850b5f9ac5dcc1594d56731a4da1d1ed5f3a5f0d10746b00823d4dda3d4
7
+ data.tar.gz: 2a80eb664311c78e8a16d0823f5424e946cb3ae35056e49d34586efaa182432a3eaee6ddd9f0625358513f741989c17685477fca771593fe169a6547932b27c0
data/.rabbit ADDED
@@ -0,0 +1 @@
1
+ pgroonga-2.rab
data/README.rd ADDED
@@ -0,0 +1,45 @@
1
+ = PGroonga 2 – Make PostgreSQL rich full text search system backend!
2
+
3
+ PGroonga 2.0 has been released with 2 years development since PGroonga 1.0.0. PGroonga 1.0.0 just provides fast full text search with all languages support. It's important because it's a lacked feature in PostgreSQL. PGroonga 2.0 provides more useful features to implement rich full text search system with PostgreSQL. This session shows how to implement rich full text search system with PostgreSQL!
4
+
5
+ This talk describes about PGroonga that resolves these problems.
6
+
7
+ == License
8
+
9
+ === Slide
10
+
11
+ CC BY-SA 4.0
12
+
13
+ Use the followings for notation of the author:
14
+
15
+ * Kouhei Sutou
16
+
17
+ === Images
18
+
19
+ ==== Groonga and PGroonga logos
20
+
21
+ CC BY 3.0
22
+
23
+ Author: The Groonga Project
24
+
25
+ It is used in page header and some pages in the slide.
26
+
27
+ == For author
28
+
29
+ === Show
30
+
31
+ rake
32
+
33
+ === Publish
34
+
35
+ rake publish
36
+
37
+ == For viewers
38
+
39
+ === Install
40
+
41
+ gem install rabbit-slide-kou-pgconf-asia-2017
42
+
43
+ === Show
44
+
45
+ rabbit rabbit-slide-kou-pgconf-asia-2017.gem
data/Rakefile ADDED
@@ -0,0 +1,17 @@
1
+ require "rabbit/task/slide"
2
+
3
+ # Edit ./config.yaml to customize meta data
4
+
5
+ spec = nil
6
+ Rabbit::Task::Slide.new do |task|
7
+ spec = task.spec
8
+ # spec.files += Dir.glob("doc/**/*.*")
9
+ # spec.files -= Dir.glob("private/**/*.*")
10
+ spec.add_runtime_dependency("rabbit-theme-groonga")
11
+ end
12
+
13
+ desc "Tag #{spec.version}"
14
+ task :tag do
15
+ sh("git", "tag", "-a", spec.version.to_s, "-m", "Publish #{spec.version}")
16
+ sh("git", "push", "--tags")
17
+ end
data/config.yaml ADDED
@@ -0,0 +1,27 @@
1
+ ---
2
+ id: pgconf-asia-2017
3
+ base_name: pgroonga-2
4
+ tags:
5
+ - rabbit
6
+ - postgresql
7
+ - pgroonga
8
+ - pgconfasia
9
+ presentation_date: '2017-12-05'
10
+ presentation_start_time: '2017-12-05T16:20:00+09:00'
11
+ presentation_end_time: '2017-12-05T17:00:00+09:00'
12
+ version: 2017.12.5.0
13
+ licenses:
14
+ - CC-BY-SA-4.0
15
+ - CC-BY-3.0
16
+ slideshare_id: pgconfasia2017
17
+ speaker_deck_id:
18
+ ustream_id:
19
+ vimeo_id:
20
+ youtube_id:
21
+ author:
22
+ markup_language: :rd
23
+ name: Kouhei Sutou
24
+ email: kou@clear-code.com
25
+ rubygems_user: kou
26
+ slideshare_user: kou
27
+ speaker_deck_user:
Binary file
Binary file
Binary file
data/pgroonga-2.rab ADDED
@@ -0,0 +1,1048 @@
1
+ = PGroonga 2
2
+
3
+ : subtitle
4
+ Make PostgreSQL rich full text search system backend!
5
+ : author
6
+ Kouhei Sutou
7
+ : institution
8
+ ClearCode Inc.
9
+ : content-source
10
+ PGConf.ASIA 2017
11
+ : date
12
+ 2017-12-05
13
+ : start-time
14
+ 2017-12-05T16:20:00+09:00
15
+ : end-time
16
+ 2017-12-05T17:00:00+09:00
17
+ : theme
18
+ .
19
+
20
+ = Targets\n(('note:対象者'))
21
+
22
+ * Want to implement full text search with PostgreSQL\n
23
+ (('note:PostgreSQLで全文検索したい'))
24
+ * Not good at full text search\n
25
+ (('note:全文検索はよく知らない'))
26
+ * PGroonga 1.0.0 users\n
27
+ (('note:PGroonga 1.0.0は使ったことがある'))
28
+
29
+ = Abbreviations\n(('note:略語'))
30
+
31
+ * PG: PostgreSQL\n
32
+ (('note:ポスグレ: PostgreSQL'))
33
+ * FTS: Full text search\n
34
+ (('note:FTS: 全文検索'))
35
+
36
+ = FTS system: Targets\n(('note:全文検索システム:対象'))
37
+
38
+ (('tag:center'))
39
+ (('tag:large'))
40
+ (('tag:margin-bottom * 2'))
41
+ Many tests\n
42
+ (('note:大量のテキスト'))
43
+
44
+ * e.g.: Text data in office docs in file servers\n
45
+ (('note:例:ファイルサーバー内のオフィス文書内のテキスト'))
46
+ * e.g.: Item descriptions, chat logs, Wiki data, ...\n
47
+ (('note:例:商品説明やチャットログ、Wikiのデータなど'))
48
+
49
+ = FTS system: Goal\n(('note:全文検索システム:目的'))
50
+
51
+ Provide\n
52
+ needed info\n
53
+ when you need\n
54
+ (('note:必要な情報を必要なときに提供すること'))
55
+
56
+ = Provide needed info\n(('note:必要な情報を提供'))
57
+
58
+ * 😞 Not found\n
59
+ (('note:探している情報が見つからない'))
60
+ * 😃 Found\n
61
+ (('note:探している情報が見つかる'))
62
+ * 😆 Found ((*unconscious needed*)) info too!\n
63
+ (('note:意識していなかったけど実は欲しかった情報も見つかる!'))
64
+
65
+ = When you need\n(('note:必要なときに活用'))
66
+
67
+ * 😞 Need many times to find\n
68
+ (('note:なかなか見つからない'))
69
+ * 😃 Find in no time\n
70
+ (('note:すぐに見つかる'))
71
+ * 😆 Already found\n
72
+ (('note:すでに見つかっていた'))
73
+ * e.g.: Recommendation\n
74
+ (('note:例:レコメンデーション'))
75
+
76
+ = How to impl.: Options\n(('note:実装方法:選択肢'))
77
+
78
+ * Use FTS server\n
79
+ (('note:全文検索サーバーを使う'))
80
+ * Use PostgreSQL\n
81
+ (('note:PostgreSQLを使う'))
82
+
83
+ = FTS server: Pros\n(('note:全文検索サーバー案:メリット'))
84
+
85
+ * Provides all basic features\n
86
+ (('note:必要な機能が揃っている'))
87
+ * Provides advanced features\n
88
+ (('note:+αの機能もある'))
89
+ * Fast\n
90
+ (('note:速い'))
91
+
92
+ = FTS server: Cons1\n(('note:全文検索サーバー案:デメリット1'))
93
+
94
+ * Large implementation cost\n
95
+ (('note:実装コスト大'))
96
+ * Learn how to use from scratch\n
97
+ (('note:使い方を1から学ぶ必要がある'))
98
+ * How to implement data sync?\n
99
+ (('note:マスターデータの同期はどうする?'))
100
+
101
+ = FTS server: Cons2\n(('note:全文検索サーバー案:デメリット2'))
102
+
103
+ * Large maintenance cost\n
104
+ (('note:メンテナンスコスト大'))
105
+ * Learn how to operate from scratch\n
106
+ (('note:運用方法を1から学ぶ必要がある'))
107
+
108
+ = PostgreSQL: Pros1\n(('note:PostgreSQL案:メリット1'))
109
+
110
+ * Less implementation cost\n
111
+ (('note:実装コスト小'))
112
+ * Less things to be learned\n
113
+ (('note:新しく覚えることが少ない'))
114
+ * Can manage data at the same place\n
115
+ (('note:データの一元管理'))
116
+
117
+ = PostgreSQL: Pros2\n(('note:PostgreSQL案:メリット2'))
118
+
119
+ * Less operation cost\n
120
+ (('note:メンテナンスコスト小'))
121
+ * The current operation knowledge is reusable\n
122
+ (('note:既存の運用ノウハウを使える'))
123
+
124
+ = PostgreSQL: Cons\n(('note:PostgreSQL案:デメリット'))
125
+
126
+ * Built-in features aren't enough\n
127
+ (('note:組込機能では機能不足'))
128
+ * SQL limits efficiency\n
129
+ (('note:SQLの表現力不足'))
130
+ * e.g.: SQL needs multiple queries for a process that can be done by 1 query by FTS server\n
131
+ (('note:例:全文検索サーバーなら1クエリーで実現できる処理にSQLだと複数クエリー必要なことがある'))
132
+
133
+ = The 3rd option\n(('note:第3の選択肢'))
134
+
135
+ * Use FTS engine via PostgreSQL (SQL)\n
136
+ (('note:PostgreSQL経由(SQL)で全文検索エンジンを使う'))
137
+
138
+ = Pros\n(('note:メリット'))
139
+
140
+ * Fast and rich features\n
141
+ (('note:高速で豊富な機能'))
142
+ * Less implementation cost\n
143
+ (('note:実装コスト小'))
144
+ * Less operation cost\n
145
+ (('note:メンテナンスコスト小'))
146
+
147
+ = Cons\n(('note:デメリット'))
148
+
149
+ * Need PostgreSQL extension\n
150
+ (('note:PostgreSQLに拡張機能が必要'))
151
+ * Not available on DBaaS\n
152
+ (('note:DBaaSで使えない'))
153
+
154
+ = Option: No FTS knowledge\n(('note:オススメの選択肢:全文検索の知識ナシ'))
155
+
156
+ * Need only simple features\n
157
+ (('note:まだ単純な機能で十分'))
158
+ * Less data: LIKE with PostgreSQL\n
159
+ (('note:データ少:PostgreSQLでLIKE'))
160
+ * Need up-to-date FTS features\n
161
+ (('note:いまどきの全文検索機能が必要'))
162
+ * FTS engine via PostgreSQL\n
163
+ (('note:PostgreSQL経由で全文検索エンジン'))
164
+
165
+ = Option: With FTS knowledge\n(('note:オススメの選択肢:全文検索の知識アリ'))
166
+
167
+ * Need tuned FTS feature\n
168
+ (('note:カリカリにチューニングしたい'))
169
+ * PostgreSQL + FTS server\n
170
+ (('note:PostgreSQL+全文検索サーバー'))
171
+ * Others\n
172
+ (('note:それ以外'))
173
+ * FTS engine via PostgreSQL\n
174
+ (('note:PostgreSQL経由で全文検索エンジン'))
175
+
176
+ = Described option\n(('note:説明する選択肢'))
177
+
178
+ FTS engine via\n
179
+ PostgreSQL\n
180
+ (('note:PostgreSQL経由で全文検索エンジン'))
181
+
182
+ = FTS engine: Groonga\n(('note:全文検索エンジン:Groonga(ぐるんが)'))
183
+
184
+ * Embeddable FTS engine\n
185
+ (('note:組込可能な全文検索エンジン'))
186
+ * PGroonga: Groonga in PostgreSQL\n
187
+ (('note:PGroonga:PostgreSQLに組込'))
188
+ * Usable as FTS server\n
189
+ (('note:全文検索サーバーとして単独でも使用可能'))
190
+ * PostgreSQL + FTS server architecture is also available\n
191
+ (('note:PostgreSQL+全文検索サーバー構成もできる'))
192
+
193
+ = Groonga's hobby: data update\n(('note:Groongaの得意な事:データの追加・更新'))
194
+
195
+ * Make fresh data searchable!\n
196
+ (('note:新鮮な情報をすぐ検索可能!'))
197
+ * Batch update is needless\n
198
+ (('note:バッチで更新しなくてもよい'))
199
+ * Can use as chat backend\n
200
+ (('note:チャットくらいの頻度でもOK'))\n
201
+ e.g.: Zulip uses PGroonga\n
202
+ (('note:例:ZulipはPGroongaを採用'))
203
+
204
+ = Groonga's hobby: data update\n(('note:Groongaの得意な事:データの追加・更新'))
205
+
206
+ * Keep search performance while updating!\n
207
+ (('note:更新中も検索性能が落ちない!'))
208
+ * Updatable when there are many search users\n
209
+ (('note:利用ユーザーが多い時でも更新可能'))
210
+
211
+ = PGroonga\n(('note:PGroonga(ぴーじーるんが)'))
212
+
213
+ * PostgreSQL index\n
214
+ (('note:PostgreSQLのインデックス'))
215
+ * Alternative of GIN, RUM, ...\n
216
+ (('note:GIN・RUMなどと同じレイヤー'))
217
+ * Usage\n
218
+ (('note:使用方法'))
219
+ * (({CREATE INDEX ...}))\n
220
+ (({USING PGroonga ...}))
221
+
222
+ = PostgreSQL and FTS\n(('note:PostgreSQLと全文検索'))
223
+
224
+ * LIKE: Built-in(('note:(組込機能)'))
225
+ * textsearch: Built-in(('note:(組込機能)'))
226
+ * pg_trgm: Contrib(('note:(標準添付)'))
227
+ * Bundled in the archive\n
228
+ (('note:アーカイブには含まれている'))
229
+ * Need to install separately\n
230
+ (('note:別途インストールすれば使える'))
231
+
232
+ = LIKE and performance\n(('note:LIKEと速度'))
233
+
234
+ * Small data\n
235
+ (('note:少ないデータ'))
236
+ * Enough performance\n
237
+ (('note:十分実用的'))
238
+ * Not small data\n
239
+ (('note:少なくないデータ'))
240
+ * Need to tune\n
241
+ (('note:性能問題アリ'))
242
+
243
+ = LIKE and FTS system\n(('note:LIKEと全文検索システム'))
244
+
245
+ 👍Enough performance\n
246
+ in most case\n
247
+ (('note:速度が実用的なことも多い'))
248
+
249
+ * Data are small in many case\n
250
+ (('note:少ないデータなら'))
251
+
252
+ = LIKE and FTS system\n(('note:LIKEと全文検索システム'))
253
+
254
+ 👎Unable to sort\n
255
+ (('note:それっぽい順のソート不可'))
256
+
257
+ * Sort is important in FTS\n
258
+ (('note:全文検索ではソート順が重要'))
259
+ * Users check only\n
260
+ the first N entries\n
261
+ (('note:ユーザーは先頭N件しか見ない'))
262
+
263
+ = textsearch
264
+
265
+ * 👍Fast search by index\n
266
+ (('note:インデックスを作るので速い'))
267
+ * Need module for each lang\n
268
+ (('note:言語毎にモジュールが必要'))
269
+ * 👍Modules for English, French, ... are built-in\n
270
+ (('note:英語やフランス語などは組込'))
271
+ * 👎Modules for languages in Asia aren't maintained\n
272
+ (('note:アジア圏の言語用のモジュールはメンテされていない'))
273
+
274
+ = pg_trgm
275
+
276
+ * 👍Fast search by index\n
277
+ (('note:インデックスを作るので速い'))
278
+ * 👎Asian languages aren't enough supported\n
279
+ (('note:アジア圏の言語のサポートは十分ではない'))
280
+ * 👎Unable to sort\n
281
+ (('note:それっぽい順のソート不可'))
282
+
283
+ = RUM
284
+
285
+ * RUM = GIN + position\n
286
+ (('note:RUMは位置情報付きのGIN'))
287
+ * (('tag:xx-small'))
288
+ ((<"https://github.com/postgrespro/rum"|URL:https://github.com/postgrespro/rum>))
289
+ * pg_trgm/pg_bigm are slow for much matches case\n
290
+ (('note:pg_trgmとpg_bigmはマッチ数が多いと遅くなる'))
291
+ * RUM will solve it\n
292
+ (('note:GINの代わりにRUMを使うことで解決できるかも!'))
293
+
294
+ = PGroonga
295
+
296
+ * 👍Fast search by index\n
297
+ (('note:インデックスを作るので速い'))
298
+ * 👍Sortable\n
299
+ (('note:それっぽい順のソート可'))
300
+ * 👍Support all languages\n
301
+ (('note:全言語対応'))
302
+ * 👎Need to install separately\n
303
+ (('note:別途インストールする必要アリ'))
304
+
305
+ = FTS system with PostgreSQL\n(('note:PostgreSQLで全文検索システム'))
306
+
307
+ * PGroonga is the best!💯\n
308
+ (('note:PGroongaがベスト!'))
309
+ * PGroonga
310
+ * Fast(('note:(高速)'))
311
+ * Support all langs(('note:(全言語対応)'))
312
+ * Sortable(('note:(それっぽい順でソート可)'))
313
+
314
+ = FTS system: Basic features\n(('note:全文検索システム:基本機能'))
315
+
316
+ * Fast FTS + sort\n
317
+ (('note:高速全文検索+ソート'))
318
+ * Show texts around keyword\n
319
+ (('note:キーワード周辺テキスト表示'))
320
+ * Highlight keyword\n
321
+ (('note:検索キーワードハイライト'))
322
+
323
+ = FTS system: Adv. features\n(('note:全文検索システム:高度な機能'))
324
+
325
+ * Auto complete\n
326
+ (('note:オートコンプリート'))
327
+ * Similar search\n
328
+ (('note:類似文書検索'))
329
+ * Synonym expansion\n
330
+ (('note:同義語展開'))
331
+
332
+ = PGroonga 1.0.0
333
+
334
+ (('tag:center'))
335
+ ↓ are only supported\n
336
+ (('note:以下の機能のみ対応'))
337
+
338
+ * Fast FTS + sort\n
339
+ (('note:高速全文検索+ソート'))
340
+ * Show texts around keyword\n
341
+ (('note:キーワード周辺テキスト表示'))
342
+
343
+ = PGroonga 2
344
+
345
+ All features are supported!\n
346
+ (('note:全機能対応!'))
347
+
348
+ = PGroonga 1.0.0 → 2
349
+
350
+ * 😆 Many new features\n
351
+ (('note:たくさんの新機能'))
352
+ * 😆 Improve performance\n
353
+ (('note:性能改善'))
354
+ * 😞 API is changed\n
355
+ (('note:APIが変わった'))
356
+
357
+ = API change\n(('note:API変更'))
358
+
359
+ (('tag:center'))
360
+ Operator is changed\n
361
+ (('note:演算子変更'))
362
+
363
+ @@ → &@~
364
+ %% → &@
365
+ ...
366
+
367
+ = API change\n(('note:API変更'))
368
+
369
+ (('tag:center'))
370
+ (({pgroonga})) schema is deprecated\n
371
+ (('note:pgroongaスキーマを非推奨に'))
372
+
373
+ pgroonga.score → pgroonga_score
374
+ pgroonga.flush → pgroonga_flush
375
+ ...
376
+
377
+ = App for PGroonga 1.0.0\n(('note:PGroonga 1.0.0用アプリ'))
378
+
379
+ * Broken with PGroonga 2?\n
380
+ (('note:PGroonga 2では動かない?'))
381
+ * No! Work without any changes!\n
382
+ (('note:何も変更しなくても動くよ!'))
383
+
384
+ (('tag:center'))
385
+ Great! But why?\n
386
+ (('note:いいじゃん!でもなんで動くの?'))\n
387
+ ↓\n
388
+ "Painless upgrade" technique
389
+
390
+ = Painless upgrade
391
+
392
+ * PGroonga 2 provides\n
393
+ both 1 API and 2 API\n
394
+ (('note:PGroonga 2は1用のAPIも2用のAPIも両方提供'))
395
+ * Can use PGroonga 2 with 1 API\n
396
+ (('note:PGroonga 1のAPIでPGroonga 2を使える'))
397
+
398
+ = Painless upgrade
399
+
400
+ * The last PGroonga 1.X\n
401
+ provides both 1 API and\n
402
+ partially 2 API\n
403
+ (('note:PGroonga 1系の最終版は1用のAPIも2用のAPIの一部も提供'))
404
+ * Can use PGroonga 1 with 2 API\n
405
+ (('note:PGroonga 2のAPIでPGroonga 1を使える'))
406
+
407
+ = Painless upgrade
408
+
409
+ * PGroonga 2 keeps 1 API\n
410
+ (('note:PGroonga 2の間は1のAPIを維持'))
411
+ * PGroonga 3 will drop 1 API\n
412
+ (('note:PGroonga 3で1のAPIを削除予定'))
413
+ * Just need to upgrade API until 3\n
414
+ (('note:PGroonga 3までにAPIをアップグレードすればよい'))
415
+
416
+ = Painless upgrade
417
+
418
+ * App for PGroonga 1.0.0 doesn't work with PGroonga 2\n
419
+ (('note:PGroonga 1.0.0用のアプリがPGroonga 2で動かない'))
420
+ * It's a bug. Please report it!\n
421
+ (('note:バグなので報告してね!'))
422
+
423
+ = FTS system: Basic features\n(('note:全文検索システム:基本機能'))
424
+
425
+ * Fast FTS + sort\n
426
+ (('note:高速全文検索+ソート'))
427
+ * Show texts around keyword\n
428
+ (('note:キーワード周辺テキスト表示'))
429
+ * Highlight keyword\n
430
+ (('note:検索キーワードハイライト'))
431
+
432
+ = Fast FTS + sort\n(('note:高速全文検索+ソート'))
433
+
434
+ # image
435
+ # src = images/php-document-search-search.png
436
+ # relative_height = 100
437
+
438
+ = Table definition
439
+
440
+ # coderay sql
441
+
442
+ CREATE TABLE entries (
443
+ -- Need primary key
444
+ -- It's needed for sort
445
+ id integer PRIMARY KEY,
446
+ title text,
447
+ content text
448
+ );
449
+
450
+ = Index definition
451
+
452
+ # coderay sql
453
+
454
+ -- For FTS.
455
+ -- The default is good enough!
456
+ CREATE INDEX entries_full_text_search
457
+ ON entries
458
+ -- "USING PGroonga" is important!
459
+ -- Primary key is for sort!
460
+ USING PGroonga (id, title, content);
461
+
462
+ = Insert data
463
+
464
+ # coderay sql
465
+
466
+ -- Normal INSERT.
467
+ INSERT INTO entries
468
+ VALUES (1,
469
+ 'Fast FTS with Groonga!',
470
+ 'Fast FTS is needed!');
471
+
472
+ = FTS
473
+
474
+ # coderay sql
475
+
476
+ SELECT title FROM entries
477
+ WHERE
478
+ -- &@~ is for FTS
479
+ -- AND search with "search" and "fast"
480
+ title &@~ 'search fast' OR
481
+ content &@~ 'search fast';
482
+
483
+ = FTS: LIKE
484
+
485
+ # coderay sql
486
+
487
+ SELECT title FROM entries
488
+ WHERE
489
+ -- Index search for LIKE is supported
490
+ -- = Improve app perf without any changes
491
+ -- NOTE: &@~ is faster than LIKE
492
+ title LIKE '%search%' OR
493
+ content LIKE '%search%';
494
+
495
+ = Sort
496
+
497
+ # coderay sql
498
+
499
+ SELECT
500
+ title,
501
+ -- pgroonga_score(TABLE_NAME) returns
502
+ -- precision as number
503
+ pgroonga_score(entries) AS score
504
+ FROM entries
505
+ WHERE -- ...
506
+ -- Sort by precision
507
+ ORDER BY score DESC LIMIT 10;
508
+
509
+ = Highlight keyword\n(('note:キーワードハイライト'))
510
+
511
+ # image
512
+ # src = images/php-document-search-search.png
513
+ # relative_height = 100
514
+
515
+ = Hightlight for HTML
516
+
517
+ # coderay sql
518
+
519
+ SELECT
520
+ pgroonga_highlight_html(
521
+ title,
522
+ -- Extract keywords from query
523
+ pgroonga_query_extract_keywords('search fast'))
524
+ FROM entries
525
+ WHERE title &@~ 'search fast' OR
526
+ content &@~ 'search fast';
527
+
528
+ = Highlight for HTML: Example
529
+
530
+ # coderay html
531
+
532
+ Fast search with <Groonga>!
533
+
534
+ <span class="keyword">Fast</span>
535
+ ↑↓ Keywords are marked up with "class"
536
+ <span class="keyword">search</span>!
537
+ with &lt;Groonga&gt;! ← Escape tag
538
+
539
+ = Texts around keyword\n(('note:キーワード周辺テキスト'))
540
+
541
+ # image
542
+ # src = images/php-document-search-search.png
543
+ # relative_height = 100
544
+
545
+ = Texts around keyword for HTML
546
+
547
+ # coderay sql
548
+
549
+ SELECT
550
+ pgroonga_snippet_html(
551
+ content,
552
+ -- Extract keywords from query
553
+ pgroonga_query_extract_keywords('search fast'))
554
+ FROM entries
555
+ WHERE title &@~ 'search fast' OR
556
+ content &@~ 'search fast';
557
+
558
+ = Example
559
+
560
+ # coderay html
561
+
562
+ ...fast search with <Groonga>!...
563
+
564
+ ARRAY[
565
+ ↓ First
566
+ '<span class="keyword">fast</span>
567
+ ↑↓ Keywords are marked up with "class"
568
+ <span class="keyword">search/span>!
569
+ with &lt;Groonga&gt;!', ← Escape tag
570
+ '...' ← Second
571
+ ]
572
+
573
+ = FTS system: Adv. features\n(('note:全文検索システム:高度な機能'))
574
+
575
+ * Auto complete\n
576
+ (('note:オートコンプリート'))
577
+ * Similar search\n
578
+ (('note:類似文書検索'))
579
+ * Synonym expansion\n
580
+ (('note:同義語展開'))
581
+
582
+ = Auto complete\n(('note:オートコンプリート'))
583
+
584
+ # image
585
+ # src = images/php-document-search.png
586
+ # relative_height = 100
587
+
588
+ = Auto complete: Preparation\n(('note:オートコンプリート:準備'))
589
+
590
+ * Master table\n
591
+ (('note:マスターテーブル'))
592
+ * Candidate\n
593
+ (('note:候補:(例:牛乳)'))
594
+ * Readings in Katakana\n
595
+ (Only for Japanese)\n
596
+ (('note:ヨミ(日本語の場合。カタカナ。複数登録可。)'))
597
+ * (('note:例:ギュウニュウ・ミルク'))
598
+
599
+ = Auto complete: Implementation\n(('note:オートコンプリート:実装方法'))
600
+
601
+ * OR search with ...
602
+ * Prefix search against readings\n
603
+ (Only for Japanese)\n
604
+ (('note:ヨミを前方一致検索(日本語の場合。)'))
605
+ * Loose FTS against candidate\n
606
+ (('note:候補をゆるく全文検索'))
607
+ * Sort by candidate\n
608
+ (('note:候補でソート'))
609
+
610
+ (('tag:xx-small'))
611
+ ((<"https://pgroonga.github.io/how-to/auto-complete.html"|URL:https://pgroonga.github.io/how-to/auto-complete.html>))
612
+
613
+ = Table definition
614
+
615
+ # coderay sql
616
+
617
+ CREATE TABLE terms (
618
+ term text, -- Candidate
619
+ readings text[], -- Readings
620
+ );
621
+
622
+ = Data example
623
+
624
+ # coderay sql
625
+
626
+ INSERT INTO terms VALUES (
627
+ 'milk', -- Candidate
628
+ ARRAY[
629
+ -- Reading in Katakana
630
+ 'ギュウニュウ', -- "milk" in Japanese
631
+ -- Multiple readings
632
+ 'ミルク' -- "milk" in Katakana
633
+ ]
634
+ );
635
+
636
+ = Data management\n(('note:データ管理'))
637
+
638
+ * Easy to maintain because it's a normal table\n
639
+ (('note:普通のテーブルなので管理が楽'))
640
+ * Easy to insert/delete/update\n
641
+ (('note:追加・削除・更新が楽'))
642
+ * Normal backup and replication\n
643
+ (('note:ダンプ・リストアもレプリケーションもいつも通り'))
644
+
645
+ = Index for prefix search\n(('note:前方一致用インデックス'))
646
+
647
+ # coderay sql
648
+
649
+ CREATE INDEX prefix_search ON terms
650
+ USING PGroonga
651
+ -- ...text_array_term_search...
652
+ (readings
653
+ pgroonga_text_array_term_search_ops_v2);
654
+
655
+ = Index for loose FTS\n(('note:緩い全文検索用インデックス'))
656
+
657
+ # coderay sql
658
+
659
+ CREATE INDEX loose_search ON terms
660
+ USING PGroonga (term)
661
+ -- Tokenizer for loose full text search
662
+ WITH (tokenizer='TokenBigramSplitSymbolAlphaDigit');
663
+
664
+ = How to search\n(('note:検索方法'))
665
+
666
+ # coderay sql
667
+
668
+ SELECT term FROM terms
669
+ -- Prefix search against readings
670
+ WHERE readings &^~ '${INPUT}' OR
671
+ -- Loose full text search
672
+ term &@ '${INPUT}'
673
+ ORDER BY term LIMIT 10; -- Sort
674
+
675
+ = Search example: Candidate\n(('note:検索例:候補'))
676
+
677
+ # coderay sql
678
+
679
+ -- User inputs "il"
680
+ SELECT term FROM terms
681
+ -- Prefix search against readings
682
+ WHERE readings &^~ 'il' OR
683
+ -- Loose full text search (Hit)
684
+ term &@ 'li'
685
+ ORDER BY term LIMIT 10; -- Sort
686
+
687
+ = Search example: Katakana\n(('note:検索例:カタカナ'))
688
+
689
+ # coderay sql
690
+
691
+ -- User inputs "ギュウ"
692
+ SELECT term FROM terms
693
+ -- Prefix search against readings (Hit)
694
+ WHERE readings &^~ 'ギュウ' OR
695
+ -- Loose full text search
696
+ term &@ 'ギュウ'
697
+ ORDER BY term LIMIT 10; -- Sort
698
+
699
+ = Search example: Hiragana\n(('note:検索例:ひらがな'))
700
+
701
+ # coderay sql
702
+
703
+ -- User inputs "ぎゅう"
704
+ SELECT term FROM terms
705
+ -- Prefix search against readings (Hit)
706
+ WHERE readings &^~ 'ぎゅう' OR
707
+ -- Loose full text search
708
+ term &@ 'ぎゅう'
709
+ ORDER BY term LIMIT 10; -- Sort
710
+
711
+ = Search example: Romaji\n(('note:検索例:ローマ字'))
712
+
713
+ # coderay sql
714
+
715
+ -- User inputs "gyu"
716
+ SELECT term FROM terms
717
+ -- Prefix search against readings (Hit)
718
+ WHERE readings &^~ 'gyu' OR
719
+ -- Loose full text search
720
+ term &@ 'gyu'
721
+ ORDER BY term LIMIT 10; -- Sort
722
+
723
+ = Synonym expansion\n(('note:同義語展開'))
724
+
725
+ * Synonym\n
726
+ (('note:同義語'))
727
+ * Same mean but different notation\n
728
+ (('note:同じ意味だが表記が異なる語'))
729
+ * e.g.: "PostgreSQL" and "PG"\n
730
+ (('note:例:「PostgreSQL」と「ポスグレ」'))
731
+
732
+ = Synonym expansion\n(('note:同義語展開'))
733
+
734
+ * Users don't want to care\n
735
+ (('note:ユーザーは気にしたくない'))
736
+ * Synonym expansion\n
737
+ (('note:同義語展開'))
738
+ * OR search with all synonyms\n
739
+ (('note:同義語すべてでOR検索'))
740
+
741
+ = Implementation\n(('note:実装方法'))
742
+
743
+ * Create synonym table\n
744
+ (('note:同義語管理テーブルを作成'))
745
+ * Expand synonyms in query\n
746
+ (('note:クエリー内の同義語を展開'))
747
+ * Search by expanded query\n
748
+ (('note:展開後のクエリーで検索'))
749
+
750
+ (('tag:xx-small'))
751
+ ((<"https://pgroonga.github.io/reference/functions/pgroonga-query-expand.html"|URL:https://pgroonga.github.io/reference/functions/pgroonga-query-expand.html>))
752
+
753
+ = Table definition
754
+
755
+ # coderay sql
756
+ CREATE TABLE synonyms (
757
+ -- Term to be expanded
758
+ term text,
759
+ -- Synonym list.
760
+ -- Including the "term" itself.
761
+ -- If you don't input the "term",
762
+ -- the "term" is unsearchable term.
763
+ terms text[]
764
+ );
765
+
766
+ = Data example
767
+
768
+ # coderay sql
769
+ INSERT INTO synonyms
770
+ VALUES ('PostgreSQL', -- Expand "PostgreSQL"
771
+ ARRAY['PostgreSQL', 'PG']),
772
+ ('PG', -- Expand "PG"
773
+ ARRAY['PG', 'PostgreSQL']);
774
+
775
+ = Data management\n(('note:データ管理'))
776
+
777
+ * Easy to maintain because it's a normal table\n
778
+ (('note:普通のテーブルなので管理が楽'))
779
+ * Easy to insert/delete/update\n
780
+ (('note:追加・削除・更新が楽'))
781
+ * Normal backup and replication\n
782
+ (('note:ダンプ・リストアもレプリケーションもいつも通り'))
783
+
784
+ = Index definition
785
+
786
+ # coderay sql
787
+ CREATE INDEX synonym_search ON synonyms
788
+ USING PGroonga
789
+ -- ...text_term_search...
790
+ -- For equal search
791
+ (term pgroonga_text_term_search_ops_v2);
792
+
793
+ = Confirm\n(('note:確認方法'))
794
+
795
+ # coderay sql
796
+
797
+ SELECT pgroonga_query_expand(
798
+ 'synonyms', -- Table name
799
+ 'term', -- Column name to be expanded
800
+ 'terms', -- Column name for synonyms
801
+ 'PostgreSQL' -- Query
802
+ );
803
+ -- '((PostgreSQL) OR (PG))'
804
+
805
+ = Search\n(('note:検索方法'))
806
+
807
+ # coderay sql
808
+ SELECT title FROM entries
809
+ WHERE
810
+ -- title &@~ 'DB ((PostgreSQL) OR (PG))'
811
+ title &@~
812
+ pgroonga_query_expand('synonyms',
813
+ 'term',
814
+ 'terms',
815
+ 'DB PostgreSQL');
816
+
817
+ = Similar search\n(('note:類似文書検索'))
818
+
819
+ * Query is document itself\n
820
+ (('note:検索クエリーは文書そのもの'))
821
+ * Not keyword\n
822
+ (('note:キーワードではない'))
823
+ * Use case\n
824
+ (('note:利用例'))
825
+ * Show related entries\n
826
+ (('note:関連エントリーの提示に使える'))
827
+
828
+ = Implementation\n(('note:実現方法'))
829
+
830
+ * Create dedicated index\n
831
+ (('note:類似検索用のインデックスを作る'))
832
+ * Use tokenizer for target language\n
833
+ (('note:対象の言語に合わせた処理で精度向上'))
834
+ * e.g.: MeCab based tokenizer for Japanese\n
835
+ (('note:例:日本語ならMeCabベースのトークナイザーを活用'))
836
+ * Use dedicated operator\n
837
+ (('note:類似検索用の演算子を使う'))
838
+
839
+ = Index definition
840
+
841
+ # coderay sql
842
+
843
+ CREATE INDEX entries_similar_search
844
+ ON entries
845
+ -- Target: Both title and content
846
+ -- Reason: Title is important
847
+ USING PGroonga (id, (title || ' ' || content))
848
+ -- TokenMecab is good for Japanese
849
+ WITH (tokenizer='TokenMecab');
850
+
851
+ = Search
852
+
853
+ # coderay sql
854
+
855
+ SELECT title,
856
+ pgroonga_score(entries) AS score
857
+ FROM entries
858
+ WHERE
859
+ -- &@* is operator for similar search.
860
+ -- Search with existing document.
861
+ (title || ' ' || content) &@*
862
+ '...fast search with Groonga!...'
863
+ ORDER BY score DESC LIMIT 3;
864
+
865
+ = Result example\n(('note:結果例'))
866
+
867
+ Query:
868
+ ...search with Groonga!...
869
+
870
+ Hit example:
871
+ ...search with PGroonga!...
872
+
873
+ = Wrap up: Basic features\n(('note:全文検索システム:基本機能'))
874
+
875
+ * Fast FTS + sort\n
876
+ (('note:高速全文検索+ソート'))
877
+ * Show texts around keyword\n
878
+ (('note:キーワード周辺テキスト表示'))
879
+ * Highlight keyword\n
880
+ (('note:検索キーワードハイライト'))
881
+
882
+ = Wrap up: Adv. features\n(('note:全文検索システム:高度な機能'))
883
+
884
+ * Auto complete\n
885
+ (('note:オートコンプリート'))
886
+ * Similar search\n
887
+ (('note:類似文書検索'))
888
+ * Synonym expansion\n
889
+ (('note:同義語展開'))
890
+
891
+ = FTS system: Next step\n(('note:全文検索システム:次の一歩'))
892
+
893
+ * Support structured data\n
894
+ (('note:構造化データ対応'))
895
+ * Office document, HTML, ...\n
896
+ (('note:オフィス文書・HTMLなど'))
897
+ * Needed features\n
898
+ (('note:対応に必要な処理'))
899
+ * Text/metadata extraction\n
900
+ (('note:テキスト・メタデータ抽出'))
901
+ * Create screenshot\n
902
+ (('note:スクリーンショット作成'))
903
+
904
+ = Extraction tool\n(('note:抽出ツール'))
905
+
906
+ * Apache Tika
907
+ * Apache Lucene's subproject
908
+ * Many supported formats\n
909
+ (('note:対応フォーマットが多い'))
910
+ * ChupaText
911
+ * Groonga's subproject
912
+ * Screenshot support\n
913
+ (('note:スクリーンショット作成対応'))
914
+
915
+ = ChupaText
916
+
917
+ * Supported formats(('note:(対応フォーマット)'))
918
+ * Word/Excel/PowerPoint
919
+ * ODT/ODS/ODP(('note:(OpenDocument)'))
920
+ * PDF/HTML/XML/CSV/...
921
+ * Interface(('note:(インターフェイス)'))
922
+ * HTTP and command line\n
923
+ (('note:HTTPとコマンドライン'))
924
+
925
+ = Install\n(('note:インストール'))
926
+
927
+ * Use Docker or Vagrant\n
928
+ (('note:DockerかVagrantを使うのが楽'))
929
+ * (('tag:xx-small'))
930
+ ((<"https://github.com/ranguba/chupa-text-docker"|URL:https://github.com/ranguba/chupa-text-docker>))
931
+ * (('tag:xx-small'))
932
+ ((<"https://github.com/ranguba/chupa-text-vagrant"|URL:https://github.com/ranguba/chupa-text-vagrant>))
933
+
934
+ = ChupaText:Docker
935
+
936
+ # coderay console
937
+ % GITHUB=https://github.com
938
+ % git clone \
939
+ ${GITHUB}/ranguba/chupa-text-docker.git
940
+ % cd chupa-text-docker
941
+ % docker-compose up --build
942
+
943
+ = Usage\n(('note:使い方'))
944
+
945
+ # coderay console
946
+ % curl \
947
+ --form data=@XXX.pdf \
948
+ http://localhost:20080/extraction.json
949
+
950
+ = Result example\n(('note:結果例'))
951
+
952
+ # coderay json
953
+
954
+ {
955
+ "mime-type": "application/pdf", # MIME type for the original data
956
+ "size": 147159, # Metadata
957
+ ...,
958
+ "texts": [ # Extracted texts
959
+ {
960
+ "mime-type": "text/plain", # MIME type for the extracted data
961
+ ...,
962
+ "creator": "Adobe Illustrator CS3", # Metadata
963
+ "body": "This is sample PDF. ...", # Extracted text
964
+ "screenshot": {
965
+ "mime-type": "image/png", # MIME type for screenshot
966
+ "data": "iVBORw...", # Base64-ed image data
967
+ "encoding": "base64"
968
+ }
969
+ }
970
+ ]
971
+ }
972
+
973
+ = Web UI
974
+
975
+ # image
976
+ # src = images/chupa-text-web-ui-form.png
977
+ # relative_height = 100
978
+
979
+ = Web UI: Extraction example\n(('note:Web UI:抽出例'))
980
+
981
+ # image
982
+ # src = images/chupa-text-web-ui-extract-metadata.png
983
+ # relative_height = 100
984
+
985
+ = Web UI: Extraction example\n(('note:Web UI:抽出例'))
986
+
987
+ # image
988
+ # src = images/chupa-text-web-ui-extract-text-and-screenshot.png
989
+ # relative_height = 100
990
+
991
+ = ChupaText:Vagrant
992
+
993
+ # coderay console
994
+ % GITHUB=https://github.com
995
+ % git clone \
996
+ ${GITHUB}/ranguba/chupa-text-vagrant.git
997
+ % cd chupa-text-vagrant
998
+ % vagrant up
999
+
1000
+ (('tag:center'))
1001
+ Usage is the same as Docker's\n
1002
+ (('note:使い方はDocker版と同じ'))
1003
+
1004
+ = Use cases(('note:(活用例)'))
1005
+
1006
+ * Extracted text
1007
+ * Insert into PGroonga
1008
+ * Extracted metadata
1009
+ * Insert into PGroonga
1010
+ * Use for condition(('note:(絞り込みに活用)'))
1011
+ * Created screenshot
1012
+ * Show in search result(('note:(検索結果で表示)'))
1013
+
1014
+ = Wrap up\n(('note:まとめ'))
1015
+
1016
+ * FTS engine via PostgreSQL\n
1017
+ (('note:PostgreSQL経由で全文検索エンジン'))
1018
+ * Provide decision info\n
1019
+ (('note:採用の判断材料を提供'))
1020
+
1021
+ = Wrap up\n(('note:まとめ'))
1022
+
1023
+ * Show how to impl. FTS system\n
1024
+ (('note:全文検索システム実装例を紹介'))
1025
+ * PGroonga
1026
+ * PGroonga 1.0.0 and 2\n
1027
+ (('note:PGroonga 1.0.0と2'))
1028
+ * Painless upgrade
1029
+
1030
+ = Wrap up\n(('note:まとめ'))
1031
+
1032
+ * Show how to support structured data\n
1033
+ (('note:構造化データの対応方法を紹介'))
1034
+ * ChupaText
1035
+
1036
+ = Support service\n(('note:サポートサービス紹介'))
1037
+
1038
+ * Install support(('note:(導入支援)'))\n
1039
+ (('note:設計支援・性能検証・移行支援・…'))
1040
+ * Development support(('note:(開発支援)'))\n
1041
+ (('note:サンプルコード提供・問い合わせ対応・…'))
1042
+ * Operation support(('note:(運用支援)'))\n
1043
+ (('note:障害対応・チューニング支援・…'))
1044
+
1045
+ Contact(('note:(問い合わせ先)'))
1046
+
1047
+ (('tag:x-small'))
1048
+ ((<"https://www.clear-code.com/contact/?type=groonga"|URL:https://www.clear-code.com/contact/?type=groonga>))
data/theme.rb ADDED
@@ -0,0 +1,3 @@
1
+ @groonga_product = "pgroonga"
2
+
3
+ include_theme("groonga")
metadata ADDED
@@ -0,0 +1,88 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: rabbit-slide-kou-pgconf-asia-2017
3
+ version: !ruby/object:Gem::Version
4
+ version: 2017.12.5.0
5
+ platform: ruby
6
+ authors:
7
+ - Kouhei Sutou
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2017-12-05 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: rabbit
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - ">="
18
+ - !ruby/object:Gem::Version
19
+ version: 2.0.2
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - ">="
25
+ - !ruby/object:Gem::Version
26
+ version: 2.0.2
27
+ - !ruby/object:Gem::Dependency
28
+ name: rabbit-theme-groonga
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - ">="
32
+ - !ruby/object:Gem::Version
33
+ version: '0'
34
+ type: :runtime
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - ">="
39
+ - !ruby/object:Gem::Version
40
+ version: '0'
41
+ description: |-
42
+ PGroonga 2.0 has been released with 2 years development since PGroonga 1.0.0. PGroonga 1.0.0 just provides fast full text search with all languages support. It's important because it's a lacked feature in PostgreSQL. PGroonga 2.0 provides more useful features to implement rich full text search system with PostgreSQL. This session shows how to implement rich full text search system with PostgreSQL!
43
+
44
+ This talk describes about PGroonga that resolves these problems.
45
+ email:
46
+ - kou@clear-code.com
47
+ executables: []
48
+ extensions: []
49
+ extra_rdoc_files: []
50
+ files:
51
+ - ".rabbit"
52
+ - README.rd
53
+ - Rakefile
54
+ - config.yaml
55
+ - images/chupa-text-web-ui-extract-metadata.png
56
+ - images/chupa-text-web-ui-extract-text-and-screenshot.png
57
+ - images/chupa-text-web-ui-form.png
58
+ - images/php-document-search-search.png
59
+ - images/php-document-search.png
60
+ - pdf/pgconf-asia-2017-pgroonga-2.pdf
61
+ - pgroonga-2.rab
62
+ - theme.rb
63
+ homepage: http://slide.rabbit-shocker.org/authors/kou/pgconf-asia-2017/
64
+ licenses:
65
+ - CC-BY-SA-4.0
66
+ - CC-BY-3.0
67
+ metadata: {}
68
+ post_install_message:
69
+ rdoc_options: []
70
+ require_paths:
71
+ - lib
72
+ required_ruby_version: !ruby/object:Gem::Requirement
73
+ requirements:
74
+ - - ">="
75
+ - !ruby/object:Gem::Version
76
+ version: '0'
77
+ required_rubygems_version: !ruby/object:Gem::Requirement
78
+ requirements:
79
+ - - ">="
80
+ - !ruby/object:Gem::Version
81
+ version: '0'
82
+ requirements: []
83
+ rubyforge_project:
84
+ rubygems_version: 2.5.2.1
85
+ signing_key:
86
+ specification_version: 4
87
+ summary: PGroonga 2 – Make PostgreSQL rich full text search system backend!
88
+ test_files: []