rabbit-slide-kenhys-debconf18-rethinking-of-debian-watch 2018.08.03.1

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,537 @@
1
+ = Rethinking of the debian/watch
2
+
3
+ : subtitle
4
+ With thought experiments about uscan
5
+ : author
6
+ Kentaro Hayashi
7
+ : content-source
8
+ DebConf18 in Taiwan
9
+ : date
10
+ 2018-08-03
11
+ : institution
12
+ ClearCode Inc.
13
+ : allotted-time
14
+ 20m
15
+ : theme
16
+ .
17
+
18
+ = Digest of this talk
19
+
20
+ * Current d/watch file is sometimes complicated
21
+ * Update to new format (v5) can solve it
22
+
23
+ = Agenda
24
+
25
+ * Who I am?
26
+ * Why I started to play with debian/watch?
27
+ * Introduction about debian/watch
28
+ * The debian/watch current statistics
29
+ * Thought experiments about debian/watch
30
+ * Conclusion
31
+
32
+ = Agenda
33
+
34
+ * ((*Who I am?*))
35
+ * Why I started to play with debian/watch?
36
+ * Introduction about debian/watch
37
+ * The debian/watch current statistics
38
+ * Thought experiments about debian/watch
39
+ * Conclusion
40
+
41
+ = Who I am?
42
+
43
+ # image
44
+ # relative-width = 10
45
+ # src = images/profile.png
46
+
47
+ * Kentaro Hayashi <kenhys@gmail.com>
48
+ * Twitter/GitHub (@kenhys) / Debian contributor (@kenhys-guest)
49
+ * ((*Trackpoint fan*)) - soft dome user
50
+ * Working for ClearCode Inc.
51
+
52
+ = Ad: ClearCode Inc.
53
+
54
+ # image
55
+ # relative-width = 50
56
+ # src = images/logo-combination-standard.svg
57
+
58
+ * ((<URL:https://www.clear-code.com/>))
59
+ * Free software is important in ClearCode Inc.
60
+ * We develop/support software with our free software development experiences.
61
+ * We feed back our business experiences to free software.
62
+
63
+ = As a contributor
64
+
65
+ * Maintainer of some packages
66
+ * groonga (Upstream releases monthly updates)
67
+ * fcitx-imlist
68
+ * libhinawa
69
+ * ((<URL:https://qa.debian.org/developer.php?email=hayashi@clear-code.com>))
70
+
71
+ = Agenda
72
+
73
+ * Who I am?
74
+ * ((*Why I started to play with debian/watch?*))
75
+ * Introduction about debian/watch
76
+ * The debian/watch current statistics
77
+ * Thought experiments about debian/watch
78
+ * Conclusion
79
+
80
+ = Why playing with d/watch?
81
+
82
+ * #899119: Need redirector for osdn.net
83
+ * ((<URL:https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=899119>))
84
+
85
+ = d/watch for fonts-sawarabi-mincho
86
+
87
+ version=4
88
+ opts="uversionmangle=s/-beta/~beta/;s/-rc/~rc/;s/-preview/~preview/, \
89
+ pagemangle=s%<osdn:file url="([^<]*)</osdn:file>%<a href="$1">$1</a>%g, \
90
+ downloadurlmangle=s%projects/sawarabi-fonts/downloads%frs/redir\.php?m=iij&f=sawarabi-fonts%g;s/xz\//xz/" \
91
+ https://osdn.net/projects/sawarabi-fonts/releases/rss \
92
+ https://osdn.net/projects/sawarabi-fonts/downloads/.*/sawarabi-mincho@ANY_VERSION@@ARCHIVE_EXT@/ debian uupdate
93
+
94
+ * Need to parse RSS!
95
+
96
+ = d/watch for fonts-sawarabi-mincho
97
+
98
+ * Combination with:
99
+
100
+ * pagemangle
101
+ * downloadurlmangle
102
+ * uversionmangle
103
+
104
+ = pagemangle?
105
+
106
+ * pagemangle=s%<osdn:file url="([^<]*)</osdn:file>%<a href="$1">$1</a>%g,
107
+ * Convert a page content
108
+ * <osdn:file url="([^<]*)</osdn:file> (('&#x27a1;')) <a href="$1">$1</a>
109
+
110
+ = downloadurlmangle?
111
+
112
+ * downloadurlmangle=s%projects/sawarabi-fonts/downloads%frs/redir\.php?m=iij&f=sawarabi-fonts%g;s/xz\//xz/"
113
+ * Convert a download url
114
+ * projects/sawarabi-fonts/downloads (('&#x27a1;')) frs/redir\.php?m=iij&f=sawarabi-fonts
115
+ * xz/ (('&#x27a1;')) xz
116
+
117
+ = uversionmangle?
118
+
119
+ * uversionmangle=s/-beta/~beta/;s/-rc/~rc/;s/-preview/~preview/
120
+ * Convert a specific suffix
121
+ * -beta (('&#x27a1;')) ~beta
122
+ * -rc (('&#x27a1;')) ~rc
123
+ * -preview (('&#x27a1;')) ~preview
124
+
125
+ = #899119
126
+
127
+ # blockquote
128
+ # title = #899119#5
129
+ Hideki Yamane:\n
130
+ "They sometimes changes download way to reduce download access
131
+ by preventing bot, so debian/watch file is complicated and it
132
+ annoyed us. Implementing redirector in qa.debian.org would improve
133
+ this situation."
134
+
135
+ = Motivation
136
+
137
+ * It seems that sometimes d/watch file is ((*too complicated*))
138
+ * I'll look into d/watch a bit
139
+
140
+ = Agenda
141
+
142
+ * Who I am?
143
+ * Why I started to play with debian/watch?
144
+ * ((*Introduction about debian/watch*))
145
+ * The debian/watch current statistics
146
+ * Thought experiments about debian/watch
147
+ * Conclusion
148
+
149
+ = Introduction about debian/watch
150
+
151
+ * Used to check for newer versions of upstream software
152
+ * https://wiki.debian.org/debian/watch is the good start point
153
+
154
+ = The typical examples
155
+
156
+ * There are 8 examples
157
+ * Bitbucket, GitHub, Gitlab(Salsa), Google Code, LaunchPad, PyPI, and Sourceforge
158
+
159
+ = Common mistakes to avoid
160
+
161
+ * There are 8 common mistakes in d/watch
162
+ * (('note: see: https://wiki.debian.org/debian/watch'))
163
+
164
+ = Common mistakes(1)
165
+
166
+ * Not escaping dots, which match any character
167
+ * The solution is:
168
+
169
+ * Use ((*\.*)) instead of ((*.*)) in the regex
170
+
171
+ = Common mistakes(2)
172
+
173
+ * A file extension regex that is not flexible enough
174
+ * The solution is:
175
+ * Use ((*\.(?:zip|tgz|tbz|txz|(?:tar\.(?:gz|bz2|xz)))*))
176
+
177
+ = Common mistakes(3)
178
+
179
+ * Not anchoring the version group at the right place
180
+ * The solution is:
181
+ * Include something before (\d\S+) like fooproj-((*(\d\S+)*))\.tar\.gz
182
+
183
+ = Common mistakes(4)
184
+
185
+ * Not starting the version part of the regex with a digit
186
+ * The solution is:
187
+ * Use ((*\d*)) instead of ((*.*))
188
+
189
+ = Common mistakes(5)
190
+
191
+ * Not being flexible enough in the path to the file
192
+ * The solution is:
193
+ * Use http://example.com/someproject/((*.**))/program-(\d\S+)\.tar\.gz instead of http://example.com/someproject/((*path/to/program/downloads*))/program-(\d\S+)\.tar\.gz
194
+
195
+ = Common mistakes(6)
196
+
197
+ * Not mangling upstream versions that are alphas, betas or release candidates to make them sort before the final release
198
+
199
+ * The solution is:
200
+ * Use ((*uversionmangle*)) like opts=uversionmangle=s/(\d)[_\.\-\+]?((RC|rc|pre|dev|beta|alpha)\d*)$/$1~$2/
201
+
202
+ = Common mistakes(7)
203
+
204
+ * Not mangling Debian versions to remove the +dfsg.1 or +dfsg1 suffix
205
+ * The solution is:
206
+ * Use ((*dversionmangle*)) like opts=dversionmangle=s/\+(debian|dfsg|ds|deb)(\.?\d+)?$//
207
+
208
+ = Common mistakes(8)
209
+
210
+ * Not enabling cryptographic signature verification when your upstream signs their releases with OpenPGP
211
+ * The solution is:
212
+ * Support cryptographic signature!
213
+
214
+ = Impression about d/watch
215
+
216
+ * It is okay once d/watch is prepared
217
+ * But, there are some pitfalls in d/watch
218
+
219
+ = Motivation again
220
+
221
+ * d/watch is useful
222
+ * But too complicated
223
+ * It should be more simple! (somehow)
224
+
225
+ = Agenda
226
+
227
+ * Who I am?
228
+ * Why I started to play with debian/watch?
229
+ * Introduction about debian/watch
230
+ * ((*The debian/watch current statistics*))
231
+ * Thought experiments about debian/watch
232
+ * Conclusion
233
+
234
+ = Why do we use statistics?
235
+
236
+ * We can't judge whether the idea is good or not
237
+ * Let's discuss based on ((*the fact (data)*))
238
+
239
+ = Collect d/watch data
240
+
241
+ * We have no data to judge
242
+ * But, we can use the API!
243
+ * ((<URL:https://sources.debian.org/doc/api/>))
244
+
245
+ = sources.d.o API documentation
246
+
247
+ # image
248
+ # relative-height = 100
249
+ # src = images/sources-d-o-api-documentation.png
250
+
251
+ = Collect package list
252
+
253
+ * Access package list API
254
+ * ((<URL:https://sources.debian.org/api/list>))
255
+ * You can use this API to collect ((*source*)) package list
256
+
257
+ = e.g. source package list
258
+
259
+ # image
260
+ # relative-height = 100
261
+ # src = images/sources-d-o-api-list-zoom.png
262
+
263
+ = Collect package info
264
+
265
+ * Access package info API
266
+ * Get suites information about package
267
+ * e.g. ((<URL:https://sources.debian.org/api/src/groonga/>))
268
+ * You can use this API to collect a specfic release package (e.g. collects sid only)
269
+
270
+ = e.g. Groonga package info
271
+
272
+ # image
273
+ # relative-height = 100
274
+ # src = images/sources-d-o-api-src-groonga-zoom.png
275
+
276
+ = Collect raw url
277
+
278
+ * Access file info API
279
+ * Get path to raw url
280
+ * e.g. ((<URL:https://sources.debian.org/api/src/groonga/latest/debian/watch/>))
281
+ (('&#x27a1;')) https://sources.debian.org/api/src/groonga/((*8.0.5-1*))/debian/watch/
282
+
283
+ = e.g. Groonga d/watch raw url
284
+
285
+ # image
286
+ # relative-height = 90
287
+ # src = images/sources-d-o-api-src-groonga-latest-debian-watch-zoom.png
288
+
289
+ = Collect d/watch
290
+
291
+ * Access file content
292
+ * Get raw content of d/watch
293
+ * e.g. ((<URL:https://sources.debian.org/data/main/g/groonga/8.0.5-1/debian/watch>))
294
+
295
+ = e.g. Groonga d/watch
296
+
297
+ # image
298
+ # relative-width = 100
299
+ # src = images/sources-d-o-api-groonga-debian-watch-zoom.png
300
+
301
+ = We are ready to collect data
302
+
303
+ * Collect source package list in unstable (API)
304
+ * Collect each d/watch if available (API)
305
+ * Analyze and Visualize data (Task)
306
+
307
+ = How to collect it?
308
+
309
+ * Use debsources-watch-crawler
310
+ * ((<URL:https://github.com/kenhys/debsources-watch-crawler.git>))
311
+ * Crawling d/watch and store into database (using Groonga)
312
+
313
+ = Parsing opts in d/watch
314
+
315
+ * Use Parse::Debian::Watch
316
+ * ((<URL:https://github.com/kenhys/perl-Parse-Debian-Watch.git>))
317
+ * Extracted parser code from scripts/uscan.pl
318
+
319
+ = Analyzing system components
320
+
321
+ # image
322
+ # relative-height = 100
323
+ # src = images/system-components.png
324
+
325
+ = NOTE
326
+
327
+ * The data for statistics is snapshot at 2018/7
328
+ * 39,074 source packages exists in debian
329
+ * 27,660 unstable source packages
330
+
331
+ = Some question about d/watch
332
+
333
+ * Is watch file used?
334
+ * Which version is used in package?
335
+ * What are the popular hosting sites?
336
+
337
+ = Is watch file used?
338
+
339
+ # image
340
+ # relative-height = 100
341
+ # src = images/group-by-watch-file.png
342
+
343
+
344
+ = What version are you using?
345
+
346
+ # image
347
+ # relative-height = 100
348
+ # src = images/group-by-watch-version.png
349
+
350
+ = Top 5 hosting covers 58%
351
+
352
+ # image
353
+ # relative-height = 100
354
+ # src = images/group-by-top5all-hosting.png
355
+
356
+ = Popular hosting?
357
+
358
+ # image
359
+ # relative-height = 100
360
+ # src = images/group-by-top5-hosting.png
361
+
362
+ = These graphs show
363
+
364
+ * 84% source packages already support d/watch.
365
+ * It seems that there is a room for optimizing for top 5 hosting sites
366
+
367
+ = What option is frequently used?
368
+
369
+ * Option is ...
370
+ * Not used
371
+ * Rarely used
372
+ * Sometimes used
373
+ * Often used
374
+
375
+ = Not used option
376
+
377
+ * bare: 0
378
+ * nopasv: 0
379
+ * hrefdecode: 0
380
+ * pretty: 0
381
+ * unzipopt: 0
382
+
383
+ = Rarely used
384
+
385
+ * user-agent: 3
386
+ * gitmode: 4
387
+ * dirversionmangle: 5
388
+ * date:9
389
+ * oversionmangle: 10
390
+
391
+ = Rarely used (2)
392
+
393
+ * component: 13
394
+ * decompress: 18
395
+ * versionmangle: 11
396
+ * passive: 30
397
+ * pagemangle: 31
398
+
399
+ = Sometimes used
400
+
401
+ * pasv: 120
402
+ * pgpmode: 175
403
+ * downloadurlmangle: 247
404
+ * mode: 249
405
+ * repack: 491
406
+ * compression: 489
407
+
408
+ = Often used
409
+
410
+ * repacksuffix: 1039
411
+ * pgpsigurlmangle: 1510
412
+ * uversionmangle: 3695
413
+ * dversionmangle: 3921
414
+ * filenamemangle: 4134
415
+
416
+ = What is the frequently used one?
417
+
418
+ # image
419
+ # relative-height = 100
420
+ # src = images/opts_frequency.png
421
+
422
+ = Thought experiments d/watch
423
+
424
+ * The facts
425
+ * Top 5 upstream hosting sites occupy 58%
426
+ * Opts option usage is very limited
427
+ * The estimations
428
+ * We can simplify d/watch by dropping support for not frequently used option
429
+
430
+ = Required information?
431
+
432
+ * Some information to be parsed
433
+ * Hosting
434
+ * Owner
435
+ * Project
436
+
437
+ = The new syntax idea
438
+
439
+ * Some information to be parsed
440
+ * Hosting (('&#x27a1;')) type=...
441
+ * Owner (('&#x27a1;')) owner=...
442
+ * Project (('&#x27a1;')) project=...
443
+
444
+ = e.g Diff between old and new rule
445
+
446
+ -version=4
447
+ +version=5
448
+
449
+ -opts=filenamemangle=s/.+\/v?(\d\S*)\.tar\.gz/fcitx-imlist-$1\.tar\.gz/
450
+ - https://github.com/kenhys/fcitx-imlist/tags .*/v?(\d\S*)\.tar\.gz
451
+ +type=github.com,owner=kenhys,project=fcitx-imlist
452
+
453
+ = e.g The new rule
454
+
455
+ version=5
456
+ type=github.com,owner=kenhys,project=fcitx-imlist
457
+
458
+ * e.g. ((<URL:https://github.com/kenhys/fcitx-imlist>))
459
+
460
+ = Pros
461
+
462
+ * for maintainer
463
+ * Easy to maitain
464
+ * It is flexible even though download url is changed (not domain change)
465
+ * It avoids pitfalls by common mistakes which is listed in wiki.d.o
466
+
467
+ = Cons
468
+
469
+ * for uscan developer
470
+ * It needs to fix uscan for each hosting sites
471
+ * The upstream uses minor hosting site, it can't migrate to the new rule until uscan supports
472
+ * It may lack the functionality in contrast to existing rules
473
+ * Traditinal and new style are needed to maitain
474
+
475
+ = Experiments
476
+
477
+ * We don't know whether new rule is practical enough
478
+ * Let's do experiment!
479
+
480
+ = Steps to verify
481
+
482
+ * 1. Modify uscan which supports new rule
483
+ * 2. Download the source package
484
+ * 3. Revert to the previous release for uscan
485
+ * 4. Uscan with current and modified rule
486
+ * 5. Compare ((*dehs*)) result
487
+
488
+ = Dehs?
489
+
490
+ * Debian External Health Status
491
+ * ((<URL:https://wiki.debian.org/DHES>))
492
+ * Machine readable output of uscan
493
+ * It's easy to detect regression
494
+ * Without regression, new rule has enough functionality!
495
+
496
+ = Test case
497
+
498
+ * New rule for GitHub
499
+ * The typical use case
500
+ * (('del:New rule for OSDN'))
501
+ * The minior use case
502
+ * It needs more work (Currently in modified version, dehs output is broken)
503
+
504
+ = The new rule for GitHub
505
+
506
+ version=5
507
+ type=github.com,owner=kenhys,project=fcitx-imlist
508
+
509
+ = How to modify uscan
510
+
511
+ * Add a patch to scripts/uscan.pl
512
+ * Bump version to 5
513
+ * Add regular expression to parse a new rule
514
+ * Assign mangle to $options to emulate
515
+ * Repeat above steps to support more patterns
516
+ * ((<URL:https://salsa.debian.org/kenhys-guest/devscripts/tree/add-type-rule>))
517
+
518
+ = How good enough new d/watch rule?
519
+
520
+ * DEMO
521
+ * The new rule for fcitx-imlist (GitHub)
522
+
523
+ = Conclusion
524
+
525
+ * There is a bit redundant case in d/watch
526
+ * d/watch can be simplified by new d/watch rule
527
+ * But not fully verified yet. It needs more testing!
528
+ * Feedback is welcome!
529
+
530
+ = Q. What about fakeupstream.cgi?
531
+
532
+ * fakeupstream.cgi returns only list of releases, so it is not useful to simplify the rule
533
+
534
+ = Q. What about redirector?
535
+
536
+ * Yes, you are right. But it needs to be supported in server side and uscan side
537
+ * The new rule only requires to implemented in uscan