codnar 0.1.64

Sign up to get free protection for your applications and to get access to all the features.
Files changed (80) hide show
  1. data/ChangeLog +165 -0
  2. data/LICENSE +19 -0
  3. data/README.rdoc +32 -0
  4. data/Rakefile +66 -0
  5. data/bin/codnar-split +5 -0
  6. data/bin/codnar-weave +5 -0
  7. data/codnar.html +10945 -0
  8. data/doc/logo.png +0 -0
  9. data/doc/root.html +22 -0
  10. data/doc/story.markdown +180 -0
  11. data/doc/system.markdown +671 -0
  12. data/lib/codnar.rb +41 -0
  13. data/lib/codnar/application.rb +92 -0
  14. data/lib/codnar/cache.rb +61 -0
  15. data/lib/codnar/data/contents.js +113 -0
  16. data/lib/codnar/data/control_chunks.js +44 -0
  17. data/lib/codnar/data/style.css +95 -0
  18. data/lib/codnar/data/sunlight/README.txt +4 -0
  19. data/lib/codnar/data/sunlight/css-min.js +1 -0
  20. data/lib/codnar/data/sunlight/default.css +236 -0
  21. data/lib/codnar/data/sunlight/javascript-min.js +1 -0
  22. data/lib/codnar/data/sunlight/min.js +1 -0
  23. data/lib/codnar/data/sunlight/ruby-min.js +1 -0
  24. data/lib/codnar/data/yui/README.txt +3 -0
  25. data/lib/codnar/data/yui/base.css +132 -0
  26. data/lib/codnar/data/yui/reset.css +142 -0
  27. data/lib/codnar/formatter.rb +180 -0
  28. data/lib/codnar/grouper.rb +28 -0
  29. data/lib/codnar/gvim.rb +132 -0
  30. data/lib/codnar/hash_extensions.rb +41 -0
  31. data/lib/codnar/markdown.rb +47 -0
  32. data/lib/codnar/merger.rb +138 -0
  33. data/lib/codnar/rake.rb +41 -0
  34. data/lib/codnar/rake/split_task.rb +71 -0
  35. data/lib/codnar/rake/weave_task.rb +59 -0
  36. data/lib/codnar/rdoc.rb +9 -0
  37. data/lib/codnar/reader.rb +121 -0
  38. data/lib/codnar/scanner.rb +216 -0
  39. data/lib/codnar/split.rb +58 -0
  40. data/lib/codnar/split_configurations.rb +367 -0
  41. data/lib/codnar/splitter.rb +32 -0
  42. data/lib/codnar/string_extensions.rb +25 -0
  43. data/lib/codnar/sunlight.rb +17 -0
  44. data/lib/codnar/version.rb +8 -0
  45. data/lib/codnar/weave.rb +58 -0
  46. data/lib/codnar/weave_configurations.rb +48 -0
  47. data/lib/codnar/weaver.rb +105 -0
  48. data/lib/codnar/writer.rb +38 -0
  49. data/test/cache_computations.rb +41 -0
  50. data/test/deep_merge.rb +29 -0
  51. data/test/embed_images.rb +12 -0
  52. data/test/expand_markdown.rb +27 -0
  53. data/test/expand_rdoc.rb +20 -0
  54. data/test/format_code_gvim_configurations.rb +55 -0
  55. data/test/format_code_sunlight_configurations.rb +37 -0
  56. data/test/format_comment_configurations.rb +86 -0
  57. data/test/format_lines.rb +72 -0
  58. data/test/group_lines.rb +31 -0
  59. data/test/gvim_highlight_syntax.rb +49 -0
  60. data/test/identify_chunks.rb +32 -0
  61. data/test/lib/test_with_configurations.rb +15 -0
  62. data/test/merge_lines.rb +133 -0
  63. data/test/rake_tasks.rb +38 -0
  64. data/test/read_chunks.rb +110 -0
  65. data/test/run_application.rb +56 -0
  66. data/test/run_split.rb +38 -0
  67. data/test/run_weave.rb +75 -0
  68. data/test/scan_lines.rb +78 -0
  69. data/test/split_chunk_configurations.rb +55 -0
  70. data/test/split_code.rb +109 -0
  71. data/test/split_code_configurations.rb +73 -0
  72. data/test/split_combined_configurations.rb +114 -0
  73. data/test/split_complex_comment_configurations.rb +73 -0
  74. data/test/split_documentation.rb +92 -0
  75. data/test/split_documentation_configurations.rb +97 -0
  76. data/test/split_simple_comment_configurations.rb +50 -0
  77. data/test/sunlight_highlight_syntax.rb +25 -0
  78. data/test/weave_configurations.rb +144 -0
  79. data/test/write_chunks.rb +28 -0
  80. metadata +363 -0
Binary file
@@ -0,0 +1,22 @@
1
+ <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
2
+ <html xmlns="http://www.w3.org/1999/xhtml">
3
+ <head>
4
+ <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
5
+ <title>Code narrator - an inverse literate programming tool.</title>
6
+ <style type="text/css">
7
+ <embed src="codnar/data/yui/reset.css" type="x-codnar/file"/>
8
+ <embed src="codnar/data/yui/base.css" type="x-codnar/file"/>
9
+ <embed src="codnar/data/style.css" type="x-codnar/file"/>
10
+ </style>
11
+ </head>
12
+ <body>
13
+ <div id="contents"></div>
14
+ <embed src="README.rdoc" type="x-codnar/include"/>
15
+ <embed src="doc/story.markdown" type="x-codnar/include"/>
16
+ <embed src="doc/system.markdown" type="x-codnar/include"/>
17
+ <script type="text/javascript">
18
+ <embed src="codnar/data/contents.js" type="x-codnar/file"/>
19
+ <embed src="codnar/data/control_chunks.js" type="x-codnar/file"/>
20
+ </script>
21
+ </body>
22
+ </html>
@@ -0,0 +1,180 @@
1
+ ## The Story ##
2
+
3
+ This is the story of the Code Narrator (Codnar) tool. It serves a dual purpose.
4
+ It describes the Codnar tool itself, but it also serves as an example of why it
5
+ exists in the first place. To explain this more fully, we'll have to make a
6
+ little detour into the issue of system documentation.
7
+
8
+ ### The Documentation Problem ###
9
+
10
+ Documentation for any system can be grouped to two kinds. The first kind is the
11
+ reference manual. If you know of a small piece of the system, this kind of
12
+ documentation will give you the details about it. A good reference will help
13
+ you find this piece even if you only have a rough idea of what it is named. A
14
+ really good reference will also link it to related pieces. A great reference
15
+ will even give you example of how to use the related pieces in a realistic
16
+ context.
17
+
18
+ Reference manuals are invaluable, and there are plenty of tools to help you
19
+ create them. The common approach is the use of structured comments (e.g.,
20
+ [JavaDoc](http://en.wikipedia.org/wiki/Javadoc),
21
+ [Doxygen](http://en.wikipedia.org/wiki/Doxygen), and a [host of similar
22
+ tools](http://en.wikipedia.org/wiki/Comparison_of_documentation_generators)).
23
+ However, reference manuals by themselves are insufficient.
24
+
25
+ A reference manual only works if you have some idea about how the system works
26
+ as a whole. For that, you need some sort of overview. Here there is much less
27
+ to help you produce good documentation. The common practice is to sprinkle
28
+ small tutorials inside your reference documentation (the [MSDN
29
+ library](http://msdn.microsoft.com/en-us/library) is a good example). This
30
+ doesn't really solve the problem: how do you sufficiently explain a complex new
31
+ system, so that references and small tutorials become useful?
32
+
33
+ One possible solution to this problem, [literate
34
+ programming](http://en.wikipedia.org/wiki/Literate_programming), was proposed
35
+ by Knuth. In a nutshell, the idea was that the source code for the system
36
+ fulfilled a dual role. You could compile it into the executable code, as
37
+ expected. But you could also generate documentation from it.
38
+
39
+ So far this sounds a lot like structured comments, and indeed structured
40
+ comments were inspired by literate programming. The key difference between the
41
+ two approaches is that in literate programming, the generated documentation was
42
+ not a reference manual. It was a linear narrative describing the system - a
43
+ story which walked you through the system in an specific path chosen for
44
+ optimal presentation.
45
+
46
+ To achieve this, the sources contained the linear documentation, with embedded
47
+ code "chunks". The order of the chunks in the sources was determined by the
48
+ narrative, not the programming language requirements. Extracting and
49
+ re-ordering these chunks was part of the build process, so the regular compiler
50
+ could process them as usual.
51
+
52
+ This was the great strength, but also the great weakness, of literate
53
+ programming. For example, it is next to impossible to create IDEs and similar
54
+ tools for literate programming source code. The code chunks are split any which
55
+ way and spread around the source files in any order; the same source file may
56
+ contain chunks in several languages; etc. Automatically figuring out, say, the
57
+ list of members of some class would be a daunting task.
58
+
59
+ In contrast, structured comments stay out of the way of the IDE and similar
60
+ tools. The source code is still structured exactly the way the compiler wants,
61
+ which allows for easy, localized processing. The trade-off, of course, is that
62
+ structured comments produce a reference manual, not a narrative.
63
+
64
+ Today, structured comments have taken over the coding world, and literate
65
+ programming has all but been forgotten. The problem it tried to solve, however,
66
+ is still very much with us. How do we explain a new complex system?
67
+
68
+ ### A Different Approach ###
69
+
70
+ Codnar is an example of a different approach for solving this problem, "inverse
71
+ literate programming" (similar to, for example,
72
+ [antiweb](http://packages.python.org/antiweb/)). This approach is a combination
73
+ of structured comments and literate programming. Note that this approach is
74
+ similar to, but different in key aspects from, [reverse literate
75
+ programming](http://ssw.jku.at/Research/Projects/RevLitProg/).
76
+
77
+ In inverse literate programming, the source files are organized just
78
+ the way the compiler, IDE, and similar tools expect them to be. Structured
79
+ comments are used to document the pieces of code, and a reference manual can be
80
+ generated from the sources as usual.
81
+
82
+ In addition, the code is split into (possibly nested) named "chunks". This is
83
+ done using specially formatted comments. It turns out this functionality is
84
+ already supported by most coding editors and IDEs, in the form of "folds" or
85
+ "regions". These allow the developer to collapse or expand such chunks at will.
86
+
87
+ At this point, inverse literate programming kicks in. The developer writes
88
+ additional documentation source files, next to the usual code source files.
89
+ These documentation source files contain a narrative that describes the system,
90
+ much in the same way that a literate programming documentation would have done,
91
+ with two important differences.
92
+
93
+ The first difference is that the documentation source files refer to and embed
94
+ the code chunks (using their names), as opposed to a literate programming
95
+ system, where the documentation source files actually contain the code chunks.
96
+
97
+ The second difference is that the documentation source files do not need to
98
+ repeat the information that is already covered in the structured comments. When
99
+ a code chunk is embedded into the documentation, it includes these comments, so
100
+ all the documentation source files need to contain is the narrative "glue" for
101
+ placing these pieces into a comprehensible context for the reader.
102
+
103
+ In this way, inverse literate programming allows generating a linear narrative
104
+ describing the system, without abandoning the existing code processing tools.
105
+ It also makes it easy to retrofit such documentation to an existing code base;
106
+ all that's needed is to mark the already-documented code chunks (or even just
107
+ treat each source code file as a single chunk), and provide the narrative glue
108
+ around them.
109
+
110
+ ### Maintaining the Documentation ###
111
+
112
+ Structutred comments have the advantage that they are easy to maintain. Every
113
+ time you change a piece of code, change its comment to match. Simiarlt,
114
+ literate programming forced one to maintain the documentation as well, since
115
+ the same source file was used for code and documentation. Inverse literate
116
+ programming does not share this advantage. The linear documentation is in a
117
+ separate file, so it isn't immediately visible to the developer who is making
118
+ the changes. Also, it is easy to just forget to include some chunks of code in
119
+ the documentation.
120
+
121
+ These issues are very similar to the issues of unit testing. Unit tests live in
122
+ a separate file from the code they test, and it is easy to forget to test some
123
+ chunks of code. One way to ensure all code is tested is to use a code coverage
124
+ tool. Similarly, inverse literate programming tools should complain about code
125
+ chunks that are left out of the final narrative.
126
+
127
+ A different approach,
128
+ [TDD](http://en.wikipedia.org/wiki/Test-driven_development), ensures that the
129
+ tests are up-to-date and complete by writing the tests before the code. The
130
+ same approach can be used for documentation.
131
+ [DDD](http://thinkingphp.org/spliceit/docs/0.1_alpha/pages/ddd_info.html) means
132
+ that you first document what you are about to do, and only then follow up with
133
+ the actual coding. Inverse literate programming and TDD are an excellent
134
+ practical way to achieve that.
135
+
136
+ The unit tests are code like any other code. As such, they should be documented
137
+ using structured comments. Certain unit test tools like
138
+ [RSpec](http://rspec.info/), [Cucumber](http://cukes.info/) and other
139
+ [BDD](http://en.wikipedia.org/wiki/Behavior_Driven_Development) tools blur the
140
+ line between the tests-as-code and the tests-as-documentation anyway, so the
141
+ amount of unit test structured documentation should be small.
142
+
143
+ Therefore, if you are writing the tests first, you have done the heavy lifting
144
+ of documenting what the new code will do. All that is left is providing a bit
145
+ of surrounding context and embedding it all in the currect location in the
146
+ narrative. Then, when you write the new code itself, it should be easy to
147
+ connect it to the narrative at the appropriate point.
148
+
149
+ In the case of Code Narrator itself, the number of (raw) lines in the code
150
+ library itself is ~2100 lines, the number of test code lines is ~2200 lines,
151
+ and the number of narrative documentation lines is only ~900 lines. Given
152
+ narrative documentation are easier to write than system (or test) code, this
153
+ indicates maintaining a narrative is not an unreasonable burden for a
154
+ well-tested project.
155
+
156
+ ### Code Narrator ###
157
+
158
+ Codnar is an inverse literate programming tool. It allows you to tell a story
159
+ about your system, which will explain it to others: developers, maintainers,
160
+ and/or users. It builds on the structured comments you would write anyway to
161
+ generate a reference manual for the system, requires minimal or no changes to
162
+ your source code files, and works perfectly well inside your favorite IDE or
163
+ editor. If you follow TDD or BDD, Codnar will make it easier for you to
164
+ complement it with DDD.
165
+
166
+ Codnar is available under the MIT license:
167
+
168
+ [[LICENSE|named_chunk_with_containers]]
169
+
170
+ And the current Codnar version is:
171
+
172
+ [[lib/codnar/version.rb|named_chunk_with_containers]]
173
+
174
+ The rest of this document goes into the details of Codnar's implementation. The
175
+ core of the system is the following simple data flow: A set of source files is
176
+ split into chunks; the chunks are woven into a single HTML. This simple flow
177
+ can be enhanced by pre-processing the sources, or post-processing the HTML. In
178
+ a realistic project, all this would be managed by some build tool; either using
179
+ the command-line (for arbitrary build tools) or using the provided Ruby classes
180
+ for Rake integration.
@@ -0,0 +1,671 @@
1
+ ## Splitting files into chunks ##
2
+
3
+ Codnar makes the reasonable assumption that each source file can be effectively
4
+ processed as a sequence of lines. This works well in practice for all "text"
5
+ source files. It fails miserably for "binary" source files, but such files
6
+ don't work that well in most generic source management tools (such as version
7
+ management systems).
8
+
9
+ A second, less obvious assumption is that it is possible to classify the source
10
+ file lines to "kinds" using a simple state machine. The classified lines are
11
+ then grouped into nested chunks based on the two special line kinds
12
+ `begin_chunk` and `end_chunk`. The other line kinds are used to control how the
13
+ lines are formatted into HTML.
14
+
15
+ The collected chunks, with the formatted HTML for each one, are then stored in
16
+ a chunks file to be used later for weaving the overall HTML narrative.
17
+
18
+ ### Scanning Lines ###
19
+
20
+ Scanning a file into classified lines is done by the `Scanner` class.
21
+ Here is a simple test that demonstrates using the scanner:
22
+
23
+ [[test/scan_lines.rb|named_chunk_with_containers]]
24
+
25
+ And here is the implementation:
26
+
27
+ [[lib/codnar/scanner.rb|named_chunk_with_containers]]
28
+
29
+ As we can see, the implementation is split into two main parts. First, all
30
+ shorthands in the syntax definition are expanded (possibly generating errors).
31
+ Then, the expanded syntax is applied to a file, to generate a sequence of
32
+ classified lines.
33
+
34
+ #### Scanner Syntax Shorthands ####
35
+
36
+ The syntax is expected to be written by hand in a YAML file. We therefore
37
+ provide some convenient shorthands (listed above) to make YAML syntax files
38
+ more readable. These shorthands must be expanded to their full form before we
39
+ can apply the syntax to a file. There are two sets of shorthands we need to
40
+ expand:
41
+
42
+ * [[Scanner pattern shorthands|named_chunk_with_containers]]
43
+
44
+ * [[Scanner state shorthands|named_chunk_with_containers]]
45
+
46
+ The above code modifies the syntax object in place. This is safe because we are
47
+ working on a `deep_clone` of the original syntax:
48
+
49
+ [[lib/codnar/hash_extensions.rb|named_chunk_with_containers]]
50
+
51
+ #### Classifying Source Lines ####
52
+
53
+ Scanning a file to classified lines is a simple matter of applying the current
54
+ state transitions to each line:
55
+
56
+ [[Scanner file processing|named_chunk_with_containers]]
57
+
58
+ If a line matches a state transition, it is classified accordingly. Otherwise,
59
+ it is reported as an error:
60
+
61
+ [[Scanner line processing|named_chunk_with_containers]]
62
+
63
+ ### Merging scanned lines to chunks ###
64
+
65
+ Once we have the array of scanned classified lines, we need to merge them into
66
+ nested chunks. Here is a simple test that demonstrates using the merger:
67
+
68
+ [[test/merge_lines.rb|named_chunk_with_containers]]
69
+
70
+ And here is the implementation:
71
+
72
+ [[lib/codnar/merger.rb|named_chunk_with_containers]]
73
+
74
+ #### Merging nested chunk lines ####
75
+
76
+ To merge the nested chunk lines, we maintain a stack of the current chunks.
77
+ Each `begin_chunk` line pushes another chunk on the stack, and each `end_chunk`
78
+ line pops it. If any chunks are not properly terminated, they will remain in
79
+ the stack when all the lines are processed.
80
+
81
+ [[Merging nested chunk lines|named_chunk_with_containers]]
82
+
83
+ #### Unindenting merged chunk lines ####
84
+
85
+ Nested chunks are typically indented relative to their container chunks.
86
+ However, in the generated documentation, these chunks are displayed on their
87
+ own, and preserving this relative indentation would reduce their readability.
88
+ We therefore unindent all chunks as much as possible as the final step.
89
+
90
+ [[Unindenting chunk lines|named_chunk_with_containers]]
91
+
92
+ ### Generating chunk HTML ###
93
+
94
+ Now that we have each chunk's lines, we need to convert them to HTML.
95
+
96
+ #### Grouping lines of the same kind ####
97
+
98
+ Instead of formatting each line on its own, we batch the operations to work on
99
+ all lines of the same kind at once. Here is a simple test that demonstrates
100
+ using the grouper:
101
+
102
+ [[test/group_lines.rb|named_chunk_with_containers]]
103
+
104
+ And here is the implementation:
105
+
106
+ [[lib/codnar/grouper.rb|named_chunk_with_containers]]
107
+
108
+ #### Formatting lines as HTML ####
109
+
110
+ Formatting is based on a configuration that specifies, for (a group of) lines
111
+ of each kind, how to convert it to HTML. Here is a simple test that
112
+ demonstrates using the formatter:
113
+
114
+ [[test/format_lines.rb|named_chunk_with_containers]]
115
+
116
+ And here is the implementation:
117
+
118
+ [[lib/codnar/formatter.rb|named_chunk_with_containers]]
119
+
120
+ #### Basic formatters ####
121
+
122
+ The implementation contains some basic formatting functions. These are
123
+ sufficient for generic source code processing.
124
+
125
+ [[Basic formatters|named_chunk_with_containers]]
126
+
127
+ #### Markup formats ####
128
+
129
+ The `markup_lines_to_html` formatter above relies on the existence of a class
130
+ for converting comments from the specific markup format to HTML. Currently, two
131
+ such formats are supported:
132
+
133
+ * RDoc, the default markup format used in Ruby comments. Here is a simple test
134
+ that demonstrates using RDoc:
135
+
136
+ [[test/expand_rdoc.rb|named_chunk_with_containers]]
137
+
138
+ And here is the implementation:
139
+
140
+ [[lib/codnar/rdoc.rb|named_chunk_with_containers]]
141
+
142
+ * Markdown, a generic markup syntax used across many systems and languages.
143
+ Here is a simple test that demonstrates using Markdown:
144
+
145
+ [[test/expand_markdown.rb|named_chunk_with_containers]]
146
+
147
+ And here is the implementation:
148
+
149
+ [[lib/codnar/markdown.rb|named_chunk_with_containers]]
150
+
151
+ In both cases, the HTML generated by the markup format conversion is a bit
152
+ messy. We therefore clean it up:
153
+
154
+ [[Clean html|named_chunk_with_containers]]
155
+
156
+ #### Syntax highlighting using GVIM ####
157
+
158
+ If you have `gvim` istalled, it is possible to use it to generate syntax
159
+ highlighting. This is a *slow* operation, as `gvim` was never meant to be used
160
+ as a command-line tool. However, what it lacks in speed it compensates for in
161
+ scope; almost any language you can think of has a `gvim` syntax highlighting
162
+ definition. Here is a simple test that demonstrates using `gvim` for syntax
163
+ highlighting:
164
+
165
+ [[test/gvim_highlight_syntax.rb|named_chunk_with_containers]]
166
+
167
+ And here is the implementation:
168
+
169
+ [[lib/codnar/gvim.rb|named_chunk_with_containers]]
170
+
171
+ Since GVim is so slow, we are using caching to minimize the time it takes to
172
+ recompute the same code's highlighted HTML. This is pretty useful in practice -
173
+ making changes in one chunk in a file will not require recomputing the
174
+ highlighting for any of the unchanged chunks in the same file. Here is a simple
175
+ test of using the caching functionality:
176
+
177
+ [[test/cache_computations.rb|named_chunk_with_containers]]
178
+
179
+ And here is the implementation:
180
+
181
+ [[lib/codnar/cache.rb|named_chunk_with_containers]]
182
+
183
+ #### Syntax highlighting using Sunlight ####
184
+
185
+ [Sunlight](http://sunlightjs.com/) offers a different approach for syntax
186
+ highlighting. Instead of pre-processing the code to generate highlighted HTML
187
+ while splitting, it provides Javascript files that examine the textual code in
188
+ the DOM and convert it to highlighted HTML in the browser. This takes virtually
189
+ no time when splitting the code, but requires recomputing highlighting for all
190
+ the code chunks every time the HTML file is loaded. This can be pretty slow,
191
+ especially if using a browser with a slow Javascript engine, like IE. However,
192
+ given how slow GVIM is, this is a reasonable trade-off, at least for small
193
+ projects. Since Sunlight is a new project, it doesn't offer the extensive
194
+ coverage of different programming languages supported by GVIM.
195
+
196
+ Here is a simple test that demonstrates using Sunlight for syntax highlighting:
197
+
198
+ [[test/sunlight_highlight_syntax.rb|named_chunk_with_containers]]
199
+
200
+ And here is the implementation:
201
+
202
+ [[lib/codnar/sunlight.rb|named_chunk_with_containers]]
203
+
204
+ ### Putting it all together ###
205
+
206
+ Now that we have all the separate pieces of functionality for splitting source
207
+ files into HTML chunks, we need to combine them to a single convenient service.
208
+
209
+ #### Splitting code files ####
210
+
211
+ Here is a simple test that demonstrates using the splitter for source code
212
+ files:
213
+
214
+ [[test/split_code.rb|named_chunk_with_containers]]
215
+
216
+ And here is the implementation:
217
+
218
+ [[lib/codnar/splitter.rb|named_chunk_with_containers]]
219
+
220
+ #### Splitting documentation files ####
221
+
222
+ The narrative documentation is expected to reside in one or more files, which
223
+ are also "split" to a single chunk each. Having both documentation and code
224
+ exist as chunks allows for uniform treatment of both when weaving, as well as
225
+ allowing for pre-processing the documentation files, if necessary. For example,
226
+ Codnar currently supports for documentation the same two markup formats that
227
+ are also supported for code comments. Here is a simple test that demonstrates
228
+ "splitting" documentation (using the same implementation as above):
229
+
230
+ [[test/split_documentation.rb|named_chunk_with_containers]]
231
+
232
+ ### Built-in configurations ###
233
+
234
+ The splitting mechanism defined above is pretty generic. To apply it to a
235
+ specific system requires providing the appropriate configuration. The system
236
+ provides a few specific built-in configurations which may be useful "out of the
237
+ box".
238
+
239
+ If one is willing to give up altogether on syntax highlighting and comment
240
+ formatting, the system would be applicable as-is to any programming language.
241
+ Properly highlighting almost any known programming language syntax would be a
242
+ simple matter of passing the correct syntax parameter to GVIM.
243
+
244
+ Properly formatting comments in additional mark-up formats would be trickier.
245
+ First, a proper pattern needs to be established for extracting the comments
246
+ (`/*`, `//`, `--`, etc.). Them, the results need to be converted to HTML. One
247
+ way would be to pass them through GVim syntax highlighting with an appropriate
248
+ format (e.g, `syntax=doxygen`). Another would be to invoke some Ruby library;
249
+ finally, one could invoke some external tool to do the job. The latter two
250
+ options would require providing additional glue Ruby code, similar to the GVim
251
+ class above.
252
+
253
+ At any rate, here are the built-in configurations:
254
+
255
+ [[lib/codnar/split_configurations.rb|named_chunk_with_containers]]
256
+
257
+ #### Combining configurations ####
258
+
259
+ Different source files require different overall configurations but reuse
260
+ common building blocks. To support it, we allow comfigurations to be combined
261
+ using a "deep merge". This allows complex nested structures to be merged. There
262
+ is even a way for arrays to append elements before/after the array they are
263
+ merged with. Here is a simple test that demonstrates deep-merging complex
264
+ structures:
265
+
266
+ [[test/deep_merge.rb|named_chunk_with_containers]]
267
+
268
+ Here is the implementation:
269
+
270
+ [[Deep merge|named_chunk_with_containers]]
271
+
272
+ And here is a test module that automates the process of merging configurations
273
+ and invoking the Splitter:
274
+
275
+ [[test/lib/test_with_configurations.rb|named_chunk_with_containers]]
276
+
277
+ #### Documentation "splitting" ####
278
+
279
+ These are pretty simple configurations, applicable to files containing a piece
280
+ of the narrative in some supported format. These configurations typically do
281
+ not require to be combined with other configurations. Here is a simple test
282
+ that demonstrates "splitting" documentation:
283
+
284
+ [[test/split_documentation_configurations.rb|named_chunk_with_containers]]
285
+
286
+ And here are the actual configurations:
287
+
288
+ [[Documentation "splitting" configurations|named_chunk_with_containers]]
289
+
290
+ #### Source code lines classification ####
291
+
292
+ Splitting source code files is a more complex affair, which does typically
293
+ require combining several configurations. The basic configuration marks all
294
+ lines as belonging to some code syntax, as a single chunk:
295
+
296
+ [[Source code lines classification configurations|named_chunk_with_containers]]
297
+
298
+ Sometimes, a code in one syntax contains nested "islands" of code in another
299
+ syntax. Here is a simple configuration to support that, which can be combined
300
+ with the above basic configuration:
301
+
302
+ [[Nested foreign syntax code islands configurations|named_chunk_with_containers]]
303
+
304
+ Here is a simple test demonstrating using source code lines classifications:
305
+
306
+ [[test/split_code_configurations.rb|named_chunk_with_containers]]
307
+
308
+ #### Simple comment classification ####
309
+
310
+ Many languages use a simple comment syntax, where some prefix indicates a
311
+ comment that spans until the end of the line (e.g., shell `#` comments or C++
312
+ `//` comments).
313
+
314
+ [[Simple comment classification configurations|named_chunk_with_containers]]
315
+
316
+ Here is a simple test demonstrating using simple comment classifications:
317
+
318
+ [[test/split_simple_comment_configurations.rb|named_chunk_with_containers]]
319
+
320
+ #### Complex comment classification ####
321
+
322
+ Other languages use a complex multi-line comment syntax, where some prefix
323
+ indicates the beginning of the comment, some suffix indicates the end, and by
324
+ convention some prefix is expected for the inner comment lines (e.g., C's
325
+ "`/*`", "` *`", "`*/`" comments or HTML's "`<!--`", "` -`", "`-->`" comments).
326
+
327
+ [[Complex comment classification configurations|named_chunk_with_containers]]
328
+
329
+ Here is a simple test demonstrating using complex comment classifications:
330
+
331
+ [[test/split_complex_comment_configurations.rb|named_chunk_with_containers]]
332
+
333
+ #### Comment formatting ####
334
+
335
+ In many cases, the text inside comments is written using some markup format
336
+ (e.g., RDoc for Ruby or JavaDoc for Java). Currently, two such formats are
337
+ supported, as well as simply wrapping the comment in an HTML pre element:
338
+
339
+ [[Comment formatting configurations|named_chunk_with_containers]]
340
+
341
+ Here is a simple test demonstrating formatting comment contents:
342
+
343
+ [[test/format_comment_configurations.rb|named_chunk_with_containers]]
344
+
345
+ #### Syntax highlighting using GVim ####
346
+
347
+ Supporting a specific programming language (other than dealing with comments)
348
+ is very easy using GVim for syntax highlighting, as demonstrated here:
349
+
350
+ [[GVim syntax highlighting formatting configurations|named_chunk_with_containers]]
351
+
352
+ Here is a simple test demonstrating highlighting code syntax using `gvim`:
353
+
354
+ [[test/format_code_gvim_configurations.rb|named_chunk_with_containers]]
355
+
356
+ #### Syntax highlighting using Sunlight ####
357
+
358
+ For small projects in languages supported by Sunlight, you may choose to use
359
+ it instead of GVIM
360
+
361
+ [[Sunlight syntax highlighting formatting configurations|named_chunk_with_containers]]
362
+
363
+ Here is a simple test demonstrating highlighting code syntax using Sunlight:
364
+
365
+ [[test/format_code_sunlight_configurations.rb|named_chunk_with_containers]]
366
+
367
+ #### Chunk splitting ####
368
+
369
+ There are many ways to denote code "regions" (which become Codnar chunks). The
370
+ following covers GVim's default scheme; others are easily added. It is safest
371
+ to merge this configuration as the last of all the combined configurations, to
372
+ ensure its patterns end up before any others.
373
+
374
+ [[Chunk splitting configurations|named_chunk_with_containers]]
375
+
376
+ Here is a simple test demonstrating splitting code chunks:
377
+
378
+ [[test/split_chunk_configurations.rb|named_chunk_with_containers]]
379
+
380
+ ### Putting it all together ###
381
+
382
+ Here is a test demonstrating putting several of the above configurations
383
+ together in a meaningful way:
384
+
385
+ [[test/split_combined_configurations.rb|named_chunk_with_containers]]
386
+
387
+ ## Storing chunks on the disk ##
388
+
389
+ ### Writing chunks to disk ###
390
+
391
+ In any realistic system, the number of source files and chunks will be such
392
+ that it makes sense to store the chunks on the disk for further processing.
393
+ This allows incorporating the split operation as part of a build tool chain,
394
+ and only re-splitting modified files. Here is a simple test demonstrating
395
+ writing chunks to the disk:
396
+
397
+ [[test/write_chunks.rb|named_chunk_with_containers]]
398
+
399
+ And here is the implementation:
400
+
401
+ [[lib/codnar/writer.rb|named_chunk_with_containers]]
402
+
403
+ ### Reading chunks to memory ###
404
+
405
+ Having written the chunks to the disk requires us, at some following point in
406
+ time, to read them back into memort. This is the first time we will have a view
407
+ of the whole documented system, which allows us to detect several classes of
408
+ consistency errors: Some chunks may be left out of the final narrative
409
+ (consider this the equivalent of tests code coverage); we may be referring to
410
+ missing (or misspelled) chunk names; and, finally, we need to deal with
411
+ duplicate chunks.
412
+
413
+ In literate programming, it is trivial to write a chunk once and use it in
414
+ several places in the compiled source code. The classical example is C/C++
415
+ function signatures that need to appear in both the `.h` and `.c`/`.cpp` files.
416
+ However, in some cases this practice makes sense for other pieces of code, and
417
+ since the ultimate source code contains only one copy of the chunk, this does
418
+ not suffer from the typical copy-and-paste issues.
419
+
420
+ In inverse literate programming, if the same code appears twice (as a result of
421
+ copy-and-paste), then it does suffer from the typical copy-and-paste issues.
422
+ The most serious of these is, of course, that when only one copy is changed.
423
+ The way that Codnar helps alleviate this problem is that if the same chunk
424
+ appears more than once in the source code, its content is expected to be
425
+ exactly the same in both cases (up to indentation). This should not be viewed
426
+ as endorsement of copy-and-paste programming; Using duplicate chunks should be
427
+ a last resort measure to combat restrictions in the programming language and
428
+ compilation tool chain.
429
+
430
+ #### Chunk identifiers ####
431
+
432
+ The above definition raises the obvious question: what does "the same chunk"
433
+ mean? As far as Codnar is concerned, a chunk is uniquely identified by its
434
+ name, which is specified on the `begin_chunk` line. The unique identifier is
435
+ not the literal name but a transformation of it. This allows us to ignore
436
+ capitalization, white space, and any punctuation that may appear in the name.
437
+ It also allows us to use the resulting ID as an HTML anchor name, without
438
+ worrying about HTML's restictions on such names.
439
+
440
+ Here is a simple test demonstrating converting names to identifiers:
441
+
442
+ [[test/identify_chunks.rb|named_chunk_with_containers]]
443
+
444
+ And here is the implementation:
445
+
446
+ [[lib/codnar/string_extensions.rb|named_chunk_with_containers]]
447
+
448
+ #### In-memory chunks storage ####
449
+
450
+ Detecting unused and/or duplicate chunks requires us to have in-memory chunk
451
+ storage that tracks all chunks access. Here is a simple test demonstrating
452
+ reading chunks into the storage and handling the various error conditions
453
+ listed above:
454
+
455
+ [[test/read_chunks.rb|named_chunk_with_containers]]
456
+
457
+ And here is the implementation:
458
+
459
+ [[lib/codnar/reader.rb|named_chunk_with_containers]]
460
+
461
+ ## Weaving chunks into HTML ##
462
+
463
+ Assembling the final HTML requires combining both the narrative documentation
464
+ and source code chunks. This is done top-down starting at a "root"
465
+ documentation chunk and recursively embedding nested documentation and code
466
+ chunks into it.
467
+
468
+ ### Weaving chunks together ###
469
+
470
+ When embedding a documentation chunk inside another documentation chunk, things
471
+ are pretty easy - we just need to insert the embedded chunk HTML into the
472
+ containing chunk. When embedding a source code chunk into the documentation,
473
+ however, we may want to wrap it in some boilerplate HTML, providing a header,
474
+ footer, borders, links, etc. Therefore, the HTML syntax we use to embed a chunk
475
+ into the documentation is `<embed src="..." type="x-codnar/template-name"/>`.
476
+ The templates are normal ERB templates, except for the magical `file` and
477
+ `image` templates, described below.
478
+
479
+ At any rate, here is a simple test demonstrating applying different templates
480
+ to the embedded code chunks:
481
+
482
+ [[test/weave_configurations.rb|named_chunk_with_containers]]
483
+
484
+ Here is the implementation:
485
+
486
+ [[lib/codnar/weaver.rb|named_chunk_with_containers]]
487
+
488
+ And here are the pre-defined weaving template configurations:
489
+
490
+ [[lib/codnar/weave_configurations.rb|named_chunk_with_containers]]
491
+
492
+ #### Embedding files ####
493
+
494
+ The template named `file` is special in two ways. First, the `src` is given
495
+ special treatment. If it begins with a "`.`", it is assumed to be a normal path
496
+ name relative to the current working directory; otherwise, it is assumed to be
497
+ a name of a file packaged inside some gem and is searched for in Ruby's
498
+ `$LOAD_PATH`. This allows gems (such as Codnar itself) to provide such files to
499
+ be used in the woven documentation.
500
+
501
+ Second, the content of the file is simply embedded into the generated
502
+ documentation. This allows the documentation to be a stand-alone file,
503
+ including all the CSS and Javascript required for proper display.
504
+
505
+ [[Processing the file template|named_chunk_with_containers]]
506
+
507
+ See the `doc/root.html` file for plenty of examples of using this
508
+ functionality.
509
+
510
+ #### Embedding images ####
511
+
512
+ The `image` template is a specialization of the `file` template for dealing
513
+ with embedded images. The specified image file is embedded into the generated
514
+ HTML as an `img` tag, using a [data
515
+ URL](http://en.wikipedia.org/wiki/Data_URI_scheme). This is very useful for
516
+ small images, but is problematic when their size increase beyond
517
+ browser-specific limits.
518
+
519
+ Here is a simple test demonstrating processing embedded image files:
520
+
521
+ [[test/embed_images.rb|named_chunk_with_containers]]
522
+
523
+ Here is the implementation:
524
+
525
+ [[Processing Base64 embedded data images|named_chunk_with_containers]]
526
+
527
+ And here is a sample embedded image:
528
+
529
+ [[doc/logo.png|image]]
530
+
531
+ ## Invoking the functionality ##
532
+
533
+ There are two ways to invoke Codnar's functionality - from the command line,
534
+ and (for Ruby projects) as integrated Rake tasks.
535
+
536
+ ### Command Line Applications ###
537
+
538
+ Executable scripts (tests, command-line applications) start with a `require
539
+ 'codnar'` line to access to the full Codnar code. This also serves as a
540
+ convenient list of all of Codnar's parts and dependencies:
541
+
542
+ [[lib/codnar.rb|named_chunk_with_containers]]
543
+
544
+ The base command line Application class handles execution from the command
545
+ line, with the usual standard options, as well as some Codnar-specific ones:
546
+ the ability to specify configuration files and/or built-in configurations, and
547
+ the ability to include additional extension code triggered from these
548
+ configurations. Together, these allow configuring and extending Codnar's
549
+ behavior to cover the specific system's needs.
550
+
551
+ Here is a simple test demonstrating the standard Codnar application behavior:
552
+
553
+ [[test/run_application.rb|named_chunk_with_containers]]
554
+
555
+ And here is the implementation:
556
+
557
+ [[lib/codnar/application.rb|named_chunk_with_containers]]
558
+
559
+ #### Application for splitting files ####
560
+
561
+ Here is a simple test demonstrating invoking the command-line application for
562
+ splitting files:
563
+
564
+ [[test/run_split.rb|named_chunk_with_containers]]
565
+
566
+ Here is the implementation:
567
+
568
+ [[lib/codnar/split.rb|named_chunk_with_containers]]
569
+
570
+ And here is the actual command-line application script:
571
+
572
+ [[bin/codnar-split|named_chunk_with_containers]]
573
+
574
+ #### Application for weaving chunks ####
575
+
576
+ Here is a simple test demonstrating invoking the command-line application for
577
+ weaving chunk to HTML:
578
+
579
+ [[test/run_weave.rb|named_chunk_with_containers]]
580
+
581
+ Here is the implementation:
582
+
583
+ [[lib/codnar/weave.rb|named_chunk_with_containers]]
584
+
585
+ And here is the actual command-line application script:
586
+
587
+ [[bin/codnar-weave|named_chunk_with_containers]]
588
+
589
+ ### Rake Integration ###
590
+
591
+ For Ruby projects (or any other project using Rake), it is also possible to
592
+ invoke Codnar using Rake tasks. Here is a simple test demonstrating using the
593
+ Rake tasks:
594
+
595
+ [[test/rake_tasks.rb|named_chunk_with_containers]]
596
+
597
+ To use these tasks in a Rakefile, one needs to `require 'codnar/rake'`. The
598
+ code implements a singleton that holds the global state shared between tasks:
599
+
600
+ [[lib/codnar/rake.rb|named_chunk_with_containers]]
601
+
602
+ #### Task for splitting files ####
603
+
604
+ To split one or more files to chunks, create a new SplitTask. Multiple such
605
+ tasks may be created; this is required if different files need to be split
606
+ using different configurations.
607
+
608
+ [[lib/codnar/rake/split_task.rb|named_chunk_with_containers]]
609
+
610
+ #### Task for weaving chunks ####
611
+
612
+ To weave the chunks together, create a single WeaveTask.
613
+
614
+ [[lib/codnar/rake/weave_task.rb|named_chunk_with_containers]]
615
+
616
+ ## Building the Codnar gem ##
617
+
618
+ The following Rakefile is in charge of building the gem, with the help of some
619
+ tools described below.
620
+
621
+ [[Rakefile|named_chunk_with_containers]]
622
+
623
+ The generated HTML requires some tweaking to yield aesthetic, readable results.
624
+ This tweaking consists of using Javascript to control chunk visibility,
625
+ generating a table of content, and using CSS to make the HTML look better.
626
+
627
+ Here are the modified configurations for generating the correct HTML:
628
+
629
+ [[Codnar configurations|named_chunk_with_containers]]
630
+
631
+ ### Javascript chunk visibilty control ###
632
+
633
+ The following code injects visibility controls ("+"/"-" toggles) next to each
634
+ embedded code chunk. It also hides all the chunks by default; this increases
635
+ the readability of the overall narrative, turning it into a high-level summary.
636
+ Expanding the embedded code chunks allows the reader to delve into the details.
637
+
638
+ [[lib/codnar/data/control_chunks.js|named_chunk_with_containers]]
639
+
640
+ ### Javascript table of content ###
641
+
642
+ The following code is not very efficient or elegant but it does a basic job of
643
+ iunjecting a table of content into the generated HTML.
644
+
645
+ [[lib/codnar/data/contents.js|named_chunk_with_containers]]
646
+
647
+ ### CSS style ###
648
+
649
+ To avoid dealing with the different default styles used by different browsers,
650
+ we employ the YUI CSS [reset](http://developer.yahoo.com/yui/reset/) and
651
+ [base](http://developer.yahoo.com/yui/base/) files. Resetting and restoring the
652
+ default CSS styles is inelegant, but it is the only current way to get a
653
+ consistent presentation of HTML. Once this is out of the way, we apply styles
654
+ specific to our HTML. Some of these override the default styles established by
655
+ the base CSS file above. We do this instead of directly tweaking the base CSS
656
+ file, to allow easy upgrade to new versions if/when YUI release any.
657
+
658
+ [[lib/codnar/data/style.css|named_chunk_with_containers]]
659
+
660
+ ### Using Sunlight ###
661
+
662
+ When using Sunlight for syntax highlighting, we also need to include some CSS
663
+ and Javascript files to convert the classified `pre` elements into properly
664
+ marked-up HTML. We also need to invoke this Javascript code (a one-line
665
+ operations). Here is what such code might look like inside a Javascript block
666
+ of the generated HTML:
667
+
668
+ &lt;embed src="codnar/data/sunlight/min.js" type="x-codnar/file"/&gt;
669
+ &lt;embed src="codnar/data/sunlight/ruby-min.js" type="x-codnar/file"/&gt;
670
+ Sunlight.globalOptions.lineNumbers = false;
671
+ Sunlight.highlightAll();