codnar 0.1.64

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (80) hide show
  1. data/ChangeLog +165 -0
  2. data/LICENSE +19 -0
  3. data/README.rdoc +32 -0
  4. data/Rakefile +66 -0
  5. data/bin/codnar-split +5 -0
  6. data/bin/codnar-weave +5 -0
  7. data/codnar.html +10945 -0
  8. data/doc/logo.png +0 -0
  9. data/doc/root.html +22 -0
  10. data/doc/story.markdown +180 -0
  11. data/doc/system.markdown +671 -0
  12. data/lib/codnar.rb +41 -0
  13. data/lib/codnar/application.rb +92 -0
  14. data/lib/codnar/cache.rb +61 -0
  15. data/lib/codnar/data/contents.js +113 -0
  16. data/lib/codnar/data/control_chunks.js +44 -0
  17. data/lib/codnar/data/style.css +95 -0
  18. data/lib/codnar/data/sunlight/README.txt +4 -0
  19. data/lib/codnar/data/sunlight/css-min.js +1 -0
  20. data/lib/codnar/data/sunlight/default.css +236 -0
  21. data/lib/codnar/data/sunlight/javascript-min.js +1 -0
  22. data/lib/codnar/data/sunlight/min.js +1 -0
  23. data/lib/codnar/data/sunlight/ruby-min.js +1 -0
  24. data/lib/codnar/data/yui/README.txt +3 -0
  25. data/lib/codnar/data/yui/base.css +132 -0
  26. data/lib/codnar/data/yui/reset.css +142 -0
  27. data/lib/codnar/formatter.rb +180 -0
  28. data/lib/codnar/grouper.rb +28 -0
  29. data/lib/codnar/gvim.rb +132 -0
  30. data/lib/codnar/hash_extensions.rb +41 -0
  31. data/lib/codnar/markdown.rb +47 -0
  32. data/lib/codnar/merger.rb +138 -0
  33. data/lib/codnar/rake.rb +41 -0
  34. data/lib/codnar/rake/split_task.rb +71 -0
  35. data/lib/codnar/rake/weave_task.rb +59 -0
  36. data/lib/codnar/rdoc.rb +9 -0
  37. data/lib/codnar/reader.rb +121 -0
  38. data/lib/codnar/scanner.rb +216 -0
  39. data/lib/codnar/split.rb +58 -0
  40. data/lib/codnar/split_configurations.rb +367 -0
  41. data/lib/codnar/splitter.rb +32 -0
  42. data/lib/codnar/string_extensions.rb +25 -0
  43. data/lib/codnar/sunlight.rb +17 -0
  44. data/lib/codnar/version.rb +8 -0
  45. data/lib/codnar/weave.rb +58 -0
  46. data/lib/codnar/weave_configurations.rb +48 -0
  47. data/lib/codnar/weaver.rb +105 -0
  48. data/lib/codnar/writer.rb +38 -0
  49. data/test/cache_computations.rb +41 -0
  50. data/test/deep_merge.rb +29 -0
  51. data/test/embed_images.rb +12 -0
  52. data/test/expand_markdown.rb +27 -0
  53. data/test/expand_rdoc.rb +20 -0
  54. data/test/format_code_gvim_configurations.rb +55 -0
  55. data/test/format_code_sunlight_configurations.rb +37 -0
  56. data/test/format_comment_configurations.rb +86 -0
  57. data/test/format_lines.rb +72 -0
  58. data/test/group_lines.rb +31 -0
  59. data/test/gvim_highlight_syntax.rb +49 -0
  60. data/test/identify_chunks.rb +32 -0
  61. data/test/lib/test_with_configurations.rb +15 -0
  62. data/test/merge_lines.rb +133 -0
  63. data/test/rake_tasks.rb +38 -0
  64. data/test/read_chunks.rb +110 -0
  65. data/test/run_application.rb +56 -0
  66. data/test/run_split.rb +38 -0
  67. data/test/run_weave.rb +75 -0
  68. data/test/scan_lines.rb +78 -0
  69. data/test/split_chunk_configurations.rb +55 -0
  70. data/test/split_code.rb +109 -0
  71. data/test/split_code_configurations.rb +73 -0
  72. data/test/split_combined_configurations.rb +114 -0
  73. data/test/split_complex_comment_configurations.rb +73 -0
  74. data/test/split_documentation.rb +92 -0
  75. data/test/split_documentation_configurations.rb +97 -0
  76. data/test/split_simple_comment_configurations.rb +50 -0
  77. data/test/sunlight_highlight_syntax.rb +25 -0
  78. data/test/weave_configurations.rb +144 -0
  79. data/test/write_chunks.rb +28 -0
  80. metadata +363 -0
Binary file
@@ -0,0 +1,22 @@
1
+ <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
2
+ <html xmlns="http://www.w3.org/1999/xhtml">
3
+ <head>
4
+ <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
5
+ <title>Code narrator - an inverse literate programming tool.</title>
6
+ <style type="text/css">
7
+ <embed src="codnar/data/yui/reset.css" type="x-codnar/file"/>
8
+ <embed src="codnar/data/yui/base.css" type="x-codnar/file"/>
9
+ <embed src="codnar/data/style.css" type="x-codnar/file"/>
10
+ </style>
11
+ </head>
12
+ <body>
13
+ <div id="contents"></div>
14
+ <embed src="README.rdoc" type="x-codnar/include"/>
15
+ <embed src="doc/story.markdown" type="x-codnar/include"/>
16
+ <embed src="doc/system.markdown" type="x-codnar/include"/>
17
+ <script type="text/javascript">
18
+ <embed src="codnar/data/contents.js" type="x-codnar/file"/>
19
+ <embed src="codnar/data/control_chunks.js" type="x-codnar/file"/>
20
+ </script>
21
+ </body>
22
+ </html>
@@ -0,0 +1,180 @@
1
+ ## The Story ##
2
+
3
+ This is the story of the Code Narrator (Codnar) tool. It serves a dual purpose.
4
+ It describes the Codnar tool itself, but it also serves as an example of why it
5
+ exists in the first place. To explain this more fully, we'll have to make a
6
+ little detour into the issue of system documentation.
7
+
8
+ ### The Documentation Problem ###
9
+
10
+ Documentation for any system can be grouped to two kinds. The first kind is the
11
+ reference manual. If you know of a small piece of the system, this kind of
12
+ documentation will give you the details about it. A good reference will help
13
+ you find this piece even if you only have a rough idea of what it is named. A
14
+ really good reference will also link it to related pieces. A great reference
15
+ will even give you example of how to use the related pieces in a realistic
16
+ context.
17
+
18
+ Reference manuals are invaluable, and there are plenty of tools to help you
19
+ create them. The common approach is the use of structured comments (e.g.,
20
+ [JavaDoc](http://en.wikipedia.org/wiki/Javadoc),
21
+ [Doxygen](http://en.wikipedia.org/wiki/Doxygen), and a [host of similar
22
+ tools](http://en.wikipedia.org/wiki/Comparison_of_documentation_generators)).
23
+ However, reference manuals by themselves are insufficient.
24
+
25
+ A reference manual only works if you have some idea about how the system works
26
+ as a whole. For that, you need some sort of overview. Here there is much less
27
+ to help you produce good documentation. The common practice is to sprinkle
28
+ small tutorials inside your reference documentation (the [MSDN
29
+ library](http://msdn.microsoft.com/en-us/library) is a good example). This
30
+ doesn't really solve the problem: how do you sufficiently explain a complex new
31
+ system, so that references and small tutorials become useful?
32
+
33
+ One possible solution to this problem, [literate
34
+ programming](http://en.wikipedia.org/wiki/Literate_programming), was proposed
35
+ by Knuth. In a nutshell, the idea was that the source code for the system
36
+ fulfilled a dual role. You could compile it into the executable code, as
37
+ expected. But you could also generate documentation from it.
38
+
39
+ So far this sounds a lot like structured comments, and indeed structured
40
+ comments were inspired by literate programming. The key difference between the
41
+ two approaches is that in literate programming, the generated documentation was
42
+ not a reference manual. It was a linear narrative describing the system - a
43
+ story which walked you through the system in an specific path chosen for
44
+ optimal presentation.
45
+
46
+ To achieve this, the sources contained the linear documentation, with embedded
47
+ code "chunks". The order of the chunks in the sources was determined by the
48
+ narrative, not the programming language requirements. Extracting and
49
+ re-ordering these chunks was part of the build process, so the regular compiler
50
+ could process them as usual.
51
+
52
+ This was the great strength, but also the great weakness, of literate
53
+ programming. For example, it is next to impossible to create IDEs and similar
54
+ tools for literate programming source code. The code chunks are split any which
55
+ way and spread around the source files in any order; the same source file may
56
+ contain chunks in several languages; etc. Automatically figuring out, say, the
57
+ list of members of some class would be a daunting task.
58
+
59
+ In contrast, structured comments stay out of the way of the IDE and similar
60
+ tools. The source code is still structured exactly the way the compiler wants,
61
+ which allows for easy, localized processing. The trade-off, of course, is that
62
+ structured comments produce a reference manual, not a narrative.
63
+
64
+ Today, structured comments have taken over the coding world, and literate
65
+ programming has all but been forgotten. The problem it tried to solve, however,
66
+ is still very much with us. How do we explain a new complex system?
67
+
68
+ ### A Different Approach ###
69
+
70
+ Codnar is an example of a different approach for solving this problem, "inverse
71
+ literate programming" (similar to, for example,
72
+ [antiweb](http://packages.python.org/antiweb/)). This approach is a combination
73
+ of structured comments and literate programming. Note that this approach is
74
+ similar to, but different in key aspects from, [reverse literate
75
+ programming](http://ssw.jku.at/Research/Projects/RevLitProg/).
76
+
77
+ In inverse literate programming, the source files are organized just
78
+ the way the compiler, IDE, and similar tools expect them to be. Structured
79
+ comments are used to document the pieces of code, and a reference manual can be
80
+ generated from the sources as usual.
81
+
82
+ In addition, the code is split into (possibly nested) named "chunks". This is
83
+ done using specially formatted comments. It turns out this functionality is
84
+ already supported by most coding editors and IDEs, in the form of "folds" or
85
+ "regions". These allow the developer to collapse or expand such chunks at will.
86
+
87
+ At this point, inverse literate programming kicks in. The developer writes
88
+ additional documentation source files, next to the usual code source files.
89
+ These documentation source files contain a narrative that describes the system,
90
+ much in the same way that a literate programming documentation would have done,
91
+ with two important differences.
92
+
93
+ The first difference is that the documentation source files refer to and embed
94
+ the code chunks (using their names), as opposed to a literate programming
95
+ system, where the documentation source files actually contain the code chunks.
96
+
97
+ The second difference is that the documentation source files do not need to
98
+ repeat the information that is already covered in the structured comments. When
99
+ a code chunk is embedded into the documentation, it includes these comments, so
100
+ all the documentation source files need to contain is the narrative "glue" for
101
+ placing these pieces into a comprehensible context for the reader.
102
+
103
+ In this way, inverse literate programming allows generating a linear narrative
104
+ describing the system, without abandoning the existing code processing tools.
105
+ It also makes it easy to retrofit such documentation to an existing code base;
106
+ all that's needed is to mark the already-documented code chunks (or even just
107
+ treat each source code file as a single chunk), and provide the narrative glue
108
+ around them.
109
+
110
+ ### Maintaining the Documentation ###
111
+
112
+ Structutred comments have the advantage that they are easy to maintain. Every
113
+ time you change a piece of code, change its comment to match. Simiarlt,
114
+ literate programming forced one to maintain the documentation as well, since
115
+ the same source file was used for code and documentation. Inverse literate
116
+ programming does not share this advantage. The linear documentation is in a
117
+ separate file, so it isn't immediately visible to the developer who is making
118
+ the changes. Also, it is easy to just forget to include some chunks of code in
119
+ the documentation.
120
+
121
+ These issues are very similar to the issues of unit testing. Unit tests live in
122
+ a separate file from the code they test, and it is easy to forget to test some
123
+ chunks of code. One way to ensure all code is tested is to use a code coverage
124
+ tool. Similarly, inverse literate programming tools should complain about code
125
+ chunks that are left out of the final narrative.
126
+
127
+ A different approach,
128
+ [TDD](http://en.wikipedia.org/wiki/Test-driven_development), ensures that the
129
+ tests are up-to-date and complete by writing the tests before the code. The
130
+ same approach can be used for documentation.
131
+ [DDD](http://thinkingphp.org/spliceit/docs/0.1_alpha/pages/ddd_info.html) means
132
+ that you first document what you are about to do, and only then follow up with
133
+ the actual coding. Inverse literate programming and TDD are an excellent
134
+ practical way to achieve that.
135
+
136
+ The unit tests are code like any other code. As such, they should be documented
137
+ using structured comments. Certain unit test tools like
138
+ [RSpec](http://rspec.info/), [Cucumber](http://cukes.info/) and other
139
+ [BDD](http://en.wikipedia.org/wiki/Behavior_Driven_Development) tools blur the
140
+ line between the tests-as-code and the tests-as-documentation anyway, so the
141
+ amount of unit test structured documentation should be small.
142
+
143
+ Therefore, if you are writing the tests first, you have done the heavy lifting
144
+ of documenting what the new code will do. All that is left is providing a bit
145
+ of surrounding context and embedding it all in the currect location in the
146
+ narrative. Then, when you write the new code itself, it should be easy to
147
+ connect it to the narrative at the appropriate point.
148
+
149
+ In the case of Code Narrator itself, the number of (raw) lines in the code
150
+ library itself is ~2100 lines, the number of test code lines is ~2200 lines,
151
+ and the number of narrative documentation lines is only ~900 lines. Given
152
+ narrative documentation are easier to write than system (or test) code, this
153
+ indicates maintaining a narrative is not an unreasonable burden for a
154
+ well-tested project.
155
+
156
+ ### Code Narrator ###
157
+
158
+ Codnar is an inverse literate programming tool. It allows you to tell a story
159
+ about your system, which will explain it to others: developers, maintainers,
160
+ and/or users. It builds on the structured comments you would write anyway to
161
+ generate a reference manual for the system, requires minimal or no changes to
162
+ your source code files, and works perfectly well inside your favorite IDE or
163
+ editor. If you follow TDD or BDD, Codnar will make it easier for you to
164
+ complement it with DDD.
165
+
166
+ Codnar is available under the MIT license:
167
+
168
+ [[LICENSE|named_chunk_with_containers]]
169
+
170
+ And the current Codnar version is:
171
+
172
+ [[lib/codnar/version.rb|named_chunk_with_containers]]
173
+
174
+ The rest of this document goes into the details of Codnar's implementation. The
175
+ core of the system is the following simple data flow: A set of source files is
176
+ split into chunks; the chunks are woven into a single HTML. This simple flow
177
+ can be enhanced by pre-processing the sources, or post-processing the HTML. In
178
+ a realistic project, all this would be managed by some build tool; either using
179
+ the command-line (for arbitrary build tools) or using the provided Ruby classes
180
+ for Rake integration.
@@ -0,0 +1,671 @@
1
+ ## Splitting files into chunks ##
2
+
3
+ Codnar makes the reasonable assumption that each source file can be effectively
4
+ processed as a sequence of lines. This works well in practice for all "text"
5
+ source files. It fails miserably for "binary" source files, but such files
6
+ don't work that well in most generic source management tools (such as version
7
+ management systems).
8
+
9
+ A second, less obvious assumption is that it is possible to classify the source
10
+ file lines to "kinds" using a simple state machine. The classified lines are
11
+ then grouped into nested chunks based on the two special line kinds
12
+ `begin_chunk` and `end_chunk`. The other line kinds are used to control how the
13
+ lines are formatted into HTML.
14
+
15
+ The collected chunks, with the formatted HTML for each one, are then stored in
16
+ a chunks file to be used later for weaving the overall HTML narrative.
17
+
18
+ ### Scanning Lines ###
19
+
20
+ Scanning a file into classified lines is done by the `Scanner` class.
21
+ Here is a simple test that demonstrates using the scanner:
22
+
23
+ [[test/scan_lines.rb|named_chunk_with_containers]]
24
+
25
+ And here is the implementation:
26
+
27
+ [[lib/codnar/scanner.rb|named_chunk_with_containers]]
28
+
29
+ As we can see, the implementation is split into two main parts. First, all
30
+ shorthands in the syntax definition are expanded (possibly generating errors).
31
+ Then, the expanded syntax is applied to a file, to generate a sequence of
32
+ classified lines.
33
+
34
+ #### Scanner Syntax Shorthands ####
35
+
36
+ The syntax is expected to be written by hand in a YAML file. We therefore
37
+ provide some convenient shorthands (listed above) to make YAML syntax files
38
+ more readable. These shorthands must be expanded to their full form before we
39
+ can apply the syntax to a file. There are two sets of shorthands we need to
40
+ expand:
41
+
42
+ * [[Scanner pattern shorthands|named_chunk_with_containers]]
43
+
44
+ * [[Scanner state shorthands|named_chunk_with_containers]]
45
+
46
+ The above code modifies the syntax object in place. This is safe because we are
47
+ working on a `deep_clone` of the original syntax:
48
+
49
+ [[lib/codnar/hash_extensions.rb|named_chunk_with_containers]]
50
+
51
+ #### Classifying Source Lines ####
52
+
53
+ Scanning a file to classified lines is a simple matter of applying the current
54
+ state transitions to each line:
55
+
56
+ [[Scanner file processing|named_chunk_with_containers]]
57
+
58
+ If a line matches a state transition, it is classified accordingly. Otherwise,
59
+ it is reported as an error:
60
+
61
+ [[Scanner line processing|named_chunk_with_containers]]
62
+
63
+ ### Merging scanned lines to chunks ###
64
+
65
+ Once we have the array of scanned classified lines, we need to merge them into
66
+ nested chunks. Here is a simple test that demonstrates using the merger:
67
+
68
+ [[test/merge_lines.rb|named_chunk_with_containers]]
69
+
70
+ And here is the implementation:
71
+
72
+ [[lib/codnar/merger.rb|named_chunk_with_containers]]
73
+
74
+ #### Merging nested chunk lines ####
75
+
76
+ To merge the nested chunk lines, we maintain a stack of the current chunks.
77
+ Each `begin_chunk` line pushes another chunk on the stack, and each `end_chunk`
78
+ line pops it. If any chunks are not properly terminated, they will remain in
79
+ the stack when all the lines are processed.
80
+
81
+ [[Merging nested chunk lines|named_chunk_with_containers]]
82
+
83
+ #### Unindenting merged chunk lines ####
84
+
85
+ Nested chunks are typically indented relative to their container chunks.
86
+ However, in the generated documentation, these chunks are displayed on their
87
+ own, and preserving this relative indentation would reduce their readability.
88
+ We therefore unindent all chunks as much as possible as the final step.
89
+
90
+ [[Unindenting chunk lines|named_chunk_with_containers]]
91
+
92
+ ### Generating chunk HTML ###
93
+
94
+ Now that we have each chunk's lines, we need to convert them to HTML.
95
+
96
+ #### Grouping lines of the same kind ####
97
+
98
+ Instead of formatting each line on its own, we batch the operations to work on
99
+ all lines of the same kind at once. Here is a simple test that demonstrates
100
+ using the grouper:
101
+
102
+ [[test/group_lines.rb|named_chunk_with_containers]]
103
+
104
+ And here is the implementation:
105
+
106
+ [[lib/codnar/grouper.rb|named_chunk_with_containers]]
107
+
108
+ #### Formatting lines as HTML ####
109
+
110
+ Formatting is based on a configuration that specifies, for (a group of) lines
111
+ of each kind, how to convert it to HTML. Here is a simple test that
112
+ demonstrates using the formatter:
113
+
114
+ [[test/format_lines.rb|named_chunk_with_containers]]
115
+
116
+ And here is the implementation:
117
+
118
+ [[lib/codnar/formatter.rb|named_chunk_with_containers]]
119
+
120
+ #### Basic formatters ####
121
+
122
+ The implementation contains some basic formatting functions. These are
123
+ sufficient for generic source code processing.
124
+
125
+ [[Basic formatters|named_chunk_with_containers]]
126
+
127
+ #### Markup formats ####
128
+
129
+ The `markup_lines_to_html` formatter above relies on the existence of a class
130
+ for converting comments from the specific markup format to HTML. Currently, two
131
+ such formats are supported:
132
+
133
+ * RDoc, the default markup format used in Ruby comments. Here is a simple test
134
+ that demonstrates using RDoc:
135
+
136
+ [[test/expand_rdoc.rb|named_chunk_with_containers]]
137
+
138
+ And here is the implementation:
139
+
140
+ [[lib/codnar/rdoc.rb|named_chunk_with_containers]]
141
+
142
+ * Markdown, a generic markup syntax used across many systems and languages.
143
+ Here is a simple test that demonstrates using Markdown:
144
+
145
+ [[test/expand_markdown.rb|named_chunk_with_containers]]
146
+
147
+ And here is the implementation:
148
+
149
+ [[lib/codnar/markdown.rb|named_chunk_with_containers]]
150
+
151
+ In both cases, the HTML generated by the markup format conversion is a bit
152
+ messy. We therefore clean it up:
153
+
154
+ [[Clean html|named_chunk_with_containers]]
155
+
156
+ #### Syntax highlighting using GVIM ####
157
+
158
+ If you have `gvim` istalled, it is possible to use it to generate syntax
159
+ highlighting. This is a *slow* operation, as `gvim` was never meant to be used
160
+ as a command-line tool. However, what it lacks in speed it compensates for in
161
+ scope; almost any language you can think of has a `gvim` syntax highlighting
162
+ definition. Here is a simple test that demonstrates using `gvim` for syntax
163
+ highlighting:
164
+
165
+ [[test/gvim_highlight_syntax.rb|named_chunk_with_containers]]
166
+
167
+ And here is the implementation:
168
+
169
+ [[lib/codnar/gvim.rb|named_chunk_with_containers]]
170
+
171
+ Since GVim is so slow, we are using caching to minimize the time it takes to
172
+ recompute the same code's highlighted HTML. This is pretty useful in practice -
173
+ making changes in one chunk in a file will not require recomputing the
174
+ highlighting for any of the unchanged chunks in the same file. Here is a simple
175
+ test of using the caching functionality:
176
+
177
+ [[test/cache_computations.rb|named_chunk_with_containers]]
178
+
179
+ And here is the implementation:
180
+
181
+ [[lib/codnar/cache.rb|named_chunk_with_containers]]
182
+
183
+ #### Syntax highlighting using Sunlight ####
184
+
185
+ [Sunlight](http://sunlightjs.com/) offers a different approach for syntax
186
+ highlighting. Instead of pre-processing the code to generate highlighted HTML
187
+ while splitting, it provides Javascript files that examine the textual code in
188
+ the DOM and convert it to highlighted HTML in the browser. This takes virtually
189
+ no time when splitting the code, but requires recomputing highlighting for all
190
+ the code chunks every time the HTML file is loaded. This can be pretty slow,
191
+ especially if using a browser with a slow Javascript engine, like IE. However,
192
+ given how slow GVIM is, this is a reasonable trade-off, at least for small
193
+ projects. Since Sunlight is a new project, it doesn't offer the extensive
194
+ coverage of different programming languages supported by GVIM.
195
+
196
+ Here is a simple test that demonstrates using Sunlight for syntax highlighting:
197
+
198
+ [[test/sunlight_highlight_syntax.rb|named_chunk_with_containers]]
199
+
200
+ And here is the implementation:
201
+
202
+ [[lib/codnar/sunlight.rb|named_chunk_with_containers]]
203
+
204
+ ### Putting it all together ###
205
+
206
+ Now that we have all the separate pieces of functionality for splitting source
207
+ files into HTML chunks, we need to combine them to a single convenient service.
208
+
209
+ #### Splitting code files ####
210
+
211
+ Here is a simple test that demonstrates using the splitter for source code
212
+ files:
213
+
214
+ [[test/split_code.rb|named_chunk_with_containers]]
215
+
216
+ And here is the implementation:
217
+
218
+ [[lib/codnar/splitter.rb|named_chunk_with_containers]]
219
+
220
+ #### Splitting documentation files ####
221
+
222
+ The narrative documentation is expected to reside in one or more files, which
223
+ are also "split" to a single chunk each. Having both documentation and code
224
+ exist as chunks allows for uniform treatment of both when weaving, as well as
225
+ allowing for pre-processing the documentation files, if necessary. For example,
226
+ Codnar currently supports for documentation the same two markup formats that
227
+ are also supported for code comments. Here is a simple test that demonstrates
228
+ "splitting" documentation (using the same implementation as above):
229
+
230
+ [[test/split_documentation.rb|named_chunk_with_containers]]
231
+
232
+ ### Built-in configurations ###
233
+
234
+ The splitting mechanism defined above is pretty generic. To apply it to a
235
+ specific system requires providing the appropriate configuration. The system
236
+ provides a few specific built-in configurations which may be useful "out of the
237
+ box".
238
+
239
+ If one is willing to give up altogether on syntax highlighting and comment
240
+ formatting, the system would be applicable as-is to any programming language.
241
+ Properly highlighting almost any known programming language syntax would be a
242
+ simple matter of passing the correct syntax parameter to GVIM.
243
+
244
+ Properly formatting comments in additional mark-up formats would be trickier.
245
+ First, a proper pattern needs to be established for extracting the comments
246
+ (`/*`, `//`, `--`, etc.). Them, the results need to be converted to HTML. One
247
+ way would be to pass them through GVim syntax highlighting with an appropriate
248
+ format (e.g, `syntax=doxygen`). Another would be to invoke some Ruby library;
249
+ finally, one could invoke some external tool to do the job. The latter two
250
+ options would require providing additional glue Ruby code, similar to the GVim
251
+ class above.
252
+
253
+ At any rate, here are the built-in configurations:
254
+
255
+ [[lib/codnar/split_configurations.rb|named_chunk_with_containers]]
256
+
257
+ #### Combining configurations ####
258
+
259
+ Different source files require different overall configurations but reuse
260
+ common building blocks. To support it, we allow comfigurations to be combined
261
+ using a "deep merge". This allows complex nested structures to be merged. There
262
+ is even a way for arrays to append elements before/after the array they are
263
+ merged with. Here is a simple test that demonstrates deep-merging complex
264
+ structures:
265
+
266
+ [[test/deep_merge.rb|named_chunk_with_containers]]
267
+
268
+ Here is the implementation:
269
+
270
+ [[Deep merge|named_chunk_with_containers]]
271
+
272
+ And here is a test module that automates the process of merging configurations
273
+ and invoking the Splitter:
274
+
275
+ [[test/lib/test_with_configurations.rb|named_chunk_with_containers]]
276
+
277
+ #### Documentation "splitting" ####
278
+
279
+ These are pretty simple configurations, applicable to files containing a piece
280
+ of the narrative in some supported format. These configurations typically do
281
+ not require to be combined with other configurations. Here is a simple test
282
+ that demonstrates "splitting" documentation:
283
+
284
+ [[test/split_documentation_configurations.rb|named_chunk_with_containers]]
285
+
286
+ And here are the actual configurations:
287
+
288
+ [[Documentation "splitting" configurations|named_chunk_with_containers]]
289
+
290
+ #### Source code lines classification ####
291
+
292
+ Splitting source code files is a more complex affair, which does typically
293
+ require combining several configurations. The basic configuration marks all
294
+ lines as belonging to some code syntax, as a single chunk:
295
+
296
+ [[Source code lines classification configurations|named_chunk_with_containers]]
297
+
298
+ Sometimes, a code in one syntax contains nested "islands" of code in another
299
+ syntax. Here is a simple configuration to support that, which can be combined
300
+ with the above basic configuration:
301
+
302
+ [[Nested foreign syntax code islands configurations|named_chunk_with_containers]]
303
+
304
+ Here is a simple test demonstrating using source code lines classifications:
305
+
306
+ [[test/split_code_configurations.rb|named_chunk_with_containers]]
307
+
308
+ #### Simple comment classification ####
309
+
310
+ Many languages use a simple comment syntax, where some prefix indicates a
311
+ comment that spans until the end of the line (e.g., shell `#` comments or C++
312
+ `//` comments).
313
+
314
+ [[Simple comment classification configurations|named_chunk_with_containers]]
315
+
316
+ Here is a simple test demonstrating using simple comment classifications:
317
+
318
+ [[test/split_simple_comment_configurations.rb|named_chunk_with_containers]]
319
+
320
+ #### Complex comment classification ####
321
+
322
+ Other languages use a complex multi-line comment syntax, where some prefix
323
+ indicates the beginning of the comment, some suffix indicates the end, and by
324
+ convention some prefix is expected for the inner comment lines (e.g., C's
325
+ "`/*`", "` *`", "`*/`" comments or HTML's "`<!--`", "` -`", "`-->`" comments).
326
+
327
+ [[Complex comment classification configurations|named_chunk_with_containers]]
328
+
329
+ Here is a simple test demonstrating using complex comment classifications:
330
+
331
+ [[test/split_complex_comment_configurations.rb|named_chunk_with_containers]]
332
+
333
+ #### Comment formatting ####
334
+
335
+ In many cases, the text inside comments is written using some markup format
336
+ (e.g., RDoc for Ruby or JavaDoc for Java). Currently, two such formats are
337
+ supported, as well as simply wrapping the comment in an HTML pre element:
338
+
339
+ [[Comment formatting configurations|named_chunk_with_containers]]
340
+
341
+ Here is a simple test demonstrating formatting comment contents:
342
+
343
+ [[test/format_comment_configurations.rb|named_chunk_with_containers]]
344
+
345
+ #### Syntax highlighting using GVim ####
346
+
347
+ Supporting a specific programming language (other than dealing with comments)
348
+ is very easy using GVim for syntax highlighting, as demonstrated here:
349
+
350
+ [[GVim syntax highlighting formatting configurations|named_chunk_with_containers]]
351
+
352
+ Here is a simple test demonstrating highlighting code syntax using `gvim`:
353
+
354
+ [[test/format_code_gvim_configurations.rb|named_chunk_with_containers]]
355
+
356
+ #### Syntax highlighting using Sunlight ####
357
+
358
+ For small projects in languages supported by Sunlight, you may choose to use
359
+ it instead of GVIM
360
+
361
+ [[Sunlight syntax highlighting formatting configurations|named_chunk_with_containers]]
362
+
363
+ Here is a simple test demonstrating highlighting code syntax using Sunlight:
364
+
365
+ [[test/format_code_sunlight_configurations.rb|named_chunk_with_containers]]
366
+
367
+ #### Chunk splitting ####
368
+
369
+ There are many ways to denote code "regions" (which become Codnar chunks). The
370
+ following covers GVim's default scheme; others are easily added. It is safest
371
+ to merge this configuration as the last of all the combined configurations, to
372
+ ensure its patterns end up before any others.
373
+
374
+ [[Chunk splitting configurations|named_chunk_with_containers]]
375
+
376
+ Here is a simple test demonstrating splitting code chunks:
377
+
378
+ [[test/split_chunk_configurations.rb|named_chunk_with_containers]]
379
+
380
+ ### Putting it all together ###
381
+
382
+ Here is a test demonstrating putting several of the above configurations
383
+ together in a meaningful way:
384
+
385
+ [[test/split_combined_configurations.rb|named_chunk_with_containers]]
386
+
387
+ ## Storing chunks on the disk ##
388
+
389
+ ### Writing chunks to disk ###
390
+
391
+ In any realistic system, the number of source files and chunks will be such
392
+ that it makes sense to store the chunks on the disk for further processing.
393
+ This allows incorporating the split operation as part of a build tool chain,
394
+ and only re-splitting modified files. Here is a simple test demonstrating
395
+ writing chunks to the disk:
396
+
397
+ [[test/write_chunks.rb|named_chunk_with_containers]]
398
+
399
+ And here is the implementation:
400
+
401
+ [[lib/codnar/writer.rb|named_chunk_with_containers]]
402
+
403
+ ### Reading chunks to memory ###
404
+
405
+ Having written the chunks to the disk requires us, at some following point in
406
+ time, to read them back into memort. This is the first time we will have a view
407
+ of the whole documented system, which allows us to detect several classes of
408
+ consistency errors: Some chunks may be left out of the final narrative
409
+ (consider this the equivalent of tests code coverage); we may be referring to
410
+ missing (or misspelled) chunk names; and, finally, we need to deal with
411
+ duplicate chunks.
412
+
413
+ In literate programming, it is trivial to write a chunk once and use it in
414
+ several places in the compiled source code. The classical example is C/C++
415
+ function signatures that need to appear in both the `.h` and `.c`/`.cpp` files.
416
+ However, in some cases this practice makes sense for other pieces of code, and
417
+ since the ultimate source code contains only one copy of the chunk, this does
418
+ not suffer from the typical copy-and-paste issues.
419
+
420
+ In inverse literate programming, if the same code appears twice (as a result of
421
+ copy-and-paste), then it does suffer from the typical copy-and-paste issues.
422
+ The most serious of these is, of course, that when only one copy is changed.
423
+ The way that Codnar helps alleviate this problem is that if the same chunk
424
+ appears more than once in the source code, its content is expected to be
425
+ exactly the same in both cases (up to indentation). This should not be viewed
426
+ as endorsement of copy-and-paste programming; Using duplicate chunks should be
427
+ a last resort measure to combat restrictions in the programming language and
428
+ compilation tool chain.
429
+
430
+ #### Chunk identifiers ####
431
+
432
+ The above definition raises the obvious question: what does "the same chunk"
433
+ mean? As far as Codnar is concerned, a chunk is uniquely identified by its
434
+ name, which is specified on the `begin_chunk` line. The unique identifier is
435
+ not the literal name but a transformation of it. This allows us to ignore
436
+ capitalization, white space, and any punctuation that may appear in the name.
437
+ It also allows us to use the resulting ID as an HTML anchor name, without
438
+ worrying about HTML's restictions on such names.
439
+
440
+ Here is a simple test demonstrating converting names to identifiers:
441
+
442
+ [[test/identify_chunks.rb|named_chunk_with_containers]]
443
+
444
+ And here is the implementation:
445
+
446
+ [[lib/codnar/string_extensions.rb|named_chunk_with_containers]]
447
+
448
+ #### In-memory chunks storage ####
449
+
450
+ Detecting unused and/or duplicate chunks requires us to have in-memory chunk
451
+ storage that tracks all chunks access. Here is a simple test demonstrating
452
+ reading chunks into the storage and handling the various error conditions
453
+ listed above:
454
+
455
+ [[test/read_chunks.rb|named_chunk_with_containers]]
456
+
457
+ And here is the implementation:
458
+
459
+ [[lib/codnar/reader.rb|named_chunk_with_containers]]
460
+
461
+ ## Weaving chunks into HTML ##
462
+
463
+ Assembling the final HTML requires combining both the narrative documentation
464
+ and source code chunks. This is done top-down starting at a "root"
465
+ documentation chunk and recursively embedding nested documentation and code
466
+ chunks into it.
467
+
468
+ ### Weaving chunks together ###
469
+
470
+ When embedding a documentation chunk inside another documentation chunk, things
471
+ are pretty easy - we just need to insert the embedded chunk HTML into the
472
+ containing chunk. When embedding a source code chunk into the documentation,
473
+ however, we may want to wrap it in some boilerplate HTML, providing a header,
474
+ footer, borders, links, etc. Therefore, the HTML syntax we use to embed a chunk
475
+ into the documentation is `<embed src="..." type="x-codnar/template-name"/>`.
476
+ The templates are normal ERB templates, except for the magical `file` and
477
+ `image` templates, described below.
478
+
479
+ At any rate, here is a simple test demonstrating applying different templates
480
+ to the embedded code chunks:
481
+
482
+ [[test/weave_configurations.rb|named_chunk_with_containers]]
483
+
484
+ Here is the implementation:
485
+
486
+ [[lib/codnar/weaver.rb|named_chunk_with_containers]]
487
+
488
+ And here are the pre-defined weaving template configurations:
489
+
490
+ [[lib/codnar/weave_configurations.rb|named_chunk_with_containers]]
491
+
492
+ #### Embedding files ####
493
+
494
+ The template named `file` is special in two ways. First, the `src` is given
495
+ special treatment. If it begins with a "`.`", it is assumed to be a normal path
496
+ name relative to the current working directory; otherwise, it is assumed to be
497
+ a name of a file packaged inside some gem and is searched for in Ruby's
498
+ `$LOAD_PATH`. This allows gems (such as Codnar itself) to provide such files to
499
+ be used in the woven documentation.
500
+
501
+ Second, the content of the file is simply embedded into the generated
502
+ documentation. This allows the documentation to be a stand-alone file,
503
+ including all the CSS and Javascript required for proper display.
504
+
505
+ [[Processing the file template|named_chunk_with_containers]]
506
+
507
+ See the `doc/root.html` file for plenty of examples of using this
508
+ functionality.
509
+
510
+ #### Embedding images ####
511
+
512
+ The `image` template is a specialization of the `file` template for dealing
513
+ with embedded images. The specified image file is embedded into the generated
514
+ HTML as an `img` tag, using a [data
515
+ URL](http://en.wikipedia.org/wiki/Data_URI_scheme). This is very useful for
516
+ small images, but is problematic when their size increase beyond
517
+ browser-specific limits.
518
+
519
+ Here is a simple test demonstrating processing embedded image files:
520
+
521
+ [[test/embed_images.rb|named_chunk_with_containers]]
522
+
523
+ Here is the implementation:
524
+
525
+ [[Processing Base64 embedded data images|named_chunk_with_containers]]
526
+
527
+ And here is a sample embedded image:
528
+
529
+ [[doc/logo.png|image]]
530
+
531
+ ## Invoking the functionality ##
532
+
533
+ There are two ways to invoke Codnar's functionality - from the command line,
534
+ and (for Ruby projects) as integrated Rake tasks.
535
+
536
+ ### Command Line Applications ###
537
+
538
+ Executable scripts (tests, command-line applications) start with a `require
539
+ 'codnar'` line to access to the full Codnar code. This also serves as a
540
+ convenient list of all of Codnar's parts and dependencies:
541
+
542
+ [[lib/codnar.rb|named_chunk_with_containers]]
543
+
544
+ The base command line Application class handles execution from the command
545
+ line, with the usual standard options, as well as some Codnar-specific ones:
546
+ the ability to specify configuration files and/or built-in configurations, and
547
+ the ability to include additional extension code triggered from these
548
+ configurations. Together, these allow configuring and extending Codnar's
549
+ behavior to cover the specific system's needs.
550
+
551
+ Here is a simple test demonstrating the standard Codnar application behavior:
552
+
553
+ [[test/run_application.rb|named_chunk_with_containers]]
554
+
555
+ And here is the implementation:
556
+
557
+ [[lib/codnar/application.rb|named_chunk_with_containers]]
558
+
559
+ #### Application for splitting files ####
560
+
561
+ Here is a simple test demonstrating invoking the command-line application for
562
+ splitting files:
563
+
564
+ [[test/run_split.rb|named_chunk_with_containers]]
565
+
566
+ Here is the implementation:
567
+
568
+ [[lib/codnar/split.rb|named_chunk_with_containers]]
569
+
570
+ And here is the actual command-line application script:
571
+
572
+ [[bin/codnar-split|named_chunk_with_containers]]
573
+
574
+ #### Application for weaving chunks ####
575
+
576
+ Here is a simple test demonstrating invoking the command-line application for
577
+ weaving chunk to HTML:
578
+
579
+ [[test/run_weave.rb|named_chunk_with_containers]]
580
+
581
+ Here is the implementation:
582
+
583
+ [[lib/codnar/weave.rb|named_chunk_with_containers]]
584
+
585
+ And here is the actual command-line application script:
586
+
587
+ [[bin/codnar-weave|named_chunk_with_containers]]
588
+
589
+ ### Rake Integration ###
590
+
591
+ For Ruby projects (or any other project using Rake), it is also possible to
592
+ invoke Codnar using Rake tasks. Here is a simple test demonstrating using the
593
+ Rake tasks:
594
+
595
+ [[test/rake_tasks.rb|named_chunk_with_containers]]
596
+
597
+ To use these tasks in a Rakefile, one needs to `require 'codnar/rake'`. The
598
+ code implements a singleton that holds the global state shared between tasks:
599
+
600
+ [[lib/codnar/rake.rb|named_chunk_with_containers]]
601
+
602
+ #### Task for splitting files ####
603
+
604
+ To split one or more files to chunks, create a new SplitTask. Multiple such
605
+ tasks may be created; this is required if different files need to be split
606
+ using different configurations.
607
+
608
+ [[lib/codnar/rake/split_task.rb|named_chunk_with_containers]]
609
+
610
+ #### Task for weaving chunks ####
611
+
612
+ To weave the chunks together, create a single WeaveTask.
613
+
614
+ [[lib/codnar/rake/weave_task.rb|named_chunk_with_containers]]
615
+
616
+ ## Building the Codnar gem ##
617
+
618
+ The following Rakefile is in charge of building the gem, with the help of some
619
+ tools described below.
620
+
621
+ [[Rakefile|named_chunk_with_containers]]
622
+
623
+ The generated HTML requires some tweaking to yield aesthetic, readable results.
624
+ This tweaking consists of using Javascript to control chunk visibility,
625
+ generating a table of content, and using CSS to make the HTML look better.
626
+
627
+ Here are the modified configurations for generating the correct HTML:
628
+
629
+ [[Codnar configurations|named_chunk_with_containers]]
630
+
631
+ ### Javascript chunk visibilty control ###
632
+
633
+ The following code injects visibility controls ("+"/"-" toggles) next to each
634
+ embedded code chunk. It also hides all the chunks by default; this increases
635
+ the readability of the overall narrative, turning it into a high-level summary.
636
+ Expanding the embedded code chunks allows the reader to delve into the details.
637
+
638
+ [[lib/codnar/data/control_chunks.js|named_chunk_with_containers]]
639
+
640
+ ### Javascript table of content ###
641
+
642
+ The following code is not very efficient or elegant but it does a basic job of
643
+ iunjecting a table of content into the generated HTML.
644
+
645
+ [[lib/codnar/data/contents.js|named_chunk_with_containers]]
646
+
647
+ ### CSS style ###
648
+
649
+ To avoid dealing with the different default styles used by different browsers,
650
+ we employ the YUI CSS [reset](http://developer.yahoo.com/yui/reset/) and
651
+ [base](http://developer.yahoo.com/yui/base/) files. Resetting and restoring the
652
+ default CSS styles is inelegant, but it is the only current way to get a
653
+ consistent presentation of HTML. Once this is out of the way, we apply styles
654
+ specific to our HTML. Some of these override the default styles established by
655
+ the base CSS file above. We do this instead of directly tweaking the base CSS
656
+ file, to allow easy upgrade to new versions if/when YUI release any.
657
+
658
+ [[lib/codnar/data/style.css|named_chunk_with_containers]]
659
+
660
+ ### Using Sunlight ###
661
+
662
+ When using Sunlight for syntax highlighting, we also need to include some CSS
663
+ and Javascript files to convert the classified `pre` elements into properly
664
+ marked-up HTML. We also need to invoke this Javascript code (a one-line
665
+ operations). Here is what such code might look like inside a Javascript block
666
+ of the generated HTML:
667
+
668
+ &lt;embed src="codnar/data/sunlight/min.js" type="x-codnar/file"/&gt;
669
+ &lt;embed src="codnar/data/sunlight/ruby-min.js" type="x-codnar/file"/&gt;
670
+ Sunlight.globalOptions.lineNumbers = false;
671
+ Sunlight.highlightAll();