RubyGems - codnar - Versions diffs - 0.1.68 → 0.1.73 - Mend

codnar 0.1.68 → 0.1.73

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (26) hide show

data/ChangeLog +17 -0
data/codnar.html +1368 -527
data/doc/system.markdown +56 -15
data/lib/codnar.rb +6 -0
data/lib/codnar/configuration/code.rb +87 -0
data/lib/codnar/configuration/comments.rb +234 -0
data/lib/codnar/configuration/documentation.rb +65 -0
data/lib/codnar/configuration/highlighting.rb +107 -0
data/lib/codnar/data/contents.js +2 -1
data/lib/codnar/haddock.rb +72 -0
data/lib/codnar/rake/split_task.rb +6 -0
data/lib/codnar/scanner.rb +34 -11
data/lib/codnar/split_configurations.rb +4 -382
data/lib/codnar/string_extensions.rb +4 -3
data/lib/codnar/version.rb +1 -1
data/test/expand_haddock.rb +23 -0
data/test/expand_rdoc.rb +2 -2
data/test/format_comment_configurations.rb +1 -1
data/test/run_weave.rb +2 -3
data/test/split_code.rb +1 -1
data/test/split_combined_configurations.rb +1 -1
data/test/{split_complex_comment_configurations.rb → split_delimited_comment_configurations.rb} +11 -11
data/test/split_denoted_comment_configurations.rb +62 -0
data/test/split_documentation.rb +1 -1
data/test/split_documentation_configurations.rb +1 -1
metadata +16 -7

data/doc/system.markdown CHANGED Viewed

@@ -155,7 +155,16 @@ such formats are supported:
   [[lib/codnar/markdown.rb|named_chunk_with_containers]]
-In both cases, the HTML generated by the markup format conversion is a bit
+* Haddock, a specific markup syntax used in comments to document Haskell code.
+  Here is a simple test that demonstrates using Haddock:
+  [[test/expand_haddock.rb|named_chunk_with_containers]]
+  And here is the implementation:
+  [[lib/codnar/haddock.rb|named_chunk_with_containers]]
+In all cases, the HTML generated by the markup format conversion is a bit
 messy. We therefore clean it up:
 [[Clean html|named_chunk_with_containers]]
@@ -329,13 +338,17 @@ that demonstrates "splitting" documentation:
 And here are the actual configurations:
-[[Documentation "splitting" configurations|named_chunk_with_containers]]
+[[lib/codnar/configuration/documentation.rb|named_chunk_with_containers]]
 #### Source code lines classification ####
 Splitting source code files is a more complex affair, which does typically
-require combining several configurations. The basic configuration marks all
-lines as belonging to some code syntax, as a single chunk:
+require combining several configurations.
+[[lib/codnar/configuration/code.rb|named_chunk_with_containers]]
+The basic configuration marks all lines as belonging to some code syntax, as a
+single chunk:
 [[Source code lines classification configurations|named_chunk_with_containers]]
@@ -349,7 +362,15 @@ Here is a simple test demonstrating using source code lines classifications:
 [[test/split_code_configurations.rb|named_chunk_with_containers]]
-#### Simple comment classification ####
+#### Classifying comment lines ####
+Classifying comment lines is the most complex part of splitting source code
+files, requiring the use of one or more configurations specific to the language
+used.
+[[lib/codnar/configuration/comments.rb|named_chunk_with_containers]]
+##### Simple comment classification #####
 Many languages use a simple comment syntax, where some prefix indicates a
 comment that spans until the end of the line (e.g., shell `#` comments or C++
@@ -361,18 +382,30 @@ Here is a simple test demonstrating using simple comment classifications:
 [[test/split_simple_comment_configurations.rb|named_chunk_with_containers]]
-#### Complex comment classification ####
+##### Denoted comment classification #####
-Other languages use a complex multi-line comment syntax, where some prefix
+Sometimes some simple comments require special treatment if they are denoted by
+some leading prefix. For example, Haskell simple comments start with `--` but
+Haddock (documentation) comments start with `-- |`, `-- ^` etc.
+[[Denoted comment classification configurations|named_chunk_with_containers]]
+Here is a simple test demonstrating using denoted comment classifications:
+[[test/split_denoted_comment_configurations.rb|named_chunk_with_containers]]
+##### Delimited comment classification #####
+Other languages use a delimited multi-line comment syntax, where some prefix
 indicates the beginning of the comment, some suffix indicates the end, and by
 convention some prefix is expected for the inner comment lines (e.g., C's
 "`/*`", "` *`", "`*/`" comments or HTML's "`<!--`", "` -`", "`-->`" comments).
-[[Complex comment classification configurations|named_chunk_with_containers]]
+[[Delimited comment classification configurations|named_chunk_with_containers]]
-Here is a simple test demonstrating using complex comment classifications:
+Here is a simple test demonstrating using delimited comment classifications:
-[[test/split_complex_comment_configurations.rb|named_chunk_with_containers]]
+[[test/split_delimited_comment_configurations.rb|named_chunk_with_containers]]
 #### Comment formatting ####
@@ -386,10 +419,18 @@ Here is a simple test demonstrating formatting comment contents:
 [[test/format_comment_configurations.rb|named_chunk_with_containers]]
-#### Syntax highlighting using GVim ####
+#### Syntax highlighting ####
-Supporting a specific programming language (other than dealing with comments)
-is very easy using GVim for syntax highlighting, as demonstrated here:
+Highlighting the syntax of the source code embedded in the documentation
+improved readability. Codnar provides several ways to achieve this.
+[[lib/codnar/configuration/highlighting.rb|named_chunk_with_containers]]
+##### Syntax highlighting using GVim #####
+Supporting almost any known programming language (other than dealing with
+comments) is very easy using GVim for syntax highlighting, as demonstrated
+here:
 [[GVim syntax highlighting formatting configurations|named_chunk_with_containers]]
@@ -399,7 +440,7 @@ classes. Here is the default CSS stylesheet used by GVim:
 [[lib/codnar/data/gvim.css|named_chunk_with_containers]]
-#### Syntax highlighting using CodeRay ####
+##### Syntax highlighting using CodeRay #####
 For supported programming languages, you may choose to use CodeRay instead of GVim.
@@ -411,7 +452,7 @@ classes. Here is the default CSS stylesheet used by CodeRay:
 [[lib/codnar/data/coderay.css|named_chunk_with_containers]]
-#### Syntax highlighting using Sunlight ####
+##### Syntax highlighting using Sunlight #####
 For small projects in supported languages, you may choose to use Sunlight
 instead of GVim.

data/lib/codnar.rb CHANGED Viewed

@@ -8,6 +8,7 @@ require "fileutils"
 require "irb"
 require "open3"
 require "rdiscount"
+require "rdoc"
 require "rdoc/markup/to_html"
 require "tempfile"
 require "yaml"
@@ -21,6 +22,7 @@ require "olag/string_unindent"
 require "codnar/version"
 require "codnar/coderay"
+require "codnar/haddock"
 require "codnar/hash_extensions"
 require "codnar/markdown"
 require "codnar/rdoc"
@@ -36,6 +38,10 @@ require "codnar/merger"
 require "codnar/split"
 require "codnar/reader"
 require "codnar/scanner"
+require "codnar/configuration/code"
+require "codnar/configuration/comments"
+require "codnar/configuration/documentation"
+require "codnar/configuration/highlighting"
 require "codnar/split_configurations"
 require "codnar/splitter"
 require "codnar/sunlight"

data/lib/codnar/configuration/code.rb ADDED Viewed

@@ -0,0 +1,87 @@
+module Codnar
+  module Configuration
+    # Configurations for splitting source code.
+    module Code
+      # {{{ Source code lines classification configurations
+      # Classify all lines as source code of some syntax (kind). This doesn't
+      # distinguish between comment and code lines; to do that, you need to
+      # combine this with comment classification configuration(s). Also, it just
+      # formats the lines in an HTML +pre+ element, without any syntax
+      # highlighting; to do that, you need to combine this with syntax
+      # highlighting formatting configuration(s).
+      CLASSIFY_SOURCE_CODE = lambda do |syntax|
+        return {
+          "formatters" => {
+            "#{syntax}_code" => "Formatter.lines_to_pre_html(lines, :class => :code)",
+          },
+          "syntax" => {
+            "patterns" => {
+              "#{syntax}_code" => { "regexp" => "^(\\s*)(.*)$" },
+            },
+            "states" => {
+              "start" => {
+                "transitions" => [
+                  { "pattern" => "#{syntax}_code" },
+                ],
+              },
+            },
+          },
+        }
+      end
+      # }}}
+      # {{{ Nested foreign syntax code islands configurations
+      # Allow for comments containing "((( <syntax>" and "))) <syntax>" to
+      # designate nested islands of foreign syntax inside the normal code. The
+      # designator comment lines are always treated as part of the surrounding
+      # code, not as part of the nested foreign syntax code. There is no further
+      # classification of the nested foreign syntax code. Therefore, the nested
+      # code is not examined for begin/end chunk markers. Likewise, the nested
+      # code may not contain deeper nested code using a third syntax.
+      CLASSIFY_NESTED_CODE = lambda do |outer_syntax, inner_syntax|
+        {
+          "syntax" => {
+            "patterns" => {
+              "start_#{inner_syntax}_in_#{outer_syntax}" =>
+                { "regexp" => "^(\\s*)(.*\\(\\(\\(\\s*#{inner_syntax}.*)$" },
+              "end_#{inner_syntax}_in_#{outer_syntax}" =>
+                { "regexp" => "^(\\s*)(.*\\)\\)\\)\\s*#{inner_syntax}.*)$" },
+              "#{inner_syntax}_in_#{outer_syntax}" =>
+                { "regexp" => "^(\\s*)(.*)$" },
+            },
+            "states" => {
+              "start" => {
+                "transitions" => [
+                  { "pattern" => "start_#{inner_syntax}_in_#{outer_syntax}",
+                    "kind" => "#{outer_syntax}_code",
+                    "next_state" => "#{inner_syntax}_in_#{outer_syntax}" },
+                  [],
+                ],
+              },
+              "#{inner_syntax}_in_#{outer_syntax}" => {
+                "transitions" => [
+                  { "pattern" => "end_#{inner_syntax}_in_#{outer_syntax}",
+                    "kind" => "#{outer_syntax}_code",
+                    "next_state" => "start" },
+                  { "pattern" => "#{inner_syntax}_in_#{outer_syntax}",
+                    "kind" => "#{inner_syntax}_code" },
+                ],
+              },
+            },
+          },
+        }
+      end
+      # }}}
+    end
+  end
+end

data/lib/codnar/configuration/comments.rb ADDED Viewed

@@ -0,0 +1,234 @@
+module Codnar
+  module Configuration
+    # Configurations for splitting source code with comments.
+    module Comments
+      # {{{ Simple comment classification configurations
+      # Classify simple comment lines. It accepts a restricted format: each
+      # comment is expected to start with some exact prefix (e.g. "#" for shell
+      # style comments or "//" for C++ style comments). The following space, if
+      # any, is stripped from the payload. As a convenience, comment that starts
+      # with "!" is not taken to start a comment. This both protects the 1st line
+      # of shell scripts ("#!"), and also any other line you wish to avoid being
+      # treated as a comment.
+      #
+      # This configuration is typically complemented by an additional one
+      # specifying how to format the (stripped!) comments; by default they are
+      # just displayed as-is using an HTML +pre+ element, which isn't very
+      # useful.
+      CLASSIFY_SIMPLE_COMMENTS = lambda do |prefix|
+        return Comments.simple_comments(prefix)
+      end
+      # Classify simple shell ("#") comment lines.
+      CLASSIFY_SHELL_COMMENTS = lambda do
+        return Comments.simple_comments("#")
+      end
+      # Classify simple C++ ("//") comment lines.
+      CLASSIFY_CPP_COMMENTS = lambda do
+        return Comments.simple_comments("//")
+      end
+      # Configuration for classifying lines to comments and code based on a
+      # simple prefix (e.g. "#" for shell style comments or "//" for C++ style
+      # comments).
+      def self.simple_comments(prefix)
+        return {
+          "syntax" => {
+            "patterns" => {
+              "comment_#{prefix}" => { "regexp" => "^(\\s*)#{prefix}(?!!)\\s?(.*)$" },
+            },
+            "states" => {
+              "start" => {
+                "transitions" => [
+                  { "pattern" => "comment_#{prefix}", "kind" => "comment" },
+                  []
+                ],
+              },
+            },
+          },
+        }
+      end
+      # }}}
+      # {{{ Denoted comment classification configurations
+      # Classify denoted comment lines. Denoted comments are similar to simple
+      # comments, except that the 1st simple comment line must start with a
+      # specific prefix (e.g., in haddock, comment lines start with '--' but
+      # haddoc comments start with '-- |', '-- ^', etc.). The comment continues
+      # in additional simple comment lines.
+      #
+      # This configuration is typically complemented by an additional one
+      # specifying how to format the (stripped!) comments; by default they are
+      # just displayed as-is using an HTML +pre+ element, which isn't very
+      # useful.
+      CLASSIFY_DENOTED_COMMENTS = lambda do |start_prefix, continue_prefix|
+        return Comments.denoted_comments(start_prefix, continue_prefix)
+      end
+      # Classify denoted haddock ("--") comment lines. Note that non-haddock
+      # comment lines are not captured; they would treated as code and handled
+      # by syntax highlighting, if any.
+      CLASSIFY_HADDOCK_COMMENTS = lambda do
+        return Comments.denoted_comments("-- [|^$]", "--")
+      end
+      # Configuration for classifying lines to comments and code based on a start
+      # comment prefix and continuation comment prefix (e.g., "-- |" and "--" for
+      # haddock).
+      def self.denoted_comments(start_prefix, continue_prefix)
+        # Ruby coverage somehow barfs if we inline this. Go figure.
+        start_transition = {
+          "pattern" => "comment_start_#{start_prefix}",
+          "next_state" => "comment_continue_#{continue_prefix}",
+          "kind" => "comment"
+        }
+        return {
+          "syntax" => {
+            "patterns" => {
+              "comment_start_#{start_prefix}" => { "regexp" => "^(\\s*)#{start_prefix}\\s?(.*)$" },
+              "comment_continue_#{continue_prefix}" => { "regexp" => "^(\\s*)#{continue_prefix}\\s?(.*)$" },
+            },
+            "states" => {
+              "start" => {
+                "transitions" => [ start_transition, [] ],
+              },
+              "comment_continue_#{continue_prefix}" => {
+                "transitions" => [ {
+                    "pattern" => "comment_continue_#{continue_prefix}",
+                    "kind" => "comment" },
+                  { "next_state" => "start" }
+                ],
+              },
+            },
+          },
+        }
+      end
+      # }}}
+      # {{{ Delimited comment classification configurations
+      # Classify delimited comment lines. It accepts a restricted format: each
+      # comment is expected to start with some exact prefix (e.g. "/*" for C
+      # style comments or "<!--" for HTML style comments). The following space,
+      # if any, is stripped from the payload. Following lines are also considered
+      # comments; a leading inner line prefix (e.g., " *" for C style comments or
+      # " -" for HTML style comments) with an optional following space are
+      # stripped from the payload. Finally, a line containing some exact suffix
+      # (e.g. "*/" for C style comments, or "-->" for HTML style comments) ends
+      # the comment. A one line comment format is also supported containing the
+      # prefix, the payload, and the suffix. As a convenience, comment that
+      # starts with "!" is not taken to start a comment. This allows protecting
+      # comment block you wish to avoid being classified as a comment.
+      #
+      # This configuration is typically complemented by an additional one
+      # specifying how to format the (stripped!) comments; by default they are
+      # just displayed as-is using an HTML +pre+ element, which isn't very
+      # useful.
+      CLASSIFY_DELIMITED_COMMENTS = lambda do |prefix, inner, suffix|
+        return Comments.delimited_comments(prefix, inner, suffix)
+      end
+      # Classify delimited C ("/*", " *", " */") style comments.
+      CLASSIFY_C_COMMENTS = lambda do
+        # Since the prefix/inner/suffix passed to the configuration are regexps,
+        # we need to escape special characters such as "*".
+        return Comments.delimited_comments("/\\*", " \\*", " \\*/")
+      end
+      # Classify delimited HTML ("<!--", " -", "-->") style comments.
+      CLASSIFY_HTML_COMMENTS = lambda do
+        return Comments.delimited_comments("<!--", " -", "-->")
+      end
+      # Configuration for classifying lines to comments and code based on a
+      # delimited start prefix, inner line prefix and final suffix (e.g., "/*", "
+      # *", " */" for C-style comments or "<!--", " -", "-->" for HTML style
+      # comments).
+      def self.delimited_comments(prefix, inner, suffix)
+        return {
+          "syntax" => {
+            "patterns" => {
+              "comment_prefix_#{prefix}" => { "regexp" => "^(\\s*)#{prefix}(?!!)\\s?(.*)$" },
+              "comment_inner_#{inner}" => { "regexp" => "^(\\s*)#{inner}\\s?(.*)$" },
+              "comment_suffix_#{suffix}" => { "regexp" => "^(\\s*)#{suffix}\\s*$" },
+              "comment_line_#{prefix}_#{suffix}" => { "regexp" => "^(\\s*)#{prefix}(?!!)\s?(.*?)\s*#{suffix}\\s*$" },
+            },
+            "states" => {
+              "start" => {
+                "transitions" => [
+                  { "pattern" => "comment_line_#{prefix}_#{suffix}",
+                    "kind" => "comment" },
+                  { "pattern" => "comment_prefix_#{prefix}",
+                    "kind" => "comment",
+                    "next_state" => "comment_#{prefix}" },
+                  [],
+                ],
+              },
+              "comment_#{prefix}" => {
+                "transitions" => [
+                  { "pattern" => "comment_suffix_#{suffix}",
+                    "kind" => "comment",
+                    "next_state" => "start" },
+                  { "pattern" => "comment_inner_#{inner}",
+                    "kind" => "comment" },
+                ],
+              },
+            },
+          },
+        }
+      end
+      # }}}
+      # {{{ Comment formatting configurations
+      # Format comments as HTML pre elements. Is used to complement a
+      # configuration that classifies some lines as +comment+.
+      FORMAT_PRE_COMMENTS = {
+        "formatters" => {
+          "comment" => "Formatter.lines_to_pre_html(lines, :class => :comment)",
+        },
+      }
+      # Format comments that use the RDoc notation. Is used to complement a
+      # configuration that classifies some lines as +comment+.
+      FORMAT_RDOC_COMMENTS = {
+        "formatters" => {
+          "comment" => "Formatter.markup_lines_to_html(lines, Codnar::RDoc, 'rdoc')",
+          "unindented_html" => "Formatter.unindented_lines_to_html(lines)",
+        },
+      }
+      # Format comments that use the Markdown notation. Is used to complement a
+      # configuration that classifies some lines as +comment+.
+      FORMAT_MARKDOWN_COMMENTS = {
+        "formatters" => {
+          "comment" => "Formatter.markup_lines_to_html(lines, Markdown, 'markdown')",
+          "unindented_html" => "Formatter.unindented_lines_to_html(lines)",
+        },
+      }
+      # Format comments that use the Haddock notation. Is used to complement a
+      # configuration that classifies some lines as +comment+.
+      FORMAT_HADDOCK_COMMENTS = {
+        "formatters" => {
+          "comment" => "Formatter.markup_lines_to_html(lines, Haddock, 'haddock')",
+          "unindented_html" => "Formatter.unindented_lines_to_html(lines)",
+        },
+      }
+      # }}}
+    end
+  end
+end