RubyGems - giblish - Versions diffs - 0.3.1 → 0.4.0 - Mend

giblish 0.3.1 → 0.4.0

Files changed (16) hide show

checksums.yaml +4 -4
data/.ruby-version +1 -1
data/README.adoc +32 -11
data/Rakefile +3 -1
data/giblish.gemspec +1 -1
data/lib/giblish-search.rb +444 -0
data/lib/giblish/buildgraph.rb +1 -1
data/lib/giblish/buildindex.rb +58 -31
data/lib/giblish/cmdline.rb +27 -0
data/lib/giblish/core.rb +66 -7
data/lib/giblish/docid.rb +3 -27
data/lib/giblish/docinfo.rb +32 -17
data/lib/giblish/indexheadings.rb +163 -0
data/lib/giblish/utils.rb +59 -3
data/lib/giblish/version.rb +1 -1
metadata +7 -6

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 963e7aeee8afe72afe253d10a5ef3f9fb8538af3668838d18ac6539d388f64a1
-  data.tar.gz: 32916267003809333978cb261e459795b14d36e29e4d2f199ab21b9a02ba791a
+  metadata.gz: a6a7fc123856a6321bbad8b7f8f527490a09faa9df0ebeeb1a984bbda4b22d78
+  data.tar.gz: 2788cc7789ed4046307d9d4a7ad75be7f7b0f12c30a1b559500b18166602e966
 SHA512:
-  metadata.gz: d0c13cd9f84c9bba7a241eb156b843145026b9e199c13255a518e487fc62c1968417c7482445457750c1c825b4b671ee80076459442f24d561a1eb670f46843c
-  data.tar.gz: '084d83eb182a093f5b1dff2bbfd0074b6d3d9f4657b7661c7395dd4fdf1e7755d88d3342f3d1a40fc529e798543390d9e7ffefcc702a51cbbe5ecb00c7e6106c'
+  metadata.gz: 2ea03abe46dd0e23c06266e8b1a9bc444811931825a2155e9545ae911d9527fd7ad993d18287fd2ad5b7ce6a0e51ce40421d2c01454ec357161300969d7bdf57
+  data.tar.gz: 308d7306d39641442e17fa7b5a1687e8f7ae473e438c6eabf3ffbb3650521b091e15a8b0ed36700cb41cad917fbcd9c3f95646b38d2d4199c112eaa75b00ca7b

data/.ruby-version CHANGED

	@@ -1 +1 @@
1	- 2.5.3
1	+ 2.6.1

data/README.adoc CHANGED

@@ -7,17 +7,31 @@ image::https://travis-ci.com/rillbert/giblish.svg?branch=master[build status]
 giblish is used to convert a source directory tree containing AsciiDoc files to
 a destination directory tree containing the corresponding html or pdf files
-and add a handy index page for the converted files.
-If the source directory tree is part of a git repository, giblish can generate
-separate html/pdf trees for branches and/or tags that match a user specified
-regexp (see examples below).
+and add some handy tools for easier navigation of the resulting files.
+The tools include:
+ * An index page listing all rendered documents with clickable links
+ * Document ids - Note: the implementation of this is giblish-specific and thus
+   you need to render your adoc files using giblish to make this work as intended.
+   You can use document ids to:
+ ** Reference one doc in the source tree from another doc without depending on file
+    names or relative paths. The referenced doc can thus be moved within the sourc
+    tree or change its file name and the reference will still be valid.
+ ** Validate doc id references during document rendering and thus be alerted to
+    any invalid doc id references.
+ ** Let giblish generate a clickable graph of all document references (requires
+    graphviz and the 'dot' tool).
+ * A (stripped-down but nonetheless useful) text-search of your documents (requires
+   that you view your docs via a web-server.
+ * If the source directory tree is part of a git repository, giblish can generate
+   separate html/pdf trees for branches and/or tags that match a user specified
+   regexp (see examples below).
 == Dependencies and credits
-Giblish is basically a wrapper (with some extra candy) around the awesome
-*asciidoctor* and *asciidoctor-pdf* projects. Thank you @mojavelinux and others for
-making these brilliant tools available!!
+Giblish uses the awesome *asciidoctor* and *asciidoctor-pdf* projects under the hood.
+Thank you @mojavelinux and others for making these brilliant tools available!!
 == Installation
@@ -119,13 +133,14 @@ A summary page containing links to all branches will be generated directly in
 the `my_dst_root` dir.
 ====
-.Advanced usage; Publish a static html site from a git repo
+.Advanced usage; Publish a static html site from a git repo with search capabilities
 ====
 giblish can be used to inject a tree of html docs suitable for serving via a web
 server (e.g. Apache). Below is an example how to create such a tree. If you
 combine this with a server side git hook that invokes this script after push,
 you will have a way of auto publish your latest documents and/or documents at
-specific git tags. In principle a poor-mans document managing system.
+specific git tags. A document management system including nice index pages and
+text search capabilities
 Assumptions:
@@ -142,7 +157,7 @@ Assumptions:
  * You want to publish the documentation as it looked for your release tags
    myprod-v1.0-final, myprod-v2.0-final, ...
- giblish -t "-final$" -r ~/gh/myrepo/common/resources -s mylayout -w /var/www/html ~/gh/myrepo/common/Documents /var/www/html/proddocs
+ giblish -m -t "-final$" -r ~/gh/myrepo/common/resources -s mylayout -w /var/www/html ~/gh/myrepo/common/Documents /var/www/html/proddocs
 The above will create a tree of html docs under `/var/www/html/proddocs`. Each
 tag will get its own subdir (e.g. `/var/www/html/proddocs/myprod_v1.0_final`).
@@ -152,4 +167,10 @@ subdir and also to the .../proddocs dir.
 The `-w` switch above will strip the `/var/www/html` from the css link so that
 the paths to the css will be correct in the context of the serving of the
 pages via the web server.
+The `-m` switch above will build a database (JSON file) with enough information
+to enable a cgi-script to provide a text-search capability for your users. The
+cgi-script must be located at http://your_web_site.com/cgi-bin/giblish-search.cgi
+and this gem provides a default implementation that you can copy from the .../lib
+folder to the correct destination.
 ====

data/Rakefile CHANGED

@@ -10,7 +10,9 @@ end
 Rake::TestTask.new(:current) do |t|
   t.libs << "test"
   t.libs << "lib"
-  t.test_files = FileList["test/**/depgraph_test.rb"]
+  t.test_files = FileList["test/**/docid_test.rb"]
+#  t.test_files = FileList["test/**/index_heading_test.rb"]
+#  t.test_files = FileList["test/**/depgraph_test.rb"]
 end
 Rake::TestTask.new(:sandbox) do |t|

data/giblish.gemspec CHANGED

@@ -35,7 +35,7 @@ Gem::Specification.new do |spec|
   spec.add_development_dependency "rake", "~> 10.0"
   # Usage: spec.add_runtime_dependency "[gem name]", [[version]]
-  spec.add_runtime_dependency "asciidoctor", "~>1.5", ">= 1.5.7.1"
+  spec.add_runtime_dependency "asciidoctor", "~>1.5", ">= 1.5.8"
   spec.add_runtime_dependency "asciidoctor-diagram", ["~> 1.5"]
   spec.add_runtime_dependency "asciidoctor-pdf", [">= 1.5.0.alpha.16"]
   spec.add_runtime_dependency "asciidoctor-rouge", ["~> 0.3"]

data/lib/giblish-search.rb ADDED

@@ -0,0 +1,444 @@
+#!/usr/bin/env ruby
+require "pathname"
+require "json"
+require "asciidoctor"
+require "open3"
+require "cgi"
+require "uri/generic"
+class GrepDocTree
+  Line_info = Struct.new(:line, :line_no) {
+    def initialize(line,line_no)
+      self.line = line
+      self.line_no = Integer(line_no)
+    end
+  }
+  # grep_opts:
+  # :search_top
+  # :search_phrase
+  # :ignorecase
+  # :useregexp
+  def initialize(grep_opts)
+    @grep_opts = "-nHr --include '*.adoc' "
+    @grep_opts += "-i " if grep_opts.has_key? :ignorecase
+    @grep_opts += "-F " unless grep_opts.has_key? :useregexp
+    @search_root = grep_opts[:search_top]
+    @input = grep_opts[:search_phrase]
+    @output = ""
+    @error = ""
+    @status = 0
+    @match_index = {}
+  end
+  def grep
+    # This console code sequence will only show the matching word in bold ms=01:mc=:sl=:cx=:fn=:ln=:bn=:se=
+    grep_env="GREP_COLORS=\"ms=01:mc=:sl=:cx=:fn=:ln=:bn=:se=\""
+    @grep_opts += " --color=always"
+    @output, @error, @status = Open3.capture3("#{grep_env} grep #{@grep_opts} \"#{@input}\" #{@search_root}")
+    begin
+      @output.force_encoding(Encoding::UTF_8)
+      @output.gsub!(/\x1b\[01m\x1b\[K/,"##")
+      @output.gsub!(/\x1b\[m\x1b\[K/,"##")
+    rescue StandardError => e
+      print e.message
+      print e.backtrace.inspect
+      exit 0
+    end
+    grep2hash @search_root
+  end
+  # returns an indexed output where each match from the search is associated with the
+  # corresponding src file's closest heading.
+  # the format of the output:
+  # {html_filename#heading : [line_1, line_2, ...], ...}
+  #
+  # The heading_db has the following JSON format
+  # {
+  #   file_infos : [{
+  #     filepath : filepath_1,
+  #     title : Title,
+  #     sections : [{
+  #       id : section_id_1,
+  #       title : section_title_1,
+  #       line_no : line_no
+  #     },
+  #     {
+  #       id : section_id_1,
+  #       title : section_title_1,
+  #       line_no : line_no
+  #     },
+  #     ...
+  #     ]
+  #   },
+  #   {
+  #     filepath : filepath_1,
+  #     ...
+  #   }]
+  # }
+  def match_with_headings heading_db
+    matches = []
+    # for each file with at least one match
+    @match_index.each do |file_path,match_infos|
+      # assume that max one file with the specified path
+      # exists
+      files = heading_db["file_infos"].select do |fi|
+        fi["filepath"] == file_path.to_s
+      end
+      next if files.empty?
+      file_anchors = construct_user_info files.first, match_infos
+      matches << file_anchors
+      end
+    matches
+  end
+  # Produce a hash with all info needed for the user to navigate to the
+  # matching html section for all matches to the file in the supplied file
+  # info hash.
+  #
+  # format of the resulting hash:
+  # {
+  #   filepath : Filepath,
+  #   title : Title,
+  #   matches : {
+  #       section_id :
+  #       {
+  #         section_title : Section Title,
+  #         location : Location,
+  #         lines : [line_1, line_2, ...]
+  #       }
+  #     }
+  #   ]
+  # }
+  #
+  def construct_user_info file_info, match_infos
+    matches = {}
+    file_anchors = {
+        "filepath" => file_info["filepath"],
+        "title" => file_info["title"],
+        "matches" => matches
+    }
+    match_infos.each do |match_info|
+      match_line_nr = match_info.line_no
+      # find section with closest lower line_no to line_info
+      best_so_far = 0
+      chosen_section_info = {}
+      file_info["sections"].each do |section_info|
+        l = Integer(section_info["line_no"])
+        if l <= match_line_nr && l > best_so_far
+          chosen_section_info = section_info
+        end
+      end
+      matches[chosen_section_info["id"]] =
+          {
+              "section_title" => chosen_section_info["title"],
+              "location" => "#{Pathname.new(file_info["filepath"]).sub_ext(".html").to_s}##{chosen_section_info["id"]}",
+              "lines" => []
+          } unless matches.key?(chosen_section_info["id"])
+      matches[chosen_section_info["id"]]["lines"] << match_info.line
+    end
+    file_anchors
+  end
+  def formatted_output
+    # assume we have an updated index
+    adoc_str = ""
+    @match_index.each do |k,v|
+      adoc_str += "#{k}::\n"
+      v.each { |line_info|
+        adoc_str += "#{line_info.line_no} : #{line_info.line}\n"
+      }
+    end
+    adoc_str
+  end
+  private
+  # converts the 'raw' matches from grep into a hash.
+  # i.e. from:
+  # <filename>:<line_no>:<line>
+  # <filename>:<line_no>:<line>
+  # ...
+  #
+  # to
+  # {file_path : [line_info1, line_info2, ...], ...}
+  def grep2hash(base_dir)
+    @match_index = {}
+    @output.split("\n").each do |line|
+      tokens = line.split(":",3)
+      # remove all lines starting with :<attrib>:
+      tokens[2].gsub!(/^:[[:graph:]]+:.*$/,"")
+      next if tokens[2].empty?
+      # remove everything above the repo root from the filepath
+      file_path = Pathname.new(tokens[0]).relative_path_from Pathname.new(base_dir)
+      @match_index[file_path] = [] unless @match_index.key? file_path
+      @match_index[file_path] << Line_info.new(tokens[2], tokens[1])
+    end
+  end
+end
+class SearchDocTree
+  def initialize(input_data)
+    @input_data = input_data
+  end
+  def search
+    # read the heading_db from file
+    jsonpath = @input_data[:search_top].join("heading_index.json")
+    src_index = {}
+    json = File.read(jsonpath.to_s)
+    src_index = JSON.parse(json)
+    # search the doc tree for regex
+    gt = GrepDocTree.new @input_data
+    gt.grep
+    matches = gt.match_with_headings src_index
+    format_search_adoc matches, get_uri_top
+  end
+  private
+  def get_uri_top
+    if @input_data[:gitbranch]
+      return @input_data[:referer][0,@input_data[:referer].rindex('/')]
+    end
+    return @input_data[:referer].chomp('/')
+  end
+  def wash_line line
+    # remove any '::'
+    result = line.gsub(/::*/,"")
+    # remove =,| at the start of a line
+    result.gsub!(/^[=|]+/,"")
+    result
+  end
+  # index is an array of file_info, see construct_user_info
+  # for format per file
+  # == Title (filename)
+  #
+  # <<location,section_title>>::
+  # line_1
+  # line_2
+  # ...
+  def format_search_adoc index,uri_top
+    str = ""
+    index.each do |file_info|
+      filename = Pathname.new(file_info["filepath"]).basename
+      str << "== #{file_info["title"]}\n\n"
+      file_info["matches"].each do |section_id, info |
+        str << "#{uri_top}/#{info["location"]}[#{info["section_title"]}]::\n\n"
+        # str << "<<#{info["location"]},#{info["section_title"]}>>::\n\n"
+        str << "[subs=\"quotes\"]\n"
+        str << "----\n"
+        info["lines"].each do | line |
+          str << "-- #{wash_line(line)}\n"
+        end.join("\n\n")
+        str << "----\n"
+      end
+      str << "\n"
+    end
+    <<~ADOC
+    = Search Result
+    #{str}
+    ADOC
+  end
+end
+def init_web_server web_root
+  require 'webrick'
+  root = File.expand_path web_root
+  puts "Trying to start a WEBrick instance at port 8000 serving files from #{web_root}..."
+  server = WEBrick::HTTPServer.new(
+      :Port => 8000,
+      :DocumentRoot => root,
+      :Logger => WEBrick::Log.new("webrick.log",WEBrick::Log::DEBUG)
+  )
+  puts "WEBrick instance now listening to localhost:8000"
+  trap 'INT' do server.shutdown end
+  server.start
+end
+def hello_world
+  require "pp"
+  # init a new cgi 'connection'
+  cgi = CGI.new
+  print cgi.header
+  print "<br>"
+  print "Useful cgi parameters and variables."
+  print "<br>"
+  print cgi.public_methods(false).sort
+  print "<br>"
+  print "<br>"
+  print "referer: #{cgi.referer}<br>"
+  print "path: #{URI(cgi.referer).path}<br>"
+  print "host: #{cgi.host}<br>"
+  print "client_sent_topdir: #{cgi["topdir"]}<br>"
+  print "<br>"
+  print "client_sent_reldir: #{cgi["reltop"]}<br>"
+  print "<br>"
+  print "ENV: "
+  pp ENV
+  print "<br>"
+end
+def cgi_main cgi
+  # retrieve the form data supplied by user
+  input_data = {
+      search_phrase: cgi["searchphrase"],
+      ignorecase: cgi.has_key?("ignorecase"),
+      useregexp: cgi.has_key?("useregexp"),
+      doc_root_abs: Pathname.new(cgi["topdir"]),
+      referer_rel_top: Pathname.new("/#{cgi["reltop"]}"),
+      referer: cgi.referer,
+      uri_path: URI(cgi.referer).path,
+      client_css: cgi["css"],
+      search_top: nil,
+      styles_top: nil
+  }
+  # fixup paths depending on git branch or not
+  #
+  # search_assets is an absolute path
+  # styles_top is a relative path
+  #
+  # if the source was rendered from a git branch, the paths
+  # search_assets = <index_dir>/../search_assets/<branch_name>/
+  # styles_dir = ../web_assets/css
+  #
+  # and if not, the path is
+  # search_assets = <index_dir>/search_assets
+  # styles_dir = ./web_assets/css
+  #
+  # The styles dir shall be a relative path
+  if input_data[:doc_root_abs].join("./search_assets").exist?
+    # this is not from a git branch
+    input_data[:search_top] = input_data[:doc_root_abs].join("./search_assets")
+    # input_data[:styles_top] = Pathname.new(input_data[:uri_path]).join("./web_assets/css")
+    input_data[:styles_top] = Pathname.new(input_data[:referer_rel_top]).join("web_assets/css")
+    input_data[:gitbranch] = false
+  elsif input_data[:doc_root_abs].join("../search_assets").exist?
+    # this is from a git branch
+    input_data[:search_top] = input_data[:doc_root_abs].join("../search_assets").join(input_data[:doc_root_abs].basename)
+    input_data[:styles_top] = Pathname.new(input_data[:referer_rel_top]).join("../web_assets/css")
+    input_data[:gitbranch] = true
+  else
+    raise ScriptError, "Could not find search_assets dir!"
+  end
+  # use a relative stylesheet (same as the index page was rendered with)
+  adoc_options =  {
+      "data-uri" => 1,
+      "linkcss" => 1,
+      "stylesdir" => input_data[:styles_top].to_s,
+      "stylesheet" => input_data[:client_css],
+      "copycss!" => 1
+  }
+  # search the docs and render html
+  sdt = SearchDocTree.new(input_data)
+  docstr = sdt.search
+  # send the result back to the client
+  print Asciidoctor.convert docstr, header_footer: true, attributes: adoc_options
+end
+# assume that the file tree looks like this when running
+# on a git branch:
+#
+# dst_root_dir
+# |- branch_1_top_dir
+# |     |- index.html
+# |     |- file_1.html
+# |     |- dir_1
+# |     |   |- file2.html
+# |- branch_2_top_dir
+# |- branch_x_...
+# |- web_assets
+# |- search_assets
+# |     |- branch_1_top_dir
+# |           |- heading_index.json
+# |           |- file1.adoc
+# |           |- dir_1
+# |           |   |- file2.html
+# |           |- ...
+# |     |- branch_2_top_dir
+# |           | ...
+# assume that the file tree looks like this when not
+# rendering a git branch:
+#
+# dst_root_dir
+# |- index.html
+# |- file_1.html
+# |- dir_1
+# |   |- file2.html
+# |...
+# |- web_assets (only if a custom stylesheet is used...)
+# |- search_assets
+# |     |- heading_index.json
+# |     |- file1.adoc
+# |     |- dir_1
+# |     |   |- file2.html
+# |     |- ...
+# Usage:
+#   to start a local web server for development work
+# giblish-giblish-search.rb <web_root>
+#
+#   to run as a cgi script via a previously setup web server:
+# giblish-giblish-search.rb
+#
+if __FILE__ == $PROGRAM_NAME
+  STDOUT.sync = true
+  if ARGV.length == 0
+    # 'Normal' cgi usage, as called from a web server
+    # init a new cgi 'connection' and print headers
+    cgi = CGI.new
+    print cgi.header
+    begin
+      # hello_world
+      cgi_main cgi
+    rescue Exception => e
+      print e.message
+      exit 1
+    end
+    exit 0
+  end
+  if ARGV.length == 1
+    # Run a simple web server to test this locally..
+    # and then create the html docs using:
+    # giblish -c -m -w <web_root> -r <resource_dir> -s <style_name> -g <git_branch> <src_root> <web_root>
+    init_web_server ARGV[0]
+    exit 0
+  end
+end