RubyGems - mailparser - Versions diffs - 0.4.22a → 0.5.0.beta1 - Mend

mailparser 0.4.22a → 0.5.0.beta1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (26) hide show

data/README.txt +2 -6
data/lib/mailparser/conv_charset.rb +40 -19
data/lib/mailparser/error.rb +1 -0
data/lib/mailparser/loose.rb +22 -20
data/lib/mailparser/rfc2045/parser.rb +140 -140
data/lib/mailparser/rfc2045/scanner.rb +15 -14
data/lib/mailparser/rfc2045.rb +1 -0
data/lib/mailparser/rfc2047.rb +26 -37
data/lib/mailparser/rfc2183/parser.rb +2 -1
data/lib/mailparser/rfc2183/scanner.rb +1 -0
data/lib/mailparser/rfc2183.rb +1 -0
data/lib/mailparser/rfc2231.rb +6 -5
data/lib/mailparser/rfc2822/parser.rb +584 -544
data/lib/mailparser/rfc2822/scanner.rb +21 -21
data/lib/mailparser/rfc2822.rb +1 -0
data/lib/mailparser.rb +83 -209
data/test/test_loose.rb +17 -8
data/test/test_mailparser.rb +88 -183
data/test/test_rfc2045.rb +1 -1
data/test/test_rfc2047.rb +35 -13
data/test/test_rfc2183.rb +1 -1
data/test/test_rfc2822.rb +6 -2
metadata +22 -9
data/HISTORY +0 -141
data/lib/mailparser/obsolete.rb +0 -403
data/test/test_obsolete.rb +0 -615

data/HISTORY DELETED Viewed

@@ -1,141 +0,0 @@
-== 0.4.22 2010-06-11 ==
-* 添付ファイルの末尾の改行コードを削除してしまっていたバグを修正
-== 0.4.21 2010-06-03 ==
-* :use_file オプション追加
-* 省メモリ化
-* Content-Type が text の時にだけ charset 変換するように修正
-== 0.4.20 2010-05-04 ==
-* ヘッダが8bit文字を含む場合に MailParser::Header#each が落ちることがあるバグを修正
-== 0.4.19 2009-05-12 ==
- * Received に \v が含まれていると無限ループになっていたバグを修正
-   cf. http://redmine.ruby-lang.org/issues/show/1196
-== 0.4.18 2009-01-27 ==
- * トークンが非常に多い文字列のパースに時間がかかっていた
-== 0.4.17 2008-11-26 ==
- * 閏秒が 23:59:60 しか考慮されていなかったバグを修正
-== 0.4.16 2008-07-28 ==
- * uuencode エンコーディングに対応
-== 0.4.15 2008-06-25 ==
- * 「group:, hoge@example.com;」の形式で落ちていたバグを修正
-== 0.4.14 2008-04-03 ==
- * 値がない Received ヘッダで落ちるバグを修正
- * Header#[] で nil 値は返さないように修正
- * Content-Type の subtype がない場合、type が text 以外でも plain になっていた。"" を返すように修正
- * 値がない Keywords ヘッダが [nil] を返していた。[] を返すように修正
-== 0.4.13 2008-03-09 ==
- * Quoted-Printable のデコード時に、= の後に改行が連続していると行が結合してしまっていたバグを修正
-== 0.4.12 2008-01-24 ==
- * Message-Id が不正な形式の場合、空白文字を含んだIDを返していた
-== 0.4.11 2008-01-15 ==
- * 日付の判定を厳密にした
-== 0.4.10 2007-11-06 ==
- * Content-Transfer-Encoding が空の場合に落ちるバグを修正
-== 0.4.9 2007-10-04 ==
- * :charset_converter 追加
-== 0.4.8 2007-09-17 ==
- * < > がない Message-Id で落ちていた
- * Message#body_preconv 追加
-== 0.4.7 2007-08-29 ==
- * boundary 文字列の直前の行の改行コードを無視しないといけなかった (RFC2046)
- * Message.new の第一引数に渡すオブジェクトに必要なメソッドを each_line から gets に変更
- * Content-Type ヘッダがない場合/charset パラメータがない場合、Message#charset が nil を返すように変更
- * racc コンパイル済み .rb ファイルを同梱
- * MailSuite::Message が String も受け付けるようにした
-== 0.4.6 2007-08-09 ==
- * :keep_raw 指定時、ヘッダの継続行が二重になっていた
-== 0.4.5 2007-08-07 ==
- * :decode_mime_filename 指定時、Content-Type も Content-Disposition もない場合に、落ちていたバグを修正
- * 空の Content-Type, Content-Disposition でエラーになっていた
- * ヘッダ行のみで区切りの空行がないパートを正しく扱えてなかった
- * Message#raw の効率化
- * Obsolete: RFC2231形式の添付ファイル名が正しく取得できないことがあった
-== 0.4.4 2007-08-06 ==
- * :keep_raw オプション追加。
- * Message#raw メソッド追加。
-== 0.4.3 2007-06-01 ==
- * ヘッダと本文の間の区切りの空行が無い場合にエラーになっていたバグを修正
- * RFC2231 パラメータが不正な時、strict=false でも ParseError になっていた
- * In-Reply-To, References ヘッダを正しくパースできなかったバグを修正
-== 0.4.2 2007-03-20 ==
- * 添付ファイルがネストされていた場合、その次の添付ファイルを取り出せなかったバグを修正。
-== 0.4.1a 2007-03-06 ==
- * 64bit環境や非JST環境でもテストが通るようにテストコードを変更。
-== 0.4.1 2007-03-03 ==
- * ドキュメントの誤記修正。
- * パース結果オブジェクトに raw メソッド追加。
- * :extract_message_type を :text_body_only, :skip_body よりも優先するように変更。
- * :output_charset 指定時にRFC2232形式のファイル名の charset が変換されていなかった
- * Content-Type, Content-Disposition のパラメータが未知の charset でエンコーディングされていた場合に落ちていたバグを修正。
-== 0.4 2007-01-16 ==
- * イチから作りなおした。0.3 とは互換なし。
-== 0.3.9 2006-07-10 ==
- * text_body_only が true で Content-Type ヘッダがない場合に、メールの本文をないものとして扱っていたバグを修正。
-== 0.3.8 2006-03-30 ==
- * RFC 2231 に対応。
-== 0.3.7 2006-03-11 ==
- * From, To, Cc の行末が「\」の時に無限ループしていたバグを修正。
-== 0.3.6 2005-10-14 ==
- * phrase内に「&lt;」「&gt;」「(」「)」があった時にメールアドレスの取得に失敗するバグを修正。
- * 「(」「)」「\(」「\)」が多く存在する行のパースに長時間かかるバグを修正。
- * quoted-string内の「(～)」を除去してしまうバグを修正。
-== 0.3.5 2005-06-08 ==
- * 本文がHTMLで添付ファイルがついている場合に、添付ファイルが認識されないバグを修正。
-== 0.3.4 2005-05-02 ==
- * Date へッダの日付が UNIX 時刻の範囲外の時に落ちるバグを修正。
-== 0.3.3 2005-03-31 ==
- * output_charset に nil を指定した時にコード変換しないようにした。
-== 0.3.2 2005-03-01 ==
- * From,To,Ccへッダに奇数個の「"」があると無限ループになるバグを修正。
-== 0.3.1 2005-02-21 ==
- * extract_message_type=() を追加。
-== 0.3 2005-01-28 ==
- * 最初の text/* を :body にするのをやめた。
- * :header を Hash に変更。
- * :rawheader 追加。
- * Uconv ではなく NKF を使用するようにした。
-== 0.2.1 2005-01-28 ==
- * 添付ファイルの Content-Type: が multipart/* の時に処理していなかったバグを修正。
- * Content-Type が message/* の時、:body が nil ではなく "" になっていたバグを修正。
- * 最初の行が空白で始まっていると落ちるバグを修正。
-== 0.2 2005-01-06 ==
- * UTF-8対応
- * output_charset=(), text_body_only=() を追加
- * Test::Unit を使用
-== 0.1 2004/11/02 ==
- * 公開

data/lib/mailparser/obsolete.rb DELETED Viewed

@@ -1,403 +0,0 @@
-# Copyright (C) 2003-2010 TOMITA Masahiro
-# mailto:tommy@tmtm.org
-require "nkf"
-require "date"
-module MailParser
-  @@output_charset = "euc-jp"
-  @@text_body_only = false
-  @@extract_message_type = true
-  ConvertMethods = {
-    "JE" => :jistoeuc,
-    "SE" => :sjistoeuc,
-    "UE" => :utf8toeuc,
-    "EU" => :euctoutf8,
-    "SU" => :sjistoutf8,
-    "JU" => :jistoutf8,
-  }
-  Charsets = {
-    "iso-2022-jp" => "J",
-    "euc-jp"      => "E",
-    "shift_jis"   => "S",
-    "sjis"        => "S",
-    "x-sjis"      => "S",
-    "utf-8"       => "U",
-    "us-ascii"    => "N",
-  }
-  module_function
-  def euctoutf8(s)
-    NKF.nkf("-m0Ewx", s)
-  end
-  def sjistoutf8(s)
-    NKF.nkf("-m0Swx", s)
-  end
-  def jistoutf8(s)
-    NKF.nkf("-m0Jwx", s)
-  end
-  def sjistoeuc(s)
-    NKF.nkf("-m0Sex", s)
-  end
-  def jistoeuc(s)
-    NKF.nkf("-m0Jex", s)
-  end
-  def utf8toeuc(s)
-    NKF.nkf("-m0Wex", s)
-  end
-  def output_charset=(c)
-    @@output_charset = c
-  end
-  def text_body_only=(f)
-    @@text_body_only = f
-  end
-  def extract_message_type=(f)
-    @@extract_message_type = f
-  end
-  def b64_hdecode(str)
-    str.unpack("m")[0]
-  end
-  def b64_decode(str)
-    str.unpack("m")[0]
-  end
-  def qp_hdecode(str)
-    str.gsub("_", " ").gsub(/=([0-9A-F][0-9A-F])/no) do $1.hex.chr end
-  end
-  def qp_decode(str)
-    str.gsub(/[ \t]+$/no, "").gsub(/=\r?\n/no, "").
-      gsub(/=([0-9A-F][0-9A-F])/no) do $1.hex.chr end
-  end
-  def mdecode_token(s)
-    if s !~ /\A=\?([a-z0-9_-]+)\?(Q|B)\?([^?]+)\?=\Z/nio then
-      s
-    else
-      charset, encoding, text = $1, $2, $3
-      fc = MailParser::Charsets[charset.downcase]
-      if fc == nil then return s end
-      if encoding.downcase == 'q' then
-        s2 = qp_hdecode(text)
-      else
-        s2 = b64_hdecode(text)
-      end
-      tc = @@output_charset && MailParser::Charsets[@@output_charset.downcase]
-      if fc == "N" or tc.nil? or fc == tc then return s2 end
-      MailParser.send(MailParser::ConvertMethods[fc+tc], s2)
-    end
-  end
-  def mime_header_decode(str)
-    return str.gsub(/\s+/no, " ").gsub(/\?=\s+=\?/no, "?==?").gsub(/=\?[a-z0-9_-]+\?(Q|B)\?[^?]+\?=/nio){mdecode_token $&}
-  end
-  def trunc_comment(v)
-    ret = ""
-    after = v
-    while not after.empty? and after =~ /^(\\.|\"(\\.|[^\\\"])*\"|[^\\\(])*/no do
-      ret << $&
-      after = $'
-      if after =~ /^\(/no then
-        a = trunc_comment_sub(after[1..-1])
-        if a == nil then
-          return ret+after
-        end
-        after = a
-      end
-      if after == "\\" then
-        break
-      end
-    end
-    ret+after
-  end
-  def trunc_comment_sub(orig)
-    after = orig
-    loop do
-      if after =~ /^(\\.|[^\\\(\)])*/no then
-        after = $'
-      end
-      if after =~ /^\)/no then
-        return after[1..-1]
-      end
-      if after =~ /^\(/no then
-        after = trunc_comment_sub(after[1..-1])
-        if after == nil then
-          return nil
-        end
-        next
-      end
-      return nil
-    end
-  end
-  def split_address(v)
-    a = []
-    r = ""
-    while not v.empty? do
-      if v =~ /^(\s+|[0-9A-Za-z\!\#\$\%\&\'\*\+\-\/\=\?\^\_\`\{\|\}\~]+|\"(\\.|[^\\\"])*\")/ then
-        r << $&
-        v = $'
-      elsif v[0] == ?, then
-        a << r.strip
-        r = ""
-        v.slice!(0,1)
-      else
-        r << v.slice!(0,1)
-      end
-    end
-    a << r.strip
-    return a
-  end
-  def get_mail_address(v)
-    v = trunc_comment(v)
-    a = split_address(v)
-    return a.map{|i| i.strip =~ /<([^<>]*)>$/ ? $1 : i.strip}
-  end
-  def get_date(s)
-    if s =~ /^[A-Z][A-Z][A-Z]\s*,\s*/i then
-      s = $'
-    end
-    d = ::DateTime._strptime(s, "%d %b %Y %X")
-    return unless d
-    Time.mktime(d[:year], d[:mon], d[:mday], d[:hour], d[:min], d[:sec]) rescue nil
-  end
-  def parse_content_type(str)
-    hash = {}
-    hash[:parameter] = {}
-    if str.strip =~ /^([a-z0-9_-]+)(?:\/([a-z0-9_-]+))?\s*/nio then
-      hash[:type] = $1.downcase
-      hash[:subtype] = $2.downcase if $2
-      params = $'	#'
-      pending = {}
-      while true do
-        if params =~ /\A\s*;\s*([a-z0-9_-]+)(?:\*(\d+))?\s*=\s*(?:\"((?:\\\"|[^\"])*)\"|([^\s\(\)\<\>\@\,\;\:\\\"\/\[\]\?\=]*))\s*/nio then
-          pn, ord, pv = $1, $2, $3||$4
-          params = $'
-          if ord then
-            pending[pn] = [] unless pending.key? pn
-            pending[pn] << [ord.to_i, pv]
-          else
-            hash[:parameter][pn.downcase] = pv
-          end
-        elsif params =~ /\A\s*;\s*([a-z0-9_-]+)\*\s*=\s*([a-z0-9_-]+)?\'(?:[a-z0-9_-]+)?\'(?:\"((?:\\\"|[^\"])*)\"|([^\s\(\)\<\>\@\,\;\:\\\"\/\[\]\?\=]*))\s*/nio then
-          pn, charset, pv = $1, $2, $3||$4
-          params = $'
-          pending[pn] = [[0, pv, charset, true]]
-        elsif params =~ /\A\s*;\s*([a-z0-9_-]+)\*0\*\s*=\s*([a-z0-9_-]+)?\'(?:[a-z0-9_-]+)?\'(?:\"((?:\\\"|[^\"])*)\"|([^\s\(\)\<\>\@\,\;\:\\\"\/\[\]\?\=]*))\s*/nio then
-          pn, charset, pv = $1, $2, $3||$4
-          params = $'
-          pending[pn] = [[0, pv, charset, true]]
-        elsif params =~ /\A\s*;\s*([a-z0-9_-]+)\*(\d+)\*\s*=\s*(?:\"((?:\\\"|[^\"])*)\"|([^\s\(\)\<\>\@\,\;\:\\\"\/\[\]\?\=]*))\s*/nio then
-          pn, ord, pv = $1, $2, $3||$4
-          params = $'
-          pending[pn] = [] unless pending.key? pn
-          pending[pn] << [ord.to_i, pv, nil, true]
-        else
-          break
-        end
-      end
-      pending.each do |pn, pv|
-        pv = pv.sort{|a,b| a[0]<=>b[0]}
-        charset = pv[0][2]
-        v = pv.map{|a|a[3] ? a[1].gsub(/%([0-9A-F][0-9A-F])/nio){$1.hex.chr} : a[1]}.join
-        fc = MailParser::Charsets[charset.downcase] if charset
-        tc = @@output_charset && MailParser::Charsets[@@output_charset.downcase]
-        if fc and fc != "N" and fc != tc then
-          v = MailParser.send(MailParser::ConvertMethods[fc+tc], v)
-        end
-        hash[:parameter][pn.downcase] = v
-      end
-    end
-    return hash
-  end
-  def parse_content_disposition(str)
-    return parse_content_type(str)
-  end
-  def parse_message(msg)
-    class << msg
-      def _each_with_multiple_delimiter(delim=[])
-        @found_boundary = false
-        loop do
-          @l = gets
-          if @l == nil then
-            return
-          end
-          ll = @l.chomp
-          if delim.include? ll then
-            @found_boundary = true
-            return
-          end
-          yield @l
-        end
-      end
-      def last_line()
-        @l && @l.chomp
-      end
-      attr_reader :found_boundary
-    end
-    m = parse_message2(msg)
-    class << m
-      def to_s()
-        return <<EOS
-From: #{self[:from].join(",")}
-To: #{self[:to].join(",")}
-Subject:#{self[:subject]}
-Date: #{self[:date]}
-#{self[:body]}
-#{if self[:parts] then self[:parts].map{|p| "[#{p[:type]}/#{p[:subtype]}]<#{p[:filename]}>"}.join("\n") end}
-EOS
-      end
-    end
-    return m
-  end
-  def parse_message2(msg, boundary=[])
-    ret = parse_header(msg, boundary)
-    return ret if msg.found_boundary
-    if ret[:type] == "message" and @@extract_message_type then
-      m = parse_message2(msg, boundary)
-      ret[:message] = m
-    elsif ret[:multipart] and ret[:boundary] then
-      parts = []
-      b = ret[:boundary]
-      bd = boundary + ["--"+b+"--", "--"+b]
-      msg._each_with_multiple_delimiter(bd) do end	# skip preamble
-      while msg.last_line == bd[-1] do
-        m = parse_message2(msg, bd)
-        parts << m
-      end
-      if msg.last_line == bd[-2] then
-        msg._each_with_multiple_delimiter(boundary) do end
-      end
-      ret[:parts] = parts
-    else
-      if not @@text_body_only or ret[:type] == "text" or ret[:type].nil? then
-        body = ""
-        msg._each_with_multiple_delimiter(boundary) do |l|
-          body << l
-        end
-        ret[:body] = decode_body(body, ret[:encoding], ret[:charset])
-      else
-        msg._each_with_multiple_delimiter(boundary) do end
-      end
-    end
-    return ret
-  end
-  def parse_header(msg, boundary=[])
-    ret = {}
-    raw = ""
-    header = []
-    msg._each_with_multiple_delimiter(boundary) do |l|
-      l.chomp!
-      break if l.empty?
-      raw << l+"\n"
-      if l =~ /^\s/no and not header.empty? then
-        header[-1] << l
-      elsif not l.include? ":"
-        next			# skip garbage
-      else
-        header << l
-      end
-    end
-    from = []
-    to = []
-    cc = []
-    date = nil
-    subject = ""
-    encoding = ct = charset = multipart = body = filename = bd = nil
-    h = {}
-    header.each do |str|
-      hn, hb = str.split(/:\s*/no, 2)
-      hn.downcase!
-      h[hn] = [] unless h.key? hn
-      h[hn] << mime_header_decode(hb)
-      case hn.downcase
-      when "from"
-        from.concat get_mail_address(hb)
-      when "to"
-        to.concat get_mail_address(hb)
-      when "cc"
-        cc.concat get_mail_address(hb)
-      when "date"
-        date = get_date(hb)
-      when "subject"
-        subject.concat hb
-      when "content-type"
-        ct = parse_content_type(hb)
-        if ct[:type] == "text" then
-          charset = ct[:parameter]["charset"]
-        elsif ct[:type] == "multipart" then
-          multipart = true
-          bd = ct[:parameter]["boundary"]
-        end
-        filename = mime_header_decode(ct[:parameter]["name"]) if ct[:parameter]["name"]
-      when "content-disposition"
-        cd = parse_content_disposition(hb)
-        filename = mime_header_decode(cd[:parameter]["filename"]) if cd[:parameter]["filename"]
-      when "content-transfer-encoding"
-        encoding = hb.strip.downcase
-      end
-    end
-    ret[:from] = from
-    ret[:to] = to
-    ret[:cc] = cc
-    ret[:date] = date
-    ret[:subject] = mime_header_decode subject
-    if ct then
-      ret[:type] = ct[:type].downcase if ct[:type]
-      ret[:subtype] = ct[:subtype].downcase if ct[:subtype]
-      ret[:charset] = charset.downcase if charset
-    end
-    ret[:encoding] = encoding if encoding
-    ret[:multipart] = multipart
-    ret[:boundary] = bd
-    ret[:filename] = filename if filename
-    ret[:header] = h
-    ret[:rawheader] = raw
-    return ret
-  end
-  def decode_body(body, encoding, charset)
-    case encoding
-    when "base64"
-      body = b64_decode body
-    when "quoted-printable"
-      body = qp_decode body
-    end
-    if charset == nil then return body end
-    fc = MailParser::Charsets[charset.downcase]
-    if fc == nil then return body end
-    tc = @@output_charset && MailParser::Charsets[@@output_charset.downcase]
-    if fc == "N" or tc.nil? or fc == tc then return body end
-    MailParser.send(MailParser::ConvertMethods[fc+tc], body)
-  end
-end