RubyGems - tasci_merger - Versions diffs - 0.0.0 - Mend

tasci_merger 0.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

checksums.yaml ADDED Viewed

@@ -0,0 +1,7 @@
+---
+SHA1:
+  metadata.gz: 97e75686a322ea2d9d5550c697424ae7dd74ddbf
+  data.tar.gz: f36602af907f096088e45bc49acc92af4a160bd2
+SHA512:
+  metadata.gz: 23b86daf7c2711403d6a9966c2a3a93514cbda5fa5723f7df96e715973e4dc790ed26e9f091897c475cacfd5ad269200e61edefe4b3bc7bb7bf4c3de0a5ef709
+  data.tar.gz: c737f3f05aaccc74d094908f0d82f67395a5b071b86983d1b2ed01d32b43298e236b4fa800181f3236066bf75ef07dab814c0fe4bcbc4a823c44f90c6cb48a77

data/LICENSE ADDED Viewed

@@ -0,0 +1,22 @@
+The MIT License (MIT)
+Copyright (c) 2015 Piotr Mankowski
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

data/README.md ADDED Viewed

@@ -0,0 +1,63 @@
+# Instructions for Tasci File Merger
+## Installation
+1. Install Ruby.
+  - [http://rubyinstaller.org/](http://rubyinstaller.org/)
+  - Download Ruby 2.1.5 (x64) installer.
+  - Install to desired *RUBY_DIRECTORY*
+  - Choose option to add ruby to *PATH*
+  - Choose option to associate *.rb files with Ruby
+2. Verify ruby installation.
+  - Go into command line.
+  - Enter `ruby -v`
+  - Ouput should look like `ruby 2.1.5p273...`
+  - Enter `gem -v`
+  - Output should look like `2.2.2`
+3. Fix potential RubyGems certificate bug, as documented on [this page](https://gist.github.com/luislavena/f064211759ee0f806c88)
+  - Try running `gem install activesupport --no-ri --no-rdoc`
+  - Click `Allow access` if prompted by Windows Firewall message, choosing option for *Private networks*
+  - If installation fails with an `SSL_error`, follow the following steps:
+    - Save certificate file from [this website](https://raw.githubusercontent.com/rubygems/rubygems/master/lib/rubygems/ssl_certs/AddTrustExternalCARoot-2048.pem) to the Downloads directory.
+    - **Make sure file is saved with the *.pem extension**
+    - Find rubygems folder location by typing `gem which rubygems` into the console. You should get output like `C:/Ruby21/lib/ruby/2.1.0/rubygems.rb`
+    - Locate the directory and open it in an explorer window. For the above path, the directory would be `C:\Ruby21\lib\ruby\2.1.0\rubygems`
+    - Open the `ssl_certs` directory, and copy the previously-downloaded `*.pem` file into this directory.
+    - Close and re-open a console window.
+4. Install required gems.
+  - Run:
+    ```
+      gem install activesupport --no-ri --no-rdoc
+      gem install 'tzinfo-data' --no-ri --no-rdoc
+    ```
+5. Download TASCI merger package zipfile from [Github](https://github.com/pmanko/tasci_merger) using the *Download ZIP* button.
+10. Merge TASCI files.
+6. Unpack to *package_directory*.
+7. Run **IRB** in *package_directory*.
+  - Open console
+  - Run `cd unpacked_package_directory` to navigate to package directory
+  - Run `irb` to open interactive ruby console
+8. Load package.
+  ```ruby
+    load('./tasci_merger.rb')
+  ```
+9. Generate master file list.
+  ```ruby
+    tasci_merger = ETL::TasciMerger.new
+    tasci_merger.create_master_list("TASCI_FILE_DIRECTORY", "OUTPUT_DIRECTORY")
+  ```
+  ```ruby
+    tm.merge_files(['SUBJECT_CODE'], "MASTER_FILE_PATH", "OUTPUT_DIRECTORY", "TASCI_FILE_DIRECTORY")
+  ```

data/bin/merge_tasci ADDED Viewed

@@ -0,0 +1,8 @@
+#!/usr/bin/env ruby
+require 'tasci_merger'
+# SUBJECT_CODE, TASCI_DIR
+tm = TasciMerger.new(ARGV[0], ARGV[1], ARGV[2])
+tm.create_master_list
+tm.merge_files

data/lib/labtime.rb ADDED Viewed

@@ -0,0 +1,111 @@
+require('active_support/values/time_zone')
+require('active_support/time_with_zone')
+require 'active_support/core_ext/time/zones'
+require 'active_support/core_ext/time'
+require 'active_support/core_ext/numeric/time'
+class Labtime
+  include Comparable
+  attr_accessor :year, :hour, :min, :sec, :time_zone
+  DEFAULT_TIME_ZONE = ActiveSupport::TimeZone.new("Eastern Time (US & Canada)")
+  def self.parse(realtime)
+    # Return nil if nil parameter
+    return nil if realtime.nil?
+    # Make sure datetime is an ActiveSupport:TimeWithZone object
+    raise ArgumentError, "realtime class #{realtime.class} is not ActiveSupport::TimeWithZone" unless realtime.is_a?(ActiveSupport::TimeWithZone)
+    # year is easy
+    year = realtime.year
+    # Reference fo labtime is start of year
+    Time.zone = realtime.time_zone
+    reference_time = Time.zone.local(year)
+    # find difference between reference and
+    second_difference = realtime.to_i - reference_time.to_i
+    # convert second difference to labtime
+    hour = second_difference / 3600
+    min = (second_difference - (hour * 3600)) / 60
+    sec = (second_difference - (hour * 3600) - (min * 60))
+    self.new(year, hour, min, sec, realtime.time_zone)
+  end
+  def self.from_decimal(decimal_labtime, year, time_zone = ActiveSupport::TimeZone.new("Eastern Time (US & Canada)"))
+    raise ArguementError, "No year supplied!" if year.blank?
+    hour = decimal_labtime.to_i
+    remainder = decimal_labtime - hour.to_f
+    min_labtime = 60.0 * remainder
+    min = min_labtime.to_i
+    remainder = min_labtime - min.to_f
+    sec = (60 * remainder).round.to_i
+    self.new(year, hour, min, sec, time_zone)
+  end
+  def self.from_seconds(sec_time, year, time_zone = DEFAULT_TIME_ZONE)
+    hour = (sec_time / 3600.0).to_i
+    sec_time = sec_time - (hour * 3600)
+    min = (sec_time / 60.0).to_i
+    sec_time = sec_time - (min * 60)
+    sec = sec_time
+    self.new(year, hour, min, sec, time_zone)
+  end
+  def self.from_s(str, time_params = {}, time_zone = DEFAULT_TIME_ZONE)
+    time_captures = /(\d+)\:(\d{1,2})(\:(\d{1,2}))?(\s(\d\d\d\d))?\z/.match(str).captures
+    time_params[:hour] ||= time_captures[0]
+    time_params[:min] ||= time_captures[1]
+    time_params[:sec] ||= time_captures[3]
+    time_params[:year] ||= time_captures[5]
+    self.new(time_params[:year], time_params[:hour], time_params[:min], time_params[:sec], time_zone)
+  end
+  def initialize(year, hour, min, sec, time_zone = nil)
+    @year = year.to_i
+    @hour = hour.to_i
+    @min = min.to_i
+    @sec = sec.to_i
+    @time_zone = time_zone || DEFAULT_TIME_ZONE
+  end
+  def to_time
+    reference_time = time_zone.local(year)
+    reference_time + time_in_seconds
+  end
+  def <=>(other)
+    to_time <=> other.to_time
+  end
+  def to_s
+    "#{year} #{hour}:#{min}:#{sec} #{time_zone.to_s}"
+  end
+  def to_short_s
+    "#{hour}:#{min}:#{sec}"
+  end
+  def time_in_seconds
+    hour * 3600 + min * 60 + sec
+  end
+  def add_seconds(sec)
+    self.class.from_seconds(self.time_in_seconds + sec, self.year, self.time_zone)
+  end
+  def to_decimal
+    hour.to_f + (min.to_f/60.0) + (sec.to_f/3600.0)
+  end
+  private
+end

data/lib/man_merger.rb ADDED Viewed

@@ -0,0 +1,344 @@
+require 'csv'
+## CHANGELOG
+# 2173 Master file: change sp16,17,18 to *_rev
+module ETL
+  class ManMerger
+    LIST_DIR  = "/usr/local/htdocs/access/lib/data/etl/klerman_merge_man_files/file_lists/"
+    T_DRIVE_DIRS = ["/home/pwm4/Windows/tdrive/IPM/Modafinil_FD_42.85h/", "/home/pwm4/Windows/tdrive/IPM/NSBRI_65d_Entrainment/"]
+    #T_DRIVE_DIR = "/home/pwm4/Windows/tdrive/IPM/Modafinil_FD_42.85h/"
+    EPOCH_LENGTH = 30
+    def merge_files
+      subject_list = load_subject_list
+      subject_list.each do |subject_code, file_list|
+        merged_file = CSV.open("/usr/local/htdocs/access/lib/data/etl/klerman_merge_man_files/merged_files/#{subject_code}_merged.csv", "wb")
+        merged_file << %w(SUBJECT_CODE LABTIME SLEEP_STAGE SLEEP_PERIOD SEM_FLAG)
+        MY_LOG.info "---- #{subject_code}"
+        previous_first_labtime = nil
+        previous_last_labtime = nil
+        subject_year = get_subject_year(file_list)
+        file_list.each do |file_hash|
+          matched_files = Dir.glob("#{T_DRIVE_DIRS[0]}#{subject_code}/PSG/SCORED/**/#{file_hash[:pattern]}.man", File::FNM_CASEFOLD)
+          matched_files = Dir.glob("#{T_DRIVE_DIRS[1]}#{subject_code}/Sleep/#{file_hash[:pattern]}.man", File::FNM_CASEFOLD) if matched_files.length != 1
+          ## Validate File List
+          if matched_files.length != 1
+            raise StandardError, "None or more than one matched file. #{file_hash[:pattern]} #{matched_files} #{matched_files.length} #{subject_code}"
+          else
+            man_file_path = matched_files[0]
+          end
+          man_file = File.open(man_file_path)
+          LOADER_LOGGER.info "--- Loading #{man_file_path}"
+          file_info = {}
+          ## Ignore Corrupted Files
+          #next if tasci_file_path == "/home/pwm4/Windows/tdrive/IPM/AFOSR9_Slp_Restrict//24B7GXT3/PSG/TASCI_SEM/24b7gxt3_082907_wp19ap1_PID_24B7GXT3_082907_WP19AP1_RID_0_SEM.TASCI"
+          # Date from file name
+          matched_date = /_(\d\d)(\d\d)(\d\d)_/.match(man_file_path)
+          file_info[:fn_date] = (matched_date ? Time.zone.local((matched_date[3].to_i > 30 ? matched_date[3].to_i + 1900 : matched_date[3].to_i + 2000), matched_date[1].to_i, matched_date[2].to_i) : nil)
+          # read file
+          lines = man_file.readlines("\r")
+          # delete possible empty last line
+          lines.pop if lines.last.blank?
+          # get file first and last times
+          matched_time = /(\d\d):(\d\d):(\d\d):(\d\d\d)/.match(lines.first)
+          file_info[:first_time] = {hour: matched_time[1].to_i, min: matched_time[2].to_i, sec: matched_time[3].to_i}
+          matched_time = /(\d\d):(\d\d):(\d\d):(\d\d\d)/.match(lines.last)
+          file_info[:last_time] = {hour: matched_time[1].to_i, min: matched_time[2].to_i, sec: matched_time[3].to_i}
+          # validate first/last times
+          if file_hash[:start_time] != file_info[:first_time]
+            MY_LOG.error "---- FIRST TIME MISMATCH ---\n#{man_file_path}\n#{file_hash[:start_time]} #{file_info[:first_time]}\n\n"
+          end
+          if file_hash[:last_line_time] != file_info[:last_time]
+            MY_LOG.error "---- LAST TIME MISMATCH ----\n#{man_file_path}\n#{file_hash[:last_line_time]} #{file_info[:last_time]}\n\n"
+          end
+          if file_hash[:last_line_number] != lines.length
+            MY_LOG.error "---- LINE COUNT MISMATCH ----\n#{man_file_path}\n#{file_hash[:last_line_number]} #{lines.length}\n\n"
+          end
+          ##
+          # VALIDATION
+          file_hash[:start_labtime] = Labtime.from_decimal(file_hash[:start_labtime], subject_year)
+          file_hash[:last_line_labtime] = Labtime.from_decimal(file_hash[:last_line_labtime], subject_year)
+          start_realtime = file_hash[:start_labtime].to_time
+          last_line_realtime = file_hash[:last_line_labtime].to_time
+          first_realtime = file_hash[:start_labtime].time_zone.local(start_realtime.year, start_realtime.month, start_realtime.day, file_info[:first_time][:hour], file_info[:first_time][:min], file_info[:first_time][:sec])
+          last_realtime = file_hash[:last_line_labtime].time_zone.local(last_line_realtime.year, last_line_realtime.month, last_line_realtime.day, file_info[:last_time][:hour], file_info[:last_time][:min], file_info[:last_time][:sec])
+          file_info[:first_labtime] = Labtime.parse(first_realtime)
+          file_info[:last_labtime] = Labtime.parse(last_realtime)
+          predicted_last_labtime = Labtime.parse(file_info[:first_labtime].to_time + ((lines.length - 1) * 30).seconds)
+          sep = false
+          if (file_hash[:start_labtime].time_in_seconds - file_info[:first_labtime].time_in_seconds).abs > 2
+            MY_LOG.error "---- FIRST LABTIME MISMATCH ----\n#{man_file_path}\n#{file_hash[:start_labtime].time_in_seconds - file_info[:first_labtime].time_in_seconds} | #{file_hash[:start_labtime].to_time}\n#{file_hash[:start_labtime]} | #{file_info[:first_labtime]}\n"
+            sep = true
+          end
+          # These checks fail if DST TRANSITION HAPPENS
+          if last_line_realtime.dst? == start_realtime.dst?
+            if (file_hash[:last_line_labtime].time_in_seconds - file_info[:last_labtime].time_in_seconds).abs > 2
+              MY_LOG.error "---- LAST LABTIME MISMATCH  ----\n#{man_file_path}\n#{file_hash[:last_line_labtime].time_in_seconds - file_info[:last_labtime].time_in_seconds} | #{file_hash[:last_line_labtime].to_time}\n#{file_hash[:last_line_labtime]} | #{file_info[:last_labtime]}\n"
+              sep = true
+            end
+            if (file_info[:last_labtime].time_in_seconds - predicted_last_labtime.time_in_seconds).abs > 0
+              MY_LOG.error "---- PRED LABTIME MISMATCH  ----\n#{man_file_path}\n#{(file_info[:last_labtime].time_in_seconds - predicted_last_labtime.time_in_seconds)} | #{predicted_last_labtime.to_time}\nl: #{file_info[:last_labtime]} | #{predicted_last_labtime}\n"
+              sep = true
+            end
+          end
+          if (file_hash[:last_line_labtime].time_in_seconds - predicted_last_labtime.time_in_seconds).abs > 2
+            MY_LOG.error "---- !PRED LABTIME MISMATCH ----\n#{man_file_path}\n#{(file_hash[:last_line_labtime].time_in_seconds - predicted_last_labtime.time_in_seconds)} | #{predicted_last_labtime.to_time}\nl: #{file_info[:last_line_labtime]} | #{predicted_last_labtime}\n"
+            sep = true
+          end
+          unless previous_first_labtime.nil? or previous_last_labtime.nil?
+            MY_LOG.error "Start time is before previous end labtime for #{man_file_path}" if file_info[:first_labtime] < previous_last_labtime
+          end
+          raise StandardError, "AHHHHH" if file_info[:first_labtime].sec != first_realtime.sec
+          raise StandardError, "AHHHHH" if file_info[:last_labtime].sec != last_realtime.sec
+          MY_LOG.info "-----------------------------------\n\n" if sep
+          last_labtime = nil
+          ibob_flag = 0
+          lines.each_with_index do |line, line_number|
+            #merged_file << %w(SUBJECT_CODE LABTIME SLEEP_STAGE SLEEP_PERIOD SEM_FLAG)
+=begin
+sleep man file:
+  0      undef/unscored
+  1      stage 1
+  2      stage 2
+  3      stage 3
+  4      stage 4
+  5      wake
+  6      REM
+  7      MVT
+  8      LOff and LOn
+wake man file:
+  0      undef/un
+cored
+  1      stage 1
+  2      stage 2
+  3      stage 3
+  4      stage 4
+  5      wake
+  6      REM
+  7      MVT
+  8      SEM
+=end
+            line_labtime = file_info[:first_labtime].add_seconds(EPOCH_LENGTH * line_number)
+            line_code = /(\d)\s\d\d:\d\d:\d\d:\d\d\d/.match(line)[1].to_i
+            # Sleep Period Coding:
+            # 1      Sleep Onset (Lights Off) (IN BED)
+            # 2      Sleep Offset (Lights On) (OUT OF BED)
+            if file_hash[:type] == :sleep and line_code == 8
+              if ibob_flag == 0
+                sleep_period = 1
+                ibob_flag = 1
+              else
+                sleep_period = 2
+                ibob_flag = 0
+              end
+            else
+              sleep_period = nil
+            end
+            # Sleep Stage Coding:
+            # 1      stage 1
+            # 2      stage 2
+            # 3      stage 3
+            # 4      stage 4
+            # 6      MT
+            # 7      Undef
+            # 5      REM
+            # 9      Wake
+            if line_code >= 1 and line_code <= 4
+              line_event = line_code
+            elsif line_code == 0
+              line_event = 7
+            elsif line_code == 5 or line_code == 8
+              line_event = 9
+            elsif line_code == 6
+              line_event = 5
+            elsif line_code == 7
+              line_event = 6
+            else
+              raise StandardError, "Cannot map the following event: #{line_code}"
+            end
+            # SEM Event Coding:
+            # 1      Slow Eye Movement
+            # 0      No Slow Eye Movement
+            if file_hash[:type] == :wake and line_code == 8
+              sem_event = 1
+            else
+              sem_event = 0
+            end
+            last_labtime = line_labtime
+            output_line = [subject_code.upcase, line_labtime.to_decimal, line_event, sleep_period, sem_event]
+            merged_file << output_line
+          end
+          previous_first_labtime = file_info[:first_labtime]
+          previous_last_labtime = last_labtime
+        end
+        merged_file.close
+        MY_LOG.info "---- end #{subject_code}\n\n"
+      end
+    end
+    def load_subject_list
+      subject_info = {}
+      Dir.foreach(LIST_DIR) do |file|
+        next if file == '.' or file == '..'
+        #MY_LOG.info "#{file}"
+        csv_file = CSV.open("#{LIST_DIR}#{file}", {headers: true})
+        # Match and Validate File Name
+        matched_sc = /(.*)SLEEP\.csv/i.match(File.basename(csv_file.path))
+        if matched_sc
+          subject_code = matched_sc[1].upcase
+        else
+          next
+        end
+        subject_info[subject_code] = []
+        csv_file.each do |row|
+          file_info = {}
+          pattern = /(.*)\.man/i.match(row[0])
+          matched_time = /(\d\d):(\d\d):(\d\d):(\d\d\d)/.match(row[1])
+          if matched_time
+            file_info[:start_time] = {hour: matched_time[1].to_i, min: matched_time[2].to_i, sec: matched_time[3].to_i}
+          else
+            MY_LOG.error "No Valid Start Time Found: #{row}"
+            next
+          end
+          matched_time = /(\d\d):(\d\d):(\d\d):(\d\d\d)/.match(row[4])
+          if matched_time
+            file_info[:last_line_time] = {hour: matched_time[1].to_i, min: matched_time[2].to_i, sec: matched_time[3].to_i}
+          else
+            MY_LOG.error "No Valid End Time Found: #{row}"
+            next
+          end
+          file_info[:start_labtime] = row[2].to_f
+          file_info[:last_line_number] = row[3].to_i
+          file_info[:last_line_labtime] = row[5].to_f
+          if pattern
+            file_info[:pattern] = pattern[1]
+            subject_info[subject_code] << file_info
+            # Determine if sleep or wake file
+            raise StandardError, "CAN'T DETERMINE SP/WP (none match): #{pattern[1]}" unless (/_sp?\d/i.match(pattern[1]) or /_wp?\d/i.match(pattern[1]))
+            raise StandardError, "CAN'T DETERMINE SP/WP (both match): #{pattern[1]}" if (/_sp?\d/i.match(pattern[1]) and /_wp?\d/i.match(pattern[1]))
+            if /_sp?\d/i.match(pattern[1])
+              file_info[:type] = :sleep
+            elsif /_wp?\d/i.match(pattern[1])
+              file_info[:type] = :wake
+            else
+              raise StandardError, "Didn't match any SP/WP..."
+            end
+          else
+            MY_LOG.info "No Valid File Name Found: #{row}"
+            next
+          end
+        end
+        #MY_LOG.info subject_info[subject_code]
+      end
+      #MY_LOG.info subject_info.inspect
+      subject_info
+    end
+    def get_subject_year(file_list)
+      years = file_list.map do |h|
+        matched_date = /_(\d\d)(\d\d)(\d\d)_/.match(h[:pattern])
+        matched_date ? matched_date[3] : nil
+      end
+      years.delete_if {|x| x.nil? }
+      years = years.uniq
+      raise StandardError, "More than one unique year found in files: #{years}" if years.length > 1
+      year = years.first.to_i
+      year > 30 ? year + 1900 : year + 2000
+    end
+  end
+end
+=begin
+path: /home/pwm4/Windows/tdrive/IPM/Modafinil_FD_42.85h/
+path: /usr/local/htdocs/access/lib/data/etl/klerman_merge_man_files/file_list
+file list:
+subject_code ,start time, labtime, last line,last line time,labtime,,,,check,gap
+sleep man file:
+  0      undef/unscored
+  1      stage 1
+  2      stage 2
+  3      stage 3
+  4      stage 4
+  5      wake
+  6      REM
+  7      MVT
+  8      LOff and LOn
+wake man file:
+  0      undef/unscored
+  1      stage 1
+  2      stage 2
+  3      stage 3
+  4      stage 4
+  5      wake
+  6      REM
+  7      MVT
+  8      SEM
+sleep stage 8 should be coded as Wake with a SEM
+5 is Wake
+1-4 is Sleep stage 1-4
+7 is REM
+8 is Wake with SEM plus LOff and Lon
+mapping:
+  1      stage 1
+  2      stage 2
+  3      stage 3
+  4      stage 4
+  6      MT
+  7      Undef
+  5      REM
+  9      Wake
+=end

data/lib/tasci_merger.rb ADDED Viewed

@@ -0,0 +1,276 @@
+require 'csv'
+require 'man_merger'
+require 'labtime'
+Time.zone = 'Eastern Time (US & Canada)'
+class TasciMerger
+  def initialize(subject_code, tasci_directory, output_directory)
+    @subject_code = subject_code.chomp
+    @tasci_directory = tasci_directory.chomp
+    @output_directory = output_directory.chomp
+    @master_file_path
+  end
+  def create_master_list
+    master_file_name = File.join(@output_directory, "tasci_master_#{Time.zone.now.strftime("%Y%m%d")}.csv")
+    master_file = CSV.open(master_file_name, "wb")
+    master_file << %w(file_name file_labtime file_full_time total_epochs start_labtime end_labtime)
+    master_file_contents = []
+    puts @tasci_directory
+    puts File.exists?(@tasci_directory)
+    Dir.foreach(@tasci_directory) do |file|
+      next if file == '.' or file == '..'
+      puts file
+      tasci_file = File.open(File.join(@tasci_directory, file))
+      file_info = {}
+      ## HEADER INFO
+      # Header Line
+      tasci_file.readline
+      # File Name
+      read_line = tasci_file.readline
+      matched_name =  /\W*File name \|\W*(.*\.vpd)/i.match(read_line)
+      puts "ERROR: #{read_line}" unless matched_name
+      file_info[:source_file_name] = matched_name[1]
+      # Record Date
+      read_line = tasci_file.readline
+      matched_date = /RecordDate\W*\|\W*(..)\/(..)\/(....)\W*\|.*/.match(read_line)
+      puts "ERROR: #{read_line}" unless matched_date
+      #MY_LOG.info "matched_date: #{matched_date[3]} #{matched_date[1]} #{matched_date[2]}"
+      file_info[:record_date] = (matched_date ? Time.zone.local(matched_date[3].to_i, matched_date[2].to_i, matched_date[1].to_i) : nil)
+      # Record Time
+      read_line = tasci_file.readline
+      matched_time = /RecordTime\W*\|\W*(..):(..):(..)\W*\|\W*Patient ID\W*\|\W*.*\W*\|/.match(read_line)
+      puts "ERROR: #{read_line}" unless matched_time
+      file_info[:record_full_time] = ((matched_time and matched_date) ? Time.zone.local(matched_date[3].to_i, matched_date[2].to_i, matched_date[1].to_i, matched_time[1].to_i, matched_time[2].to_i, matched_time[3].to_i) : nil)
+      file_info[:record_labtime] = Labtime.parse(file_info[:record_full_time])
+      6.times do
+        tasci_file.readline
+      end
+      # Epochs and duration
+      read_line = tasci_file.readline
+      matched_line = /\W*# Epochs\W*\|\W*(\d+)\W*\|\W*Duration\(S\)\W*\|\W*(\d+)\|/.match(read_line)
+      puts "ERROR: #{read_line}" unless matched_line
+      file_info[:epochs] = matched_line[1].to_i - 1
+      file_info[:epoch_duration] = matched_line[2].to_i
+      5.times do
+        tasci_file.readline
+      end
+      first_labtime = nil
+      last_labtime = nil
+      until tasci_file.eof?
+        line = tasci_file.readline
+        matched_line = /(\d+)\|\W*(\d+)\|\W*(\d+)\|\W*(\d+)\|\W*(\d\d):(\d\d):(\d\d)\|\W*(.+)\|\W*(.+)\|/.match(line)
+        fields = matched_line.to_a
+        fields.delete_at(0)
+        raise StandardError, "fields should have 9 fields: #{fields.length} #{fields} #{line}" unless fields.length == 9
+        # Calculating labtime is tricky - file may span two days
+        calculated_line_time = file_info[:record_full_time] + fields[1].to_i.hours + fields[2].to_i.minutes + fields[3].to_i.seconds
+        if calculated_line_time.hour == fields[4].to_i and calculated_line_time.min == fields[5].to_i and calculated_line_time.sec == fields[6].to_i
+          line_time = calculated_line_time
+          line_labtime = Labtime.parse(line_time)
+        elsif file_info[:record_full_time].dst? != calculated_line_time.dst?
+          if (calculated_line_time.hour - fields[4].to_i).abs == 1 and calculated_line_time.min == fields[5].to_i and calculated_line_time.sec == fields[6].to_i
+            line_time = calculated_line_time
+            line_labtime = Labtime.parse(line_time)
+          else
+            raise StandardError, "Times DO NOT MATCH IN TASCI FILE #{File.basename(tasci_file_path)}!!! #{calculated_line_time.to_s} #{fields[4]} #{fields[5]} #{fields[6]}"
+          end
+        else
+          raise StandardError, "Times DO NOT MATCH IN TASCI FILE #{File.basename(tasci_file_path)}!!! #{calculated_line_time.to_s} #{fields[4]} #{fields[5]} #{fields[6]}"
+        end
+        first_labtime = line_labtime if first_labtime.nil?
+        last_labtime = line_labtime
+        #MY_LOG.info fields
+      end
+      master_file_contents << [file, file_info[:record_labtime].to_short_s, file_info[:record_full_time], file_info[:epochs], first_labtime.to_decimal, last_labtime.to_decimal]
+    end
+    master_file_contents.sort! {|x, y| x[4] <=> y[4] }
+    master_file_contents.each { |row| master_file << row }
+    puts "Created master file: #{master_file.path}"
+    @master_file_path = master_file.path
+    master_file_name
+  end
+  def merge_files
+    raise StandardError, "No master file path set! You must run create_master_list before running this function." unless @master_file_path
+    merged_file = CSV.open(File.join(@output_directory, "#{@subject_code}_merged_#{Time.zone.now.strftime("%Y%m%d")}.csv"), "wb")
+    merged_file << %w(SUBJECT_CODE FILE_NAME_SLEEP_WAKE_EPISODE LABTIME SLEEP_STAGE LIGHTS_OFF_ON_FLAG SEM_FLAG)
+    previous_first_labtime = nil
+    previous_last_labtime = nil
+    CSV.foreach(@master_file_path, headers: true) do |row|
+      puts row
+      tasci_file_path = File.join(@tasci_directory, row[0])
+      tasci_file = File.open(tasci_file_path)
+      file_info = {}
+      ## HEADER INFO
+      # Header Line
+      tasci_file.readline
+      # File Name
+      read_line = tasci_file.readline
+      matched_name =  /\W*File name \|\W*(.*\.vpd)/i.match(read_line)
+      puts "ERROR: #{read_line}" unless matched_name
+      file_info[:source_file_name] = matched_name[1]
+      # Record Date
+      read_line = tasci_file.readline
+      matched_date = /RecordDate\W*\|\W*(..)\/(..)\/(....)\W*\|.*/.match(read_line)
+      puts "ERROR: #{read_line}" unless matched_date
+      #MY_LOG.info "matched_date: #{matched_date[3]} #{matched_date[1]} #{matched_date[2]}"
+      file_info[:record_date] = (matched_date ? Time.zone.local(matched_date[3].to_i, matched_date[2].to_i, matched_date[1].to_i) : nil)
+      # Record Time
+      read_line = tasci_file.readline
+      matched_time = /RecordTime\W*\|\W*(..):(..):(..)\W*\|\W*Patient ID\W*\|\W*.*_.*_(\w*)\W*\|/.match(read_line)
+      puts "ERROR: #{read_line}" unless matched_time
+      file_info[:record_full_time] = ((matched_time and matched_date) ? Time.zone.local(matched_date[3].to_i, matched_date[2].to_i, matched_date[1].to_i, matched_time[1].to_i, matched_time[2].to_i, matched_time[3].to_i) : nil)
+      file_info[:record_labtime] = Labtime.parse(file_info[:record_full_time])
+      file_info[:sleep_wake_episode] = matched_time[4]
+      6.times do
+        tasci_file.readline
+      end
+      # Epochs and duration
+      read_line = tasci_file.readline
+      matched_line = /\W*# Epochs\W*\|\W*(\d+)\W*\|\W*Duration\(S\)\W*\|\W*(\d+)\|/.match(read_line)
+      puts "ERROR: #{read_line}" unless matched_line
+      file_info[:epochs] = matched_line[1].to_i
+      file_info[:epoch_duration] = matched_line[2].to_i
+      5.times do
+        tasci_file.readline
+      end
+      first_labtime = nil
+      last_labtime = nil
+      until tasci_file.eof?
+        line = tasci_file.readline
+        matched_line = /(\d+)\|\W*(\d+)\|\W*(\d+)\|\W*(\d+)\|\W*(\d\d):(\d\d):(\d\d)\|\W*(.+)\|\W*(.+)\|/.match(line)
+        fields = matched_line.to_a
+        fields.delete_at(0)
+        raise StandardError, "fields should have 9 fields: #{fields.length} #{fields} #{line}" unless fields.length == 9
+        # Calculating labtime is tricky - file may span two days
+        calculated_line_time = file_info[:record_full_time] + fields[1].to_i.hours + fields[2].to_i.minutes + fields[3].to_i.seconds
+        if calculated_line_time.hour == fields[4].to_i and calculated_line_time.min == fields[5].to_i and calculated_line_time.sec == fields[6].to_i
+          line_time = calculated_line_time
+          line_labtime = Labtime.parse(line_time)
+        elsif file_info[:record_full_time].dst? != calculated_line_time.dst?
+          if (calculated_line_time.hour - fields[4].to_i).abs == 1 and calculated_line_time.min == fields[5].to_i and calculated_line_time.sec == fields[6].to_i
+            line_time = calculated_line_time
+            line_labtime = Labtime.parse(line_time)
+          else
+            raise StandardError, "Times DO NOT MATCH IN TASCI FILE #{File.basename(tasci_file_path)}!!! #{calculated_line_time.to_s} #{fields[4]} #{fields[5]} #{fields[6]}"
+          end
+        else
+          raise StandardError, "Times DO NOT MATCH IN TASCI FILE #{File.basename(tasci_file_path)}!!! #{calculated_line_time.to_s} #{fields[4]} #{fields[5]} #{fields[6]}"
+        end
+        # Sleep Period Coding:
+        # 1      Sleep Onset (Lights Off)
+        # 2      Sleep Offset (Lights On)
+        if /Lights Off/i.match(fields[7]) # Sleep Onset
+          sleep_period = 1
+        elsif /Lights On/i.match(fields[7]) # Sleep Offset
+          sleep_period = 2
+        else
+          sleep_period = nil
+        end
+        # Sleep Stage Coding:
+        # 1      stage 1
+        # 2      stage 2
+        # 3      stage 3
+        # 4      stage 4
+        # 6      MT
+        # 7      Undef
+        # 5      REM
+        # 9      Wake
+        line_event = nil
+        if fields[8] == "Wake"
+          line_event = 9
+        elsif fields[8] == "Undefined"
+          line_event = 7
+        elsif fields[8] == "N1"
+          line_event = 1
+        elsif fields[8] == "N2"
+          line_event = 2
+        elsif fields[8] == "N3"
+          line_event = 3
+        elsif fields[8] == "4"
+          line_event = 4
+        elsif fields[8] == "REM"
+          line_event = 5
+        elsif fields[8] == "MVT"
+          line_event = 6
+        else
+          raise StandardError, "Cannot map the following event: #{fields[8]}"
+        end
+        # SEM Event Coding:
+        # 1      Slow Eye Movement
+        # 0      No Slow Eye Movement
+        sem_event = (fields[7] =~ /SEM/ ? 1 : 0)
+        # Previous Effort:
+        #line_time = Time.zone.local(file_info[:record_full_time].year, file_info[:record_full_time].month, file_info[:record_full_time].day, fields[4].to_i, fields[5].to_i, fields[6].to_i)
+        #line_labtime = Labtime.parse(line_time)
+        first_labtime = line_labtime if first_labtime.nil?
+        last_labtime = line_labtime
+        output_line = [@subject_code.upcase, file_info[:sleep_wake_episode], line_labtime.to_decimal, line_event, sleep_period, sem_event]
+        merged_file << output_line
+        #MY_LOG.info fields
+      end
+      unless previous_first_labtime.nil? or previous_last_labtime.nil?
+        puts "Start time is before previous end labtime: #{previous_last_labtime.to_short_s} #{first_labtime.to_short_s}" if first_labtime < previous_last_labtime
+      end
+      previous_first_labtime = first_labtime
+      previous_last_labtime = last_labtime
+    end
+    merged_file.close
+  end
+end

data/tasci_merger.gemspec ADDED Viewed

@@ -0,0 +1,21 @@
+Gem::Specification.new do |s|
+  s.name        = 'tasci_merger'
+  s.version     = '0.0.0'
+  s.date        = '2015-02-06'
+  s.summary     = "Merger utility for TASCI scored sleep files."
+  s.description = "Merger utility for TASCI scored sleep files, built for the Division of Sleep and Circadian Disorders at BWH."
+  s.authors     = ["Piotr Mankowski"]
+  s.email       = 'pmankowski@partners.org'
+  s.files       = %w(LICENSE README.md tasci_merger.gemspec lib/tasci_merger.rb lib/man_merger.rb lib/labtime.rb)
+  s.require_path = 'lib'
+  s.homepage    =
+      'https://github.com/pmanko/tasci_merger'
+  s.license       = 'MIT'
+  s.executables << 'merge_tasci'
+  s.required_ruby_version = '>= 2.1.0'
+  s.add_dependency "activesupport", '~> 4.2', '>= 4.2.0'
+end

metadata ADDED Viewed

@@ -0,0 +1,72 @@
+--- !ruby/object:Gem::Specification
+name: tasci_merger
+version: !ruby/object:Gem::Version
+  version: 0.0.0
+platform: ruby
+authors:
+- Piotr Mankowski
+autorequire:
+bindir: bin
+cert_chain: []
+date: 2015-02-06 00:00:00.000000000 Z
+dependencies:
+- !ruby/object:Gem::Dependency
+  name: activesupport
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '4.2'
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: 4.2.0
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '4.2'
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: 4.2.0
+description: Merger utility for TASCI scored sleep files, built for the Division of
+  Sleep and Circadian Disorders at BWH.
+email: pmankowski@partners.org
+executables:
+- merge_tasci
+extensions: []
+extra_rdoc_files: []
+files:
+- LICENSE
+- README.md
+- bin/merge_tasci
+- lib/labtime.rb
+- lib/man_merger.rb
+- lib/tasci_merger.rb
+- tasci_merger.gemspec
+homepage: https://github.com/pmanko/tasci_merger
+licenses:
+- MIT
+metadata: {}
+post_install_message:
+rdoc_options: []
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: 2.1.0
+required_rubygems_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '0'
+requirements: []
+rubyforge_project:
+rubygems_version: 2.2.2
+signing_key:
+specification_version: 4
+summary: Merger utility for TASCI scored sleep files.
+test_files: []