whatsapp-chat-parser 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/CHANGELOG.md +16 -0
- data/LICENSE +21 -0
- data/README.md +146 -0
- data/lib/whatsapp-chat-parser/encoding.rb +55 -0
- data/lib/whatsapp-chat-parser/file_processor.rb +56 -0
- data/lib/whatsapp-chat-parser/models/message.rb +38 -0
- data/lib/whatsapp-chat-parser/platforms/android/pattern.rb +47 -0
- data/lib/whatsapp-chat-parser/platforms/android.rb +86 -0
- data/lib/whatsapp-chat-parser/platforms/ios/pattern.rb +64 -0
- data/lib/whatsapp-chat-parser/platforms/ios.rb +87 -0
- data/lib/whatsapp-chat-parser/platforms/pattern_helpers.rb +23 -0
- data/lib/whatsapp-chat-parser/platforms.rb +38 -0
- data/lib/whatsapp-chat-parser/version.rb +5 -0
- data/lib/whatsapp-chat-parser.rb +24 -0
- metadata +137 -0
checksums.yaml
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
1
|
+
---
|
|
2
|
+
SHA256:
|
|
3
|
+
metadata.gz: c63551e684919c385110d12bcbbe8e15bae0eef6e4ffe31a0617970f45f24bd6
|
|
4
|
+
data.tar.gz: a510d3a34c51154faae21a8f69d7629e6f0f0ca103a4cfddfbb7571013cc8d3c
|
|
5
|
+
SHA512:
|
|
6
|
+
metadata.gz: c35b864edcaef50faed6aff4c32d80c8d12e7a624819165766a3e935052b7148aff429dfb674c714dae32b938b2f0e3a654e0fbf8628f2147ead5c46e46dfaa6
|
|
7
|
+
data.tar.gz: 16c197ee4c6ffc1f1e5ecfc94aadabde22a2b5329ab0b7059ffc2897b18afeb3f6b879a799649b98ac162506162d9a965a0ecfcd6e9423c0f11192a2f9880d48
|
data/CHANGELOG.md
ADDED
|
@@ -0,0 +1,16 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to this project will be documented in this file.
|
|
4
|
+
|
|
5
|
+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
6
|
+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
|
+
|
|
8
|
+
## [0.1.0] - 2026-02-18
|
|
9
|
+
|
|
10
|
+
### Added
|
|
11
|
+
- Initial release of `whatsapp-chat-parser`.
|
|
12
|
+
- Support for parsing exported WhatsApp chat `.txt` files from both **Android and iOS** platforms.
|
|
13
|
+
- Capability to parse both file streams and raw message strings.
|
|
14
|
+
- Automatic encoding normalization for cross-platform file compatibility.
|
|
15
|
+
- Support for multi-line messages.
|
|
16
|
+
- High-precision timestamp parsing (including second-level precision for iOS).
|
data/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2025-2026 Emmanuel Akachukwu
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
data/README.md
ADDED
|
@@ -0,0 +1,146 @@
|
|
|
1
|
+
# WhatsApp Chat Parser
|
|
2
|
+
|
|
3
|
+
A Ruby library that parses exported WhatsApp chat `.txt` files and converts them into structured, machine-readable data. Designed for downstream processing such as analytics, ETL pipelines, storage, and transformation - not for rendering UI or interacting with the WhatsApp API.
|
|
4
|
+
|
|
5
|
+
## Features
|
|
6
|
+
|
|
7
|
+
- **Platform support**: Handles both Android and iOS WhatsApp chat exports
|
|
8
|
+
- **Structured output**: Normalized message records suitable for JSON, databases, or further transformation
|
|
9
|
+
- **Robust parsing**: Detects platform-specific formats, normalizes timestamps, and groups multi-line messages
|
|
10
|
+
- **Deterministic**: No dependencies, explicit platform handling, predictable output structure
|
|
11
|
+
- **Fail-safe**: Skips or handles malformed lines when possible instead of aborting
|
|
12
|
+
|
|
13
|
+
## Installation
|
|
14
|
+
|
|
15
|
+
Add to your Gemfile:
|
|
16
|
+
|
|
17
|
+
```ruby
|
|
18
|
+
gem 'whatsapp-chat-parser'
|
|
19
|
+
```
|
|
20
|
+
|
|
21
|
+
Then run:
|
|
22
|
+
|
|
23
|
+
```bash
|
|
24
|
+
bundle install
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
Or install directly:
|
|
28
|
+
|
|
29
|
+
```bash
|
|
30
|
+
gem install whatsapp-chat-parser
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
## Usage
|
|
34
|
+
|
|
35
|
+
**Parse a single message string** (returns a `Message` or `nil` if malformed):
|
|
36
|
+
|
|
37
|
+
```ruby
|
|
38
|
+
require 'whatsapp-chat-parser'
|
|
39
|
+
|
|
40
|
+
line = '12/15/25, 10:30:00 AM - John Doe: Hello World'
|
|
41
|
+
msg = WhatsappChatParser.parse_line(line)
|
|
42
|
+
puts "#{msg.timestamp} | #{msg.author}: #{msg.body}" if msg
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
**Parse a file by path or io** (returns an enumerable of `Message`; malformed lines are skipped):
|
|
46
|
+
|
|
47
|
+
```ruby
|
|
48
|
+
messages = WhatsappChatParser.parse_file('path/to/chat.txt')
|
|
49
|
+
messages.each { |msg| puts "#{msg.timestamp} | #{msg.author}: #{msg.body}" }
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
```ruby
|
|
53
|
+
File.open('path/to/chat.txt') do |f|
|
|
54
|
+
WhatsappChatParser.parse_file(f).each { |msg| puts "#{msg.timestamp} | #{msg.author}: #{msg.body}" }
|
|
55
|
+
end
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
Each message has `timestamp`, `author`, `body`, `platform` and `type`. The result is suitable for JSON, databases, or pipelines.
|
|
59
|
+
|
|
60
|
+
For a more comprehensive example, see [samples/example.rb](samples/example.rb).
|
|
61
|
+
|
|
62
|
+
## Output format
|
|
63
|
+
|
|
64
|
+
Each parsed record includes:
|
|
65
|
+
|
|
66
|
+
| Field | Description |
|
|
67
|
+
|-------------|----------------------------------------------------|
|
|
68
|
+
| `timestamp` | Normalized date/time (consistent across platforms) |
|
|
69
|
+
| `author` | Sender name or identifier (when present) |
|
|
70
|
+
| `body` | Full message content (multi-line messages grouped) |
|
|
71
|
+
| `platform` | Platform where chat was exported from (Anroid/iOS) |
|
|
72
|
+
| `type` | e.g. user message, system message |
|
|
73
|
+
|
|
74
|
+
## Design principles
|
|
75
|
+
|
|
76
|
+
- **Deterministic parsing** - Same input yields same output
|
|
77
|
+
- **No dependencies** - Self-contained Ruby
|
|
78
|
+
- **Explicit platform handling** - Android vs iOS format differences are handled explicitly
|
|
79
|
+
- **Predictable structure** - Stable, documented output schema
|
|
80
|
+
|
|
81
|
+
## Use cases
|
|
82
|
+
|
|
83
|
+
- Chat analytics and reporting
|
|
84
|
+
- Data migration or archival
|
|
85
|
+
- ETL pipelines into databases or spreadsheets
|
|
86
|
+
- Automated processing of exported WhatsApp conversations
|
|
87
|
+
|
|
88
|
+
## Non-goals
|
|
89
|
+
|
|
90
|
+
This library does **not**:
|
|
91
|
+
|
|
92
|
+
- Interact with WhatsApp APIs
|
|
93
|
+
- Require network access
|
|
94
|
+
- Perform message interpretation, sentiment analysis, or NLP
|
|
95
|
+
- Handle encrypted or proprietary WhatsApp data formats
|
|
96
|
+
|
|
97
|
+
Input must be unmodified exports from WhatsApp’s “Export Chat” feature.
|
|
98
|
+
|
|
99
|
+
## How to export WhatsApp chats
|
|
100
|
+
|
|
101
|
+
To use this library you need a plain-text export of a WhatsApp conversation. Use WhatsApp’s built-in **Export Chat** and choose **Without media** so you get a single `.txt` file.
|
|
102
|
+
|
|
103
|
+
- [Android](https://faq.whatsapp.com/1180414079177245?cms_platform=android)
|
|
104
|
+
- [iOS](https://faq.whatsapp.com/1180414079177245/?cms_platform=iphone)
|
|
105
|
+
|
|
106
|
+
Use the exported `.txt` file as-is; do not edit the format. This library supports both Android and iOS export formats.
|
|
107
|
+
|
|
108
|
+
## Development
|
|
109
|
+
|
|
110
|
+
### Setup
|
|
111
|
+
|
|
112
|
+
Clone the repository and install dependencies:
|
|
113
|
+
|
|
114
|
+
```bash
|
|
115
|
+
git clone https://github.com/emmaakachukwu/whatsapp-chat-parser-rb
|
|
116
|
+
cd whatsapp-chat-parser-rb
|
|
117
|
+
bundle install
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
### Running Tests
|
|
121
|
+
|
|
122
|
+
We use RSpec for testing. Ensure all tests pass before submitting changes:
|
|
123
|
+
|
|
124
|
+
```bash
|
|
125
|
+
bundle exec rspec
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
### Linting
|
|
129
|
+
|
|
130
|
+
We use RuboCop to maintain code quality:
|
|
131
|
+
|
|
132
|
+
```bash
|
|
133
|
+
bundle exec rubocop
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
## Contributing
|
|
137
|
+
|
|
138
|
+
Contributions are welcome. Please open an issue or pull request on the project repository.
|
|
139
|
+
|
|
140
|
+
1. **Fork** the repository and create a feature branch.
|
|
141
|
+
2. Ensure your code follows the **Development** steps above (tests and linting pass).
|
|
142
|
+
3. **Submit a Pull Request** with a detailed description of your work.
|
|
143
|
+
|
|
144
|
+
## License
|
|
145
|
+
|
|
146
|
+
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
|
|
@@ -0,0 +1,55 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module WhatsappChatParser
|
|
4
|
+
# Handles encoding detection and normalization for chat exports.
|
|
5
|
+
module Encoding
|
|
6
|
+
UTF8_BOM = "\xEF\xBB\xBF".b.freeze
|
|
7
|
+
UTF16LE_BOM = "\xFF\xFE".b.freeze
|
|
8
|
+
UTF16BE_BOM = "\xFE\xFF".b.freeze
|
|
9
|
+
FALLBACK_ENCODING = 'UTF-8'
|
|
10
|
+
|
|
11
|
+
class << self
|
|
12
|
+
# Normalizes a string to UTF-8, handling BOM and common encodings.
|
|
13
|
+
# @param line [String] The raw input string.
|
|
14
|
+
# @return [String] The normalized UTF-8 string.
|
|
15
|
+
def normalize_to_utf8(line)
|
|
16
|
+
enc = encoding_for(line)
|
|
17
|
+
str = line.dup.force_encoding(enc)
|
|
18
|
+
str = strip_bom(str, enc)
|
|
19
|
+
unless enc == ::Encoding::UTF_8
|
|
20
|
+
str = str.encode(
|
|
21
|
+
::Encoding::UTF_8, invalid: :replace, undef: :replace
|
|
22
|
+
)
|
|
23
|
+
end
|
|
24
|
+
|
|
25
|
+
str
|
|
26
|
+
end
|
|
27
|
+
|
|
28
|
+
private
|
|
29
|
+
|
|
30
|
+
def encoding_for(line)
|
|
31
|
+
raw = line.dup.force_encoding(::Encoding::BINARY)
|
|
32
|
+
if raw.start_with?(UTF8_BOM)
|
|
33
|
+
::Encoding::UTF_8
|
|
34
|
+
elsif raw.start_with?(UTF16LE_BOM)
|
|
35
|
+
::Encoding::UTF_16LE
|
|
36
|
+
elsif raw.start_with?(UTF16BE_BOM)
|
|
37
|
+
::Encoding::UTF_16BE
|
|
38
|
+
else
|
|
39
|
+
::Encoding.find(FALLBACK_ENCODING)
|
|
40
|
+
end
|
|
41
|
+
end
|
|
42
|
+
|
|
43
|
+
def strip_bom(str, encoding)
|
|
44
|
+
case encoding
|
|
45
|
+
when ::Encoding::UTF_8
|
|
46
|
+
str.start_with?("\uFEFF") ? str.delete_prefix("\uFEFF") : str
|
|
47
|
+
when ::Encoding::UTF_16LE, ::Encoding::UTF_16BE
|
|
48
|
+
str.bytesize >= 2 ? str.byteslice(2..).force_encoding(encoding) : str
|
|
49
|
+
else
|
|
50
|
+
str
|
|
51
|
+
end
|
|
52
|
+
end
|
|
53
|
+
end
|
|
54
|
+
end
|
|
55
|
+
end
|
|
@@ -0,0 +1,56 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require 'stringio'
|
|
4
|
+
|
|
5
|
+
module WhatsappChatParser
|
|
6
|
+
# Handles reading and processing chat export files.
|
|
7
|
+
module FileProcessor
|
|
8
|
+
class << self
|
|
9
|
+
# Iterates through the source and yields parsed messages.
|
|
10
|
+
# @param source [String, IO] The file path or IO object.
|
|
11
|
+
# @yield [message] Yields each parsed message.
|
|
12
|
+
# @yieldparam message [WhatsappChatParser::Models::Message]
|
|
13
|
+
# @return [Enumerator] if no block is given.
|
|
14
|
+
def parse(source, &block)
|
|
15
|
+
return enum_for(__method__, source) unless block_given?
|
|
16
|
+
|
|
17
|
+
file = source.is_a?(StringIO) ? source : File.open(source)
|
|
18
|
+
parse_io(file, &block)
|
|
19
|
+
end
|
|
20
|
+
|
|
21
|
+
private
|
|
22
|
+
|
|
23
|
+
# Processes an IO object.
|
|
24
|
+
# @param io [IO] The input source.
|
|
25
|
+
# @yield [message]
|
|
26
|
+
def parse_io(io, &block)
|
|
27
|
+
return enum_for(__method__, io) unless block_given?
|
|
28
|
+
|
|
29
|
+
accumulate_messages(io, &block)
|
|
30
|
+
end
|
|
31
|
+
|
|
32
|
+
def accumulate_messages(io, &block)
|
|
33
|
+
message = ''
|
|
34
|
+
|
|
35
|
+
io.each_line do |line|
|
|
36
|
+
if message_starts_here?(line)
|
|
37
|
+
yield_message(message, &block) unless message.empty?
|
|
38
|
+
message = line
|
|
39
|
+
else
|
|
40
|
+
message << line
|
|
41
|
+
end
|
|
42
|
+
end
|
|
43
|
+
|
|
44
|
+
yield_message(message, &block) unless message.empty?
|
|
45
|
+
end
|
|
46
|
+
|
|
47
|
+
def yield_message(message)
|
|
48
|
+
yield(Platforms.parse(message))
|
|
49
|
+
end
|
|
50
|
+
|
|
51
|
+
def message_starts_here?(line)
|
|
52
|
+
!Platforms.parse(line).nil?
|
|
53
|
+
end
|
|
54
|
+
end
|
|
55
|
+
end
|
|
56
|
+
end
|
|
@@ -0,0 +1,38 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module WhatsappChatParser
|
|
4
|
+
module Models
|
|
5
|
+
# Represents a single WhatsApp message.
|
|
6
|
+
class Message
|
|
7
|
+
# @return [String] The date and time the message was sent (standardized SQL format).
|
|
8
|
+
attr_accessor :timestamp
|
|
9
|
+
# @return [String, nil] The name or phone number of the message author, or nil for system messages.
|
|
10
|
+
attr_accessor :author
|
|
11
|
+
# @return [String] The content of the message.
|
|
12
|
+
attr_accessor :body
|
|
13
|
+
# @return [Symbol] The type of message (:user or :system).
|
|
14
|
+
attr_accessor :type
|
|
15
|
+
# @return [Symbol] The platform the message was exported from (:android or :ios).
|
|
16
|
+
attr_accessor :platform
|
|
17
|
+
|
|
18
|
+
# Initializes a new Message object.
|
|
19
|
+
# @param timestamp [String] The standardized timestamp.
|
|
20
|
+
# @param author [String, nil] The author of the message.
|
|
21
|
+
# @param body [String] The message body.
|
|
22
|
+
# @param platform [Symbol] The export platform.
|
|
23
|
+
def initialize(timestamp:, author:, body:, platform:)
|
|
24
|
+
@timestamp = timestamp
|
|
25
|
+
@author = author
|
|
26
|
+
@body = body
|
|
27
|
+
@platform = platform
|
|
28
|
+
@type = message_type
|
|
29
|
+
end
|
|
30
|
+
|
|
31
|
+
private
|
|
32
|
+
|
|
33
|
+
def message_type
|
|
34
|
+
@author.nil? ? :system : :user
|
|
35
|
+
end
|
|
36
|
+
end
|
|
37
|
+
end
|
|
38
|
+
end
|
|
@@ -0,0 +1,47 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module WhatsappChatParser
|
|
4
|
+
module Platforms
|
|
5
|
+
module Android
|
|
6
|
+
# Regex patterns and builders for Android WhatsApp exports.
|
|
7
|
+
module Pattern
|
|
8
|
+
# rubocop:disable Layout/HashAlignment
|
|
9
|
+
PATTERNS = {
|
|
10
|
+
month: /(\d{1,2})/,
|
|
11
|
+
day: /(\d{1,2})/,
|
|
12
|
+
year: /(\d{2})/,
|
|
13
|
+
hour: /(\d{1,2})/,
|
|
14
|
+
minute: /(\d{2})/,
|
|
15
|
+
meridiem: /\p{Space}*([AP]M)/,
|
|
16
|
+
author: /(?:([^:]+): )?/,
|
|
17
|
+
body: /(.*)/
|
|
18
|
+
}.freeze
|
|
19
|
+
# rubocop:enable Layout/HashAlignment
|
|
20
|
+
|
|
21
|
+
class << self
|
|
22
|
+
# Returns the compiled regex for Android chat exports.
|
|
23
|
+
# @return [Regexp]
|
|
24
|
+
def regex
|
|
25
|
+
Regexp.new(
|
|
26
|
+
"#{date_pattern}, #{time_pattern} " \
|
|
27
|
+
"- #{PatternHelpers.source(PATTERNS, :author)}#{PatternHelpers.source(PATTERNS, :body)}",
|
|
28
|
+
Regexp::MULTILINE
|
|
29
|
+
)
|
|
30
|
+
end
|
|
31
|
+
|
|
32
|
+
private
|
|
33
|
+
|
|
34
|
+
def date_pattern
|
|
35
|
+
PATTERNS.fetch_values(:month, :day, :year)
|
|
36
|
+
.map(&:source)
|
|
37
|
+
.join('/')
|
|
38
|
+
end
|
|
39
|
+
|
|
40
|
+
def time_pattern
|
|
41
|
+
PatternHelpers.format_sources(PATTERNS, %i[hour minute meridiem], '%s:%s%s')
|
|
42
|
+
end
|
|
43
|
+
end
|
|
44
|
+
end
|
|
45
|
+
end
|
|
46
|
+
end
|
|
47
|
+
end
|
|
@@ -0,0 +1,86 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module WhatsappChatParser
|
|
4
|
+
module Platforms
|
|
5
|
+
# Parser for Android WhatsApp chat exports.
|
|
6
|
+
module Android
|
|
7
|
+
class << self
|
|
8
|
+
# Parses a line from an Android export.
|
|
9
|
+
# @param line [String] The exported line.
|
|
10
|
+
# @return [Models::Message, nil]
|
|
11
|
+
def parse(line)
|
|
12
|
+
match = line.match(Pattern.regex)
|
|
13
|
+
return unless match
|
|
14
|
+
|
|
15
|
+
timestamp = extract_timestamp(match)
|
|
16
|
+
author = extract(match, :author)
|
|
17
|
+
body = extract(match, :body)
|
|
18
|
+
|
|
19
|
+
Models::Message.new(timestamp: timestamp, author: author, body: body, platform: :android)
|
|
20
|
+
end
|
|
21
|
+
|
|
22
|
+
# Checks if a line matches the Android format.
|
|
23
|
+
# @param line [String]
|
|
24
|
+
# @return [Boolean]
|
|
25
|
+
def matches?(line)
|
|
26
|
+
Pattern.regex.match?(line)
|
|
27
|
+
end
|
|
28
|
+
|
|
29
|
+
private
|
|
30
|
+
|
|
31
|
+
def extract(match, key)
|
|
32
|
+
index = Pattern::PATTERNS.keys.index(key)
|
|
33
|
+
match[index + 1]
|
|
34
|
+
end
|
|
35
|
+
|
|
36
|
+
def extract_timestamp(match)
|
|
37
|
+
date_components = extract_date_components(match)
|
|
38
|
+
time_components = extract_time_components(match)
|
|
39
|
+
|
|
40
|
+
format_sql_timestamp(date_components, time_components)
|
|
41
|
+
end
|
|
42
|
+
|
|
43
|
+
def extract_date_components(match)
|
|
44
|
+
month = extract(match, :month)
|
|
45
|
+
day = extract(match, :day)
|
|
46
|
+
year = extract(match, :year).to_i + 2000
|
|
47
|
+
|
|
48
|
+
{ month: month, day: day, year: year }
|
|
49
|
+
end
|
|
50
|
+
|
|
51
|
+
def extract_time_components(match)
|
|
52
|
+
hour = extract(match, :hour).to_i
|
|
53
|
+
minute = extract(match, :minute).to_i
|
|
54
|
+
meridiem = extract(match, :meridiem)
|
|
55
|
+
hour = convert_to_24_hour(hour, meridiem)
|
|
56
|
+
|
|
57
|
+
{ hour: hour, minute: minute }
|
|
58
|
+
end
|
|
59
|
+
|
|
60
|
+
def convert_to_24_hour(hour, meridiem)
|
|
61
|
+
meridiem = meridiem.upcase
|
|
62
|
+
if meridiem == 'PM' && hour < 12
|
|
63
|
+
hour + 12
|
|
64
|
+
elsif meridiem == 'AM' && hour == 12
|
|
65
|
+
0
|
|
66
|
+
else
|
|
67
|
+
hour
|
|
68
|
+
end
|
|
69
|
+
end
|
|
70
|
+
|
|
71
|
+
def format_sql_timestamp(date, time)
|
|
72
|
+
# rubocop:disable Layout/HashAlignment
|
|
73
|
+
format(
|
|
74
|
+
'%<year>04d-%<month>02d-%<day>02d %<hour>02d:%<minute>02d:00',
|
|
75
|
+
year: date[:year],
|
|
76
|
+
month: date[:month],
|
|
77
|
+
day: date[:day],
|
|
78
|
+
hour: time[:hour],
|
|
79
|
+
minute: time[:minute]
|
|
80
|
+
)
|
|
81
|
+
# rubocop:enable Layout/HashAlignment
|
|
82
|
+
end
|
|
83
|
+
end
|
|
84
|
+
end
|
|
85
|
+
end
|
|
86
|
+
end
|
|
@@ -0,0 +1,64 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module WhatsappChatParser
|
|
4
|
+
module Platforms
|
|
5
|
+
module Ios
|
|
6
|
+
# Regex patterns and builders for iOS WhatsApp exports.
|
|
7
|
+
module Pattern
|
|
8
|
+
# rubocop:disable Layout/HashAlignment
|
|
9
|
+
PATTERNS = {
|
|
10
|
+
day: /(\d{1,2})/,
|
|
11
|
+
month: /(\d{1,2})/,
|
|
12
|
+
year: /(\d{4})/,
|
|
13
|
+
hour: /(\d{1,2})/,
|
|
14
|
+
minute: /(\d{2})/,
|
|
15
|
+
second: /(?::(\d{2}))?/,
|
|
16
|
+
meridiem: /\p{Space}*([AP]M)?/,
|
|
17
|
+
author: /(?:([^:]+?)\p{Space}*:\p{Space}*)?/,
|
|
18
|
+
body: /(.*)/
|
|
19
|
+
}.freeze
|
|
20
|
+
# rubocop:enable Layout/HashAlignment
|
|
21
|
+
|
|
22
|
+
class << self
|
|
23
|
+
# Returns the compiled regex for iOS chat exports.
|
|
24
|
+
# @return [Regexp]
|
|
25
|
+
def regex
|
|
26
|
+
Regexp.new(
|
|
27
|
+
"#{square_bracket_open_pattern}" \
|
|
28
|
+
"#{date_pattern},#{space_pattern}" \
|
|
29
|
+
"#{time_pattern}" \
|
|
30
|
+
"#{square_bracket_close_pattern}" \
|
|
31
|
+
"#{space_pattern}#{/[-~]?/.source}#{space_pattern}" \
|
|
32
|
+
"#{PatternHelpers.source(PATTERNS, :author)}#{PatternHelpers.source(PATTERNS, :body)}",
|
|
33
|
+
Regexp::MULTILINE
|
|
34
|
+
)
|
|
35
|
+
end
|
|
36
|
+
|
|
37
|
+
private
|
|
38
|
+
|
|
39
|
+
def date_pattern
|
|
40
|
+
PatternHelpers.join_sources(PATTERNS, %i[day month year], '/')
|
|
41
|
+
end
|
|
42
|
+
|
|
43
|
+
def time_pattern
|
|
44
|
+
PatternHelpers.format_sources(
|
|
45
|
+
PATTERNS, %i[hour minute second meridiem], '%s:%s%s%s'
|
|
46
|
+
)
|
|
47
|
+
end
|
|
48
|
+
|
|
49
|
+
def space_pattern
|
|
50
|
+
/\p{Space}*/.source
|
|
51
|
+
end
|
|
52
|
+
|
|
53
|
+
def square_bracket_open_pattern
|
|
54
|
+
/\[?/.source
|
|
55
|
+
end
|
|
56
|
+
|
|
57
|
+
def square_bracket_close_pattern
|
|
58
|
+
/\]?/.source
|
|
59
|
+
end
|
|
60
|
+
end
|
|
61
|
+
end
|
|
62
|
+
end
|
|
63
|
+
end
|
|
64
|
+
end
|
|
@@ -0,0 +1,87 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module WhatsappChatParser
|
|
4
|
+
module Platforms
|
|
5
|
+
# Parser for iOS WhatsApp chat exports.
|
|
6
|
+
module Ios
|
|
7
|
+
class << self
|
|
8
|
+
# Parses a line from an iOS export.
|
|
9
|
+
# @param line [String] The exported line.
|
|
10
|
+
# @return [Models::Message, nil]
|
|
11
|
+
def parse(line)
|
|
12
|
+
match = line.match(Pattern.regex)
|
|
13
|
+
return unless match
|
|
14
|
+
|
|
15
|
+
timestamp = extract_timestamp(match)
|
|
16
|
+
author = extract(match, :author)
|
|
17
|
+
body = extract(match, :body)
|
|
18
|
+
|
|
19
|
+
Models::Message.new(timestamp: timestamp, author: author, body: body, platform: :ios)
|
|
20
|
+
end
|
|
21
|
+
|
|
22
|
+
# Checks if a line matches the iOS format.
|
|
23
|
+
# @param line [String]
|
|
24
|
+
# @return [Boolean]
|
|
25
|
+
def matches?(line)
|
|
26
|
+
Pattern.regex.match?(line)
|
|
27
|
+
end
|
|
28
|
+
|
|
29
|
+
private
|
|
30
|
+
|
|
31
|
+
def extract(match, key)
|
|
32
|
+
index = Pattern::PATTERNS.keys.index(key)
|
|
33
|
+
match[index + 1]
|
|
34
|
+
end
|
|
35
|
+
|
|
36
|
+
def extract_timestamp(match)
|
|
37
|
+
date_components = extract_date_components(match)
|
|
38
|
+
time_components = extract_time_components(match)
|
|
39
|
+
|
|
40
|
+
format_sql_timestamp(date_components, time_components)
|
|
41
|
+
end
|
|
42
|
+
|
|
43
|
+
def extract_date_components(match)
|
|
44
|
+
month = extract(match, :month)
|
|
45
|
+
day = extract(match, :day)
|
|
46
|
+
year = extract(match, :year)
|
|
47
|
+
|
|
48
|
+
{ month: month, day: day, year: year }
|
|
49
|
+
end
|
|
50
|
+
|
|
51
|
+
def extract_time_components(match)
|
|
52
|
+
hour = extract(match, :hour).to_i
|
|
53
|
+
minute = extract(match, :minute).to_i
|
|
54
|
+
second = extract(match, :second)
|
|
55
|
+
meridiem = extract(match, :meridiem)
|
|
56
|
+
hour = convert_to_24_hour(hour, meridiem)
|
|
57
|
+
|
|
58
|
+
{ hour: hour, minute: minute, second: second }
|
|
59
|
+
end
|
|
60
|
+
|
|
61
|
+
def convert_to_24_hour(hour, meridiem)
|
|
62
|
+
if meridiem == 'PM' && hour < 12
|
|
63
|
+
hour + 12
|
|
64
|
+
elsif meridiem == 'AM' && hour == 12
|
|
65
|
+
0
|
|
66
|
+
else
|
|
67
|
+
hour
|
|
68
|
+
end
|
|
69
|
+
end
|
|
70
|
+
|
|
71
|
+
def format_sql_timestamp(date, time)
|
|
72
|
+
# rubocop:disable Layout/HashAlignment
|
|
73
|
+
format(
|
|
74
|
+
'%<year>04d-%<month>02d-%<day>02d %<hour>02d:%<minute>02d:%<second>02d',
|
|
75
|
+
year: date[:year],
|
|
76
|
+
month: date[:month],
|
|
77
|
+
day: date[:day],
|
|
78
|
+
hour: time[:hour],
|
|
79
|
+
minute: time[:minute],
|
|
80
|
+
second: time[:second]
|
|
81
|
+
)
|
|
82
|
+
# rubocop:enable Layout/HashAlignment
|
|
83
|
+
end
|
|
84
|
+
end
|
|
85
|
+
end
|
|
86
|
+
end
|
|
87
|
+
end
|
|
@@ -0,0 +1,23 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module WhatsappChatParser
|
|
4
|
+
module Platforms
|
|
5
|
+
# Shared utilities for building regex patterns.
|
|
6
|
+
module PatternHelpers
|
|
7
|
+
class << self
|
|
8
|
+
def join_sources(patterns, keys, separator)
|
|
9
|
+
patterns.fetch_values(*keys).map(&:source).join(separator)
|
|
10
|
+
end
|
|
11
|
+
|
|
12
|
+
def source(patterns, key)
|
|
13
|
+
patterns[key].source
|
|
14
|
+
end
|
|
15
|
+
|
|
16
|
+
def format_sources(patterns, keys, format_string)
|
|
17
|
+
values = patterns.fetch_values(*keys).map(&:source)
|
|
18
|
+
format_string % values
|
|
19
|
+
end
|
|
20
|
+
end
|
|
21
|
+
end
|
|
22
|
+
end
|
|
23
|
+
end
|
|
@@ -0,0 +1,38 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require_relative 'encoding'
|
|
4
|
+
require_relative 'platforms/android'
|
|
5
|
+
require_relative 'platforms/ios'
|
|
6
|
+
require_relative 'platforms/android/pattern'
|
|
7
|
+
require_relative 'platforms/ios/pattern'
|
|
8
|
+
require_relative 'platforms/pattern_helpers'
|
|
9
|
+
|
|
10
|
+
module WhatsappChatParser
|
|
11
|
+
# Registry and dispatcher for platform-specific chat parsers.
|
|
12
|
+
module Platforms
|
|
13
|
+
PLATFORMS = [Android, Ios].freeze
|
|
14
|
+
|
|
15
|
+
class << self
|
|
16
|
+
# Attempts to parse a message line by identifying its platform.
|
|
17
|
+
# @param line [String] The raw message line.
|
|
18
|
+
# @return [WhatsappChatParser::Models::Message, nil] The parsed message or nil.
|
|
19
|
+
def parse(line)
|
|
20
|
+
sanitized = sanitize(line)
|
|
21
|
+
platform = platform_for(sanitized)
|
|
22
|
+
return nil unless platform
|
|
23
|
+
|
|
24
|
+
platform.parse(sanitized)
|
|
25
|
+
end
|
|
26
|
+
|
|
27
|
+
private
|
|
28
|
+
|
|
29
|
+
def platform_for(line)
|
|
30
|
+
PLATFORMS.find { |platform| platform.matches?(line) }
|
|
31
|
+
end
|
|
32
|
+
|
|
33
|
+
def sanitize(line)
|
|
34
|
+
Encoding.normalize_to_utf8(line).strip.scrub(' ').squeeze(' ')
|
|
35
|
+
end
|
|
36
|
+
end
|
|
37
|
+
end
|
|
38
|
+
end
|
|
@@ -0,0 +1,24 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require_relative 'whatsapp-chat-parser/platforms'
|
|
4
|
+
require_relative 'whatsapp-chat-parser/models/message'
|
|
5
|
+
require_relative 'whatsapp-chat-parser/file_processor'
|
|
6
|
+
|
|
7
|
+
# Main entry point for the WhatsApp Chat Parser library.
|
|
8
|
+
module WhatsappChatParser
|
|
9
|
+
class << self
|
|
10
|
+
# Parses a single message line of a WhatsApp chat export.
|
|
11
|
+
# @param line [String] The line to parse.
|
|
12
|
+
# @return [WhatsappChatParser::Models::Message, nil] The parsed message or nil if message is malformed.
|
|
13
|
+
def parse_line(line)
|
|
14
|
+
Platforms.parse(line)
|
|
15
|
+
end
|
|
16
|
+
|
|
17
|
+
# Parses a WhatsApp chat export .txt file path or IO.
|
|
18
|
+
# @param source [String, IO] The path to the file or an IO object.
|
|
19
|
+
# @return [Enumerator<WhatsappChatParser::Models::Message>] Enumerator of parsed messages.
|
|
20
|
+
def parse_file(source)
|
|
21
|
+
FileProcessor.parse(source)
|
|
22
|
+
end
|
|
23
|
+
end
|
|
24
|
+
end
|
metadata
ADDED
|
@@ -0,0 +1,137 @@
|
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
|
2
|
+
name: whatsapp-chat-parser
|
|
3
|
+
version: !ruby/object:Gem::Version
|
|
4
|
+
version: 0.1.0
|
|
5
|
+
platform: ruby
|
|
6
|
+
authors:
|
|
7
|
+
- Emmanuel Akachukwu
|
|
8
|
+
autorequire:
|
|
9
|
+
bindir: bin
|
|
10
|
+
cert_chain: []
|
|
11
|
+
date: 2026-02-18 00:00:00.000000000 Z
|
|
12
|
+
dependencies:
|
|
13
|
+
- !ruby/object:Gem::Dependency
|
|
14
|
+
name: rspec
|
|
15
|
+
requirement: !ruby/object:Gem::Requirement
|
|
16
|
+
requirements:
|
|
17
|
+
- - "~>"
|
|
18
|
+
- !ruby/object:Gem::Version
|
|
19
|
+
version: '3.13'
|
|
20
|
+
type: :development
|
|
21
|
+
prerelease: false
|
|
22
|
+
version_requirements: !ruby/object:Gem::Requirement
|
|
23
|
+
requirements:
|
|
24
|
+
- - "~>"
|
|
25
|
+
- !ruby/object:Gem::Version
|
|
26
|
+
version: '3.13'
|
|
27
|
+
- !ruby/object:Gem::Dependency
|
|
28
|
+
name: rubocop
|
|
29
|
+
requirement: !ruby/object:Gem::Requirement
|
|
30
|
+
requirements:
|
|
31
|
+
- - "~>"
|
|
32
|
+
- !ruby/object:Gem::Version
|
|
33
|
+
version: '1.84'
|
|
34
|
+
type: :development
|
|
35
|
+
prerelease: false
|
|
36
|
+
version_requirements: !ruby/object:Gem::Requirement
|
|
37
|
+
requirements:
|
|
38
|
+
- - "~>"
|
|
39
|
+
- !ruby/object:Gem::Version
|
|
40
|
+
version: '1.84'
|
|
41
|
+
- !ruby/object:Gem::Dependency
|
|
42
|
+
name: rubocop-performance
|
|
43
|
+
requirement: !ruby/object:Gem::Requirement
|
|
44
|
+
requirements:
|
|
45
|
+
- - "~>"
|
|
46
|
+
- !ruby/object:Gem::Version
|
|
47
|
+
version: '1.26'
|
|
48
|
+
type: :development
|
|
49
|
+
prerelease: false
|
|
50
|
+
version_requirements: !ruby/object:Gem::Requirement
|
|
51
|
+
requirements:
|
|
52
|
+
- - "~>"
|
|
53
|
+
- !ruby/object:Gem::Version
|
|
54
|
+
version: '1.26'
|
|
55
|
+
- !ruby/object:Gem::Dependency
|
|
56
|
+
name: rubocop-rspec
|
|
57
|
+
requirement: !ruby/object:Gem::Requirement
|
|
58
|
+
requirements:
|
|
59
|
+
- - "~>"
|
|
60
|
+
- !ruby/object:Gem::Version
|
|
61
|
+
version: '3.9'
|
|
62
|
+
type: :development
|
|
63
|
+
prerelease: false
|
|
64
|
+
version_requirements: !ruby/object:Gem::Requirement
|
|
65
|
+
requirements:
|
|
66
|
+
- - "~>"
|
|
67
|
+
- !ruby/object:Gem::Version
|
|
68
|
+
version: '3.9'
|
|
69
|
+
- !ruby/object:Gem::Dependency
|
|
70
|
+
name: simplecov
|
|
71
|
+
requirement: !ruby/object:Gem::Requirement
|
|
72
|
+
requirements:
|
|
73
|
+
- - "~>"
|
|
74
|
+
- !ruby/object:Gem::Version
|
|
75
|
+
version: '0.22'
|
|
76
|
+
type: :development
|
|
77
|
+
prerelease: false
|
|
78
|
+
version_requirements: !ruby/object:Gem::Requirement
|
|
79
|
+
requirements:
|
|
80
|
+
- - "~>"
|
|
81
|
+
- !ruby/object:Gem::Version
|
|
82
|
+
version: '0.22'
|
|
83
|
+
description: |
|
|
84
|
+
WhatsappChatParser parses exported WhatsApp chat .txt files (Android and iOS)
|
|
85
|
+
and converts them into structured, machine-readable message objects.
|
|
86
|
+
It supports both file inputs and raw message strings.
|
|
87
|
+
email:
|
|
88
|
+
- emmanuelakachukwu1@gmail.com
|
|
89
|
+
executables: []
|
|
90
|
+
extensions: []
|
|
91
|
+
extra_rdoc_files: []
|
|
92
|
+
files:
|
|
93
|
+
- CHANGELOG.md
|
|
94
|
+
- LICENSE
|
|
95
|
+
- README.md
|
|
96
|
+
- lib/whatsapp-chat-parser.rb
|
|
97
|
+
- lib/whatsapp-chat-parser/encoding.rb
|
|
98
|
+
- lib/whatsapp-chat-parser/file_processor.rb
|
|
99
|
+
- lib/whatsapp-chat-parser/models/message.rb
|
|
100
|
+
- lib/whatsapp-chat-parser/platforms.rb
|
|
101
|
+
- lib/whatsapp-chat-parser/platforms/android.rb
|
|
102
|
+
- lib/whatsapp-chat-parser/platforms/android/pattern.rb
|
|
103
|
+
- lib/whatsapp-chat-parser/platforms/ios.rb
|
|
104
|
+
- lib/whatsapp-chat-parser/platforms/ios/pattern.rb
|
|
105
|
+
- lib/whatsapp-chat-parser/platforms/pattern_helpers.rb
|
|
106
|
+
- lib/whatsapp-chat-parser/version.rb
|
|
107
|
+
homepage: https://github.com/emmaakachukwu/whatsapp-chat-parser-rb
|
|
108
|
+
licenses:
|
|
109
|
+
- MIT
|
|
110
|
+
metadata:
|
|
111
|
+
homepage_uri: https://github.com/emmaakachukwu/whatsapp-chat-parser-rb
|
|
112
|
+
bug_tracker_uri: https://github.com/emmaakachukwu/whatsapp-chat-parser-rb/issues
|
|
113
|
+
changelog_uri: https://github.com/emmaakachukwu/whatsapp-chat-parser-rb/blob/v0.1.0/CHANGELOG.md
|
|
114
|
+
documentation_uri: https://www.rubydoc.info/gems/whatsapp-chat-parser/0.1.0
|
|
115
|
+
source_code_uri: https://github.com/emmaakachukwu/whatsapp-chat-parser-rb/tree/v0.1.0
|
|
116
|
+
keywords: whatsapp chat parser whatsapp-chat-parser text export android ios
|
|
117
|
+
rubygems_mfa_required: 'true'
|
|
118
|
+
post_install_message:
|
|
119
|
+
rdoc_options: []
|
|
120
|
+
require_paths:
|
|
121
|
+
- lib
|
|
122
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
|
123
|
+
requirements:
|
|
124
|
+
- - ">="
|
|
125
|
+
- !ruby/object:Gem::Version
|
|
126
|
+
version: 3.0.0
|
|
127
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
|
128
|
+
requirements:
|
|
129
|
+
- - ">="
|
|
130
|
+
- !ruby/object:Gem::Version
|
|
131
|
+
version: '0'
|
|
132
|
+
requirements: []
|
|
133
|
+
rubygems_version: 3.5.22
|
|
134
|
+
signing_key:
|
|
135
|
+
specification_version: 4
|
|
136
|
+
summary: A Ruby library for parsing exported WhatsApp chat .txt files or message strings.
|
|
137
|
+
test_files: []
|