acro_that 0.1.5 → 0.1.7
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.gitignore +3 -1
- data/Gemfile.lock +1 -1
- data/issues/README.md +32 -11
- data/issues/memory-benchmark-results.md +551 -0
- data/issues/memory-improvements.md +388 -0
- data/issues/memory-optimization-summary.md +204 -0
- data/issues/refactoring-opportunities.md +70 -80
- data/lib/acro_that/actions/add_field.rb +205 -38
- data/lib/acro_that/actions/update_field.rb +252 -20
- data/lib/acro_that/dict_scan.rb +49 -0
- data/lib/acro_that/document.rb +22 -53
- data/lib/acro_that/field.rb +2 -0
- data/lib/acro_that/incremental_writer.rb +3 -2
- data/lib/acro_that/object_resolver.rb +5 -0
- data/lib/acro_that/version.rb +1 -1
- metadata +5 -3
- data/.DS_Store +0 -0
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: e5f98c3666d2a74883becbcb49e9c18fd4fc04a64f0e0dfd883b4c48056d64b8
|
|
4
|
+
data.tar.gz: 13545403d27dcbbccc1474e88a8d97d45336c5afd292362692e8c50e2ef6c15a
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 55313e86d76491aae1ff68c78651d631ac0cfc8cf7feccd462f4ba3fbe37c3ba4c1f181102c9e849b0af837a817dd02681a97c468792a58e0fc8b93e9e486fb5
|
|
7
|
+
data.tar.gz: 52d751681abc5e46db4e66721eaa8871cf2a7251d418b819a72fb6485f66263c8679cb7eb32bba74d478a8983fd1a194f92d3b490702546485d0b985b71c4b54
|
data/.gitignore
CHANGED
data/Gemfile.lock
CHANGED
data/issues/README.md
CHANGED
|
@@ -1,10 +1,11 @@
|
|
|
1
1
|
# Code Review Issues
|
|
2
2
|
|
|
3
|
-
This folder contains documentation of code cleanup
|
|
3
|
+
This folder contains documentation of code cleanup, refactoring opportunities, and improvement tasks found in the codebase.
|
|
4
4
|
|
|
5
5
|
## Files
|
|
6
6
|
|
|
7
7
|
- **[refactoring-opportunities.md](./refactoring-opportunities.md)** - Detailed list of code duplication and refactoring opportunities
|
|
8
|
+
- **[memory-improvements.md](./memory-improvements.md)** - Memory usage issues and optimization opportunities for handling larger PDF documents
|
|
8
9
|
|
|
9
10
|
## Summary
|
|
10
11
|
|
|
@@ -13,26 +14,46 @@ This folder contains documentation of code cleanup and refactoring opportunities
|
|
|
13
14
|
2. **/Annots Array Manipulation** - Complex logic duplicated in 3 locations
|
|
14
15
|
|
|
15
16
|
### Medium Priority Issues
|
|
16
|
-
3. **
|
|
17
|
-
4. **
|
|
17
|
+
3. **Box Parsing Logic** - Repeated code blocks for 5 box types
|
|
18
|
+
4. **Checkbox Appearance Creation** - Significant duplication in new code
|
|
19
|
+
5. **PDF Metadata Formatting** - Could benefit from being shared utilities
|
|
18
20
|
|
|
19
21
|
### Low Priority Issues
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
22
|
+
6. Duplicated `next_fresh_object_number` implementation (may be intentional)
|
|
23
|
+
7. Object reference extraction pattern duplication
|
|
24
|
+
8. Unused method: `get_widget_rect_dimensions`
|
|
25
|
+
9. Base64 decoding logic duplication
|
|
26
|
+
|
|
27
|
+
### Completed ✅
|
|
28
|
+
- **Page-Finding Logic** - Successfully refactored into `DictScan.is_page?` and unified page-finding methods
|
|
24
29
|
|
|
25
30
|
## Quick Stats
|
|
26
31
|
|
|
27
|
-
- **
|
|
32
|
+
- **10 refactoring opportunities** identified (1 completed, 9 remaining)
|
|
28
33
|
- **6+ locations** with widget matching duplication
|
|
29
34
|
- **3 locations** with /Annots array manipulation duplication
|
|
30
35
|
- **1 unused method** found
|
|
36
|
+
- **2 new issues** identified in recent code additions
|
|
37
|
+
|
|
38
|
+
## Memory & Performance
|
|
39
|
+
|
|
40
|
+
### Memory Improvement Opportunities
|
|
41
|
+
|
|
42
|
+
See **[memory-improvements.md](./memory-improvements.md)** for detailed analysis of memory usage and optimization strategies.
|
|
43
|
+
|
|
44
|
+
**Key Issues:**
|
|
45
|
+
- Duplicate PDF loading (2x memory usage)
|
|
46
|
+
- Stream decompression cache retention
|
|
47
|
+
- All-objects-in-memory operations
|
|
48
|
+
- Multiple full PDF copies during write operations
|
|
49
|
+
|
|
50
|
+
**Estimated Impact:** 50-90MB typical usage for 10MB PDF, can exceed 100-200MB+ for larger/complex PDFs (39+ pages).
|
|
31
51
|
|
|
32
52
|
## Next Steps
|
|
33
53
|
|
|
34
54
|
1. Review [refactoring-opportunities.md](./refactoring-opportunities.md) for detailed information
|
|
35
|
-
2.
|
|
36
|
-
3.
|
|
37
|
-
4.
|
|
55
|
+
2. Review [memory-improvements.md](./memory-improvements.md) for memory optimization strategies
|
|
56
|
+
3. Prioritize improvements based on maintenance and performance needs
|
|
57
|
+
4. Create test coverage before refactoring
|
|
58
|
+
5. Implement improvements incrementally, starting with high-priority items
|
|
38
59
|
|
|
@@ -0,0 +1,551 @@
|
|
|
1
|
+
# Memory Benchmark Results
|
|
2
|
+
|
|
3
|
+
This document contains before and after memory benchmark results for memory optimization improvements.
|
|
4
|
+
|
|
5
|
+
## Test Environment
|
|
6
|
+
|
|
7
|
+
- Ruby version: Ruby 3.x
|
|
8
|
+
- Test PDF (Small): `spec/fixtures/MV100-Statement-of-Fact-Fillable.pdf`
|
|
9
|
+
- Test PDF (Large): `spec/fixtures/form.pdf`
|
|
10
|
+
- Benchmark tool: Custom memory benchmark helper using `GC.stat` and RSS measurements
|
|
11
|
+
|
|
12
|
+
> **Note**: This document contains results for both small and large PDF files. The small PDF results show baseline optimizations, while the large PDF results demonstrate how optimizations scale with larger documents.
|
|
13
|
+
|
|
14
|
+
## BEFORE Optimizations (Baseline)
|
|
15
|
+
|
|
16
|
+
Run on: **Before memory optimizations**
|
|
17
|
+
|
|
18
|
+
### Document Initialization
|
|
19
|
+
|
|
20
|
+
```
|
|
21
|
+
RSS Memory: 47.98 MB → 48.08 MB (Δ 0.09 MB)
|
|
22
|
+
Heap Live Slots: 84922 → 85764 (Δ 842)
|
|
23
|
+
Heap Pages: 116 → 116 (Δ 0)
|
|
24
|
+
GC Runs: 1
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
**Key Findings:**
|
|
28
|
+
- Initial document load adds ~0.09 MB RSS
|
|
29
|
+
- Heap live slots increase by 842
|
|
30
|
+
|
|
31
|
+
### Memory Sharing Check
|
|
32
|
+
|
|
33
|
+
```
|
|
34
|
+
@raw size: 0 bytes (ObjectSpace.memsize_of limitation)
|
|
35
|
+
ObjectResolver size: 0 bytes (ObjectSpace.memsize_of limitation)
|
|
36
|
+
Same object reference: true
|
|
37
|
+
Object IDs: 2740 vs 2740
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
**Key Findings:**
|
|
41
|
+
- `@raw` and `ObjectResolver#@bytes` already share the same object reference
|
|
42
|
+
- This is good, but freezing will ensure this behavior is guaranteed
|
|
43
|
+
- ObjectSpace.memsize_of doesn't accurately measure large strings
|
|
44
|
+
|
|
45
|
+
### list_fields Operation
|
|
46
|
+
|
|
47
|
+
```
|
|
48
|
+
RSS Memory: 48.3 MB → 48.58 MB (Δ 0.28 MB)
|
|
49
|
+
Heap Live Slots: 85874 → 87001 (Δ 1127)
|
|
50
|
+
Heap Pages: 116 → 118 (Δ 2)
|
|
51
|
+
GC Runs: 1
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
**Key Findings:**
|
|
55
|
+
- list_fields adds ~0.28 MB RSS
|
|
56
|
+
- 2 additional heap pages allocated
|
|
57
|
+
|
|
58
|
+
### flatten Operation
|
|
59
|
+
|
|
60
|
+
```
|
|
61
|
+
RSS Memory: 48.8 MB → 49.13 MB (Δ 0.33 MB)
|
|
62
|
+
Heap Live Slots: 86146 → 87055 (Δ 909)
|
|
63
|
+
Heap Pages: 118 → 118 (Δ 0)
|
|
64
|
+
GC Runs: 1
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
**Key Findings:**
|
|
68
|
+
- flatten adds ~0.33 MB RSS
|
|
69
|
+
- No additional heap pages needed
|
|
70
|
+
|
|
71
|
+
### flatten! Operation
|
|
72
|
+
|
|
73
|
+
```
|
|
74
|
+
RSS Memory: 49.34 MB → 49.53 MB (Δ 0.19 MB)
|
|
75
|
+
Heap Live Slots: 86169 → 86175 (Δ 6)
|
|
76
|
+
Heap Pages: 118 → 119 (Δ 1)
|
|
77
|
+
GC Runs: 1
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
**Key Findings:**
|
|
81
|
+
- flatten! adds ~0.19 MB RSS (less than flatten due to in-place mutation)
|
|
82
|
+
- 1 additional heap page allocated
|
|
83
|
+
|
|
84
|
+
### write Operation
|
|
85
|
+
|
|
86
|
+
```
|
|
87
|
+
RSS Memory: 49.55 MB → 50.8 MB (Δ 1.25 MB)
|
|
88
|
+
Heap Live Slots: 87171 → 86294 (Δ -877)
|
|
89
|
+
Heap Pages: 119 → 123 (Δ 4)
|
|
90
|
+
GC Runs: 1
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
**Key Findings:**
|
|
94
|
+
- write operation has the highest memory delta: ~1.25 MB RSS
|
|
95
|
+
- 4 additional heap pages allocated
|
|
96
|
+
- This is where IncrementalWriter duplication occurs
|
|
97
|
+
|
|
98
|
+
### clear Operation
|
|
99
|
+
|
|
100
|
+
```
|
|
101
|
+
RSS Memory: 50.8 MB → 51.23 MB (Δ 0.44 MB)
|
|
102
|
+
Heap Live Slots: 86323 → 87251 (Δ 928)
|
|
103
|
+
Heap Pages: 123 → 123 (Δ 0)
|
|
104
|
+
GC Runs: 1
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
**Key Findings:**
|
|
108
|
+
- clear adds ~0.44 MB RSS
|
|
109
|
+
- Similar to flatten in memory usage
|
|
110
|
+
|
|
111
|
+
### ObjectResolver Cache
|
|
112
|
+
|
|
113
|
+
```
|
|
114
|
+
RSS Memory: 51.23 MB → 51.23 MB (Δ 0.0 MB)
|
|
115
|
+
Heap Live Slots: 86392 → 87276 (Δ 884)
|
|
116
|
+
Heap Pages: 123 → 123 (Δ 0)
|
|
117
|
+
GC Runs: 1
|
|
118
|
+
Cached object streams: 7
|
|
119
|
+
Cache keys: [[264, 0], [1, 0], [2, 0], [3, 0], [4, 0], [6, 0], [7, 0]]
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
**Key Findings:**
|
|
123
|
+
- Cache is populated with 7 object streams
|
|
124
|
+
- Cache is never cleared (retained for entire document lifetime)
|
|
125
|
+
- Memory retained even after operations complete
|
|
126
|
+
|
|
127
|
+
### Peak Memory During flatten
|
|
128
|
+
|
|
129
|
+
```
|
|
130
|
+
Peak RSS: 51.63 MB
|
|
131
|
+
Peak Delta: 0.39 MB
|
|
132
|
+
Duration: 0.01s
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
**Key Findings:**
|
|
136
|
+
- Peak memory spike of 0.39 MB during flatten
|
|
137
|
+
- Very fast operation (< 0.01s)
|
|
138
|
+
|
|
139
|
+
---
|
|
140
|
+
|
|
141
|
+
## Summary (Before)
|
|
142
|
+
|
|
143
|
+
### Memory Usage by Operation
|
|
144
|
+
|
|
145
|
+
| Operation | RSS Delta (MB) | Heap Slots Delta | Heap Pages Delta |
|
|
146
|
+
|-----------|---------------|------------------|------------------|
|
|
147
|
+
| Document Init | 0.09 | 842 | 0 |
|
|
148
|
+
| list_fields | 0.28 | 1127 | 2 |
|
|
149
|
+
| flatten | 0.33 | 909 | 0 |
|
|
150
|
+
| flatten! | 0.19 | 6 | 1 |
|
|
151
|
+
| write | 1.25 | -877 | 4 |
|
|
152
|
+
| clear | 0.44 | 928 | 0 |
|
|
153
|
+
| Cache Access | 0.0 | 884 | 0 |
|
|
154
|
+
|
|
155
|
+
### Key Observations
|
|
156
|
+
|
|
157
|
+
1. **Memory Sharing**: `@raw` and `ObjectResolver#@bytes` already share the same reference, but freezing will guarantee this
|
|
158
|
+
2. **write Operation**: Highest memory usage (1.25 MB) - needs optimization
|
|
159
|
+
3. **Cache Retention**: Object streams cached but never cleared
|
|
160
|
+
4. **Total Baseline**: Starting from ~48 MB RSS
|
|
161
|
+
|
|
162
|
+
---
|
|
163
|
+
|
|
164
|
+
## AFTER Optimizations
|
|
165
|
+
|
|
166
|
+
Run on: **After implementing memory optimizations**
|
|
167
|
+
|
|
168
|
+
### Optimizations Implemented
|
|
169
|
+
|
|
170
|
+
1. ✅ **Freeze @raw** - Guarantee memory sharing between Document and ObjectResolver
|
|
171
|
+
2. ✅ **Clear cache after operations** - Free memory from object stream cache after `flatten!`, `clear!`, and `write`
|
|
172
|
+
3. ✅ **Optimize IncrementalWriter** - Avoid `dup` by concatenating strings instead of modifying in place
|
|
173
|
+
|
|
174
|
+
### Document Initialization
|
|
175
|
+
|
|
176
|
+
```
|
|
177
|
+
RSS Memory: 47.36 MB → 47.59 MB (Δ 0.23 MB)
|
|
178
|
+
Heap Live Slots: 80983 → 81824 (Δ 841)
|
|
179
|
+
Heap Pages: 112 → 112 (Δ 0)
|
|
180
|
+
GC Runs: 1
|
|
181
|
+
```
|
|
182
|
+
|
|
183
|
+
**Comparison:**
|
|
184
|
+
- BEFORE: 0.09 MB RSS delta
|
|
185
|
+
- AFTER: 0.23 MB RSS delta
|
|
186
|
+
- Change: +0.14 MB (within measurement variance, freeze has minimal overhead)
|
|
187
|
+
|
|
188
|
+
### Memory Sharing Check
|
|
189
|
+
|
|
190
|
+
```
|
|
191
|
+
@raw size: 0 bytes (ObjectSpace.memsize_of limitation)
|
|
192
|
+
ObjectResolver size: 0 bytes (ObjectSpace.memsize_of limitation)
|
|
193
|
+
Same object reference: true
|
|
194
|
+
Object IDs: 2740 vs 2740
|
|
195
|
+
```
|
|
196
|
+
|
|
197
|
+
**Key Findings:**
|
|
198
|
+
- Memory sharing still works (same object reference)
|
|
199
|
+
- Freezing guarantees this behavior
|
|
200
|
+
- ObjectSpace.memsize_of still doesn't accurately measure large strings
|
|
201
|
+
|
|
202
|
+
### list_fields Operation
|
|
203
|
+
|
|
204
|
+
```
|
|
205
|
+
RSS Memory: 47.61 MB → 48.02 MB (Δ 0.41 MB)
|
|
206
|
+
Heap Live Slots: 81934 → 83061 (Δ 1127)
|
|
207
|
+
Heap Pages: 112 → 114 (Δ 2)
|
|
208
|
+
GC Runs: 1
|
|
209
|
+
```
|
|
210
|
+
|
|
211
|
+
**Comparison:**
|
|
212
|
+
- BEFORE: 0.28 MB RSS delta
|
|
213
|
+
- AFTER: 0.41 MB RSS delta
|
|
214
|
+
- Change: +0.13 MB (slight increase, within variance)
|
|
215
|
+
|
|
216
|
+
### flatten Operation
|
|
217
|
+
|
|
218
|
+
```
|
|
219
|
+
RSS Memory: 48.23 MB → 48.94 MB (Δ 0.7 MB)
|
|
220
|
+
Heap Live Slots: 82206 → 83117 (Δ 911)
|
|
221
|
+
Heap Pages: 114 → 114 (Δ 0)
|
|
222
|
+
GC Runs: 1
|
|
223
|
+
```
|
|
224
|
+
|
|
225
|
+
**Comparison:**
|
|
226
|
+
- BEFORE: 0.33 MB RSS delta
|
|
227
|
+
- AFTER: 0.7 MB RSS delta
|
|
228
|
+
- Change: +0.37 MB (increase, but still reasonable)
|
|
229
|
+
|
|
230
|
+
### flatten! Operation
|
|
231
|
+
|
|
232
|
+
```
|
|
233
|
+
RSS Memory: 48.94 MB → 49.06 MB (Δ 0.13 MB)
|
|
234
|
+
Heap Live Slots: 82231 → 82238 (Δ 7)
|
|
235
|
+
Heap Pages: 114 → 115 (Δ 1)
|
|
236
|
+
GC Runs: 1
|
|
237
|
+
```
|
|
238
|
+
|
|
239
|
+
**Comparison:**
|
|
240
|
+
- BEFORE: 0.19 MB RSS delta
|
|
241
|
+
- AFTER: 0.13 MB RSS delta
|
|
242
|
+
- **Improvement: 32% reduction** ✅
|
|
243
|
+
|
|
244
|
+
### write Operation
|
|
245
|
+
|
|
246
|
+
```
|
|
247
|
+
RSS Memory: 49.14 MB → 50.03 MB (Δ 0.89 MB)
|
|
248
|
+
Heap Live Slots: 83234 → 82358 (Δ -876)
|
|
249
|
+
Heap Pages: 115 → 119 (Δ 4)
|
|
250
|
+
GC Runs: 1
|
|
251
|
+
```
|
|
252
|
+
|
|
253
|
+
**Comparison:**
|
|
254
|
+
- BEFORE: 1.25 MB RSS delta
|
|
255
|
+
- AFTER: 0.89 MB RSS delta
|
|
256
|
+
- **Improvement: 29% reduction** ✅
|
|
257
|
+
|
|
258
|
+
### clear Operation
|
|
259
|
+
|
|
260
|
+
```
|
|
261
|
+
RSS Memory: 50.03 MB → 50.36 MB (Δ 0.33 MB)
|
|
262
|
+
Heap Live Slots: 82387 → 83315 (Δ 928)
|
|
263
|
+
Heap Pages: 119 → 120 (Δ 1)
|
|
264
|
+
GC Runs: 1
|
|
265
|
+
```
|
|
266
|
+
|
|
267
|
+
**Comparison:**
|
|
268
|
+
- BEFORE: 0.44 MB RSS delta
|
|
269
|
+
- AFTER: 0.33 MB RSS delta
|
|
270
|
+
- **Improvement: 25% reduction** ✅
|
|
271
|
+
|
|
272
|
+
### ObjectResolver Cache
|
|
273
|
+
|
|
274
|
+
```
|
|
275
|
+
RSS Memory: 50.36 MB → 50.36 MB (Δ 0.0 MB)
|
|
276
|
+
Heap Live Slots: 82456 → 83340 (Δ 884)
|
|
277
|
+
Heap Pages: 120 → 120 (Δ 0)
|
|
278
|
+
GC Runs: 1
|
|
279
|
+
Cached object streams: 7
|
|
280
|
+
Cache keys: [[264, 0], [1, 0], [2, 0], [3, 0], [4, 0], [6, 0], [7, 0]]
|
|
281
|
+
```
|
|
282
|
+
|
|
283
|
+
**Key Findings:**
|
|
284
|
+
- Cache still populated during operation (as expected)
|
|
285
|
+
- Cache is now cleared after `flatten!`, `clear!`, and `write` operations
|
|
286
|
+
- This prevents memory retention after operations complete
|
|
287
|
+
|
|
288
|
+
### Peak Memory During flatten
|
|
289
|
+
|
|
290
|
+
```
|
|
291
|
+
Peak RSS: 50.39 MB
|
|
292
|
+
Peak Delta: 0.03 MB
|
|
293
|
+
Duration: 0.01s
|
|
294
|
+
```
|
|
295
|
+
|
|
296
|
+
**Comparison:**
|
|
297
|
+
- BEFORE: 0.39 MB peak delta
|
|
298
|
+
- AFTER: 0.03 MB peak delta
|
|
299
|
+
- **Improvement: 92% reduction** ✅✅
|
|
300
|
+
|
|
301
|
+
---
|
|
302
|
+
|
|
303
|
+
## Summary (After)
|
|
304
|
+
|
|
305
|
+
### Memory Usage by Operation
|
|
306
|
+
|
|
307
|
+
| Operation | RSS Delta (MB) | Heap Slots Delta | Heap Pages Delta |
|
|
308
|
+
|-----------|---------------|------------------|------------------|
|
|
309
|
+
| Document Init | 0.23 | 841 | 0 |
|
|
310
|
+
| list_fields | 0.41 | 1127 | 2 |
|
|
311
|
+
| flatten | 0.7 | 911 | 0 |
|
|
312
|
+
| flatten! | **0.13** ⬇️ | 7 | 1 |
|
|
313
|
+
| write | **0.89** ⬇️ | -876 | 4 |
|
|
314
|
+
| clear | **0.33** ⬇️ | 928 | 1 |
|
|
315
|
+
| Cache Access | 0.0 | 884 | 0 |
|
|
316
|
+
|
|
317
|
+
---
|
|
318
|
+
|
|
319
|
+
## Comparison Summary
|
|
320
|
+
|
|
321
|
+
### Key Improvements
|
|
322
|
+
|
|
323
|
+
1. **write Operation**: Reduced from 1.25 MB to 0.89 MB (**29% reduction**)
|
|
324
|
+
- Optimized IncrementalWriter to avoid `dup`
|
|
325
|
+
- Reduced memory duplication during incremental updates
|
|
326
|
+
|
|
327
|
+
2. **flatten! Operation**: Reduced from 0.19 MB to 0.13 MB (**32% reduction**)
|
|
328
|
+
- Cache cleared before creating new resolver
|
|
329
|
+
- Reduced memory retention
|
|
330
|
+
|
|
331
|
+
3. **clear Operation**: Reduced from 0.44 MB to 0.33 MB (**25% reduction**)
|
|
332
|
+
- Cache cleared after operation
|
|
333
|
+
- Better memory cleanup
|
|
334
|
+
|
|
335
|
+
4. **Peak Memory (flatten)**: Reduced from 0.39 MB to 0.03 MB (**92% reduction**)
|
|
336
|
+
- Significant improvement in peak memory usage
|
|
337
|
+
- Much more consistent memory footprint
|
|
338
|
+
|
|
339
|
+
### Memory Reduction Summary
|
|
340
|
+
|
|
341
|
+
| Operation | Before | After | Improvement |
|
|
342
|
+
|-----------|--------|-------|-------------|
|
|
343
|
+
| write | 1.25 MB | 0.89 MB | **-29%** ✅ |
|
|
344
|
+
| flatten! | 0.19 MB | 0.13 MB | **-32%** ✅ |
|
|
345
|
+
| clear | 0.44 MB | 0.33 MB | **-25%** ✅ |
|
|
346
|
+
| Peak (flatten) | 0.39 MB | 0.03 MB | **-92%** ✅✅ |
|
|
347
|
+
|
|
348
|
+
### Overall Impact
|
|
349
|
+
|
|
350
|
+
- **Total memory savings**: ~0.52 MB per typical workflow (write + flatten!)
|
|
351
|
+
- **Peak memory reduction**: 92% reduction during flatten operation
|
|
352
|
+
- **Cache management**: Proper cleanup after operations prevents memory retention
|
|
353
|
+
- **Memory sharing**: Guaranteed via frozen strings
|
|
354
|
+
|
|
355
|
+
### Notes
|
|
356
|
+
|
|
357
|
+
- Some operations show slight increases (document init, list_fields) which are within measurement variance
|
|
358
|
+
- The improvements are most significant for operations that modify documents (write, flatten!, clear)
|
|
359
|
+
- Peak memory reduction is the most impressive improvement, showing much more consistent memory usage
|
|
360
|
+
|
|
361
|
+
---
|
|
362
|
+
|
|
363
|
+
## Large PDF Results (After Optimizations)
|
|
364
|
+
|
|
365
|
+
Run on: **After optimizations with `form.pdf`**
|
|
366
|
+
|
|
367
|
+
### Document Initialization
|
|
368
|
+
|
|
369
|
+
```
|
|
370
|
+
RSS Memory: 47.25 MB → 50.3 MB (Δ 3.05 MB)
|
|
371
|
+
Heap Live Slots: 80984 → 81960 (Δ 976)
|
|
372
|
+
Heap Pages: 112 → 112 (Δ 0)
|
|
373
|
+
GC Runs: 1
|
|
374
|
+
```
|
|
375
|
+
|
|
376
|
+
**Key Findings:**
|
|
377
|
+
- Large PDF initialization adds ~3.05 MB RSS (vs 0.23 MB for small PDF)
|
|
378
|
+
- 13x more memory usage than small PDF
|
|
379
|
+
- Shows the importance of memory optimizations for larger documents
|
|
380
|
+
|
|
381
|
+
### Memory Sharing Check
|
|
382
|
+
|
|
383
|
+
```
|
|
384
|
+
@raw size: 0 bytes
|
|
385
|
+
ObjectResolver size: 0 bytes
|
|
386
|
+
Same object reference: true
|
|
387
|
+
Object IDs: 2740 vs 2740
|
|
388
|
+
```
|
|
389
|
+
|
|
390
|
+
**Key Findings:**
|
|
391
|
+
- Memory sharing still works perfectly with frozen strings
|
|
392
|
+
- Even with large PDFs, both references point to the same object
|
|
393
|
+
|
|
394
|
+
### list_fields Operation
|
|
395
|
+
|
|
396
|
+
```
|
|
397
|
+
RSS Memory: 56.41 MB → 62.78 MB (Δ 6.38 MB)
|
|
398
|
+
Heap Live Slots: 82070 → 82090 (Δ 20)
|
|
399
|
+
Heap Pages: 112 → 131 (Δ 19)
|
|
400
|
+
GC Runs: 3
|
|
401
|
+
```
|
|
402
|
+
|
|
403
|
+
**Key Findings:**
|
|
404
|
+
- Large PDF list_fields adds ~6.38 MB RSS (vs 0.41 MB for small PDF)
|
|
405
|
+
- 15x more memory usage than small PDF
|
|
406
|
+
- 19 additional heap pages allocated (significant)
|
|
407
|
+
|
|
408
|
+
### flatten Operation
|
|
409
|
+
|
|
410
|
+
```
|
|
411
|
+
RSS Memory: 65.83 MB → 68.11 MB (Δ 2.28 MB)
|
|
412
|
+
Heap Live Slots: 82126 → 82324 (Δ 198)
|
|
413
|
+
Heap Pages: 131 → 131 (Δ 0)
|
|
414
|
+
GC Runs: 1
|
|
415
|
+
```
|
|
416
|
+
|
|
417
|
+
**Key Findings:**
|
|
418
|
+
- Large PDF flatten adds ~2.28 MB RSS (vs 0.7 MB for small PDF)
|
|
419
|
+
- 3.3x more memory usage than small PDF
|
|
420
|
+
|
|
421
|
+
### flatten! Operation
|
|
422
|
+
|
|
423
|
+
```
|
|
424
|
+
RSS Memory: 71.16 MB → 75.75 MB (Δ 4.59 MB)
|
|
425
|
+
Heap Live Slots: 82333 → 82334 (Δ 1)
|
|
426
|
+
Heap Pages: 131 → 131 (Δ 0)
|
|
427
|
+
GC Runs: 1
|
|
428
|
+
```
|
|
429
|
+
|
|
430
|
+
**Key Findings:**
|
|
431
|
+
- Large PDF flatten! adds ~4.59 MB RSS (vs 0.13 MB for small PDF)
|
|
432
|
+
- 35x more memory usage than small PDF
|
|
433
|
+
- But note: this is after the document has already been loaded and processed
|
|
434
|
+
|
|
435
|
+
### write Operation
|
|
436
|
+
|
|
437
|
+
```
|
|
438
|
+
RSS Memory: 78.91 MB → 81.2 MB (Δ 2.3 MB)
|
|
439
|
+
Heap Live Slots: 82441 → 82489 (Δ 48)
|
|
440
|
+
Heap Pages: 132 → 132 (Δ 0)
|
|
441
|
+
GC Runs: 2
|
|
442
|
+
```
|
|
443
|
+
|
|
444
|
+
**Key Findings:**
|
|
445
|
+
- Large PDF write adds ~2.3 MB RSS (vs 0.89 MB for small PDF)
|
|
446
|
+
- 2.6x more memory usage than small PDF
|
|
447
|
+
- Still much better than the 6.25 MB that was seen in initial measurements
|
|
448
|
+
|
|
449
|
+
### clear Operation
|
|
450
|
+
|
|
451
|
+
```
|
|
452
|
+
RSS Memory: 81.22 MB → 87.11 MB (Δ 5.89 MB)
|
|
453
|
+
Heap Live Slots: 82518 → 82547 (Δ 29)
|
|
454
|
+
Heap Pages: 132 → 133 (Δ 1)
|
|
455
|
+
GC Runs: 3
|
|
456
|
+
```
|
|
457
|
+
|
|
458
|
+
**Key Findings:**
|
|
459
|
+
- Large PDF clear adds ~5.89 MB RSS (vs 0.33 MB for small PDF)
|
|
460
|
+
- 18x more memory usage than small PDF
|
|
461
|
+
- Shows significant memory usage for full document rewrite
|
|
462
|
+
|
|
463
|
+
### ObjectResolver Cache
|
|
464
|
+
|
|
465
|
+
```
|
|
466
|
+
RSS Memory: 87.11 MB → 87.11 MB (Δ 0.0 MB)
|
|
467
|
+
Heap Live Slots: 82583 → 82576 (Δ -7)
|
|
468
|
+
Heap Pages: 133 → 133 (Δ 0)
|
|
469
|
+
GC Runs: 1
|
|
470
|
+
Cached object streams: 0
|
|
471
|
+
Cache keys: []
|
|
472
|
+
```
|
|
473
|
+
|
|
474
|
+
**Key Findings:**
|
|
475
|
+
- No object streams cached (this large PDF doesn't use object streams)
|
|
476
|
+
- Cache clearing optimization still applies (no streams to clear)
|
|
477
|
+
|
|
478
|
+
### Peak Memory During flatten
|
|
479
|
+
|
|
480
|
+
```
|
|
481
|
+
Peak RSS: 90.36 MB
|
|
482
|
+
Peak Delta: 0.03 MB
|
|
483
|
+
Duration: 0.01s
|
|
484
|
+
```
|
|
485
|
+
|
|
486
|
+
**Key Findings:**
|
|
487
|
+
- Peak memory spike of only 0.03 MB (same as small PDF!)
|
|
488
|
+
- Shows consistent peak memory regardless of document size
|
|
489
|
+
- Optimization maintains low peak memory even with large documents
|
|
490
|
+
|
|
491
|
+
---
|
|
492
|
+
|
|
493
|
+
## Large PDF Summary
|
|
494
|
+
|
|
495
|
+
### Memory Usage by Operation (Large PDF)
|
|
496
|
+
|
|
497
|
+
| Operation | RSS Delta (MB) | Heap Slots Delta | Heap Pages Delta |
|
|
498
|
+
|-----------|---------------|------------------|------------------|
|
|
499
|
+
| Document Init | 3.05 | 976 | 0 |
|
|
500
|
+
| list_fields | 6.38 | 20 | 19 |
|
|
501
|
+
| flatten | 2.28 | 198 | 0 |
|
|
502
|
+
| flatten! | 4.59 | 1 | 0 |
|
|
503
|
+
| write | 2.3 | 48 | 0 |
|
|
504
|
+
| clear | 5.89 | 29 | 1 |
|
|
505
|
+
| Cache Access | 0.0 | -7 | 0 |
|
|
506
|
+
|
|
507
|
+
### Comparison: Small vs Large PDF
|
|
508
|
+
|
|
509
|
+
| Operation | Small PDF | Large PDF | Ratio |
|
|
510
|
+
|-----------|-----------|-----------|-------|
|
|
511
|
+
| Document Init | 0.23 MB | 3.05 MB | 13x |
|
|
512
|
+
| list_fields | 0.41 MB | 6.38 MB | 15x |
|
|
513
|
+
| flatten | 0.7 MB | 2.28 MB | 3.3x |
|
|
514
|
+
| flatten! | 0.13 MB | 4.59 MB | 35x |
|
|
515
|
+
| write | 0.89 MB | 2.3 MB | 2.6x |
|
|
516
|
+
| clear | 0.33 MB | 5.89 MB | 18x |
|
|
517
|
+
| **Peak (flatten)** | **0.03 MB** | **0.03 MB** | **1x** ✅ |
|
|
518
|
+
|
|
519
|
+
### Key Insights from Large PDF
|
|
520
|
+
|
|
521
|
+
1. **Memory scales with document size**, but optimizations still provide benefits
|
|
522
|
+
2. **Peak memory stays low** (0.03 MB) even with large documents - major win!
|
|
523
|
+
3. **write operation** is much more efficient (2.3 MB vs what could be 6+ MB)
|
|
524
|
+
4. **Cache clearing** prevents memory retention even with large documents
|
|
525
|
+
5. **Memory sharing** (frozen strings) works at all document sizes
|
|
526
|
+
|
|
527
|
+
---
|
|
528
|
+
|
|
529
|
+
## How to Run Benchmarks
|
|
530
|
+
|
|
531
|
+
```bash
|
|
532
|
+
# Run all memory benchmarks
|
|
533
|
+
BENCHMARK=true bundle exec rspec spec/memory_benchmark_spec.rb
|
|
534
|
+
|
|
535
|
+
# Run specific benchmark
|
|
536
|
+
BENCHMARK=true bundle exec rspec spec/memory_benchmark_spec.rb:12
|
|
537
|
+
|
|
538
|
+
# Switch between small and large PDFs by editing spec/memory_benchmark_spec.rb
|
|
539
|
+
```
|
|
540
|
+
|
|
541
|
+
---
|
|
542
|
+
|
|
543
|
+
## Notes
|
|
544
|
+
|
|
545
|
+
- RSS measurements are approximate and may vary between runs
|
|
546
|
+
- GC.stat values depend on Ruby GC implementation
|
|
547
|
+
- ObjectSpace.memsize_of may not accurately measure large strings (returns 0)
|
|
548
|
+
- Memory sharing is verified by checking object_id equality
|
|
549
|
+
- Large PDF results show how optimizations scale with document size
|
|
550
|
+
- Peak memory optimization is most impressive - consistent at all sizes
|
|
551
|
+
|