@nomad-e/bluma-cli 0.1.17 → 0.1.19

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (41) hide show
  1. package/README.md +62 -12
  2. package/dist/config/native_tools.json +7 -0
  3. package/dist/config/skills/git-commit/LICENSE.txt +18 -0
  4. package/dist/config/skills/git-commit/SKILL.md +258 -0
  5. package/dist/config/skills/git-commit/references/REFERENCE.md +249 -0
  6. package/dist/config/skills/git-commit/scripts/validate_commit_msg.py +163 -0
  7. package/dist/config/skills/git-pr/LICENSE.txt +18 -0
  8. package/dist/config/skills/git-pr/SKILL.md +293 -0
  9. package/dist/config/skills/git-pr/references/REFERENCE.md +256 -0
  10. package/dist/config/skills/git-pr/scripts/validate_commits.py +112 -0
  11. package/dist/config/skills/pdf/LICENSE.txt +26 -0
  12. package/dist/config/skills/pdf/SKILL.md +327 -0
  13. package/dist/config/skills/pdf/references/FORMS.md +69 -0
  14. package/dist/config/skills/pdf/references/REFERENCE.md +52 -0
  15. package/dist/config/skills/pdf/scripts/create_report.py +59 -0
  16. package/dist/config/skills/pdf/scripts/merge_pdfs.py +39 -0
  17. package/dist/config/skills/skill-creator/LICENSE.txt +26 -0
  18. package/dist/config/skills/skill-creator/SKILL.md +229 -0
  19. package/dist/config/skills/xlsx/LICENSE.txt +18 -0
  20. package/dist/config/skills/xlsx/SKILL.md +298 -0
  21. package/dist/config/skills/xlsx/references/REFERENCE.md +337 -0
  22. package/dist/config/skills/xlsx/scripts/office/__init__.py +2 -0
  23. package/dist/config/skills/xlsx/scripts/office/__pycache__/__init__.cpython-312.pyc +0 -0
  24. package/dist/config/skills/xlsx/scripts/office/__pycache__/pack.cpython-312.pyc +0 -0
  25. package/dist/config/skills/xlsx/scripts/office/__pycache__/soffice.cpython-312.pyc +0 -0
  26. package/dist/config/skills/xlsx/scripts/office/__pycache__/unpack.cpython-312.pyc +0 -0
  27. package/dist/config/skills/xlsx/scripts/office/__pycache__/validate.cpython-312.pyc +0 -0
  28. package/dist/config/skills/xlsx/scripts/office/pack.py +58 -0
  29. package/dist/config/skills/xlsx/scripts/office/soffice.py +180 -0
  30. package/dist/config/skills/xlsx/scripts/office/unpack.py +63 -0
  31. package/dist/config/skills/xlsx/scripts/office/validate.py +122 -0
  32. package/dist/config/skills/xlsx/scripts/recalc.py +143 -0
  33. package/dist/main.js +275 -89
  34. package/package.json +1 -1
  35. package/dist/config/example.bluma-mcp.json.txt +0 -14
  36. package/dist/config/models_config.json +0 -78
  37. package/dist/skills/git-conventional/LICENSE.txt +0 -3
  38. package/dist/skills/git-conventional/SKILL.md +0 -83
  39. package/dist/skills/skill-creator/SKILL.md +0 -495
  40. package/dist/skills/testing/LICENSE.txt +0 -3
  41. package/dist/skills/testing/SKILL.md +0 -114
@@ -0,0 +1,256 @@
1
+ # Git PR — Advanced Patterns Reference
2
+
3
+ ## Monorepo Pull Requests
4
+
5
+ When working in a monorepo, PRs should be scoped to a single package or
6
+ service when possible.
7
+
8
+ ### Title Convention
9
+
10
+ ```
11
+ feat(packages/auth): add PKCE support for OAuth2 flow
12
+ ```
13
+
14
+ ### Description Structure
15
+
16
+ Add a **Scope** section after Description:
17
+
18
+ ```markdown
19
+ ## Scope
20
+
21
+ This PR affects only the `packages/auth` module. No changes to other
22
+ packages.
23
+
24
+ ### Packages Modified
25
+ - `packages/auth` — Core changes
26
+ - `packages/shared-types` — Updated `AuthConfig` interface
27
+ ```
28
+
29
+ ### Cross-Package Dependencies
30
+
31
+ If the PR touches multiple packages, list the dependency graph:
32
+
33
+ ```markdown
34
+ ## Cross-Package Impact
35
+
36
+ ```
37
+ packages/auth (changed)
38
+ └── packages/shared-types (interface update)
39
+ └── packages/api (consumer — no code changes, recompile needed)
40
+ ```
41
+ ```
42
+
43
+ ## Release Pull Requests
44
+
45
+ Release PRs aggregate multiple feature/fix PRs into a versioned release.
46
+
47
+ ### Title Format
48
+
49
+ ```
50
+ chore(release): v2.4.0
51
+ ```
52
+
53
+ ### Body Template Addition
54
+
55
+ Add a **Changelog** section:
56
+
57
+ ```markdown
58
+ ## Changelog
59
+
60
+ ### Features
61
+ - feat(auth): add JWT refresh token rotation (#123)
62
+ - feat(pdf): add CSV-to-PDF report generation (#125)
63
+
64
+ ### Bug Fixes
65
+ - fix(agent): handle empty tool responses gracefully (#124)
66
+
67
+ ### Breaking Changes
68
+ - BREAKING: `AuthConfig.tokenExpiry` renamed to `AuthConfig.accessTokenTTL` (#123)
69
+
70
+ ### Migration Guide
71
+ 1. Update all references to `tokenExpiry` → `accessTokenTTL`
72
+ 2. Set `refreshTokenTTL` (new required field, default: 7d)
73
+ ```
74
+
75
+ ## Hotfix Pull Requests
76
+
77
+ Hotfixes bypass the normal development flow and go directly to production.
78
+
79
+ ### Branch Naming
80
+
81
+ ```
82
+ hotfix/<issue-id>-<short-description>
83
+ ```
84
+
85
+ Example: `hotfix/456-fix-auth-crash`
86
+
87
+ ### Title Format
88
+
89
+ ```
90
+ fix(auth): prevent null pointer on expired session [HOTFIX]
91
+ ```
92
+
93
+ ### Additional Body Sections
94
+
95
+ ```markdown
96
+ ## Hotfix Justification
97
+
98
+ **Severity**: Critical / High / Medium
99
+ **Impact**: {describe user impact}
100
+ **Root Cause**: {brief root cause analysis}
101
+ **Temporary or Permanent**: {is this a permanent fix or a stopgap?}
102
+
103
+ ## Rollback Plan
104
+
105
+ 1. Revert commit {hash}
106
+ 2. Redeploy previous version {tag}
107
+ 3. Verify via {health check endpoint}
108
+ ```
109
+
110
+ ## PR Labels
111
+
112
+ Suggest appropriate labels based on the change type:
113
+
114
+ | Change Type | Suggested Labels |
115
+ |-------------|-----------------|
116
+ | Bug fix | `bug`, `priority:high` (if critical) |
117
+ | New feature | `enhancement`, `feature` |
118
+ | Breaking change | `breaking-change`, `major` |
119
+ | Documentation | `documentation` |
120
+ | Refactoring | `refactoring`, `tech-debt` |
121
+ | Performance | `performance`, `optimization` |
122
+ | Security | `security`, `priority:critical` |
123
+ | Dependencies | `dependencies`, `chore` |
124
+
125
+ When using `gh pr create`, add labels:
126
+
127
+ ```bash
128
+ gh pr create \
129
+ --title "feat(auth): add MFA support" \
130
+ --label "enhancement" \
131
+ --label "feature" \
132
+ --body "..."
133
+ ```
134
+
135
+ ## Reviewer Assignment
136
+
137
+ Suggest reviewers based on files changed:
138
+
139
+ ```bash
140
+ # Find who has most commits in the changed files
141
+ git log --format='%an' -- <changed-files> | sort | uniq -c | sort -rn | head -5
142
+ ```
143
+
144
+ Add reviewers when creating the PR:
145
+
146
+ ```bash
147
+ gh pr create \
148
+ --title "..." \
149
+ --reviewer "username1,username2" \
150
+ --body "..."
151
+ ```
152
+
153
+ ## Draft Pull Requests
154
+
155
+ For work-in-progress changes, create draft PRs:
156
+
157
+ ```bash
158
+ gh pr create \
159
+ --title "feat(agent): implement streaming responses" \
160
+ --draft \
161
+ --body "..."
162
+ ```
163
+
164
+ Add a **Status** section to draft PR bodies:
165
+
166
+ ```markdown
167
+ ## Status
168
+
169
+ 🚧 **Work in Progress** — Do not merge
170
+
171
+ ### Completed
172
+ - [x] Core streaming implementation
173
+ - [x] Token-by-token output
174
+
175
+ ### Remaining
176
+ - [ ] Error handling for stream interruption
177
+ - [ ] Unit tests
178
+ - [ ] Documentation update
179
+ ```
180
+
181
+ ## Stacked Pull Requests
182
+
183
+ For large features, break into smaller, reviewable PRs:
184
+
185
+ ### Naming Convention
186
+
187
+ ```
188
+ feat(auth)/1-data-models
189
+ feat(auth)/2-api-endpoints
190
+ feat(auth)/3-frontend-ui
191
+ ```
192
+
193
+ ### Body Addition
194
+
195
+ ```markdown
196
+ ## PR Stack
197
+
198
+ This is PR **2 of 3** in the auth feature stack.
199
+
200
+ | # | PR | Status |
201
+ |---|---|--------|
202
+ | 1 | #100 — Data models | ✅ Merged |
203
+ | 2 | #101 — API endpoints (this PR) | 🔄 In Review |
204
+ | 3 | #102 — Frontend UI | ⏳ Blocked on #101 |
205
+
206
+ **Base branch**: `feat/auth-1-data-models` (not `main`)
207
+ ```
208
+
209
+ ## Commit Squashing Strategy
210
+
211
+ Before creating a PR, consider squashing noisy commits:
212
+
213
+ ### When to Squash
214
+ - Multiple "fix typo" or "WIP" commits
215
+ - Commits that only fix linting or formatting
216
+ - Commits that undo then redo changes
217
+
218
+ ### When NOT to Squash
219
+ - Each commit represents a logical, reviewable unit
220
+ - Commit history tells a useful story of the implementation
221
+ - Different commits affect different modules
222
+
223
+ ### Interactive Rebase (manual, not for BluMa to run automatically)
224
+
225
+ ```bash
226
+ git rebase -i main
227
+ # Mark commits as 'squash' or 'fixup'
228
+ ```
229
+
230
+ Note: BluMa should NOT automatically rebase or squash. Instead, suggest
231
+ it to the user when appropriate.
232
+
233
+ ## Multi-Issue PRs
234
+
235
+ When a PR addresses multiple issues:
236
+
237
+ ```markdown
238
+ ## Related Issues
239
+
240
+ Closes #123
241
+ Closes #125
242
+ Refs #130 (partial — remaining work tracked separately)
243
+ ```
244
+
245
+ ## PR Size Guidelines
246
+
247
+ | Size | Files Changed | Lines Changed | Review Time |
248
+ |------|--------------|---------------|-------------|
249
+ | XS | 1-2 | < 50 | 10 min |
250
+ | S | 3-5 | 50-200 | 30 min |
251
+ | M | 6-10 | 200-500 | 1 hour |
252
+ | L | 11-20 | 500-1000 | 2+ hours |
253
+ | XL | 20+ | 1000+ | Split recommended |
254
+
255
+ If a PR exceeds **L** size, BluMa should suggest splitting it into
256
+ stacked PRs (see above).
@@ -0,0 +1,112 @@
1
+ #!/usr/bin/env python3
2
+ """
3
+ Validate that commits in a given range follow Conventional Commits format
4
+ and include the BluMa watermark trailer.
5
+
6
+ Usage:
7
+ python validate_commits.py [base_branch]
8
+
9
+ Arguments:
10
+ base_branch Target branch to compare against (default: main)
11
+
12
+ Exit codes:
13
+ 0 All commits are valid
14
+ 1 One or more commits have issues
15
+ """
16
+
17
+ import subprocess
18
+ import sys
19
+ import re
20
+
21
+ CONVENTIONAL_PATTERN = re.compile(
22
+ r'^(feat|fix|refactor|docs|test|perf|style|chore|security|revert)'
23
+ r'(\([a-zA-Z0-9_/.-]+\))?'
24
+ r'!?:\s.{1,72}$'
25
+ )
26
+
27
+ WATERMARK = "Generated-by: BluMa"
28
+
29
+
30
+ def get_commits(base: str) -> list[dict]:
31
+ result = subprocess.run(
32
+ ["git", "log", f"{base}..HEAD", "--format=%H|||%s|||%b%n---END---"],
33
+ capture_output=True, text=True
34
+ )
35
+ if result.returncode != 0:
36
+ print(f"Error running git log: {result.stderr.strip()}")
37
+ sys.exit(1)
38
+
39
+ raw = result.stdout.strip()
40
+ if not raw:
41
+ return []
42
+
43
+ commits = []
44
+ for block in raw.split("---END---"):
45
+ block = block.strip()
46
+ if not block:
47
+ continue
48
+ parts = block.split("|||", 2)
49
+ if len(parts) >= 2:
50
+ commits.append({
51
+ "hash": parts[0].strip()[:8],
52
+ "subject": parts[1].strip(),
53
+ "body": parts[2].strip() if len(parts) > 2 else ""
54
+ })
55
+ return commits
56
+
57
+
58
+ def validate(commits: list[dict]) -> list[str]:
59
+ issues = []
60
+
61
+ for c in commits:
62
+ h = c["hash"]
63
+ subj = c["subject"]
64
+
65
+ if not CONVENTIONAL_PATTERN.match(subj):
66
+ issues.append(
67
+ f" [{h}] Subject does not follow Conventional Commits: \"{subj}\""
68
+ )
69
+
70
+ if len(subj) > 72:
71
+ issues.append(
72
+ f" [{h}] Subject exceeds 72 chars ({len(subj)}): \"{subj}\""
73
+ )
74
+
75
+ if WATERMARK not in c["body"] and WATERMARK not in subj:
76
+ issues.append(
77
+ f" [{h}] Missing BluMa watermark trailer"
78
+ )
79
+
80
+ return issues
81
+
82
+
83
+ def main():
84
+ base = sys.argv[1] if len(sys.argv) > 1 else "main"
85
+
86
+ commits = get_commits(base)
87
+ if not commits:
88
+ print(f"No commits found between {base} and HEAD.")
89
+ sys.exit(0)
90
+
91
+ print(f"Validating {len(commits)} commit(s) against {base}...\n")
92
+
93
+ for c in commits:
94
+ status = "OK" if CONVENTIONAL_PATTERN.match(c["subject"]) else "WARN"
95
+ print(f" [{c['hash']}] {status}: {c['subject']}")
96
+
97
+ issues = validate(commits)
98
+
99
+ print()
100
+ if issues:
101
+ print(f"Found {len(issues)} issue(s):\n")
102
+ for issue in issues:
103
+ print(issue)
104
+ print("\nPlease fix the above before creating a PR.")
105
+ sys.exit(1)
106
+ else:
107
+ print("All commits are valid. Ready for PR.")
108
+ sys.exit(0)
109
+
110
+
111
+ if __name__ == "__main__":
112
+ main()
@@ -0,0 +1,26 @@
1
+ © 2026 NomadEngenuity LDA. All rights reserved.
2
+
3
+ LICENSE: Use of these materials (including all code, prompts, assets, files,
4
+ and other components of this Skill) is governed by your agreement with
5
+ NomadEngenuity LDA regarding use of NomadEngenuity LDA's services. If no
6
+ separate agreement exists, use is governed by NomadEngenuity LDA's applicable
7
+ Terms of Service.
8
+
9
+ ADDITIONAL RESTRICTIONS: Notwithstanding anything in the Agreement to the
10
+ contrary, users may not:
11
+
12
+ - Extract these materials from the Services or retain copies of these
13
+ materials outside the Services
14
+ - Reproduce or copy these materials, except for temporary copies created
15
+ automatically during authorized use of the Services
16
+ - Create derivative works based on these materials
17
+ - Distribute, sublicense, or transfer these materials to any third party
18
+ - Make, offer to sell, sell, or import any inventions embodied in these
19
+ materials
20
+ - Reverse engineer, decompile, or disassemble these materials
21
+
22
+ The receipt, viewing, or possession of these materials does not convey or
23
+ imply any license or right beyond those expressly granted above.
24
+
25
+ NomadEngenuity LDA retains all right, title, and interest in these materials,
26
+ including all copyrights, patents, and other intellectual property rights.
@@ -0,0 +1,327 @@
1
+ ---
2
+ name: pdf
3
+ description: >
4
+ Use this skill for any task involving PDF files. Triggers include: creating
5
+ new PDFs from scratch, reading or extracting text and tables from PDFs,
6
+ merging or splitting PDFs, rotating pages, adding watermarks, encrypting or
7
+ decrypting PDFs, filling PDF forms, extracting images, and OCR on scanned
8
+ PDFs to make them searchable. Use this skill whenever the user mentions
9
+ .pdf, "PDF document", "create a report as PDF", "extract from PDF",
10
+ "merge PDFs", "split PDF", "protect PDF", "fill a form", or any
11
+ PDF-related task — even if not explicitly stated as such.
12
+
13
+ license: Proprietary. LICENSE.txt has complete terms
14
+ ---
15
+
16
+ # PDF Processing Guide
17
+
18
+ ## Overview
19
+
20
+ This guide covers essential PDF processing operations using Python libraries
21
+ and command-line tools. For advanced features, JavaScript libraries, and
22
+ detailed examples, see references/REFERENCE.md. If you need to fill out a
23
+ PDF form, read references/FORMS.md and follow its instructions.
24
+
25
+ ## Quick Start
26
+
27
+ ```python
28
+ from pypdf import PdfReader, PdfWriter
29
+
30
+ # Read a PDF
31
+ reader = PdfReader("document.pdf")
32
+ print(f"Pages: {len(reader.pages)}")
33
+
34
+ # Extract text
35
+ text = ""
36
+ for page in reader.pages:
37
+ text += page.extract_text()
38
+ ```
39
+
40
+ ## Python Libraries
41
+
42
+ ### pypdf — Basic Operations
43
+
44
+ #### Merge PDFs
45
+ ```python
46
+ from pypdf import PdfWriter, PdfReader
47
+
48
+ writer = PdfWriter()
49
+ for pdf_file in ["doc1.pdf", "doc2.pdf", "doc3.pdf"]:
50
+ reader = PdfReader(pdf_file)
51
+ for page in reader.pages:
52
+ writer.add_page(page)
53
+
54
+ with open("merged.pdf", "wb") as output:
55
+ writer.write(output)
56
+ ```
57
+
58
+ #### Split PDF
59
+ ```python
60
+ reader = PdfReader("input.pdf")
61
+ for i, page in enumerate(reader.pages):
62
+ writer = PdfWriter()
63
+ writer.add_page(page)
64
+ with open(f"page_{i+1}.pdf", "wb") as output:
65
+ writer.write(output)
66
+ ```
67
+
68
+ #### Extract Metadata
69
+ ```python
70
+ reader = PdfReader("document.pdf")
71
+ meta = reader.metadata
72
+ print(f"Title: {meta.title}")
73
+ print(f"Author: {meta.author}")
74
+ print(f"Subject: {meta.subject}")
75
+ print(f"Creator: {meta.creator}")
76
+ ```
77
+
78
+ #### Rotate Pages
79
+ ```python
80
+ reader = PdfReader("input.pdf")
81
+ writer = PdfWriter()
82
+
83
+ page = reader.pages[0]
84
+ page.rotate(90) # Rotate 90 degrees clockwise
85
+ writer.add_page(page)
86
+
87
+ with open("rotated.pdf", "wb") as output:
88
+ writer.write(output)
89
+ ```
90
+
91
+ ### pdfplumber — Text and Table Extraction
92
+
93
+ #### Extract Text with Layout
94
+ ```python
95
+ import pdfplumber
96
+
97
+ with pdfplumber.open("document.pdf") as pdf:
98
+ for page in pdf.pages:
99
+ text = page.extract_text()
100
+ print(text)
101
+ ```
102
+
103
+ #### Extract Tables
104
+ ```python
105
+ with pdfplumber.open("document.pdf") as pdf:
106
+ for i, page in enumerate(pdf.pages):
107
+ tables = page.extract_tables()
108
+ for j, table in enumerate(tables):
109
+ print(f"Table {j+1} on page {i+1}:")
110
+ for row in table:
111
+ print(row)
112
+ ```
113
+
114
+ #### Advanced Table Extraction
115
+ ```python
116
+ import pandas as pd
117
+
118
+ with pdfplumber.open("document.pdf") as pdf:
119
+ all_tables = []
120
+ for page in pdf.pages:
121
+ tables = page.extract_tables()
122
+ for table in tables:
123
+ if table:
124
+ df = pd.DataFrame(table[1:], columns=table[0])
125
+ all_tables.append(df)
126
+
127
+ if all_tables:
128
+ combined_df = pd.concat(all_tables, ignore_index=True)
129
+ combined_df.to_excel("extracted_tables.xlsx", index=False)
130
+ ```
131
+
132
+ ### reportlab — Create PDFs
133
+
134
+ #### Basic PDF Creation
135
+ ```python
136
+ from reportlab.lib.pagesizes import A4
137
+ from reportlab.pdfgen import canvas
138
+
139
+ c = canvas.Canvas("hello.pdf", pagesize=A4)
140
+ width, height = A4
141
+
142
+ # Add text
143
+ c.drawString(100, height - 100, "Hello World!")
144
+ c.drawString(100, height - 120, "This is a PDF created with reportlab")
145
+
146
+ # Add a line
147
+ c.line(100, height - 140, 400, height - 140)
148
+
149
+ c.save()
150
+ ```
151
+
152
+ #### Create PDF with Multiple Pages
153
+ ```python
154
+ from reportlab.lib.pagesizes import A4
155
+ from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, PageBreak
156
+ from reportlab.lib.styles import getSampleStyleSheet
157
+
158
+ doc = SimpleDocTemplate("report.pdf", pagesize=A4)
159
+ styles = getSampleStyleSheet()
160
+ story = []
161
+
162
+ # Add content
163
+ title = Paragraph("Report Title", styles['Title'])
164
+ story.append(title)
165
+ story.append(Spacer(1, 12))
166
+
167
+ body = Paragraph("This is the body of the report. " * 20, styles['Normal'])
168
+ story.append(body)
169
+ story.append(PageBreak())
170
+
171
+ # Page 2
172
+ story.append(Paragraph("Page 2", styles['Heading1']))
173
+ story.append(Paragraph("Content for page 2", styles['Normal']))
174
+
175
+ # Build PDF
176
+ doc.build(story)
177
+ ```
178
+
179
+ #### Subscripts and Superscripts
180
+
181
+ **IMPORTANT**: Never use Unicode subscript/superscript characters (₀₁₂₃₄₅₆₇₈₉,
182
+ ⁰¹²³⁴⁵⁶⁷⁸⁹) in ReportLab PDFs. The built-in fonts do not include these glyphs,
183
+ causing them to render as solid black boxes.
184
+
185
+ pdftk file1.pdf file2.pdf cat output merged.pdf
186
+
187
+ # Split
188
+ pdftk input.pdf burst
189
+ from reportlab.lib.styles import getSampleStyleSheet
190
+
191
+ styles = getSampleStyleSheet()
192
+
193
+ # Subscripts: use <sub> tag
194
+ chemical = Paragraph("H<sub>2</sub>O", styles['Normal'])
195
+
196
+ # Superscripts: use <super> tag
197
+ squared = Paragraph("x<super>2</super> + y<super>2</super>", styles['Normal'])
198
+ ```
199
+
200
+ For canvas-drawn text (not Paragraph objects), manually adjust font size and
201
+ position rather than using Unicode subscripts/superscripts.
202
+
203
+ ## Command-Line Tools
204
+
205
+ ### pdftotext (poppler-utils)
206
+ ```bash
207
+ # Extract text
208
+ pdftotext input.pdf output.txt
209
+
210
+ # Extract text preserving layout
211
+ pdftotext -layout input.pdf output.txt
212
+
213
+ # Extract specific pages
214
+ pdftotext -f 1 -l 5 input.pdf output.txt # Pages 1-5
215
+ ```
216
+
217
+ ### qpdf
218
+ ```bash
219
+ # Merge PDFs
220
+ qpdf --empty --pages file1.pdf file2.pdf -- merged.pdf
221
+
222
+ # Split pages
223
+ qpdf input.pdf --pages . 1-5 -- pages1-5.pdf
224
+ qpdf input.pdf --pages . 6-10 -- pages6-10.pdf
225
+
226
+ # Rotate pages
227
+ qpdf input.pdf output.pdf --rotate=+90:1 # Rotate page 1 by 90 degrees
228
+
229
+ # Remove password
230
+ qpdf --password=mypassword --decrypt encrypted.pdf decrypted.pdf
231
+ ```
232
+
233
+ ### pdftk (if available)
234
+ ```bash
235
+ # Merge
236
+ pdftk file1.pdf file2.pdf cat output merged.pdf
237
+
238
+ # Split
239
+ pdftk input.pdf burst
240
+
241
+ # Rotate
242
+ pdftk input.pdf rotate 1east output rotated.pdf
243
+ ```
244
+
245
+ ## Common Tasks
246
+
247
+ ### Extract Text from Scanned PDFs
248
+ ```python
249
+ # Requires: pip install pytesseract pdf2image
250
+ import pytesseract
251
+ from pdf2image import convert_from_path
252
+
253
+ # Convert PDF to images
254
+ images = convert_from_path('scanned.pdf')
255
+
256
+ # OCR each page
257
+ text = ""
258
+ for i, image in enumerate(images):
259
+ text += f"Page {i+1}:\n"
260
+ text += pytesseract.image_to_string(image, lang="por+eng")
261
+ text += "\n\n"
262
+
263
+ print(text)
264
+ ```
265
+
266
+ ### Add Watermark
267
+ ```python
268
+ from pypdf import PdfReader, PdfWriter
269
+
270
+ # Load watermark
271
+ watermark = PdfReader("watermark.pdf").pages[0]
272
+
273
+ # Apply to all pages
274
+ reader = PdfReader("document.pdf")
275
+ writer = PdfWriter()
276
+
277
+ for page in reader.pages:
278
+ page.merge_page(watermark)
279
+ writer.add_page(page)
280
+
281
+ with open("watermarked.pdf", "wb") as output:
282
+ writer.write(output)
283
+ ```
284
+
285
+ ### Extract Images
286
+ ```bash
287
+ # Using pdfimages (poppler-utils)
288
+ pdfimages -j input.pdf output_prefix
289
+ # Extracts all images as output_prefix-000.jpg, output_prefix-001.jpg, etc.
290
+ ```
291
+
292
+ ### Password Protection
293
+ ```python
294
+ from pypdf import PdfReader, PdfWriter
295
+
296
+ reader = PdfReader("input.pdf")
297
+ writer = PdfWriter()
298
+
299
+ for page in reader.pages:
300
+ writer.add_page(page)
301
+
302
+ # Add password
303
+ writer.encrypt("userpassword", "ownerpassword")
304
+
305
+ with open("encrypted.pdf", "wb") as output:
306
+ writer.write(output)
307
+ ```
308
+
309
+ ## Quick Reference
310
+
311
+ | Task | Best Tool | Command/Code |
312
+ |-----------------------|--------------|-------------------------------------|
313
+ | Merge PDFs | pypdf | `writer.add_page(page)` |
314
+ | Split PDFs | pypdf | One page per file |
315
+ | Extract text | pdfplumber | `page.extract_text()` |
316
+ | Extract tables | pdfplumber | `page.extract_tables()` |
317
+ | Create PDFs | reportlab | Canvas or Platypus |
318
+ | Command line merge | qpdf | `qpdf --empty --pages ...` |
319
+ | OCR scanned PDFs | pytesseract | Convert to image first |
320
+ | Fill PDF forms | pypdf / pdf-lib (see FORMS.md) | See references/FORMS.md |
321
+
322
+ ## Next Steps
323
+
324
+ - For advanced pypdfium2 usage, see references/REFERENCE.md
325
+ - For JavaScript libraries (pdf-lib), see references/REFERENCE.md
326
+ - If you need to fill out a PDF form, follow the instructions in references/FORMS.md
327
+ - For troubleshooting guides, see references/REFERENCE.md