resume-parser-ats 1.1.1 → 1.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +62 -67
- package/package.json +5 -2
- package/resume-parser-ats/SKILL.md +203 -0
- package/resume-parser-ats/references/algorithm.md +126 -0
package/README.md
CHANGED
|
@@ -1,4 +1,4 @@
|
|
|
1
|
-
# 📄 Resume Parser
|
|
1
|
+
# 📄 Resume Parser — Agent Skill
|
|
2
2
|
|
|
3
3
|
<p align="center">
|
|
4
4
|
<strong>Deep resume parsing • ATS compatibility scoring • Actionable improvement insights</strong>
|
|
@@ -10,19 +10,30 @@
|
|
|
10
10
|
|
|
11
11
|
---
|
|
12
12
|
|
|
13
|
-
|
|
13
|
+
An **agent skill** that deeply parses resumes using the **OpenResume 4-step algorithm**, extracts structured information (Name, Email, Phone, Education, Work Experience, Skills, Projects), evaluates ATS (Applicant Tracking System) compatibility, and provides prioritized, actionable suggestions to improve your resume.
|
|
14
14
|
|
|
15
15
|
## ✨ Features
|
|
16
16
|
|
|
17
17
|
- **🔍 Deep Parsing** — Extracts 10+ fields from raw text or PDF using a feature-scoring engine
|
|
18
18
|
- **📊 ATS Scoring** — Grades your resume A+ through F with detailed per-field confidence ratings
|
|
19
19
|
- **💡 Smart Suggestions** — Prioritized, categorized fixes (critical → low) with before/after examples
|
|
20
|
+
- **🤖 Agent Skill** — Install via `npx skills add` and use directly in your agent
|
|
20
21
|
- **🛠️ CLI & MCP Server** — Use interactively from the command line or as an MCP tool
|
|
21
22
|
- **⚙️ Configurable Strictness** — Lenient, moderate, or strict ATS evaluation modes
|
|
22
23
|
- **🔒 Zero Dependencies on Proprietary APIs** — Runs entirely locally with no external calls
|
|
23
24
|
|
|
24
25
|
## 📦 Installation
|
|
25
26
|
|
|
27
|
+
### As an Agent Skill (recommended)
|
|
28
|
+
|
|
29
|
+
```bash
|
|
30
|
+
npx skills add dhanushk-offl/resume-parser-skill
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
After installing, the skill is automatically available to your agent. When the agent encounters a resume-related task, it will load the skill and use it.
|
|
34
|
+
|
|
35
|
+
### Manual Setup
|
|
36
|
+
|
|
26
37
|
```bash
|
|
27
38
|
# Clone the repo
|
|
28
39
|
git clone https://github.com/dhanushk-offl/resume-parser-skill.git
|
|
@@ -37,26 +48,34 @@ npm run build
|
|
|
37
48
|
|
|
38
49
|
## 🚀 Usage
|
|
39
50
|
|
|
40
|
-
### As
|
|
51
|
+
### As an Agent Skill
|
|
52
|
+
|
|
53
|
+
Once installed via `npx skills add`, the agent will automatically use this skill when you:
|
|
54
|
+
|
|
55
|
+
- Ask to parse, review, or analyze a resume
|
|
56
|
+
- Ask "is my resume ATS-friendly?"
|
|
57
|
+
- Ask for resume improvement suggestions
|
|
58
|
+
- Upload or reference a resume PDF
|
|
59
|
+
|
|
60
|
+
The agent can invoke three tools:
|
|
61
|
+
|
|
62
|
+
| Tool | Description |
|
|
63
|
+
|------|-------------|
|
|
64
|
+
| `parse_resume` | Parse a resume PDF or raw text → structured data |
|
|
65
|
+
| `analyze_resume` | Parse + compute ATS compatibility score with per-field confidence |
|
|
66
|
+
| `suggest_improvements` | Parse + analyze + generate prioritized improvement suggestions |
|
|
67
|
+
|
|
68
|
+
### From the CLI (after manual setup)
|
|
41
69
|
|
|
42
70
|
```bash
|
|
43
71
|
# Parse a resume and output structured data
|
|
44
|
-
|
|
72
|
+
node resume-parser-ats/scripts/parse.mjs resume.pdf
|
|
45
73
|
|
|
46
74
|
# Parse + analyze ATS compatibility
|
|
47
|
-
|
|
75
|
+
node resume-parser-ats/scripts/analyze.mjs resume.pdf
|
|
48
76
|
|
|
49
77
|
# Full pipeline: parse + analyze + actionable suggestions
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
# Parse from raw text
|
|
53
|
-
npx resume-parser-ats parse "John Doe\njohn@email.com\nSoftware Engineer"
|
|
54
|
-
|
|
55
|
-
# Adjust ATS strictness
|
|
56
|
-
npx resume-parser-ats analyze resume.pdf --strictness strict
|
|
57
|
-
|
|
58
|
-
# Focus on specific areas
|
|
59
|
-
npx resume-parser-ats insights resume.pdf --focus ats,formatting --json
|
|
78
|
+
node resume-parser-ats/scripts/insights.mjs resume.pdf --strictness strict --focus ats,formatting
|
|
60
79
|
```
|
|
61
80
|
|
|
62
81
|
### As a Library
|
|
@@ -145,7 +164,7 @@ Each attribute (Name, Email, Phone, etc.) has **feature sets** — matching func
|
|
|
145
164
|
> *Before applying to jobs, run your resume through the parser to see what an ATS actually extracts.*
|
|
146
165
|
|
|
147
166
|
```bash
|
|
148
|
-
|
|
167
|
+
node resume-parser-ats/scripts/insights.mjs my-resume.pdf --strictness strict
|
|
149
168
|
```
|
|
150
169
|
|
|
151
170
|
Identify critical issues like a missing email, unparseable name, or sections an ATS can't detect — and fix them *before* you apply.
|
|
@@ -179,10 +198,6 @@ import { fullPipeline } from "resume-parser-ats";
|
|
|
179
198
|
|
|
180
199
|
const result = fullPipeline({ rawText: resumeText, strictness: "strict" });
|
|
181
200
|
|
|
182
|
-
// result.parsed — structured data
|
|
183
|
-
// result.analyzed — ATS score + field analysis
|
|
184
|
-
// result.suggestions — prioritized actions
|
|
185
|
-
|
|
186
201
|
// Feed to an LLM for natural-language coaching
|
|
187
202
|
const prompt = `You are a resume coach. Here is the analysis:
|
|
188
203
|
${JSON.stringify(result.analyzed.data)}
|
|
@@ -206,44 +221,36 @@ Suggest improvements in a friendly, encouraging tone.`;
|
|
|
206
221
|
- Flag common issues (missing dates, non-standard section headers)
|
|
207
222
|
- Provide standardized improvement templates
|
|
208
223
|
|
|
209
|
-
### 6. 🔄 Resume Migration Tool
|
|
210
|
-
|
|
211
|
-
> *Convert resumes from one format to structured JSON for database ingestion.*
|
|
212
|
-
|
|
213
|
-
```typescript
|
|
214
|
-
import { parseResume } from "resume-parser-ats";
|
|
215
|
-
|
|
216
|
-
const result = parseResume({ filePath: "legacy-resume.pdf" });
|
|
217
|
-
// result.data is a clean, typed JSON object ready for your database
|
|
218
|
-
```
|
|
219
|
-
|
|
220
224
|
## 🏗️ Architecture
|
|
221
225
|
|
|
222
226
|
```
|
|
223
|
-
resume-parser/
|
|
224
|
-
├──
|
|
225
|
-
├──
|
|
226
|
-
├──
|
|
227
|
-
├──
|
|
228
|
-
├──
|
|
229
|
-
|
|
230
|
-
│
|
|
227
|
+
resume-parser-skill/
|
|
228
|
+
├── resume-parser-ats/ # Agent skill directory
|
|
229
|
+
│ ├── SKILL.md # Skill manifest & instructions
|
|
230
|
+
│ ├── scripts/ # Executable scripts for agent use
|
|
231
|
+
│ │ ├── parse.mjs # Parse a resume → JSON
|
|
232
|
+
│ │ ├── analyze.mjs # Parse + ATS scoring → JSON
|
|
233
|
+
│ │ └── insights.mjs # Full pipeline → JSON
|
|
234
|
+
│ └── references/ # Detailed docs loaded on-demand
|
|
235
|
+
│ └── algorithm.md # Full algorithm specification
|
|
236
|
+
├── src/ # TypeScript source
|
|
237
|
+
│ ├── index.ts # Main entry point + fullPipeline()
|
|
231
238
|
│ ├── tools/
|
|
232
|
-
│ │ ├── parse-resume.ts
|
|
233
|
-
│ │ ├── analyze-resume.ts
|
|
234
|
-
│ │ └── suggest-improvements.ts
|
|
239
|
+
│ │ ├── parse-resume.ts
|
|
240
|
+
│ │ ├── analyze-resume.ts
|
|
241
|
+
│ │ └── suggest-improvements.ts
|
|
235
242
|
│ └── prompts/
|
|
236
|
-
│ ├── parser-prompt.ts
|
|
237
|
-
│ └── insights-prompt.ts
|
|
238
|
-
├── mcp-server/
|
|
239
|
-
│ └── server.ts # MCP server implementation
|
|
243
|
+
│ ├── parser-prompt.ts
|
|
244
|
+
│ └── insights-prompt.ts
|
|
240
245
|
├── bin/
|
|
241
|
-
│ └── cli.js
|
|
242
|
-
|
|
243
|
-
|
|
244
|
-
|
|
245
|
-
|
|
246
|
-
|
|
246
|
+
│ └── cli.js # CLI entry point
|
|
247
|
+
├── mcp-server/
|
|
248
|
+
│ └── server.ts # MCP server implementation
|
|
249
|
+
├── test/
|
|
250
|
+
│ └── evals/ # Evaluation test suites
|
|
251
|
+
├── AGENTS.md # Agent configuration
|
|
252
|
+
├── package.json
|
|
253
|
+
└── README.md
|
|
247
254
|
```
|
|
248
255
|
|
|
249
256
|
## 🧪 Testing
|
|
@@ -251,13 +258,10 @@ resume-parser/
|
|
|
251
258
|
```bash
|
|
252
259
|
# Run all tests
|
|
253
260
|
npm test
|
|
254
|
-
|
|
255
|
-
# Run evaluation suites
|
|
256
|
-
node --test test/evals/parse-resume.test.js
|
|
257
|
-
node --test test/evals/analyze-resume.test.js
|
|
258
|
-
node --test test/evals/suggest-improvements.test.js
|
|
259
261
|
```
|
|
260
262
|
|
|
263
|
+
86 tests covering parsing, analysis, and suggestion generation across all strictness levels.
|
|
264
|
+
|
|
261
265
|
## 🤝 Contributing
|
|
262
266
|
|
|
263
267
|
1. Fork the repository
|
|
@@ -268,19 +272,10 @@ node --test test/evals/suggest-improvements.test.js
|
|
|
268
272
|
|
|
269
273
|
## ☁️ CI/CD
|
|
270
274
|
|
|
271
|
-
This project uses GitHub Actions for continuous integration and npm publishing:
|
|
272
|
-
|
|
273
275
|
| Workflow | Trigger | What it does |
|
|
274
276
|
|----------|---------|-------------|
|
|
275
277
|
| **Build & Test** | Push/PR to `master` | Lint, build, and test across Node 18/20/22 |
|
|
276
|
-
| **Publish to npm** | Tag push `v*`
|
|
277
|
-
|
|
278
|
-
To publish a new version:
|
|
279
|
-
|
|
280
|
-
```bash
|
|
281
|
-
npm version patch # or minor, major
|
|
282
|
-
git push --follow-tags
|
|
283
|
-
```
|
|
278
|
+
| **Publish to npm** | Tag push `v*` | Builds and publishes to npmjs with provenance |
|
|
284
279
|
|
|
285
280
|
## 📄 License
|
|
286
281
|
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "resume-parser-ats",
|
|
3
|
-
"version": "1.
|
|
3
|
+
"version": "1.2.0",
|
|
4
4
|
"description": "An agent skill that deeply parses resumes, extracts structured data, and provides actionable insights to improve ATS compatibility and readability.",
|
|
5
5
|
"main": "dist/src/index.js",
|
|
6
6
|
"types": "dist/src/index.d.ts",
|
|
@@ -18,6 +18,7 @@
|
|
|
18
18
|
"files": [
|
|
19
19
|
"dist/",
|
|
20
20
|
"bin/",
|
|
21
|
+
"resume-parser-ats/",
|
|
21
22
|
"README.md",
|
|
22
23
|
"LICENSE"
|
|
23
24
|
],
|
|
@@ -27,7 +28,9 @@
|
|
|
27
28
|
"ATS",
|
|
28
29
|
"agent-skill",
|
|
29
30
|
"resume-parser",
|
|
30
|
-
"career"
|
|
31
|
+
"career",
|
|
32
|
+
"skills",
|
|
33
|
+
"npx-skills"
|
|
31
34
|
],
|
|
32
35
|
"author": "dhanush",
|
|
33
36
|
"license": "MIT",
|
|
@@ -0,0 +1,203 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: resume-parser-ats
|
|
3
|
+
description: >
|
|
4
|
+
Deeply parses resume PDFs using the OpenResume 4-step algorithm, extracts structured information
|
|
5
|
+
(Name, Email, Phone, Education, Work Experience, Skills, etc.), evaluates ATS compatibility,
|
|
6
|
+
and provides actionable improvement suggestions. Use when a user asks to parse, review, or
|
|
7
|
+
analyze a resume, check ATS-friendliness, or get resume improvement suggestions.
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
# Resume Parser — ATS Intelligence
|
|
11
|
+
|
|
12
|
+
You are a resume parsing and ATS analysis specialist. When activated, deeply parse resumes and provide structured, actionable insights.
|
|
13
|
+
|
|
14
|
+
## When to Activate
|
|
15
|
+
|
|
16
|
+
- User asks to parse, review, or analyze a resume
|
|
17
|
+
- User asks "is my resume ATS-friendly?"
|
|
18
|
+
- User asks for resume improvement suggestions
|
|
19
|
+
- User uploads or references a resume PDF
|
|
20
|
+
- User wants to compare what an ATS sees vs. their intended content
|
|
21
|
+
|
|
22
|
+
## Tools Available
|
|
23
|
+
|
|
24
|
+
For programmatic use, install the npm package:
|
|
25
|
+
|
|
26
|
+
```bash
|
|
27
|
+
npm install resume-parser-ats
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
### `parse_resume` — Extract structured data from a resume
|
|
31
|
+
|
|
32
|
+
```bash
|
|
33
|
+
npx resume-parser-ats parse <file.pdf>
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
```javascript
|
|
37
|
+
import { parseResume } from "resume-parser-ats";
|
|
38
|
+
const result = parseResume({ filePath: "/path/to/resume.pdf" });
|
|
39
|
+
// or: parseResume({ rawText: "John Doe\njohn@email.com..." })
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
**Input**: `{ filePath?: string, rawText?: string }`
|
|
43
|
+
**Output**: Structured data with profile, education, experience, skills, projects.
|
|
44
|
+
|
|
45
|
+
---
|
|
46
|
+
|
|
47
|
+
### `analyze_resume` — Parse + ATS compatibility scoring
|
|
48
|
+
|
|
49
|
+
```bash
|
|
50
|
+
npx resume-parser-ats analyze <file.pdf> --strictness strict
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
```javascript
|
|
54
|
+
import { analyzeResume } from "resume-parser-ats";
|
|
55
|
+
const result = analyzeResume({ filePath: "/path/to/resume.pdf", strictness: "moderate" });
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
**Input**: `{ filePath?, rawText?, strictness?: "lenient"|"moderate"|"strict" }`
|
|
59
|
+
**Output**: ATS score (0-100), letter grade (A+ to F), per-field confidence, section detection, format issues.
|
|
60
|
+
|
|
61
|
+
---
|
|
62
|
+
|
|
63
|
+
### `suggest_improvements` — Parse + analyze + prioritized suggestions
|
|
64
|
+
|
|
65
|
+
```bash
|
|
66
|
+
npx resume-parser-ats insights <file.pdf> --strictness strict --focus ats,formatting
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
```javascript
|
|
70
|
+
import { suggestImprovements } from "resume-parser-ats";
|
|
71
|
+
const result = suggestImprovements({ filePath: "/path/to/resume.pdf", focusAreas: ["ats", "content"] });
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
**Input**: `{ filePath?, rawText?, strictness?, focusAreas?: string[] }`
|
|
75
|
+
**Output**: Overall score, grade, quick wins, prioritized suggestions (critical → low), section analysis.
|
|
76
|
+
|
|
77
|
+
---
|
|
78
|
+
|
|
79
|
+
## Manual Parsing Algorithm
|
|
80
|
+
|
|
81
|
+
Use this algorithm when the npm package is unavailable or for understanding the parsing logic:
|
|
82
|
+
|
|
83
|
+
### Step 1: Read Text Items from PDF
|
|
84
|
+
|
|
85
|
+
Extract all text items from the PDF. Each item includes:
|
|
86
|
+
- `text` — text content
|
|
87
|
+
- `x1`, `x2` — left/right X positions (origin at bottom-left)
|
|
88
|
+
- `y` — Y position from bottom
|
|
89
|
+
- `bold` — whether text is bold
|
|
90
|
+
- `newLine` — whether this item starts a new line
|
|
91
|
+
|
|
92
|
+
### Step 2: Group Text Items into Lines
|
|
93
|
+
|
|
94
|
+
1. **Merge adjacent items**: When `Distance = RightTextItem.X₁ - LeftTextItem.X₂` is less than average character width, merge them
|
|
95
|
+
2. **Average character width**: Total character widths / total character count (exclude bold and newline elements)
|
|
96
|
+
3. **Group by Y-coordinate**: Same Y = same line
|
|
97
|
+
|
|
98
|
+
### Step 3: Group Lines into Sections
|
|
99
|
+
|
|
100
|
+
**Section title detection** (must satisfy ALL 3):
|
|
101
|
+
1. It is the only text item in the line
|
|
102
|
+
2. It is bolded
|
|
103
|
+
3. Its letters are all UPPERCASE
|
|
104
|
+
|
|
105
|
+
**Fallback**: Keyword match against known headers:
|
|
106
|
+
PROFILE, SUMMARY, OBJECTIVE, ABOUT, EDUCATION, ACADEMIC, DEGREES, EXPERIENCE, WORK EXPERIENCE, EMPLOYMENT, PROFESSIONAL EXPERIENCE, SKILLS, TECHNICAL SKILLS, COMPETENCIES, PROJECTS, PORTFOLIO, CERTIFICATIONS, LICENSES, HONORS, AWARDS, VOLUNTEER, COMMUNITY, LEADERSHIP, PUBLICATIONS, RESEARCH, INTERESTS, ACTIVITIES, HOBBIES
|
|
107
|
+
|
|
108
|
+
Lines before any section title go into PROFILE.
|
|
109
|
+
|
|
110
|
+
### Step 4: Extract Attributes via Feature Scoring
|
|
111
|
+
|
|
112
|
+
Each attribute has feature sets (matching function + score). The text item with the **highest total score** wins.
|
|
113
|
+
|
|
114
|
+
| Attribute | Core Feature | Regex |
|
|
115
|
+
|-----------|-------------|-------|
|
|
116
|
+
| Name | Only letters/spaces/periods | `/^[a-zA-Z\s\.]+$/` |
|
|
117
|
+
| Email | Email format | `/\S+@\S+\.\S+/` |
|
|
118
|
+
| Phone | Phone format | `/\(?\d{3}\)?[\s-]?\d{3}[\s-]?\d{4}/` |
|
|
119
|
+
| Location | City, ST format | `/[A-Z][a-zA-Z\s]+, [A-Z]{2}/` |
|
|
120
|
+
| URL | URL format | `/\S+\.[a-z]+\/\S+/` |
|
|
121
|
+
|
|
122
|
+
**Name scoring example**: Only letters (+3), bolded (+2), uppercase (+2), has @ (-4), has digit (-4), has comma (-4), has slash (-4)
|
|
123
|
+
|
|
124
|
+
**Subsection detection** (for Education, Work Experience):
|
|
125
|
+
- Primary: vertical line gap > typical line gap × 1.4
|
|
126
|
+
- Fallback: text item is bolded
|
|
127
|
+
|
|
128
|
+
See [references/algorithm.md](references/algorithm.md) for the full specification.
|
|
129
|
+
|
|
130
|
+
---
|
|
131
|
+
|
|
132
|
+
## ATS Compatibility Scoring
|
|
133
|
+
|
|
134
|
+
| Dimension | Weight |
|
|
135
|
+
|-----------|--------|
|
|
136
|
+
| Name extraction | 20 pts |
|
|
137
|
+
| Email extraction | 20 pts |
|
|
138
|
+
| Phone extraction | 10 pts |
|
|
139
|
+
| Section detection | 15 pts |
|
|
140
|
+
| Education parsing | 10 pts |
|
|
141
|
+
| Experience parsing | 15 pts |
|
|
142
|
+
| Skills parsing | 10 pts |
|
|
143
|
+
|
|
144
|
+
**Grading**: A+ (90-100), A (85-89), B+ (80-84), B (75-79), B- (70-74), C+ (65-69), C (60-64), D (50-59), F (0-49)
|
|
145
|
+
|
|
146
|
+
## Issue Severity Levels
|
|
147
|
+
|
|
148
|
+
- **CRITICAL**: Name or email cannot be parsed → ATS will likely discard
|
|
149
|
+
- **HIGH**: Key sections missing, dates unparseable, phone not found
|
|
150
|
+
- **MEDIUM**: Skills not extracted cleanly, formatting merge issues
|
|
151
|
+
- **LOW**: Minor inconsistencies, optional fields missing
|
|
152
|
+
|
|
153
|
+
## Output Format
|
|
154
|
+
|
|
155
|
+
Always provide results in this structured format:
|
|
156
|
+
|
|
157
|
+
```
|
|
158
|
+
## 📊 Resume Parsing Report
|
|
159
|
+
|
|
160
|
+
### ATS Compatibility Score: XX/100 (Grade: X)
|
|
161
|
+
|
|
162
|
+
### ✅ Successfully Parsed Fields
|
|
163
|
+
| Field | Parsed Value | Confidence |
|
|
164
|
+
|-------|-------------|------------|
|
|
165
|
+
| Name | John Doe | High |
|
|
166
|
+
|
|
167
|
+
### ⚠️ Issues Found
|
|
168
|
+
| # | Severity | Field | Issue | Suggestion |
|
|
169
|
+
|---|----------|-------|-------|------------|
|
|
170
|
+
| 1 | CRITICAL | Email | ... | ... |
|
|
171
|
+
|
|
172
|
+
### 📝 Priority Fixes
|
|
173
|
+
1. **[Fix Title]**: Description of what to change and why
|
|
174
|
+
- Before: `current state`
|
|
175
|
+
- After: `suggested state`
|
|
176
|
+
|
|
177
|
+
### 📋 Section-by-Section Analysis
|
|
178
|
+
#### Profile
|
|
179
|
+
- Analysis notes...
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
## Important Rules
|
|
183
|
+
|
|
184
|
+
1. **Always run all 4 parsing steps** — do not skip steps
|
|
185
|
+
2. **Always provide the ATS compatibility score** — this is the primary metric
|
|
186
|
+
3. **Every suggestion must be actionable** — not "improve formatting" but "Move the date to the same line as the company name"
|
|
187
|
+
4. **Prioritize Name and Email extraction** — if they fail, flag as CRITICAL
|
|
188
|
+
5. **Explain WHY** each suggestion matters in ATS terms
|
|
189
|
+
6. **Compare parsed output vs. likely intended content** — surface discrepancies
|
|
190
|
+
7. **Never modify the original file** — this is a read-only analysis tool
|
|
191
|
+
8. **If a PDF cannot be parsed**, fall back to raw text and note the limitation
|
|
192
|
+
9. **Flag when text items break unexpectedly** (e.g., phone numbers split across items)
|
|
193
|
+
|
|
194
|
+
## Programmatic Access
|
|
195
|
+
|
|
196
|
+
For batch processing or integration, install the npm package:
|
|
197
|
+
|
|
198
|
+
```bash
|
|
199
|
+
npm install resume-parser-ats
|
|
200
|
+
npx resume-parser-ats parse resume.pdf
|
|
201
|
+
npx resume-parser-ats analyze resume.pdf --strictness strict
|
|
202
|
+
npx resume-parser-ats insights resume.pdf --focus ats,formatting --json
|
|
203
|
+
```
|
|
@@ -0,0 +1,126 @@
|
|
|
1
|
+
# OpenResume 4-Step Parsing Algorithm
|
|
2
|
+
|
|
3
|
+
This document provides the full technical reference for the resume parsing algorithm.
|
|
4
|
+
|
|
5
|
+
## Step 1: Read Text Items from PDF
|
|
6
|
+
|
|
7
|
+
Extract all text items from the PDF using `pdfjs-dist`. Each text item includes:
|
|
8
|
+
|
|
9
|
+
| Field | Type | Description |
|
|
10
|
+
|-------|------|-------------|
|
|
11
|
+
| `text` | string | The text content |
|
|
12
|
+
| `x1` | number | Left X position |
|
|
13
|
+
| `x2` | number | Right X position |
|
|
14
|
+
| `y` | number | Y position (from page bottom) |
|
|
15
|
+
| `bold` | boolean | Whether the text is bold |
|
|
16
|
+
| `newLine` | boolean | Whether this item starts a new line |
|
|
17
|
+
|
|
18
|
+
X,Y coordinates are relative to the bottom-left corner (origin 0,0).
|
|
19
|
+
|
|
20
|
+
## Step 2: Group Text Items into Lines
|
|
21
|
+
|
|
22
|
+
1. **Merge adjacent items** when `Distance = RightTextItem.X₁ - LeftTextItem.X₂` is less than average character width
|
|
23
|
+
2. Average character width = total character widths / total character count (exclude bold and newline elements)
|
|
24
|
+
3. **Group by Y-coordinate** to form lines (same Y = same line)
|
|
25
|
+
|
|
26
|
+
This reconstructs the line-by-line reading order that may be lost in PDF extraction.
|
|
27
|
+
|
|
28
|
+
## Step 3: Group Lines into Sections
|
|
29
|
+
|
|
30
|
+
### Section Title Detection (primary heuristic — must satisfy ALL 3):
|
|
31
|
+
|
|
32
|
+
1. It is the only text item in the line
|
|
33
|
+
2. It is bolded
|
|
34
|
+
3. Its letters are all UPPERCASE
|
|
35
|
+
|
|
36
|
+
### Fallback Heuristic: Keyword matching
|
|
37
|
+
|
|
38
|
+
Known section titles: PROFILE, SUMMARY, OBJECTIVE, ABOUT, EDUCATION, ACADEMIC, DEGREES, EXPERIENCE, WORK EXPERIENCE, EMPLOYMENT, PROFESSIONAL EXPERIENCE, SKILLS, TECHNICAL SKILLS, COMPETENCIES, PROJECTS, PORTFOLIO, CERTIFICATIONS, LICENSES, HONORS, AWARDS, VOLUNTEER, COMMUNITY, LEADERSHIP, PUBLICATIONS, RESEARCH, INTERESTS, ACTIVITIES, HOBBIES
|
|
39
|
+
|
|
40
|
+
- Group all lines under their closest preceding section title
|
|
41
|
+
- Lines before any section title go into the PROFILE section
|
|
42
|
+
|
|
43
|
+
## Step 4: Extract Resume Attributes using Feature Scoring
|
|
44
|
+
|
|
45
|
+
Each attribute has **feature sets** (matching function + score). Run every text item through all feature sets for an attribute. The text item with the **highest total feature score** is extracted as that attribute.
|
|
46
|
+
|
|
47
|
+
### Subsection Detection (for Education, Work Experience, etc.)
|
|
48
|
+
|
|
49
|
+
- **Primary**: vertical line gap > typical line gap × 1.4
|
|
50
|
+
- **Fallback**: text item is bolded
|
|
51
|
+
|
|
52
|
+
### Feature Scoring Tables
|
|
53
|
+
|
|
54
|
+
#### Name
|
|
55
|
+
|
|
56
|
+
| Feature | Score |
|
|
57
|
+
|---------|-------|
|
|
58
|
+
| Contains only letters, spaces or periods | +3 |
|
|
59
|
+
| Is bolded | +2 |
|
|
60
|
+
| Contains all uppercase letters | +2 |
|
|
61
|
+
| Contains @ (may be email) | -4 |
|
|
62
|
+
| Contains number (may be phone) | -4 |
|
|
63
|
+
| Contains , (may be address) | -4 |
|
|
64
|
+
| Contains / (may be URL) | -4 |
|
|
65
|
+
|
|
66
|
+
#### Email
|
|
67
|
+
|
|
68
|
+
| Feature | Score |
|
|
69
|
+
|---------|-------|
|
|
70
|
+
| Matches email regex `\S+@\S+\.\S+` | +5 |
|
|
71
|
+
| Contains @ | +2 |
|
|
72
|
+
|
|
73
|
+
#### Phone
|
|
74
|
+
|
|
75
|
+
| Feature | Score |
|
|
76
|
+
|---------|-------|
|
|
77
|
+
| Matches phone regex `\(?\d{3}\)?[\s-]?\d{3}[\s-]?\d{4}` | +5 |
|
|
78
|
+
|
|
79
|
+
#### Location
|
|
80
|
+
|
|
81
|
+
| Feature | Score |
|
|
82
|
+
|---------|-------|
|
|
83
|
+
| Matches city,state regex `[A-Z][a-zA-Z\s]+, [A-Z]{2}` | +5 |
|
|
84
|
+
|
|
85
|
+
#### URL
|
|
86
|
+
|
|
87
|
+
| Feature | Score |
|
|
88
|
+
|---------|-------|
|
|
89
|
+
| Matches URL regex `\S+\.[a-z]+\/\S+` | +5 |
|
|
90
|
+
|
|
91
|
+
#### School
|
|
92
|
+
|
|
93
|
+
| Feature | Score |
|
|
94
|
+
|---------|-------|
|
|
95
|
+
| Contains school keyword (College, University, School, Institute, Academy) | +4 |
|
|
96
|
+
|
|
97
|
+
#### Degree
|
|
98
|
+
|
|
99
|
+
| Feature | Score |
|
|
100
|
+
|---------|-------|
|
|
101
|
+
| Contains degree keyword (Associate, Bachelor, Master, Doctorate, B.S., B.A., M.S., M.A., Ph.D.) | +4 |
|
|
102
|
+
|
|
103
|
+
#### GPA
|
|
104
|
+
|
|
105
|
+
| Feature | Score |
|
|
106
|
+
|---------|-------|
|
|
107
|
+
| Matches GPA regex `[0-4]\.\d{1,2}` | +5 |
|
|
108
|
+
|
|
109
|
+
## ATS Compatibility Scoring Framework
|
|
110
|
+
|
|
111
|
+
| Dimension | Weight |
|
|
112
|
+
|-----------|--------|
|
|
113
|
+
| Name extraction | 20 pts |
|
|
114
|
+
| Email extraction | 20 pts |
|
|
115
|
+
| Phone extraction | 10 pts |
|
|
116
|
+
| Section detection | 15 pts |
|
|
117
|
+
| Education parsing | 10 pts |
|
|
118
|
+
| Experience parsing | 15 pts |
|
|
119
|
+
| Skills parsing | 10 pts |
|
|
120
|
+
|
|
121
|
+
### Issue Severity Levels
|
|
122
|
+
|
|
123
|
+
- **CRITICAL**: Name or email cannot be parsed (ATS will likely discard)
|
|
124
|
+
- **HIGH**: Key sections missing, dates unparseable, phone not found
|
|
125
|
+
- **MEDIUM**: Skills not extracted cleanly, formatting merge issues
|
|
126
|
+
- **LOW**: Minor inconsistencies, optional fields missing
|