dev-laiser 0.2.2__tar.gz → 0.2.4__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {dev_laiser-0.2.2 → dev_laiser-0.2.4}/PKG-INFO +3 -3
- {dev_laiser-0.2.2 → dev_laiser-0.2.4}/README.md +2 -2
- {dev_laiser-0.2.2 → dev_laiser-0.2.4}/dev_laiser.egg-info/PKG-INFO +3 -3
- {dev_laiser-0.2.2 → dev_laiser-0.2.4}/laiser/llm_methods.py +57 -6
- {dev_laiser-0.2.2 → dev_laiser-0.2.4}/laiser/params.py +2 -4
- {dev_laiser-0.2.2 → dev_laiser-0.2.4}/laiser/skill_extractor.py +38 -16
- {dev_laiser-0.2.2 → dev_laiser-0.2.4}/setup.py +1 -1
- {dev_laiser-0.2.2 → dev_laiser-0.2.4}/LICENSE +0 -0
- {dev_laiser-0.2.2 → dev_laiser-0.2.4}/dev_laiser.egg-info/SOURCES.txt +0 -0
- {dev_laiser-0.2.2 → dev_laiser-0.2.4}/dev_laiser.egg-info/dependency_links.txt +0 -0
- {dev_laiser-0.2.2 → dev_laiser-0.2.4}/dev_laiser.egg-info/requires.txt +0 -0
- {dev_laiser-0.2.2 → dev_laiser-0.2.4}/dev_laiser.egg-info/top_level.txt +0 -0
- {dev_laiser-0.2.2 → dev_laiser-0.2.4}/laiser/__init__.py +0 -0
- {dev_laiser-0.2.2 → dev_laiser-0.2.4}/laiser/utils.py +0 -0
- {dev_laiser-0.2.2 → dev_laiser-0.2.4}/setup.cfg +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.2
|
|
2
2
|
Name: dev-laiser
|
|
3
|
-
Version: 0.2.
|
|
3
|
+
Version: 0.2.4
|
|
4
4
|
Summary: LAiSER (Leveraging Artificial Intelligence for Skill Extraction & Research) is a tool designed to help learners, educators, and employers extract and share trusted information about skills. It uses a fine-tuned language model to extract raw skill keywords from text, then aligns them with a predefined taxonomy. You can find more technical details in the project’s paper.md and an overview in the README.md.
|
|
5
5
|
Home-page: https://github.com/LAiSER-Software/extract-module
|
|
6
6
|
Author: Satya Phanindra Kumar Kalaga, Bharat Khandelwal, Prudhvi Chekuri
|
|
@@ -75,7 +75,7 @@ LAiSER is a tool that helps learners, educators and employers share trusted and
|
|
|
75
75
|
Before proceeding to LAiSER, you'd want to follow the steps below to install the required dependencies:
|
|
76
76
|
- Clone the repository using
|
|
77
77
|
```shell
|
|
78
|
-
git clone https://github.com/
|
|
78
|
+
git clone https://github.com/LAiSER-Software/extract-module.git
|
|
79
79
|
```
|
|
80
80
|
or download the [zip(link)](https://github.com/Micah-Sanders/LAiSER/archive/refs/heads/main.zip) file and extract it.
|
|
81
81
|
|
|
@@ -104,7 +104,7 @@ To use LAiSER as a command line tool, follow the steps below:
|
|
|
104
104
|
|
|
105
105
|
- Navigate to the root directory of the repository and run the command below:
|
|
106
106
|
```shell
|
|
107
|
-
pip install laiser
|
|
107
|
+
pip install dev-laiser
|
|
108
108
|
```
|
|
109
109
|
|
|
110
110
|
- Once the installation is complete, you can run the tool using the command below:
|
|
@@ -34,7 +34,7 @@ LAiSER is a tool that helps learners, educators and employers share trusted and
|
|
|
34
34
|
Before proceeding to LAiSER, you'd want to follow the steps below to install the required dependencies:
|
|
35
35
|
- Clone the repository using
|
|
36
36
|
```shell
|
|
37
|
-
git clone https://github.com/
|
|
37
|
+
git clone https://github.com/LAiSER-Software/extract-module.git
|
|
38
38
|
```
|
|
39
39
|
or download the [zip(link)](https://github.com/Micah-Sanders/LAiSER/archive/refs/heads/main.zip) file and extract it.
|
|
40
40
|
|
|
@@ -63,7 +63,7 @@ To use LAiSER as a command line tool, follow the steps below:
|
|
|
63
63
|
|
|
64
64
|
- Navigate to the root directory of the repository and run the command below:
|
|
65
65
|
```shell
|
|
66
|
-
pip install laiser
|
|
66
|
+
pip install dev-laiser
|
|
67
67
|
```
|
|
68
68
|
|
|
69
69
|
- Once the installation is complete, you can run the tool using the command below:
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.2
|
|
2
2
|
Name: dev-laiser
|
|
3
|
-
Version: 0.2.
|
|
3
|
+
Version: 0.2.4
|
|
4
4
|
Summary: LAiSER (Leveraging Artificial Intelligence for Skill Extraction & Research) is a tool designed to help learners, educators, and employers extract and share trusted information about skills. It uses a fine-tuned language model to extract raw skill keywords from text, then aligns them with a predefined taxonomy. You can find more technical details in the project’s paper.md and an overview in the README.md.
|
|
5
5
|
Home-page: https://github.com/LAiSER-Software/extract-module
|
|
6
6
|
Author: Satya Phanindra Kumar Kalaga, Bharat Khandelwal, Prudhvi Chekuri
|
|
@@ -75,7 +75,7 @@ LAiSER is a tool that helps learners, educators and employers share trusted and
|
|
|
75
75
|
Before proceeding to LAiSER, you'd want to follow the steps below to install the required dependencies:
|
|
76
76
|
- Clone the repository using
|
|
77
77
|
```shell
|
|
78
|
-
git clone https://github.com/
|
|
78
|
+
git clone https://github.com/LAiSER-Software/extract-module.git
|
|
79
79
|
```
|
|
80
80
|
or download the [zip(link)](https://github.com/Micah-Sanders/LAiSER/archive/refs/heads/main.zip) file and extract it.
|
|
81
81
|
|
|
@@ -104,7 +104,7 @@ To use LAiSER as a command line tool, follow the steps below:
|
|
|
104
104
|
|
|
105
105
|
- Navigate to the root directory of the repository and run the command below:
|
|
106
106
|
```shell
|
|
107
|
-
pip install laiser
|
|
107
|
+
pip install dev-laiser
|
|
108
108
|
```
|
|
109
109
|
|
|
110
110
|
- Once the installation is complete, you can run the tool using the command below:
|
|
@@ -48,7 +48,7 @@ Rev No. Date Author Description
|
|
|
48
48
|
[1.0.0] 07/10/2024 Satya Phanindra K. Define all the LLM methods being used in the project
|
|
49
49
|
[1.0.1] 07/19/2024 Satya Phanindra K. Add descriptions to each method
|
|
50
50
|
[1.0.2] 11/24/2024 Prudhvi Chekuri Add support for skills extraction from syllabi data
|
|
51
|
-
[1.0.3]
|
|
51
|
+
[1.0.3] 03/12/2025 Prudhvi Chekuri Implement functions to extract levels, KSAs from job descriptions and syllabi data using vLLM
|
|
52
52
|
|
|
53
53
|
TODO:
|
|
54
54
|
-----
|
|
@@ -220,20 +220,21 @@ def get_completion(input_text, text_columns, input_type, model, tokenizer) -> st
|
|
|
220
220
|
|
|
221
221
|
|
|
222
222
|
def parse_output_vllm(response):
|
|
223
|
-
|
|
223
|
+
|
|
224
224
|
"""
|
|
225
|
-
Parse the
|
|
225
|
+
Parse the model's response to extract key skills, knowledge required, and task abilities.
|
|
226
226
|
|
|
227
227
|
Parameters
|
|
228
228
|
----------
|
|
229
229
|
response : str
|
|
230
|
-
The model's response
|
|
230
|
+
The model's response after processing the prompt.
|
|
231
231
|
|
|
232
232
|
Returns
|
|
233
233
|
-------
|
|
234
|
-
list: List of dictionaries
|
|
234
|
+
list: List of dictionaries that has levels, KSAs for all the data points in the input text.
|
|
235
|
+
|
|
235
236
|
"""
|
|
236
|
-
|
|
237
|
+
|
|
237
238
|
out = []
|
|
238
239
|
# Split into items, handling optional '->' prefix and multi-line input
|
|
239
240
|
items = [item.strip() for item in response.split('->') if item.strip()]
|
|
@@ -289,6 +290,7 @@ def create_ksa_prompt(query, input_type, num_key_skills, num_key_kr, num_key_tas
|
|
|
289
290
|
-------
|
|
290
291
|
str
|
|
291
292
|
The formatted prompt for the KSA extraction task.
|
|
293
|
+
|
|
292
294
|
"""
|
|
293
295
|
|
|
294
296
|
prompt_template = """user
|
|
@@ -352,6 +354,32 @@ model
|
|
|
352
354
|
|
|
353
355
|
def vllm_batch_generate(llm, queries, input_type, batch_size=32, num_key_skills=5, num_key_kr='3-5', num_key_tas='3-5'):
|
|
354
356
|
|
|
357
|
+
"""
|
|
358
|
+
Generate completions for a batch of queries using the model.
|
|
359
|
+
|
|
360
|
+
Parameters
|
|
361
|
+
----------
|
|
362
|
+
llm : model
|
|
363
|
+
The model to use for generating completions
|
|
364
|
+
queries : pandas DataFrame
|
|
365
|
+
The queries to get completions for using the model
|
|
366
|
+
input_type : str
|
|
367
|
+
Type of input data - 'job_desc' / 'syllabus' etc. (Default: 'job_desc')
|
|
368
|
+
batch_size : int, optional
|
|
369
|
+
Preferred batch size to use for generating completions
|
|
370
|
+
num_key_skills : int, optional
|
|
371
|
+
Number of key skills to extract from the input text
|
|
372
|
+
num_key_kr : str, optional
|
|
373
|
+
Number of key knowledge required items to extract from the input text
|
|
374
|
+
num_key_tas : str, optional
|
|
375
|
+
Number of key task abilities items to extract from the input text
|
|
376
|
+
|
|
377
|
+
Returns
|
|
378
|
+
-------
|
|
379
|
+
list: List of completions generated by the model for the input queries
|
|
380
|
+
|
|
381
|
+
"""
|
|
382
|
+
|
|
355
383
|
result = []
|
|
356
384
|
|
|
357
385
|
sampling_params = SamplingParams(max_tokens=1000)
|
|
@@ -367,6 +395,29 @@ def vllm_batch_generate(llm, queries, input_type, batch_size=32, num_key_skills=
|
|
|
367
395
|
|
|
368
396
|
def get_completion_vllm(input_text, text_columns, id_column, input_type, llm, batch_size=4) -> list:
|
|
369
397
|
|
|
398
|
+
"""
|
|
399
|
+
Get completions for whole input data and parse the required KSAs from the model responses. The input data can be a job description or syllabi data.
|
|
400
|
+
|
|
401
|
+
Parameters
|
|
402
|
+
----------
|
|
403
|
+
input_text : pandas DataFrame
|
|
404
|
+
The input data to get completions for using the model
|
|
405
|
+
text_columns : list
|
|
406
|
+
List of columns in the input_text dataframe that contain the text data. (Default: ['description'])
|
|
407
|
+
id_column : str
|
|
408
|
+
Column name in the input_text dataframe that contains the unique identifier for each row
|
|
409
|
+
input_type : str
|
|
410
|
+
Type of input data - 'job_desc' / 'syllabus' etc. (Default: 'job_desc')
|
|
411
|
+
llm : model
|
|
412
|
+
The model to use for generating completions
|
|
413
|
+
batch_size : int, optional
|
|
414
|
+
Preferred batch size to use for generating completions
|
|
415
|
+
|
|
416
|
+
Returns
|
|
417
|
+
-------
|
|
418
|
+
list: List of dictionaries that has levels, KSAs for all the data points in the input text.
|
|
419
|
+
"""
|
|
420
|
+
|
|
370
421
|
result = vllm_batch_generate(llm, input_text, input_type=input_type, batch_size=batch_size)
|
|
371
422
|
|
|
372
423
|
parsed_output = []
|
|
@@ -40,7 +40,7 @@ Rev No. Date Author Description
|
|
|
40
40
|
[1.0.0] 06/01/2024 Vedant M. Initial Version
|
|
41
41
|
[1.0.1] 06/10/2024 Vedant M. added paths for input and output
|
|
42
42
|
[1.0.2] 07/01/2024 Satya Phanindra K. updated threshold for similarity and AI model ID
|
|
43
|
-
|
|
43
|
+
[1.0.3] 03/12/2025 Prudhvi Chekuri Remove unnecessary params
|
|
44
44
|
|
|
45
45
|
TODO:
|
|
46
46
|
-----
|
|
@@ -51,10 +51,8 @@ import os
|
|
|
51
51
|
from dotenv import load_dotenv
|
|
52
52
|
|
|
53
53
|
ROOT_DIR = os.path.dirname(os.path.abspath(__file__))
|
|
54
|
-
INPUT_PATH = os.path.join(ROOT_DIR, 'input')
|
|
55
|
-
OUTPUT_PATH = os.path.join(ROOT_DIR, 'output')
|
|
56
54
|
|
|
57
|
-
SKILL_DB_PATH = os.path.join(
|
|
55
|
+
SKILL_DB_PATH = os.path.join('https://raw.githubusercontent.com/LAiSER-Software/datasets/refs/heads/master/taxonomies/combined.csv')
|
|
58
56
|
|
|
59
57
|
|
|
60
58
|
SIMILARITY_THRESHOLD = 0.85
|
|
@@ -62,7 +62,7 @@ Rev No. Date Author Description
|
|
|
62
62
|
[1.0.8] 07/11/2024 Satya Phanindra K. Calculate cosine similarities in bulk for optimal performance.
|
|
63
63
|
[1.0.9] 07/15/2024 Satya Phanindra K. Error handling for empty list outputs from extract_raw function
|
|
64
64
|
[1.0.10] 11/24/2024 Prudhvi Chekuri Added support for skills extraction from syllabi data
|
|
65
|
-
[1.0
|
|
65
|
+
[1.1.0] 03/12/2025 Prudhvi Chekuri Added support for extracting KSAs from text and aligning them to the taxonomy
|
|
66
66
|
|
|
67
67
|
|
|
68
68
|
TODO:
|
|
@@ -105,22 +105,33 @@ class Skill_Extractor:
|
|
|
105
105
|
|
|
106
106
|
Attributes
|
|
107
107
|
----------
|
|
108
|
-
|
|
109
|
-
|
|
108
|
+
model_id: string
|
|
109
|
+
Model ID for Large Language Model
|
|
110
|
+
HF_TOKEN: string
|
|
111
|
+
HuggingFace Token for restricted models under gated HF repos.
|
|
112
|
+
use_gpu: boolean
|
|
113
|
+
Flag to use GPU for Large Language Model
|
|
114
|
+
nlp: spacy model
|
|
115
|
+
Spacy model for NER
|
|
116
|
+
skill_db_df: pandas dataframe
|
|
117
|
+
Dataframe containing taxonomy skills
|
|
118
|
+
skill_db_embeddings: numpy array
|
|
119
|
+
Array containing embeddings of taxonomy skills
|
|
120
|
+
llm: LLM model
|
|
121
|
+
Large Language Model for skill extraction
|
|
122
|
+
ner_extractor: SkillExtractor
|
|
123
|
+
SkillNer model for CPU skill extraction
|
|
110
124
|
|
|
111
125
|
Methods
|
|
112
126
|
-------
|
|
113
127
|
extract_raw(input_text: text)
|
|
114
128
|
The function extracts skills from text using NER model
|
|
115
|
-
|
|
116
|
-
align_skills(raw_skills: list, document_id='0': string):
|
|
117
|
-
This function aligns the skills provided to the desired taxonomy
|
|
118
|
-
|
|
119
|
-
align_KSAs(extracted_df: pandas dataframe, id_column='Research ID'):
|
|
120
|
-
This function aligns the skills provided to the desired taxonomy
|
|
121
129
|
|
|
122
130
|
extractor(data: pandas dataframe, id_column='Research ID', text_column='Text'):
|
|
123
131
|
Function takes text dataset to extract and aligns skills based on available taxonomies
|
|
132
|
+
|
|
133
|
+
align_KSAs(extracted_df: pandas dataframe, id_column='Research ID'):
|
|
134
|
+
This function aligns the KSAs provided to the available taxonomy
|
|
124
135
|
....
|
|
125
136
|
|
|
126
137
|
"""
|
|
@@ -156,6 +167,8 @@ class Skill_Extractor:
|
|
|
156
167
|
----------
|
|
157
168
|
input_text : pandas Series with text data
|
|
158
169
|
Job advertisement / Job Description / Syllabus Description / Course Outcomes etc.
|
|
170
|
+
id_column: string
|
|
171
|
+
Name of id column in the dataset. Defaults to 'Research ID'
|
|
159
172
|
text_columns: list
|
|
160
173
|
Name of the text columns in the dataset. Defaults to 'description'
|
|
161
174
|
input_type: string
|
|
@@ -165,11 +178,6 @@ class Skill_Extractor:
|
|
|
165
178
|
-------
|
|
166
179
|
list: List of extracted skills from text
|
|
167
180
|
|
|
168
|
-
Notes
|
|
169
|
-
-----
|
|
170
|
-
More details on which (pre-trained) language model is fine-tuned can be found in llm_methods.py
|
|
171
|
-
The Function is designed only to return list of skills based on prompt passed to OpenAI's Fine-tuned model.
|
|
172
|
-
|
|
173
181
|
"""
|
|
174
182
|
|
|
175
183
|
if torch.cuda.is_available() and self.use_gpu:
|
|
@@ -247,13 +255,14 @@ class Skill_Extractor:
|
|
|
247
255
|
|
|
248
256
|
|
|
249
257
|
def align_KSAs(self, extracted_df, id_column):
|
|
258
|
+
|
|
250
259
|
"""
|
|
251
|
-
This function aligns the
|
|
260
|
+
This function aligns the KSAs provided to the available taxonomy
|
|
252
261
|
|
|
253
262
|
Parameters
|
|
254
263
|
----------
|
|
255
264
|
extracted_df : pandas dataframe
|
|
256
|
-
|
|
265
|
+
Dataset containing extracted KSAs from text and their details.
|
|
257
266
|
id_column: string
|
|
258
267
|
Name of id column in the dataset. Defaults to 'Research ID'
|
|
259
268
|
|
|
@@ -319,6 +328,7 @@ class Skill_Extractor:
|
|
|
319
328
|
|
|
320
329
|
Returns
|
|
321
330
|
-------
|
|
331
|
+
For CPU:
|
|
322
332
|
list: List of skill tags and similarity_score for all texts in from text in JSON format
|
|
323
333
|
[
|
|
324
334
|
{
|
|
@@ -334,6 +344,18 @@ class Skill_Extractor:
|
|
|
334
344
|
},
|
|
335
345
|
...
|
|
336
346
|
]
|
|
347
|
+
|
|
348
|
+
For GPU:
|
|
349
|
+
pandas dataframe with below columns:
|
|
350
|
+
- "Research ID": text_id
|
|
351
|
+
- "Description": text description
|
|
352
|
+
- "Learning Outcomes": learning outcomes
|
|
353
|
+
- "Raw Skill": Raw skill extracted
|
|
354
|
+
- "Level": Level of the skill
|
|
355
|
+
- "Knowledge Required": Knowledge required for the skill
|
|
356
|
+
- "Task Abilities": Task abilities
|
|
357
|
+
- "Skill Tag": taxonomy skill tag
|
|
358
|
+
- "Correlation Coefficient": similarity_score
|
|
337
359
|
|
|
338
360
|
"""
|
|
339
361
|
|
|
@@ -2,7 +2,7 @@ from setuptools import setup, find_packages
|
|
|
2
2
|
|
|
3
3
|
setup(
|
|
4
4
|
name='dev-laiser',
|
|
5
|
-
version='0.2.
|
|
5
|
+
version='0.2.4',
|
|
6
6
|
author='Satya Phanindra Kumar Kalaga, Bharat Khandelwal, Prudhvi Chekuri',
|
|
7
7
|
author_email='phanindra.connect@gmail.com',
|
|
8
8
|
description='LAiSER (Leveraging Artificial Intelligence for Skill Extraction & Research) is a tool designed to help learners, educators, and employers extract and share trusted information about skills. It uses a fine-tuned language model to extract raw skill keywords from text, then aligns them with a predefined taxonomy. You can find more technical details in the project’s paper.md and an overview in the README.md.',
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|