PyPI - dev-laiser - Versions diffs - 0.2.2__tar.gz → 0.2.4__tar.gz - Mend

dev-laiser 0.2.2tar.gz → 0.2.4tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

{dev_laiser-0.2.2 → dev_laiser-0.2.4}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.2
 Name: dev-laiser
-Version: 0.2.2
+Version: 0.2.4
 Summary: LAiSER (Leveraging Artificial Intelligence for Skill Extraction & Research) is a tool designed to help learners, educators, and employers extract and share trusted information about skills. It uses a fine-tuned language model to extract raw skill keywords from text, then aligns them with a predefined taxonomy. You can find more technical details in the project’s paper.md and an overview in the README.md.
 Home-page: https://github.com/LAiSER-Software/extract-module
 Author: Satya Phanindra Kumar Kalaga, Bharat Khandelwal, Prudhvi Chekuri
@@ -75,7 +75,7 @@ LAiSER is a tool that helps learners, educators and employers share trusted and
 Before proceeding to  LAiSER, you'd want to follow the steps below to install the required dependencies:
 - Clone the repository using
   ```shell
-  git clone https://github.com/Micah-Sanders/LAiSER.git
+  git clone https://github.com/LAiSER-Software/extract-module.git
   ```
   or download the [zip(link)](https://github.com/Micah-Sanders/LAiSER/archive/refs/heads/main.zip) file and extract it.
@@ -104,7 +104,7 @@ To use LAiSER as a command line tool, follow the steps below:
 - Navigate to the root directory of the repository and run the command below:
   ```shell
-  pip install laiser-dev
+  pip install dev-laiser
   ```
 - Once the installation is complete, you can run the tool using the command below:

{dev_laiser-0.2.2 → dev_laiser-0.2.4}/README.md RENAMED Viewed

@@ -34,7 +34,7 @@ LAiSER is a tool that helps learners, educators and employers share trusted and
 Before proceeding to  LAiSER, you'd want to follow the steps below to install the required dependencies:
 - Clone the repository using
   ```shell
-  git clone https://github.com/Micah-Sanders/LAiSER.git
+  git clone https://github.com/LAiSER-Software/extract-module.git
   ```
   or download the [zip(link)](https://github.com/Micah-Sanders/LAiSER/archive/refs/heads/main.zip) file and extract it.
@@ -63,7 +63,7 @@ To use LAiSER as a command line tool, follow the steps below:
 - Navigate to the root directory of the repository and run the command below:
   ```shell
-  pip install laiser-dev
+  pip install dev-laiser
   ```
 - Once the installation is complete, you can run the tool using the command below:

{dev_laiser-0.2.2 → dev_laiser-0.2.4}/dev_laiser.egg-info/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.2
 Name: dev-laiser
-Version: 0.2.2
+Version: 0.2.4
 Summary: LAiSER (Leveraging Artificial Intelligence for Skill Extraction & Research) is a tool designed to help learners, educators, and employers extract and share trusted information about skills. It uses a fine-tuned language model to extract raw skill keywords from text, then aligns them with a predefined taxonomy. You can find more technical details in the project’s paper.md and an overview in the README.md.
 Home-page: https://github.com/LAiSER-Software/extract-module
 Author: Satya Phanindra Kumar Kalaga, Bharat Khandelwal, Prudhvi Chekuri
@@ -75,7 +75,7 @@ LAiSER is a tool that helps learners, educators and employers share trusted and
 Before proceeding to  LAiSER, you'd want to follow the steps below to install the required dependencies:
 - Clone the repository using
   ```shell
-  git clone https://github.com/Micah-Sanders/LAiSER.git
+  git clone https://github.com/LAiSER-Software/extract-module.git
   ```
   or download the [zip(link)](https://github.com/Micah-Sanders/LAiSER/archive/refs/heads/main.zip) file and extract it.
@@ -104,7 +104,7 @@ To use LAiSER as a command line tool, follow the steps below:
 - Navigate to the root directory of the repository and run the command below:
   ```shell
-  pip install laiser-dev
+  pip install dev-laiser
   ```
 - Once the installation is complete, you can run the tool using the command below:

{dev_laiser-0.2.2 → dev_laiser-0.2.4}/laiser/llm_methods.py RENAMED Viewed

@@ -48,7 +48,7 @@ Rev No.     Date            Author              Description
 [1.0.0]     07/10/2024      Satya Phanindra K.  Define all the LLM methods being used in the project
 [1.0.1]     07/19/2024      Satya Phanindra K.  Add descriptions to each method
 [1.0.2]     11/24/2024      Prudhvi Chekuri     Add support for skills extraction from syllabi data
-[1.0.3]     11/25/2024      Satya Phanindra K.  Add support for skills extraction from course outcomes data
+[1.0.3]     03/12/2025      Prudhvi Chekuri     Implement functions to extract levels, KSAs from job descriptions and syllabi data using vLLM
 TODO:
 -----
@@ -220,20 +220,21 @@ def get_completion(input_text, text_columns, input_type, model, tokenizer) -> st
 def parse_output_vllm(response):
-    # TODO: Verify the docstring and update missing/incorrect information
     """
-    Parse the output from the VLLM model to extract skills, levels, knowledge required, and task abilities.
+    Parse the model's response to extract key skills, knowledge required, and task abilities.
     Parameters
     ----------
     response : str
-        The model's response containing the structured information about skills.
+        The model's response after processing the prompt.
     Returns
     -------
-    list: List of dictionaries containing the extracted skills, levels, knowledge required, and task abilities.
+    list: List of dictionaries that has levels, KSAs for all the data points in the input text.
     """
     out = []
     # Split into items, handling optional '->' prefix and multi-line input
     items = [item.strip() for item in response.split('->') if item.strip()]
@@ -289,6 +290,7 @@ def create_ksa_prompt(query, input_type, num_key_skills, num_key_kr, num_key_tas
     -------
     str
         The formatted prompt for the KSA extraction task.
     """
     prompt_template = """user
@@ -352,6 +354,32 @@ model
 def vllm_batch_generate(llm, queries, input_type, batch_size=32, num_key_skills=5, num_key_kr='3-5', num_key_tas='3-5'):
+    """
+    Generate completions for a batch of queries using the model.
+    Parameters
+    ----------
+    llm : model
+        The model to use for generating completions
+    queries : pandas DataFrame
+        The queries to get completions for using the model
+    input_type : str
+        Type of input data - 'job_desc' / 'syllabus' etc. (Default: 'job_desc')
+    batch_size : int, optional
+        Preferred batch size to use for generating completions
+    num_key_skills : int, optional
+        Number of key skills to extract from the input text
+    num_key_kr : str, optional
+        Number of key knowledge required items to extract from the input text
+    num_key_tas : str, optional
+        Number of key task abilities items to extract from the input text
+    Returns
+    -------
+    list: List of completions generated by the model for the input queries
+    """
     result = []
     sampling_params = SamplingParams(max_tokens=1000)
@@ -367,6 +395,29 @@ def vllm_batch_generate(llm, queries, input_type, batch_size=32, num_key_skills=
 def get_completion_vllm(input_text, text_columns, id_column, input_type, llm, batch_size=4) -> list:
+    """
+    Get completions for whole input data and parse the required KSAs from the model responses. The input data can be a job description or syllabi data.
+    Parameters
+    ----------
+    input_text : pandas DataFrame
+        The input data to get completions for using the model
+    text_columns : list
+        List of columns in the input_text dataframe that contain the text data. (Default: ['description'])
+    id_column : str
+        Column name in the input_text dataframe that contains the unique identifier for each row
+    input_type : str
+        Type of input data - 'job_desc' / 'syllabus' etc. (Default: 'job_desc')
+    llm : model
+        The model to use for generating completions
+    batch_size : int, optional
+        Preferred batch size to use for generating completions
+    Returns
+    -------
+    list: List of dictionaries that has levels, KSAs for all the data points in the input text.
+    """
     result = vllm_batch_generate(llm, input_text, input_type=input_type, batch_size=batch_size)
     parsed_output = []

{dev_laiser-0.2.2 → dev_laiser-0.2.4}/laiser/params.py RENAMED Viewed

@@ -40,7 +40,7 @@ Rev No.     Date            Author                Description
 [1.0.0]     06/01/2024      Vedant M.             Initial Version
 [1.0.1]     06/10/2024      Vedant M.             added paths for input and output
 [1.0.2]     07/01/2024      Satya Phanindra K.    updated threshold for similarity and AI model ID
+[1.0.3]     03/12/2025      Prudhvi Chekuri       Remove unnecessary params
 TODO:
 -----
@@ -51,10 +51,8 @@ import os
 from dotenv import load_dotenv
 ROOT_DIR = os.path.dirname(os.path.abspath(__file__))
-INPUT_PATH = os.path.join(ROOT_DIR, 'input')
-OUTPUT_PATH = os.path.join(ROOT_DIR, 'output')
-SKILL_DB_PATH = os.path.join(INPUT_PATH, 'combined.csv')
+SKILL_DB_PATH = os.path.join('https://raw.githubusercontent.com/LAiSER-Software/datasets/refs/heads/master/taxonomies/combined.csv')
 SIMILARITY_THRESHOLD = 0.85

{dev_laiser-0.2.2 → dev_laiser-0.2.4}/laiser/skill_extractor.py RENAMED Viewed

@@ -62,7 +62,7 @@ Rev No.     Date            Author              Description
 [1.0.8]     07/11/2024      Satya Phanindra K.  Calculate cosine similarities in bulk for optimal performance.
 [1.0.9]     07/15/2024      Satya Phanindra K.  Error handling for empty list outputs from extract_raw function
 [1.0.10]    11/24/2024      Prudhvi Chekuri     Added support for skills extraction from syllabi data
-[1.0.11]    03/12/2025      Satya Phanindra K.  Update extractor function to handle syllabus data
+[1.1.0]     03/12/2025      Prudhvi Chekuri     Added support for extracting KSAs from text and aligning them to the taxonomy
 TODO:
@@ -105,22 +105,33 @@ class Skill_Extractor:
     Attributes
     ----------
-    client : HuggingFace API client
-    nlp : spacy nlp model
+    model_id: string
+        Model ID for Large Language Model
+    HF_TOKEN: string
+        HuggingFace Token for restricted models under gated HF repos.
+    use_gpu: boolean
+        Flag to use GPU for Large Language Model
+    nlp: spacy model
+        Spacy model for NER
+    skill_db_df: pandas dataframe
+        Dataframe containing taxonomy skills
+    skill_db_embeddings: numpy array
+        Array containing embeddings of taxonomy skills
+    llm: LLM model
+        Large Language Model for skill extraction
+    ner_extractor: SkillExtractor
+        SkillNer model for CPU skill extraction
     Methods
     -------
     extract_raw(input_text: text)
         The function extracts skills from text using NER model
-    align_skills(raw_skills: list, document_id='0': string):
-        This function aligns the skills provided to the desired taxonomy
-    align_KSAs(extracted_df: pandas dataframe, id_column='Research ID'):
-        This function aligns the skills provided to the desired taxonomy
     extractor(data: pandas dataframe, id_column='Research ID', text_column='Text'):
         Function takes text dataset to extract and aligns skills based on available taxonomies
+    align_KSAs(extracted_df: pandas dataframe, id_column='Research ID'):
+        This function aligns the KSAs provided to the available taxonomy
     ....
     """
@@ -156,6 +167,8 @@ class Skill_Extractor:
         ----------
         input_text : pandas Series with text data
             Job advertisement / Job Description / Syllabus Description / Course Outcomes etc.
+        id_column: string
+            Name of id column in the dataset. Defaults to 'Research ID'
         text_columns: list
             Name of the text columns in the dataset. Defaults to 'description'
         input_type: string
@@ -165,11 +178,6 @@ class Skill_Extractor:
         -------
         list: List of extracted skills from text
-        Notes
-        -----
-        More details on which (pre-trained) language model is fine-tuned can be found in llm_methods.py
-        The Function is designed only to return list of skills based on prompt passed to OpenAI's Fine-tuned model.
         """
         if torch.cuda.is_available() and self.use_gpu:
@@ -247,13 +255,14 @@ class Skill_Extractor:
     def align_KSAs(self, extracted_df, id_column):
         """
-        This function aligns the skills provided to the available taxonomy
+        This function aligns the KSAs provided to the available taxonomy
         Parameters
         ----------
         extracted_df : pandas dataframe
-            Provide dataframe of skills extracted from Job Descriptions / Syllabus.
+            Dataset containing extracted KSAs from text and their details.
         id_column: string
             Name of id column in the dataset. Defaults to 'Research ID'
@@ -319,6 +328,7 @@ class Skill_Extractor:
         Returns
         -------
+        For CPU:
         list: List of skill tags and similarity_score for all texts in  from text in JSON format
             [
                 {
@@ -334,6 +344,18 @@ class Skill_Extractor:
                 },
                 ...
             ]
+        For GPU:
+        pandas dataframe with below columns:
+            - "Research ID": text_id
+            - "Description": text description
+            - "Learning Outcomes": learning outcomes
+            - "Raw Skill": Raw skill extracted
+            - "Level": Level of the skill
+            - "Knowledge Required": Knowledge required for the skill
+            - "Task Abilities": Task abilities
+            - "Skill Tag": taxonomy skill tag
+            - "Correlation Coefficient": similarity_score
         """

{dev_laiser-0.2.2 → dev_laiser-0.2.4}/setup.py RENAMED Viewed

@@ -2,7 +2,7 @@ from setuptools import setup, find_packages
 setup(
     name='dev-laiser',
-    version='0.2.2',
+    version='0.2.4',
     author='Satya Phanindra Kumar Kalaga, Bharat Khandelwal, Prudhvi Chekuri',
     author_email='phanindra.connect@gmail.com',
     description='LAiSER (Leveraging Artificial Intelligence for Skill Extraction & Research) is a tool designed to help learners, educators, and employers extract and share trusted information about skills. It uses a fine-tuned language model to extract raw skill keywords from text, then aligns them with a predefined taxonomy. You can find more technical details in the project’s paper.md and an overview in the README.md.',