themefinder 0.0.3__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of themefinder might be problematic. Click here for more details.

@@ -0,0 +1,173 @@
1
+ Metadata-Version: 2.3
2
+ Name: themefinder
3
+ Version: 0.0.3
4
+ Summary: A topic modelling Python package designed for analysing one-to-many question-answer data eg free-text survey responses.
5
+ License: MIT
6
+ Author: i.AI
7
+ Author-email: packages@cabinetoffice.gov.uk
8
+ Requires-Python: >=3.10,<3.13
9
+ Classifier: Intended Audience :: Developers
10
+ Classifier: Intended Audience :: Science/Research
11
+ Classifier: License :: OSI Approved :: MIT License
12
+ Classifier: Programming Language :: Python :: 3
13
+ Classifier: Programming Language :: Python :: 3.10
14
+ Classifier: Programming Language :: Python :: 3.11
15
+ Classifier: Programming Language :: Python :: 3.12
16
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
17
+ Classifier: Topic :: Text Processing :: Linguistic
18
+ Requires-Dist: boto3 (>=1.29,<2.0)
19
+ Requires-Dist: langchain
20
+ Requires-Dist: langchain-openai (==0.1.17)
21
+ Requires-Dist: langfuse (==2.29.1)
22
+ Requires-Dist: openpyxl (>=3.1.5,<4.0.0)
23
+ Requires-Dist: pandas (>=2.2.2,<3.0.0)
24
+ Requires-Dist: pyarrow (>=15.0.0,<16.0.0)
25
+ Requires-Dist: python-dotenv (>=1.0.1,<2.0.0)
26
+ Requires-Dist: scikit-learn
27
+ Requires-Dist: toml (>=0.10.2,<0.11.0)
28
+ Project-URL: Documentation, https://i-dot-ai.github.io/themefinder/
29
+ Project-URL: Repository, https://github.com/i-dot-ai/themefinder/
30
+ Description-Content-Type: text/markdown
31
+
32
+ # ThemeFinder
33
+
34
+ ThemeFinder is a topic modelling Python package designed for analysing one-to-many question-answer data (i.e. survey responses, public consultations, etc.). See the [docs](https://i-dot-ai.github.io/themefinder/) for more info.
35
+
36
+ > [!IMPORTANT]
37
+ > Incubation project: This project is an incubation project; as such, we don't recommend using this for critical use cases yet. We are currently in a research stage, trialling the tool for case studies across the Civil Service. Find out more about our projects at https://ai.gov.uk/.
38
+
39
+
40
+ ## Quickstart
41
+
42
+ ### Install using your package manager of choice
43
+
44
+ For example `pip install themefinder` or `poetry add themefinder`.
45
+
46
+ ### Usage
47
+
48
+ ThemeFinder takes as input a [pandas DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) with two columns:
49
+ - `response_id`: A unique identifier for each response
50
+ - `response`: The free text survey response
51
+
52
+ ThemeFinder now supports a range of language models through structured outputs.
53
+
54
+ The function `find_themes` identifies common themes in responses and labels them, it also outputs results from intermediate steps in the theme finding pipeline.
55
+
56
+ For this example, import the following Python packages into your virtual environment: `asyncio`, `pandas`, `lanchain`. And import `themefinder` as described above.
57
+
58
+ If you are using environment variables (eg for API keys), you can use `python-dotenv` to read variables from a `.env` file.
59
+
60
+ If you are using an Azure OpenAI endpoint, you will need the following variables:
61
+
62
+ - `AZURE_OPENAI_API_KEY`
63
+ - `AZURE_OPENAI_ENDPOINT`
64
+ - `OPENAI_API_VERSION`
65
+ - `DEPLOYMENT_NAME`
66
+ - `AZURE_OPENAI_BASE_URL`
67
+
68
+ Otherwise you will need whichever variables [LangChain](https://www.langchain.com/) requires for your LLM of choice.
69
+
70
+ ```python
71
+ import asyncio
72
+ from dotenv import load_dotenv
73
+ import pandas as pd
74
+ from langchain_openai import AzureChatOpenAI
75
+ from themefinder import find_themes
76
+
77
+ # If needed, load LLM API settings from .env file
78
+ load_dotenv()
79
+
80
+ # Initialise your LLM of choice using langchain
81
+ llm = AzureChatOpenAI(
82
+ model="gpt-4o",
83
+ temperature=0,
84
+ )
85
+
86
+ # Set up your data
87
+ responses_df = pd.DataFrame({
88
+ "response_id": ["1", "2", "3", "4", "5"],
89
+ "response": ["I think it's awesome, I can use it for consultation analysis.",
90
+ "It's great.", "It's a good approach to topic modelling.", "I'm not sure, I need to trial it more.", "I don't like it so much."]
91
+ })
92
+
93
+ # Add your question
94
+ question = "What do you think of ThemeFinder?"
95
+
96
+ # Make the system prompt specific to your use case
97
+ system_prompt = "You are an AI evaluation tool analyzing survey responses about a Python package."
98
+
99
+ # Run the function to find themes, we use asyncio to query LLM endpoints asynchronously, so we need to await our function
100
+ async def main():
101
+ result = await find_themes(responses_df, llm, question, system_prompt=system_prompt)
102
+ print(result)
103
+
104
+ if __name__ == "__main__":
105
+ asyncio.run(main())
106
+ ```
107
+
108
+ ## ThemeFinder pipeline
109
+
110
+ ThemeFinder's pipeline consists of five distinct stages, each utilizing a specialized LLM prompt:
111
+
112
+ ### Sentiment analysis
113
+ - Analyses the emotional tone and position of each response using sentiment-focused prompts
114
+ - Provides structured sentiment categorisation based on LLM analysis
115
+
116
+ ### Theme generation
117
+ - Uses exploratory prompts to identify initial themes from response batches
118
+ - Groups related responses for better context through guided theme extraction
119
+
120
+ ### Theme condensation
121
+ - Employs comparative prompts to combine similar or overlapping themes
122
+ - Reduces redundancy in identified topics through systematic theme evaluation
123
+
124
+ ### Theme refinement
125
+ - Leverages standardisation prompts to normalise theme descriptions
126
+ - Creates clear, consistent theme definitions through structured refinement
127
+
128
+ ### Theme target alignment
129
+ - Optional step to consolidate themes down to a target number
130
+
131
+ ### Theme mapping
132
+ - Utilizes classification prompts to map individual responses to refined themes
133
+ - Supports multiple theme assignments per response through detailed analysis
134
+
135
+
136
+ The prompts used at each stage can be found in `src/themefinder/prompts/`.
137
+
138
+ The file `src/themefinder.core.py` contains the function `find_themes` which runs the pipline. It also contains functions fo each individual stage.
139
+
140
+
141
+ **For more detail - see the docs: [https://i-dot-ai.github.io/themefinder/](https://i-dot-ai.github.io/themefinder/).**
142
+
143
+
144
+ ## Model Compatibility
145
+
146
+ ThemeFinder's structured output approach makes it compatible with a wide range of language models from various providers. This list is non-exhaustive, and other models may also work effectively:
147
+
148
+ ### OpenAI Models
149
+ - GPT-4, GPT-4o, GPT-4.1
150
+ - All Azure OpenAI deployments
151
+
152
+ ### Google Models
153
+ - Gemini series (1.5 Pro, 2.0 Pro, etc.)
154
+
155
+ ### Anthropic Models
156
+ - Claude series (Claude 3 Opus, Sonnet, Haiku, etc.)
157
+
158
+ ### Open Source Models
159
+ - Llama 2, Llama 3
160
+ - Mistral models (e.g., Mistral 7B, Mixtral)
161
+
162
+
163
+ ## License
164
+
165
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
166
+
167
+ The documentation is [© Crown copyright](https://www.nationalarchives.gov.uk/information-management/re-using-public-sector-information/uk-government-licensing-framework/crown-copyright/) and available under the terms of the [Open Government 3.0 licence](https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/).
168
+
169
+
170
+ ## Feedback
171
+
172
+ If you have feedback on this package, please fill in our [feedback form](https://forms.gle/85xUSMvxGzSSKQ499) or contact us with questions or feedback at packages@cabinetoffice.gov.uk.
173
+
@@ -0,0 +1,17 @@
1
+ themefinder/__init__.py,sha256=yfIyHWPMM59u23m79igHSllT-w3r4l_euLCDZygo22Q,431
2
+ themefinder/core.py,sha256=J4BJZO8BNN9xbX3LsKah4ZOGkW6YJcg_iYB9HCH7UR0,22768
3
+ themefinder/llm_batch_processor.py,sha256=zdrQH1bvMR9FHWDaDp1tvdiADTHTaNDg_Z-3QQ0771k,17641
4
+ themefinder/models.py,sha256=RN_7WzucXgKWSVXEoizijTgAM63rMVvXW6vdGD3o6Z8,12332
5
+ themefinder/prompts/consultation_system_prompt.txt,sha256=_A07oY_an4hnRx-9pQ0y-TLXJz0dd8vDI-MZne7Mdb4,89
6
+ themefinder/prompts/detail_detection.txt,sha256=6Vr_oN7rF5BCFipnCIHTSF8MmjerGyCixRWRT3vni1U,941
7
+ themefinder/prompts/sentiment_analysis.txt,sha256=vYCDhtEsG5I9xixwVhZbvKPJGU1Gqpw4-xAqGz72xhU,1671
8
+ themefinder/prompts/theme_condensation.txt,sha256=pHWuCtfU58gdtP2BfGZWOTvcb0MnTpb9OhOCGtkJv8U,1672
9
+ themefinder/prompts/theme_generation.txt,sha256=QRKW7DtcMSb2olT6j5jmdEPcXPMeZgogM-NYddEIKRk,1871
10
+ themefinder/prompts/theme_mapping.txt,sha256=HtGuStm-622TIEaqdb9LTaBs9xE-n9lvmcGQTG2_JOQ,2042
11
+ themefinder/prompts/theme_refinement.txt,sha256=evWMCIEdeZCJ8zn4SBNgP6bmfAb0vzKiR5C5wfAjkUk,2649
12
+ themefinder/prompts/theme_target_alignment.txt,sha256=g7AVZLiP_xIH010X5SIZyG3q7gA6OBAplPv3xvmstOY,855
13
+ themefinder/themefinder_logging.py,sha256=n5SUQovEZLC4skEbxicjz_fOGF9mOk3S-Wpj5uXsaL8,314
14
+ themefinder-0.0.3.dist-info/LICENCE,sha256=C9ULIN0ctF60ZxUWH_hw1H434bDLg49Z-Qzn6BUHgqs,1060
15
+ themefinder-0.0.3.dist-info/METADATA,sha256=Ix9gzGyZ-j7ji9tIzpFm0KNLb12UfK94KjxL5id0OsQ,6850
16
+ themefinder-0.0.3.dist-info/WHEEL,sha256=b4K_helf-jlQoXBBETfwnf4B04YC67LOev0jo4fX5m8,88
17
+ themefinder-0.0.3.dist-info/RECORD,,
@@ -0,0 +1,4 @@
1
+ Wheel-Version: 1.0
2
+ Generator: poetry-core 2.1.3
3
+ Root-Is-Purelib: true
4
+ Tag: py3-none-any