openai-gabriel 1.0.1__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- gabriel/__init__.py +61 -0
- gabriel/_version.py +1 -0
- gabriel/api.py +2284 -0
- gabriel/cli/__main__.py +60 -0
- gabriel/core/__init__.py +7 -0
- gabriel/core/llm_client.py +34 -0
- gabriel/core/pipeline.py +18 -0
- gabriel/core/prompt_template.py +152 -0
- gabriel/prompts/__init__.py +1 -0
- gabriel/prompts/bucket_prompt.jinja2 +113 -0
- gabriel/prompts/classification_prompt.jinja2 +50 -0
- gabriel/prompts/codify_prompt.jinja2 +95 -0
- gabriel/prompts/comparison_prompt.jinja2 +60 -0
- gabriel/prompts/deduplicate_prompt.jinja2 +41 -0
- gabriel/prompts/deidentification_prompt.jinja2 +112 -0
- gabriel/prompts/extraction_prompt.jinja2 +61 -0
- gabriel/prompts/filter_prompt.jinja2 +31 -0
- gabriel/prompts/ideation_prompt.jinja2 +80 -0
- gabriel/prompts/merge_prompt.jinja2 +47 -0
- gabriel/prompts/paraphrase_prompt.jinja2 +17 -0
- gabriel/prompts/rankings_prompt.jinja2 +49 -0
- gabriel/prompts/ratings_prompt.jinja2 +50 -0
- gabriel/prompts/regional_analysis_prompt.jinja2 +40 -0
- gabriel/prompts/seed.jinja2 +43 -0
- gabriel/prompts/snippets.jinja2 +117 -0
- gabriel/tasks/__init__.py +63 -0
- gabriel/tasks/_attribute_utils.py +69 -0
- gabriel/tasks/bucket.py +432 -0
- gabriel/tasks/classify.py +562 -0
- gabriel/tasks/codify.py +1033 -0
- gabriel/tasks/compare.py +235 -0
- gabriel/tasks/debias.py +1460 -0
- gabriel/tasks/deduplicate.py +341 -0
- gabriel/tasks/deidentify.py +316 -0
- gabriel/tasks/discover.py +524 -0
- gabriel/tasks/extract.py +455 -0
- gabriel/tasks/filter.py +169 -0
- gabriel/tasks/ideate.py +782 -0
- gabriel/tasks/merge.py +464 -0
- gabriel/tasks/paraphrase.py +531 -0
- gabriel/tasks/rank.py +2041 -0
- gabriel/tasks/rate.py +347 -0
- gabriel/tasks/seed.py +465 -0
- gabriel/tasks/whatever.py +344 -0
- gabriel/utils/__init__.py +64 -0
- gabriel/utils/audio_utils.py +42 -0
- gabriel/utils/file_utils.py +464 -0
- gabriel/utils/image_utils.py +22 -0
- gabriel/utils/jinja.py +31 -0
- gabriel/utils/logging.py +86 -0
- gabriel/utils/mapmaker.py +304 -0
- gabriel/utils/media_utils.py +78 -0
- gabriel/utils/modality_utils.py +148 -0
- gabriel/utils/openai_utils.py +5470 -0
- gabriel/utils/parsing.py +282 -0
- gabriel/utils/passage_viewer.py +2557 -0
- gabriel/utils/pdf_utils.py +20 -0
- gabriel/utils/plot_utils.py +2881 -0
- gabriel/utils/prompt_utils.py +42 -0
- gabriel/utils/word_matching.py +158 -0
- openai_gabriel-1.0.1.dist-info/METADATA +443 -0
- openai_gabriel-1.0.1.dist-info/RECORD +67 -0
- openai_gabriel-1.0.1.dist-info/WHEEL +5 -0
- openai_gabriel-1.0.1.dist-info/entry_points.txt +2 -0
- openai_gabriel-1.0.1.dist-info/licenses/LICENSE +201 -0
- openai_gabriel-1.0.1.dist-info/licenses/NOTICE +13 -0
- openai_gabriel-1.0.1.dist-info/top_level.txt +1 -0
|
@@ -0,0 +1,112 @@
|
|
|
1
|
+
You will be given a block of text belonging to a single individual or group. Your job is to produce or update a JSON mapping from personal or sensitive entities mentioned in the text to anonymized substitutes. Each mapping entry uses a short description of the entity (e.g., "younger son's name", "street address") as the key.
|
|
2
|
+
|
|
3
|
+
Below is the current mapping dictionary for this group, built from previously processed text chunks. You must preserve all existing entries, adding new real forms of existing entities if they appear in this text, and adding new entries for any additional entities that need anonymization. You may refine a description key (e.g., from "personal name" to "daughter's name") if you learn more detail, but keep the casted form consistent.
|
|
4
|
+
|
|
5
|
+
Current mapping:
|
|
6
|
+
{{ current_map }}
|
|
7
|
+
|
|
8
|
+
|
|
9
|
+
If there is no existing mapping provided above, start a new one from scratch.
|
|
10
|
+
|
|
11
|
+
By default you should anonymize common personally identifying information, including:
|
|
12
|
+
- Names of the individual and their relatives or close contacts
|
|
13
|
+
- Specific addresses, hometowns, or workplace names that would reveal location
|
|
14
|
+
- Phone numbers, email addresses, credit card or SSN numbers
|
|
15
|
+
Any public figures or information not considered identifying need not be changed.
|
|
16
|
+
Ages also should not be included in the mapping dictionary by default.
|
|
17
|
+
Importantly, all personal details that are not identifying should be retained and not included in the mapping dictionary.
|
|
18
|
+
We don't want to override even very personal and sensitive details; we only wish to anonymize all information that would be easily identifying.
|
|
19
|
+
Someone's hobbies and opinions are not identifying information and should be retained; their name and address are identifying information and should be anonymized.
|
|
20
|
+
Similarly, "my mother" or a common workplace name like "Target" should not be included in the mapping dictionary, only include cases where the term could easily be used to identify an individual, where they live, and other PII.
|
|
21
|
+
|
|
22
|
+
For each entity, output an object with two fields:
|
|
23
|
+
- "real forms": list of verbatim strings exactly as they appear in the text across all processed chunks
|
|
24
|
+
- "casted form": your chosen anonymized substitute, ideally matching the original's style (e.g., similar cultural background for names, similar city for hometown). Choose a conceptually close substitute that conveys the essence of the original entity without revealing the original identity
|
|
25
|
+
|
|
26
|
+
Example mapping snippet:
|
|
27
|
+
{
|
|
28
|
+
"younger son's name": {
|
|
29
|
+
"real forms": ["Tim", "Timmy", "Tim Scott", "T-I-M-O-T-H-Y", "T-I-M ... uh ... O-T-H-Y", "Timothy", "Timthy", "T-Tim"],
|
|
30
|
+
"casted form": "Blake"
|
|
31
|
+
},
|
|
32
|
+
"hometown": {
|
|
33
|
+
"real forms": ["Lubbock", "Lubbock, Texas"],
|
|
34
|
+
"casted form": "Medford"
|
|
35
|
+
},
|
|
36
|
+
"mother's name": {
|
|
37
|
+
"real forms": ["Susan", "Sue", "Sue Scott"],
|
|
38
|
+
"casted form": "Mary"
|
|
39
|
+
},
|
|
40
|
+
"hospital of birth": {
|
|
41
|
+
"real forms": ["Olmsted Medical Center", "Olmsted Medical", "OMC"],
|
|
42
|
+
"casted form": "Orchard Medical Center"
|
|
43
|
+
},
|
|
44
|
+
"workplace name": {
|
|
45
|
+
"real forms": [
|
|
46
|
+
"Mayo Clinic", "Mayo-Clinic",
|
|
47
|
+
"The Mayo Clinic", "mayo hospital", "Mayo Jacksonville",
|
|
48
|
+
"Mayo Clinic Hospital"
|
|
49
|
+
],
|
|
50
|
+
"casted form": "Frontier Clinic"
|
|
51
|
+
}
|
|
52
|
+
}
|
|
53
|
+
|
|
54
|
+
Here is the text you are to analyze:
|
|
55
|
+
BEGIN TEXT
|
|
56
|
+
{{ text }}
|
|
57
|
+
END TEXT
|
|
58
|
+
|
|
59
|
+
Current mapping:
|
|
60
|
+
{{ current_map }}
|
|
61
|
+
|
|
62
|
+
If there is no existing mapping provided above, start a new one from scratch.
|
|
63
|
+
|
|
64
|
+
Return the full mapping dictionary as JSON, retaining all existing entries and adding only. Do not include any narrative or explanation outside the JSON object.
|
|
65
|
+
Ensure the real forms are verbatim as they appear in the text.
|
|
66
|
+
The keys like "younger son's name" should be descriptive and concise, as precise as you can confidently make them.
|
|
67
|
+
For example, if a private name is mentioned but not the relationship, you can use "person's name" as the key (and "second person's name" and so on if needed).
|
|
68
|
+
But if you can tell a more specific identifier, like "mother's name" or "doctor's name", use that instead.
|
|
69
|
+
Similarly, if an existing mapping is under "person's name" but you learn it is a "mother's name", you can change the key to "mother's name". This also applies to other entities like addresses, phone numbers, etc.
|
|
70
|
+
Above all, ensure that you never lose any of the existing mapping entries, especially the real forms. Only add new real forms to an existing mapping entry if you discover any new forms for that same entity.
|
|
71
|
+
If you learn a new entity that should be deidentified, add it as a new entry.
|
|
72
|
+
Be very thorough even if the text is long, capturing all entities you can identify in all forms.
|
|
73
|
+
Ensure you return the full mapping dictionary as JSON, with all the existing entries maintained or updated, and any new ones you add.
|
|
74
|
+
|
|
75
|
+
Choose casted forms that are conceptually close to the original and carry the same meaning, like a similar sized city or an ethnically similar name.
|
|
76
|
+
Ensure when coming up with casted forms that you have some creativity and would not just use the same exact casted name or address on a different text.
|
|
77
|
+
Unless you have been provided additional instructions that override the default, use common sense by default and don't overclassify non-sensitive information. Don't include ages in the mapping dictionary by default, don't put anything in the mapping that is not sensitive or personal, and there is no point in mapping something to itself.
|
|
78
|
+
Keep to a convention like "person's name", "second person's name", "third person's name" or "child's name", "second child's name", "third child's name" etc if you have multiple names to map and can't tell the relationships more specifically.
|
|
79
|
+
Be thorough and careful but don't overclassify either.
|
|
80
|
+
|
|
81
|
+
It is crucial that all forms of the same entity are captured in the mapping, so be sure to include all variations and spellings of the same entity that appear in the text, including nicknames, abbreviations, forms where the name is spelled out letter by letter (e.g. "E-M-I-L-Y"), and even misspellings in their exact mispelled form. Any and all forms must be documented in the mapping dictionary.
|
|
82
|
+
This is important because we need to ultimately use this mapping to deidentify the text, and if we miss any forms of an entity, it could lead to reidentification.
|
|
83
|
+
Pay close attention to whether a mispelled or oddly formatted form of an entity is present in the text, and be sure to include all such unusual forms in the mapping dictionary. Same for any abbreviations or acronyms that are used in the text, especially for workplaces or organizations.
|
|
84
|
+
It is also important to ensure that you are confident all real forms for an entity are for that one same entity. If in doubt, you can always add a new entry for a new entity.
|
|
85
|
+
If you realize there is misorgnization or wrong mappings in the existing mapping, you can fix it.
|
|
86
|
+
You can reorganize and redo the mappings as much as needed, as long as all real forms from the existing mapping are somewhere in the new mapping, even if you move them around or change the keys.
|
|
87
|
+
That said, only make changes if you are reasonably confident in them.
|
|
88
|
+
|
|
89
|
+
Again, the existing mapping is:
|
|
90
|
+
{{ current_map }}
|
|
91
|
+
|
|
92
|
+
While you can and should add new entries, modify existing entries to reflect the new information you have, and you can reorganize the mapping if you are confident in the changes, you MUST ensure that all real forms from the existing mapping are somewhere in the new mapping verbatim, even if you move them around or change the keys.
|
|
93
|
+
|
|
94
|
+
Again, remember only de-identify sensitive information and retain all non-identifying details.
|
|
95
|
+
Public figures (celebrity or politician names etc.) should not be anonymized.
|
|
96
|
+
Common workplaces (e.g. Walmart, Uber, TaskRabbit, etc.) should not be anonymized -- only location specific workplaces like "Mayo Clinic" or "Pinocchio's Pizza" should be anonymized.
|
|
97
|
+
Age should not be included in the mapping dictionary; by default, age should not be anonymized.
|
|
98
|
+
Terms like "my mother" or "my mom" or "Dad" or "my workplace" or "email address" must not be included in the mapping dictionary, only include cases where the actual person's or entity's identifying name is used.
|
|
99
|
+
Only map terms that are identifying; this is a fairly narrow set of terms and most personal details should be retained. By default, most personal details should not be anonymized because they are not identifying.
|
|
100
|
+
Ensure that you do not confuse two distinct entities by assigning them the same mapping. For example, if you have two different local businesses, they should be mapped to different fictitious names.
|
|
101
|
+
Map to casted forms that are conceptually close to the original and carry the same meaning, like a real similar sized city or an ethnically similar name.
|
|
102
|
+
Only map to fictitious casted forms if necessary; otherwise, map to a casted form that is real and exists, like a real city or a real neighborhood.
|
|
103
|
+
Have some creativity and stochasticity with choosing casted forms and don't just default to the same names/places every time.
|
|
104
|
+
|
|
105
|
+
{% if additional_instructions %}
|
|
106
|
+
The user has provided the following additional instructions, clarifications, or labelled examples (if these provided, rely heavily on them to calibrate your judgment).
|
|
107
|
+
If they conflict with the other instructions, these user instructions take precedence.
|
|
108
|
+
|
|
109
|
+
BEGIN ADDITIONAL INSTRUCTIONS/CLARIFICATIONS
|
|
110
|
+
{{ additional_instructions }}
|
|
111
|
+
END ADDITIONAL INSTRUCTIONS/CLARIFICATIONS
|
|
112
|
+
{% endif %}
|
|
@@ -0,0 +1,61 @@
|
|
|
1
|
+
{% import "snippets.jinja2" as snip %}
|
|
2
|
+
{% if modality is not defined or not modality %}
|
|
3
|
+
{% set modality = "entity" %}
|
|
4
|
+
{% endif %}
|
|
5
|
+
{{ snip.single_entry(modality, text) }}
|
|
6
|
+
|
|
7
|
+
Your task: for each attribute below, extract the specific information requested (e.g. year of invention, price, death year, product launch date, etc.).
|
|
8
|
+
Provide each value as a string.
|
|
9
|
+
Provide only the extracted datapoint requested in as precise a form as possible.
|
|
10
|
+
Give the datapoint and nothing else; no extra commentary or explanation.
|
|
11
|
+
Whenever possible/relevant, leave it in a purely numeric string form (e.g. "235" not "year is 235" or "235 CE" or "210-250").
|
|
12
|
+
If the information cannot be found or is not known, return placeholder "unknown" (always exactly that word).
|
|
13
|
+
|
|
14
|
+
BEGIN ATTRIBUTES
|
|
15
|
+
{{ attributes | shuffled_dict }}
|
|
16
|
+
END ATTRIBUTES
|
|
17
|
+
Each attribute key tells you what to extract. If a definition is provided, closely follow it to anchor what you provide; otherwise rely on your best consistent interpretation.
|
|
18
|
+
|
|
19
|
+
General formatting rules (pay attention to attribute definitions for exact requirements):
|
|
20
|
+
Year: Return years only (e.g. "1984", "629"). Use negative numbers for BCE years (e.g. "-356" not "356 BCE"). Avoid vague year ranges (e.g. "1980s", "early 20th century", "1250-1260"); aim for a single, specific year.
|
|
21
|
+
Date: Use ISO format "YYYY-MM-DD"; drop any extraneous words (e.g. not "June 5, 2020" but "2020-06-05").
|
|
22
|
+
Prices: Don't include any currency symbols or commas (e.g. not "$16.85" but "16.85").
|
|
23
|
+
Names/Places/Other Text: Provide the exact requested phrase without additional commentary (e.g. "Marilyn Monroe", "New York City", "The Beatles", etc.).
|
|
24
|
+
Use the specific value that matches the attribute definition best (e.g. the primary year of invention rather than a range).
|
|
25
|
+
If multiple values apply, choose the single most applicable value - don't list them all.
|
|
26
|
+
Do not round (e.g. not "1860" but "1863", not "100" but "112"). Even if approximating a specific datapoint, don't round or use ranges - provide the best possible exact value.
|
|
27
|
+
Consistent formatting is critical; stick to the attribute definition and these rules.
|
|
28
|
+
If asked to extract text direct from content, be complete and verbatim.
|
|
29
|
+
If any task is not possible, return "unknown" for that attribute.
|
|
30
|
+
|
|
31
|
+
Output JSON only, in following format:
|
|
32
|
+
{
|
|
33
|
+
"<insert first entity name here>":
|
|
34
|
+
{
|
|
35
|
+
"<insert attribute name here>": "<insert corresponding extracted value or "unknown" here>",
|
|
36
|
+
"<attribute name>": "<extracted value or unknown>",
|
|
37
|
+
...
|
|
38
|
+
},
|
|
39
|
+
"<second entity name>":
|
|
40
|
+
{
|
|
41
|
+
"<attribute name>": "<extracted value or unknown>",
|
|
42
|
+
...
|
|
43
|
+
},
|
|
44
|
+
...
|
|
45
|
+
}
|
|
46
|
+
|
|
47
|
+
Usually whole content is just one unified entity, so fill only first entity section with all attributes for whole content (no other entity sections).
|
|
48
|
+
But if input clearly contains multiple relevant distinct entities (e.g. catalog with several product listings), output one top-level entry per entity. Use a short, informative identifier as each entity’s key, and place that entity’s unique extracted attributes inside its nested dictionary.
|
|
49
|
+
|
|
50
|
+
Attributes you are extracting info on are: {{ attributes.keys() | shuffled }}
|
|
51
|
+
Assess EVERY attribute; use "unknown" if needed but no drops. Use these attribute names verbatim, with absolutely no modification.
|
|
52
|
+
Same case, same spelling, same punctuation, same formatting.
|
|
53
|
+
|
|
54
|
+
{% if additional_instructions %}
|
|
55
|
+
The user has provided the following additional instructions, clarifications, or labelled examples (if these provided, rely heavily on them to calibrate yourself).
|
|
56
|
+
If they conflict with the other instructions, these user instructions take precedence.
|
|
57
|
+
|
|
58
|
+
BEGIN ADDITIONAL INSTRUCTIONS/CLARIFICATIONS
|
|
59
|
+
{{ additional_instructions }}
|
|
60
|
+
END ADDITIONAL INSTRUCTIONS/CLARIFICATIONS
|
|
61
|
+
{% endif %}
|
|
@@ -0,0 +1,31 @@
|
|
|
1
|
+
Consider the following condition:
|
|
2
|
+
BEGIN CONDITION
|
|
3
|
+
{{ condition }}
|
|
4
|
+
END CONDITION
|
|
5
|
+
Your task is to evaluate every entity in the list below on whether it meets the condition.
|
|
6
|
+
You will return only those that do.
|
|
7
|
+
BEGIN ENTITY LIST
|
|
8
|
+
{{ entities }}
|
|
9
|
+
END ENTITY LIST
|
|
10
|
+
Carefully review every single entity; skip none.
|
|
11
|
+
Do not skim: consider every entity with equal attention, including those buried in middle of list.
|
|
12
|
+
Essential you output all entities that meet the condition; triple check that you do not miss any.
|
|
13
|
+
Do not include any entities that don't meet the condition in output.
|
|
14
|
+
|
|
15
|
+
Output in following JSON format:
|
|
16
|
+
{
|
|
17
|
+
"entities meeting condition": ["<insert entity verbatim here", "<entity>", ...]
|
|
18
|
+
}
|
|
19
|
+
|
|
20
|
+
In output list, entity names must be taken verbatim from entity list with absolutely no modification.
|
|
21
|
+
Same case, same spelling, same punctuation, same formatting.
|
|
22
|
+
Only include entities from the provided list, and only those that meet the condition.
|
|
23
|
+
Your output list must purely be every single entity from the entity list that meets the condition; nothing overlooked, nothing included that doesn't meet condition.
|
|
24
|
+
{% if additional_instructions %}
|
|
25
|
+
The user has provided the following additional instructions, clarifications, or labelled examples (if these provided, rely heavily on them to calibrate yourself).
|
|
26
|
+
If they conflict with the other instructions, these user instructions take precedence.
|
|
27
|
+
|
|
28
|
+
BEGIN ADDITIONAL INSTRUCTIONS/CLARIFICATIONS
|
|
29
|
+
{{ additional_instructions }}
|
|
30
|
+
END ADDITIONAL INSTRUCTIONS/CLARIFICATIONS
|
|
31
|
+
{% endif %}
|
|
@@ -0,0 +1,80 @@
|
|
|
1
|
+
Your task is to use your vast internal knowledge and thorough searching of the frontier and highest quality academic literature to ideate on the following topic:
|
|
2
|
+
|
|
3
|
+
BEGIN TOPIC
|
|
4
|
+
{{ topic }}
|
|
5
|
+
END TOPIC
|
|
6
|
+
|
|
7
|
+
Your goal is novelty and brilliance: you are to fully apply yourself and come up with a truly new and specific theory/hypothesis that pushes the frontier of the field described in the topic.
|
|
8
|
+
Your thinking should be rooted in and inspired by existing literature, but you are to be like a professor and come up with a genuinely novel, important, and interesting theory relevant to the topic.
|
|
9
|
+
Be abundantly creative and novel; be willing to challenge the existing and go in profoundly new directions.
|
|
10
|
+
Leverage your deep interdisciplinary knowledge to potentially leverage ideas/observations from other fields too, if applicable.
|
|
11
|
+
|
|
12
|
+
Your aim is to embody a high quality researcher: fully explore what is already known/theorized, think very deeply about it all, iterate and be truly creative and new in your thinking, and come to something new and specific.
|
|
13
|
+
Your theory should be a genuine add to the literature relevant to the topic; not just new, but significant, specific, and capturing something important in the real world.
|
|
14
|
+
Don't be afraid to return to the drawing board if your initial theories aren't meeting this high standard.
|
|
15
|
+
Triple check that your theory is actually novel, or at the very least a novel twist/addendum in some important way on a existing theory.
|
|
16
|
+
Be creative and willing to go in different directions: say something truly new, not just simple riffs on existing work.
|
|
17
|
+
A good test is that your final theory has novel and unique testable predictions, that other existing theories wouldn't expect.
|
|
18
|
+
Think deeply about testable predictions if applicable; ensure they are unique and non-trivial, and change tack on the theory if the testable predictions are uninteresting.
|
|
19
|
+
Be new and don't be repetitive; better to be wrong than be uncreative and unambitious.
|
|
20
|
+
The ideal theory is novel, testable with unique predictions, and cleverly gets at something non-trivial and non-niche in the topic and the real world.
|
|
21
|
+
Don't just slap old theory onto something new like AI or quantum or DeFi; come up with a new and important idea (the subject/data could be old school, not necessarily anything recent).
|
|
22
|
+
Your theory must actually matter, not tied to some niche or fad or new fangled thing but explain something crucial.
|
|
23
|
+
|
|
24
|
+
If possible, invoke empirics in justifying and explaining your novel theory.
|
|
25
|
+
This often means real world data/prior findings/examples in the wild (i.e. not in academic literature), but it can also mean great and realistic hypotheticals/illustrative examples/analogies to explain the theory in a real world and understandable way.
|
|
26
|
+
Your audience is academics well versed in the topic, so don't dumb down.
|
|
27
|
+
Aim for a significant addition to the frontier literature.
|
|
28
|
+
At the same time, while the ideas should be frontier thought, your output writing should be a great read for both experts and a somewhat more casual audience as well (e.g. advanced college students).
|
|
29
|
+
It should capture your advanced, novel ideas and the literature while avoiding unnecessary jargon.
|
|
30
|
+
Don't hide behind a veil of complexity; be honest and clear about your theory and don't overhype.
|
|
31
|
+
Be the ultimate scientific communicator: valuable, not patronizing, easy to B.S. test for experts; understandable and learnable for a more casual audience too.
|
|
32
|
+
Lean towards a slightly pithy, academic casual voice; fun to read, easy to understand complex ideas, deep and meaningful yet interpretable and engaging.
|
|
33
|
+
Do NOT be vague. Be clear and use excellent explanatory sentences.
|
|
34
|
+
Use simple, easy to understand, casual language where possible; help the reader easily grasp the mechanisms and practical aspects of your theory.
|
|
35
|
+
Clear, edifying, specific, and easy to follow langauge is key.
|
|
36
|
+
Save technical complexity for the full thinking section; the earlier snippets should be extremely easy to follow for even a more casual audience.
|
|
37
|
+
The nutshell, abstract, etc need to convey the specific interesting ideas you are getting at for a wider audience to be engaged and understand the point.
|
|
38
|
+
|
|
39
|
+
Structure your output as follows (include the headings like "In a nutshell:" and the newlines in output):
|
|
40
|
+
BEGIN OUTPUT FORMAT
|
|
41
|
+
Title: [a concise, engaging, high quality, and fun title that captures your theory]
|
|
42
|
+
|
|
43
|
+
In a nutshell: [one pithy, efficient sentence that captures the core thrust of your theory; reading this one sentence should give readers a good sense of the deep idea you are getting at]
|
|
44
|
+
|
|
45
|
+
In one paragraph: [a single paragraph, like an abstract, that is concise but more thorough than the nutshell; specific and detailed, NOT vague, reading this alone should be enough to grasp the theory; captures the theory but also might touch on motivation/importance, where the theory is grounded, great examples/analogies that capture its empirical validity, how specifically it could be tested, etc]
|
|
46
|
+
|
|
47
|
+
Illustrative examples: [in one paragraph, at least one well described example of what your theory looks like in a real world scenario; very clear and specific and vivid to inspire understanding of your theory in action]
|
|
48
|
+
|
|
49
|
+
Testable predictions: [if applicable, in 1-3 concise paragraphs, outline some clear, testable predictions of your theory; novel predictions of real world phenomena that can be empirically tested and would be expected if your theory is true, but not if your theory is false; give a few novel predictions and be specific and unique to your theory]
|
|
50
|
+
|
|
51
|
+
The full thinking: [a thorough research memo, covering the theory, the relevant literature, the motivation, the background/significance, relevant math/data/examples/models, what further specific empirical tests could be run (detail experimental design); a reader should deeply understand the theory, its grounding, its novelty and creativity and potential, and its limitations]
|
|
52
|
+
END OUTPUT FORMAT
|
|
53
|
+
Output nothing but a report in the above format; no prefacing, no text outside of the core report.
|
|
54
|
+
|
|
55
|
+
Once more, the topic of focus to situate your novel, creative, and significant addition to the literature is:
|
|
56
|
+
|
|
57
|
+
BEGIN TOPIC
|
|
58
|
+
{{ topic }}
|
|
59
|
+
END TOPIC
|
|
60
|
+
|
|
61
|
+
{% if seed %}
|
|
62
|
+
The following is a seed to inspire and anchor your creative ideation.
|
|
63
|
+
Use it as a starting point within the topic for your thought process.
|
|
64
|
+
It is a place to start for your thinking and a nudge, but your thinking should take you far beyond this initialization and perhaps in a new direction.
|
|
65
|
+
Don't overindex on this seed but use it as an inspiration to get started; start here but go anywhere.
|
|
66
|
+
BEGIN SEED
|
|
67
|
+
{{ seed }}
|
|
68
|
+
END SEED
|
|
69
|
+
{% endif %}
|
|
70
|
+
|
|
71
|
+
Think creatively, think new thoughts, go in new directions, think deeply about what is known and what needs to be known, aim for useful and testable discovery; think like a professor.
|
|
72
|
+
|
|
73
|
+
{% if additional_instructions %}
|
|
74
|
+
The user has provided the following additional instructions, clarifications, or labelled examples (if these provided, rely heavily on them to calibrate your judgment).
|
|
75
|
+
If they conflict with the other instructions, these user instructions take precedence.
|
|
76
|
+
|
|
77
|
+
BEGIN ADDITIONAL INSTRUCTIONS/CLARIFICATIONS
|
|
78
|
+
{{ additional_instructions }}
|
|
79
|
+
END ADDITIONAL INSTRUCTIONS/CLARIFICATIONS
|
|
80
|
+
{% endif %}
|
|
@@ -0,0 +1,47 @@
|
|
|
1
|
+
Consider the following short list of terms:
|
|
2
|
+
BEGIN SHORT LIST
|
|
3
|
+
{{ short_list }}
|
|
4
|
+
END SHORT LIST
|
|
5
|
+
|
|
6
|
+
Your task is to find matches for each short list term in the long list:
|
|
7
|
+
BEGIN LONG LIST
|
|
8
|
+
{{ long_list }}
|
|
9
|
+
END LONG LIST
|
|
10
|
+
Consider the long list options thoroughly-beginning, middle, end. Don't skim.
|
|
11
|
+
|
|
12
|
+
For each short list term, find a suitable match in the long list that is as close as possible in meaning/substance to the short list term.
|
|
13
|
+
Think of this as an intelligent fuzzy matching task - it might be as obvious as a different spelling/formatting of the same term (e.g. "Mary Monroe" and "Marilyn Monroe"; "John Kennedy" and "JFK"; "cook" and "Cook County, IL").
|
|
14
|
+
But they might also be facially distinct but meaningfully equivalent (e.g. "heart rhythm doctor" and "927 - Cardiac Electrophysiology"; "Soviet folktales" and "Russian Folklore"; "minneapolis" and "Hennepin County").
|
|
15
|
+
If no suitable match, use "no certain match".
|
|
16
|
+
Only output short list/long list term pairs that are genuine matches.
|
|
17
|
+
Output only the single best match or "no certain match".
|
|
18
|
+
A single long list term can be used for multiple short list terms separately, if you are confident that is the best match for each (e.g. "minnesota" for both "olmsted" and "hennepin").
|
|
19
|
+
Abbreviations/acronyms: ensure match is logical expansion.
|
|
20
|
+
Especially if term is short or acronym, only match if certain.
|
|
21
|
+
Use "no certain match" unless extremely confident in match.
|
|
22
|
+
Don't force mediocre matches; default to "no certain match" unless you see a definitive match.
|
|
23
|
+
|
|
24
|
+
Output JSON only, in following format:
|
|
25
|
+
{
|
|
26
|
+
"<insert short list term here>": "<insert single best matching long list term or "no certain match" here>",
|
|
27
|
+
"<short list term>": "...",
|
|
28
|
+
...
|
|
29
|
+
}
|
|
30
|
+
|
|
31
|
+
Again, the short list terms you are finding matches for are: {{ short_list.keys() | shuffled }}
|
|
32
|
+
Include EVERY short list term in output; no drops.
|
|
33
|
+
Use "no certain match" for any short list terms with no match/low confidence: DO NOT MATCH UNLESS EXTREMELY CONFIDENT IT IS CORRECT.
|
|
34
|
+
Wrong matches are far worse than no match.
|
|
35
|
+
Term names from both lists must be taken verbatim, with absolutely no modification.
|
|
36
|
+
Same case, same spelling, same punctuation, same formatting.
|
|
37
|
+
It is imperative your keys are verbatim from the short list specifically, while the values are verbatim from the long list.
|
|
38
|
+
Any unique subtleties in spelling, formatting, etc must be preserved EXACTLY AS GIVEN and SPECIFIC to its corresponding list for mapping to work.
|
|
39
|
+
|
|
40
|
+
{% if additional_instructions %}
|
|
41
|
+
The user has provided the following additional instructions, clarifications, or labelled examples (if these provided, rely heavily on them to calibrate yourself).
|
|
42
|
+
If they conflict with the other instructions, these user instructions take precedence.
|
|
43
|
+
|
|
44
|
+
BEGIN ADDITIONAL INSTRUCTIONS/CLARIFICATIONS
|
|
45
|
+
{{ additional_instructions }}
|
|
46
|
+
END ADDITIONAL INSTRUCTIONS/CLARIFICATIONS
|
|
47
|
+
{% endif %}
|
|
@@ -0,0 +1,17 @@
|
|
|
1
|
+
Consider the following text:
|
|
2
|
+
BEGIN TEXT
|
|
3
|
+
{{ text }}
|
|
4
|
+
END TEXT
|
|
5
|
+
Your task is to rewrite the above text in accordance with the following instructions:
|
|
6
|
+
BEGIN INSTRUCTIONS
|
|
7
|
+
{{ instructions }}
|
|
8
|
+
END INSTRUCTIONS
|
|
9
|
+
The text should be rewritten in a way that is consistent with the instructions.
|
|
10
|
+
Be very diligent about following the instructions.
|
|
11
|
+
Output only your revised text, nothing else, no preamble, no postamble, no explanation, no nothing other than the revised text.
|
|
12
|
+
Consider the whole text, and follow the instructions carefully and completely.
|
|
13
|
+
Be meticulous and exacting; your output of rewritten text must be as faithful as possible in following the instructions.
|
|
14
|
+
Once again, the instructions are:
|
|
15
|
+
BEGIN INSTRUCTIONS
|
|
16
|
+
{{ instructions }}
|
|
17
|
+
END INSTRUCTIONS
|
|
@@ -0,0 +1,49 @@
|
|
|
1
|
+
{% import "snippets.jinja2" as snip %}
|
|
2
|
+
{{ snip.pair_entries(modality, entry_square, entry_circle, circle_first=circle_first|default(false)) }}
|
|
3
|
+
|
|
4
|
+
Essential: do not confuse or conflate the two entries.
|
|
5
|
+
Fully separate entry square from entry circle in your mind.
|
|
6
|
+
"square" and "circle" labels purely for referencing. Content was randomly assigned to each reference label, so bear no weight on entry order.
|
|
7
|
+
Give both entries full and equal attention. Check yourself; ensure equal comprehension of both.
|
|
8
|
+
|
|
9
|
+
Your task: for each attribute below, decide which entry (square or circle) manifests that attribute to a greater degree than the other entry.
|
|
10
|
+
|
|
11
|
+
BEGIN ATTRIBUTES
|
|
12
|
+
{{ attributes | shuffled_dict }}
|
|
13
|
+
END ATTRIBUTES
|
|
14
|
+
Each dictionary key is an attribute. If a definition is provided, use it to anchor judgment; otherwise use your best consistent definition.
|
|
15
|
+
|
|
16
|
+
Decision options per attribute:
|
|
17
|
+
"square" - entry square shows stronger **direct** signal than circle
|
|
18
|
+
"circle" - entry circle shows stronger **direct** signal than square
|
|
19
|
+
"draw" - truly indistinguishable strength
|
|
20
|
+
"insufficient signal" - **both** lack any direct signal
|
|
21
|
+
|
|
22
|
+
Rules:
|
|
23
|
+
- Judge each attribute independently and separately from each other
|
|
24
|
+
- Absolutely no indirect inference from other attributes or cross-attribute leakage (example of bad: inferring environmentalism from general political lean; good: measuring direct evidence of environmental opinions, no inference/biasing)
|
|
25
|
+
- Only measure and compare direct signal of each attribute alone in each entry, NOT what is implied by other attributes; CRUCIAL each attribute measured independently on its own direct, specifically relevant signal
|
|
26
|
+
- Identify even subtle differences; use "draw" only in rare case signals are truly equal
|
|
27
|
+
- Deep, holistic comparison of both entries is critical
|
|
28
|
+
- Use "insufficient signal" readily, whenever **both circle and square** have no direct signal on the attribute (i.e. neither has any content directly relevant to measuring that specific attribute)
|
|
29
|
+
- If attribute can only be inferred and not directly measured, use "insufficient signal"
|
|
30
|
+
|
|
31
|
+
Output JSON only, in following format:
|
|
32
|
+
{
|
|
33
|
+
"<insert attribute name here>": "<insert "square" or "circle" here, depending on which entry manifests this specific attribute more; else "draw" or "insufficient signal">",
|
|
34
|
+
"<attribute name>": "<insert one of "circle"|"square"|"draw"|"insufficient signal">",
|
|
35
|
+
...
|
|
36
|
+
}
|
|
37
|
+
|
|
38
|
+
Attributes you are adjudicating are: {{ attributes.keys() | shuffled }}
|
|
39
|
+
Assess EVERY attribute; no drops. Use these attribute names verbatim, with absolutely no modification.
|
|
40
|
+
Same case, same spelling, same punctuation, same formatting.
|
|
41
|
+
|
|
42
|
+
{% if additional_instructions %}
|
|
43
|
+
The user has provided the following additional instructions, clarifications, or labelled examples (if these provided, rely heavily on them to calibrate your judgment).
|
|
44
|
+
If they conflict with the other instructions, these user instructions take precedence.
|
|
45
|
+
|
|
46
|
+
BEGIN ADDITIONAL INSTRUCTIONS/CLARIFICATIONS
|
|
47
|
+
{{ additional_instructions }}
|
|
48
|
+
END ADDITIONAL INSTRUCTIONS/CLARIFICATIONS
|
|
49
|
+
{% endif %}
|
|
@@ -0,0 +1,50 @@
|
|
|
1
|
+
{% import "snippets.jinja2" as snip %}
|
|
2
|
+
{{ snip.single_entry(modality, text) }}
|
|
3
|
+
|
|
4
|
+
Your task: for each attribute below, rate how strongly the provided content manifests it.
|
|
5
|
+
|
|
6
|
+
BEGIN ATTRIBUTES
|
|
7
|
+
{{ attributes | shuffled_dict }}
|
|
8
|
+
END ATTRIBUTES
|
|
9
|
+
Each dictionary key is an attribute. If a definition is provided, use it to anchor judgment; otherwise use your best consistent definition.
|
|
10
|
+
|
|
11
|
+
BEGIN RATING SCALE
|
|
12
|
+
{% if scale %}
|
|
13
|
+
{{ scale }}
|
|
14
|
+
{% else %}
|
|
15
|
+
Use integers 0-100 (inclusive). low = absent; high = extreme; mid = moderate.
|
|
16
|
+
Use the full range and every increment; do not round to 5s/10s.
|
|
17
|
+
Extremes are rare: use near 0 only if truly absent and near 100 only if overwhelming.
|
|
18
|
+
Use moderate intermediates (e.g. 19, 67, 32) to account for nuance where applicable.
|
|
19
|
+
{% endif %}
|
|
20
|
+
END RATING SCALE
|
|
21
|
+
|
|
22
|
+
Method (per attribute): pick one exact integer. Stick to provided scale. Double-check before choosing extremes.
|
|
23
|
+
Interpret gradations as: absent→faint→moderate→abundant→extreme.
|
|
24
|
+
Don't overlook subtlety; don't default to extremes. Consider full spectrum, including intermediate gradations.
|
|
25
|
+
High accuracy/precision is critical; needs deep, holistic analysis of content.
|
|
26
|
+
|
|
27
|
+
Rules:
|
|
28
|
+
- Judge each attribute independently and separately from each other
|
|
29
|
+
- Absolutely no indirect inference from other attributes or cross-attribute leakage (example of bad: inferring environmentalism from general political lean; good: measuring direct evidence of environmental opinions, no inference/biasing)
|
|
30
|
+
- Only measure the direct signal of each attribute alone in the content, NOT what is implied by other attributes; CRUCIAL each attribute measured independently on its own direct, specifically relevant signal
|
|
31
|
+
|
|
32
|
+
Output JSON only, in following format:
|
|
33
|
+
{
|
|
34
|
+
"<insert attribute name here>": <insert corresponding rating here>,
|
|
35
|
+
"<attribute name>": <corresponding rating>,
|
|
36
|
+
...
|
|
37
|
+
}
|
|
38
|
+
|
|
39
|
+
Attributes you are measuring in the content are: {{ attributes.keys() | shuffled }}
|
|
40
|
+
Assess EVERY attribute; no drops. Use these attribute names verbatim, with absolutely no modification.
|
|
41
|
+
Same case, same spelling, same punctuation, same formatting.
|
|
42
|
+
|
|
43
|
+
{% if additional_instructions %}
|
|
44
|
+
The user has provided the following additional instructions, clarifications, or labelled examples (if these provided, rely heavily on them to calibrate your judgment).
|
|
45
|
+
If they conflict with the other instructions, these user instructions take precedence.
|
|
46
|
+
|
|
47
|
+
BEGIN ADDITIONAL INSTRUCTIONS/CLARIFICATIONS
|
|
48
|
+
{{ additional_instructions }}
|
|
49
|
+
END ADDITIONAL INSTRUCTIONS/CLARIFICATIONS
|
|
50
|
+
{% endif %}
|
|
@@ -0,0 +1,40 @@
|
|
|
1
|
+
Consider the following geographical region and the following research topic:
|
|
2
|
+
|
|
3
|
+
Region: {{ region }}
|
|
4
|
+
Topic: {{ topic }}
|
|
5
|
+
|
|
6
|
+
Your task is to thoroughly research the region using web tools and your internal knowledge base, specifically focusing on the following topic as it exists within that specific region: {{ topic }}
|
|
7
|
+
Don't just research the topic at large; focus on gathering lots of information about the topic within this region specifically.
|
|
8
|
+
Be flexible in how you approach research the region. If, for example, it is a ZIP code, first search what neighborhoods/comunities make up the ZIP code, and then research those on the topic. Or say it is a college, consider the college as the "region" and research the college's website, student publications, faculty lists, RateMyProfessor, etc, whatever primary sources that are relevant.
|
|
9
|
+
We want to have a comprehensive characterization of the topic within this region, especially of how it manifests in this region uniquely from the average of all regions of a similar class (e.g. counties, countries, ZIPs, whatever the relevant administrative unit is).
|
|
10
|
+
This means you should consider not only descriptions of the region, but also where the region is situated relative to its peer administrative units, in terms of how the topic manifests in this region relative to other regions.
|
|
11
|
+
It is critical that you look for a representative sample of relevant (ideally primary) sources within the region, not just the loudest voices.
|
|
12
|
+
This means exploring social media sources, local news outlets, blogs, Reddit forums, local business reviews, school websites, local neighborhood forums, etc.
|
|
13
|
+
Use whatever primary, representative sources that will inform you about the real world manifestations of the topic in this specific region.
|
|
14
|
+
Think of at least a few creative primary sources you might explore, in addition to the obvious sources, so as to attack the topic from different angles.
|
|
15
|
+
Cast a wide net and gather as much information as possible.
|
|
16
|
+
Representative primary sources should be your north star. Be very creative in your search for sources and focus on collecting what real, regular people are saying about the topic in this region, not just the loudest voices or pundits or your preconceived notions.
|
|
17
|
+
Once you are done with your thorough research, your output is a long and highly detailed report that characterizes the topic within this region.
|
|
18
|
+
It should be 3000+ words long, and include many direct quotes from the many different sources you find.
|
|
19
|
+
It should give as broad, deep, comprehensive, and representative a picture of the topic within this region as possible.
|
|
20
|
+
Also crucial: be objective, not overly rosy or negative. Have a skeptical mindset -- people will often advertise/portray their region/institution in a positive light.
|
|
21
|
+
It is incumbent on you to put together a representative and real picture of the topic in this region, not just what is sold. Read between the lines when necessary.
|
|
22
|
+
|
|
23
|
+
Spare no detail, directly quote and paraphrase as many primary sources as possible, be creative and smart about deducing what set of sources will best inform you about the real world, authentic, representative manifestations of the topic in this specific region.
|
|
24
|
+
Your report should be objective and descriptive, almost entirely just restating through direct quotes and paraphrasing the primary sources you find.
|
|
25
|
+
You must explore the Internet smartly, first finding the best ways to get relevant primary sources specific to this region, and then going to them and exploring them thoroughly.
|
|
26
|
+
Your first, ideal option for primary sources might not work for this region, which is why you should explore multiple options for getting relevant and representative primary sources (e.g. exploring a school webpage, a school board meeting transcript, social media posts, business reviews, etc, not just one of these but multiple angles of attack).
|
|
27
|
+
If your current approach is not finding good sources for the topic in the region, be creative and try a completely different approach to get relevant primary sources.
|
|
28
|
+
Explore sources fully; don't just stop at a website homepage. If there is a promising avenue, follow it and explore it thoroughly, even if it means navigating to subpages or following links.
|
|
29
|
+
Remember: your final report should largely be a compendium of the relevant and locally representative primary sources you find, mostly just direct quotes or rich summaries of the sources you find.
|
|
30
|
+
Don't try to craft a narrative, just compile the sources you find. It is fine and expected to often have internal disagreements and contradictions amongst the sources you find.
|
|
31
|
+
|
|
32
|
+
Once more, the topic and the region:
|
|
33
|
+
|
|
34
|
+
Region: {{ region }}
|
|
35
|
+
Topic: {{ topic }}
|
|
36
|
+
|
|
37
|
+
{% if additional_instructions %}
|
|
38
|
+
**Additional instructions**
|
|
39
|
+
{{ additional_instructions }}
|
|
40
|
+
{% endif %}
|
|
@@ -0,0 +1,43 @@
|
|
|
1
|
+
Consider the following instructions on the sort of entities we wish to create:
|
|
2
|
+
|
|
3
|
+
BEGIN ENTITY INSTRUCTIONS
|
|
4
|
+
{{ instructions }}
|
|
5
|
+
END ENTITY INSTRUCTIONS
|
|
6
|
+
|
|
7
|
+
Your task is to generate {{ entities_per_generation }} such entities in accordance with the above instructions.
|
|
8
|
+
It is crucial that each entity is distinct from the others; ensure mutual exclusivity and accurate/representative coverage of the whole domain of entities specified in the instructions.
|
|
9
|
+
Do not repeat yourself; each entity in your output should be its own unique creation from each other and each must tread new ground and go in a novel direction.
|
|
10
|
+
If the instructions specify a distribution, ensure the overall set of entities represent that distribution accurately (NOT overrepresenting minorities, but accurately capturing the true statistical reality of said distribution).
|
|
11
|
+
|
|
12
|
+
For example, if asked to initialize personas that represent the US population, you would use your knowledge to ensure the overall set of entities matches known statistics, such as the fraction of entities of each race or state or educational background or job or personality traits.
|
|
13
|
+
In this case, carefully consider existing entities and your knowledge of the true statistical distributions; if a certain characteristic appears overrepresented (according to STATISTICS not perception), downweight it in your output as to rebalance the overall set (e.g. too many women, output fewer women to reach true distribution; too few Christians, output extra).
|
|
14
|
+
If a certain demographic is only 1% of the population, it should rarely be included in output at all and avoided, to ensure its prevalence in the total set is accurate.
|
|
15
|
+
Consider fraction of your {{ entities_per_generation }} entities that should reasonably be devoted to any group (i.e. a 7% demographic should only get that percent of slots, and less if previously overrepresented); rare demographics should get none or very few slots, based on their population frequency.
|
|
16
|
+
|
|
17
|
+
If the instructions instead emphasize true creativity and novelty, you must put extra effort into each entity going in a genuinely different direction than the others, with some going in truly unusual and hypercreative directions to ensure mutual exclusivity and radical novelty.
|
|
18
|
+
For example, if asked to generate novel areas for scientific research, each idea should be specific but going in a clearly novel direction from the others.
|
|
19
|
+
Mutual exclusivity and truly different approaches are key; each should chart its own unique path.
|
|
20
|
+
If some topic is overrepresented in the existing entities, avoid that area and go somewhere completely new.
|
|
21
|
+
|
|
22
|
+
{% if existing_entities %}
|
|
23
|
+
The following are a sample of entities you have already generated.
|
|
24
|
+
In addition to making each element of your generated entity list unique from one another, you must ensure they are unique from these too.
|
|
25
|
+
Carefully read these existing entities to ensure your don't tread the same ground.
|
|
26
|
+
BEGIN EXISTING ENTITIES
|
|
27
|
+
{{ existing_entities }}
|
|
28
|
+
END EXISTING ENTITIES
|
|
29
|
+
Again: your new entities must not repeat what has already been done.
|
|
30
|
+
Your set of entities is ALL of them (existing + new generations); consider the entire set as one.
|
|
31
|
+
{% endif %}
|
|
32
|
+
|
|
33
|
+
Output in the following JSON format:
|
|
34
|
+
BEGIN OUTPUT FORMAT
|
|
35
|
+
{
|
|
36
|
+
"entity 1": "<insert unique entity here>",
|
|
37
|
+
"entity 2": "<insert entity here, unique from others>",
|
|
38
|
+
...,
|
|
39
|
+
"entity n": "<insert unique entity here>"
|
|
40
|
+
}
|
|
41
|
+
END OUTPUT FORMAT
|
|
42
|
+
|
|
43
|
+
Ensure you output exactly {{ entities_per_generation }} unique entities.
|