aurelian 0.3.2__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- aurelian/__init__.py +9 -0
- aurelian/agents/__init__.py +0 -0
- aurelian/agents/amigo/__init__.py +3 -0
- aurelian/agents/amigo/amigo_agent.py +77 -0
- aurelian/agents/amigo/amigo_config.py +85 -0
- aurelian/agents/amigo/amigo_evals.py +73 -0
- aurelian/agents/amigo/amigo_gradio.py +52 -0
- aurelian/agents/amigo/amigo_mcp.py +152 -0
- aurelian/agents/amigo/amigo_tools.py +152 -0
- aurelian/agents/biblio/__init__.py +42 -0
- aurelian/agents/biblio/biblio_agent.py +94 -0
- aurelian/agents/biblio/biblio_config.py +40 -0
- aurelian/agents/biblio/biblio_gradio.py +67 -0
- aurelian/agents/biblio/biblio_mcp.py +115 -0
- aurelian/agents/biblio/biblio_tools.py +164 -0
- aurelian/agents/biblio_agent.py +46 -0
- aurelian/agents/checklist/__init__.py +44 -0
- aurelian/agents/checklist/checklist_agent.py +85 -0
- aurelian/agents/checklist/checklist_config.py +28 -0
- aurelian/agents/checklist/checklist_gradio.py +70 -0
- aurelian/agents/checklist/checklist_mcp.py +86 -0
- aurelian/agents/checklist/checklist_tools.py +141 -0
- aurelian/agents/checklist/content/checklists.yaml +7 -0
- aurelian/agents/checklist/content/streams.csv +136 -0
- aurelian/agents/checklist_agent.py +40 -0
- aurelian/agents/chemistry/__init__.py +3 -0
- aurelian/agents/chemistry/chemistry_agent.py +46 -0
- aurelian/agents/chemistry/chemistry_config.py +71 -0
- aurelian/agents/chemistry/chemistry_evals.py +79 -0
- aurelian/agents/chemistry/chemistry_gradio.py +50 -0
- aurelian/agents/chemistry/chemistry_mcp.py +120 -0
- aurelian/agents/chemistry/chemistry_tools.py +121 -0
- aurelian/agents/chemistry/image_agent.py +15 -0
- aurelian/agents/d4d/__init__.py +30 -0
- aurelian/agents/d4d/d4d_agent.py +72 -0
- aurelian/agents/d4d/d4d_config.py +46 -0
- aurelian/agents/d4d/d4d_gradio.py +58 -0
- aurelian/agents/d4d/d4d_mcp.py +71 -0
- aurelian/agents/d4d/d4d_tools.py +157 -0
- aurelian/agents/d4d_agent.py +64 -0
- aurelian/agents/diagnosis/__init__.py +33 -0
- aurelian/agents/diagnosis/diagnosis_agent.py +53 -0
- aurelian/agents/diagnosis/diagnosis_config.py +48 -0
- aurelian/agents/diagnosis/diagnosis_evals.py +76 -0
- aurelian/agents/diagnosis/diagnosis_gradio.py +52 -0
- aurelian/agents/diagnosis/diagnosis_mcp.py +141 -0
- aurelian/agents/diagnosis/diagnosis_tools.py +204 -0
- aurelian/agents/diagnosis_agent.py +28 -0
- aurelian/agents/draw/__init__.py +3 -0
- aurelian/agents/draw/draw_agent.py +39 -0
- aurelian/agents/draw/draw_config.py +26 -0
- aurelian/agents/draw/draw_gradio.py +50 -0
- aurelian/agents/draw/draw_mcp.py +94 -0
- aurelian/agents/draw/draw_tools.py +100 -0
- aurelian/agents/draw/judge_agent.py +18 -0
- aurelian/agents/filesystem/__init__.py +0 -0
- aurelian/agents/filesystem/filesystem_config.py +27 -0
- aurelian/agents/filesystem/filesystem_gradio.py +49 -0
- aurelian/agents/filesystem/filesystem_mcp.py +89 -0
- aurelian/agents/filesystem/filesystem_tools.py +95 -0
- aurelian/agents/filesystem/py.typed +0 -0
- aurelian/agents/github/__init__.py +0 -0
- aurelian/agents/github/github_agent.py +83 -0
- aurelian/agents/github/github_cli.py +248 -0
- aurelian/agents/github/github_config.py +22 -0
- aurelian/agents/github/github_gradio.py +152 -0
- aurelian/agents/github/github_mcp.py +252 -0
- aurelian/agents/github/github_tools.py +408 -0
- aurelian/agents/github/github_tools.py.tmp +413 -0
- aurelian/agents/goann/__init__.py +13 -0
- aurelian/agents/goann/documents/Transcription_Factors_Annotation_Guidelines.md +1000 -0
- aurelian/agents/goann/documents/Transcription_Factors_Annotation_Guidelines.pdf +0 -0
- aurelian/agents/goann/documents/Transcription_Factors_Annotation_Guidelines_Paper.md +693 -0
- aurelian/agents/goann/documents/Transcription_Factors_Annotation_Guidelines_Paper.pdf +0 -0
- aurelian/agents/goann/goann_agent.py +90 -0
- aurelian/agents/goann/goann_config.py +90 -0
- aurelian/agents/goann/goann_evals.py +104 -0
- aurelian/agents/goann/goann_gradio.py +62 -0
- aurelian/agents/goann/goann_mcp.py +0 -0
- aurelian/agents/goann/goann_tools.py +65 -0
- aurelian/agents/gocam/__init__.py +43 -0
- aurelian/agents/gocam/documents/DNA-binding transcription factor activity annotation guidelines.docx +0 -0
- aurelian/agents/gocam/documents/DNA-binding transcription factor activity annotation guidelines.pdf +0 -0
- aurelian/agents/gocam/documents/DNA-binding_transcription_factor_activity_annotation_guidelines.md +100 -0
- aurelian/agents/gocam/documents/E3 ubiquitin ligases.docx +0 -0
- aurelian/agents/gocam/documents/E3 ubiquitin ligases.pdf +0 -0
- aurelian/agents/gocam/documents/E3_ubiquitin_ligases.md +134 -0
- aurelian/agents/gocam/documents/GO-CAM annotation guidelines README.docx +0 -0
- aurelian/agents/gocam/documents/GO-CAM annotation guidelines README.pdf +0 -0
- aurelian/agents/gocam/documents/GO-CAM modelling guidelines TO DO.docx +0 -0
- aurelian/agents/gocam/documents/GO-CAM modelling guidelines TO DO.pdf +0 -0
- aurelian/agents/gocam/documents/GO-CAM_annotation_guidelines_README.md +1 -0
- aurelian/agents/gocam/documents/GO-CAM_modelling_guidelines_TO_DO.md +3 -0
- aurelian/agents/gocam/documents/How to annotate complexes in GO-CAM.docx +0 -0
- aurelian/agents/gocam/documents/How to annotate complexes in GO-CAM.pdf +0 -0
- aurelian/agents/gocam/documents/How to annotate molecular adaptors.docx +0 -0
- aurelian/agents/gocam/documents/How to annotate molecular adaptors.pdf +0 -0
- aurelian/agents/gocam/documents/How to annotate sequestering proteins.docx +0 -0
- aurelian/agents/gocam/documents/How to annotate sequestering proteins.pdf +0 -0
- aurelian/agents/gocam/documents/How_to_annotate_complexes_in_GO-CAM.md +29 -0
- aurelian/agents/gocam/documents/How_to_annotate_molecular_adaptors.md +31 -0
- aurelian/agents/gocam/documents/How_to_annotate_sequestering_proteins.md +42 -0
- aurelian/agents/gocam/documents/Molecular adaptor activity.docx +0 -0
- aurelian/agents/gocam/documents/Molecular adaptor activity.pdf +0 -0
- aurelian/agents/gocam/documents/Molecular carrier activity.docx +0 -0
- aurelian/agents/gocam/documents/Molecular carrier activity.pdf +0 -0
- aurelian/agents/gocam/documents/Molecular_adaptor_activity.md +51 -0
- aurelian/agents/gocam/documents/Molecular_carrier_activity.md +41 -0
- aurelian/agents/gocam/documents/Protein sequestering activity.docx +0 -0
- aurelian/agents/gocam/documents/Protein sequestering activity.pdf +0 -0
- aurelian/agents/gocam/documents/Protein_sequestering_activity.md +50 -0
- aurelian/agents/gocam/documents/Signaling receptor activity annotation guidelines.docx +0 -0
- aurelian/agents/gocam/documents/Signaling receptor activity annotation guidelines.pdf +0 -0
- aurelian/agents/gocam/documents/Signaling_receptor_activity_annotation_guidelines.md +187 -0
- aurelian/agents/gocam/documents/Transcription coregulator activity.docx +0 -0
- aurelian/agents/gocam/documents/Transcription coregulator activity.pdf +0 -0
- aurelian/agents/gocam/documents/Transcription_coregulator_activity.md +36 -0
- aurelian/agents/gocam/documents/Transporter activity annotation annotation guidelines.docx +0 -0
- aurelian/agents/gocam/documents/Transporter activity annotation annotation guidelines.pdf +0 -0
- aurelian/agents/gocam/documents/Transporter_activity_annotation_annotation_guidelines.md +43 -0
- Regulatory Processes in GO-CAM.docx +0 -0
- Regulatory Processes in GO-CAM.pdf +0 -0
- aurelian/agents/gocam/documents/WIP_-_Regulation_and_Regulatory_Processes_in_GO-CAM.md +31 -0
- aurelian/agents/gocam/documents/md/DNA-binding_transcription_factor_activity_annotation_guidelines.md +131 -0
- aurelian/agents/gocam/documents/md/E3_ubiquitin_ligases.md +166 -0
- aurelian/agents/gocam/documents/md/GO-CAM_annotation_guidelines_README.md +1 -0
- aurelian/agents/gocam/documents/md/GO-CAM_modelling_guidelines_TO_DO.md +5 -0
- aurelian/agents/gocam/documents/md/How_to_annotate_complexes_in_GO-CAM.md +28 -0
- aurelian/agents/gocam/documents/md/How_to_annotate_molecular_adaptors.md +19 -0
- aurelian/agents/gocam/documents/md/How_to_annotate_sequestering_proteins.md +38 -0
- aurelian/agents/gocam/documents/md/Molecular_adaptor_activity.md +52 -0
- aurelian/agents/gocam/documents/md/Molecular_carrier_activity.md +59 -0
- aurelian/agents/gocam/documents/md/Protein_sequestering_activity.md +52 -0
- aurelian/agents/gocam/documents/md/Signaling_receptor_activity_annotation_guidelines.md +271 -0
- aurelian/agents/gocam/documents/md/Transcription_coregulator_activity.md +54 -0
- aurelian/agents/gocam/documents/md/Transporter_activity_annotation_annotation_guidelines.md +38 -0
- aurelian/agents/gocam/documents/md/WIP_-_Regulation_and_Regulatory_Processes_in_GO-CAM.md +39 -0
- aurelian/agents/gocam/documents/pandoc_md/Signaling_receptor_activity_annotation_guidelines.md +334 -0
- aurelian/agents/gocam/gocam_agent.py +240 -0
- aurelian/agents/gocam/gocam_config.py +85 -0
- aurelian/agents/gocam/gocam_curator_agent.py +46 -0
- aurelian/agents/gocam/gocam_evals.py +67 -0
- aurelian/agents/gocam/gocam_gradio.py +89 -0
- aurelian/agents/gocam/gocam_mcp.py +224 -0
- aurelian/agents/gocam/gocam_tools.py +294 -0
- aurelian/agents/linkml/__init__.py +0 -0
- aurelian/agents/linkml/linkml_agent.py +62 -0
- aurelian/agents/linkml/linkml_config.py +48 -0
- aurelian/agents/linkml/linkml_evals.py +66 -0
- aurelian/agents/linkml/linkml_gradio.py +45 -0
- aurelian/agents/linkml/linkml_mcp.py +186 -0
- aurelian/agents/linkml/linkml_tools.py +102 -0
- aurelian/agents/literature/__init__.py +3 -0
- aurelian/agents/literature/literature_agent.py +55 -0
- aurelian/agents/literature/literature_config.py +35 -0
- aurelian/agents/literature/literature_gradio.py +52 -0
- aurelian/agents/literature/literature_mcp.py +174 -0
- aurelian/agents/literature/literature_tools.py +182 -0
- aurelian/agents/monarch/__init__.py +25 -0
- aurelian/agents/monarch/monarch_agent.py +44 -0
- aurelian/agents/monarch/monarch_config.py +45 -0
- aurelian/agents/monarch/monarch_gradio.py +51 -0
- aurelian/agents/monarch/monarch_mcp.py +65 -0
- aurelian/agents/monarch/monarch_tools.py +113 -0
- aurelian/agents/oak/__init__.py +0 -0
- aurelian/agents/oak/oak_config.py +27 -0
- aurelian/agents/oak/oak_gradio.py +57 -0
- aurelian/agents/ontology_mapper/__init__.py +31 -0
- aurelian/agents/ontology_mapper/ontology_mapper_agent.py +56 -0
- aurelian/agents/ontology_mapper/ontology_mapper_config.py +50 -0
- aurelian/agents/ontology_mapper/ontology_mapper_evals.py +108 -0
- aurelian/agents/ontology_mapper/ontology_mapper_gradio.py +58 -0
- aurelian/agents/ontology_mapper/ontology_mapper_mcp.py +81 -0
- aurelian/agents/ontology_mapper/ontology_mapper_tools.py +147 -0
- aurelian/agents/phenopackets/__init__.py +3 -0
- aurelian/agents/phenopackets/phenopackets_agent.py +58 -0
- aurelian/agents/phenopackets/phenopackets_config.py +72 -0
- aurelian/agents/phenopackets/phenopackets_evals.py +99 -0
- aurelian/agents/phenopackets/phenopackets_gradio.py +55 -0
- aurelian/agents/phenopackets/phenopackets_mcp.py +178 -0
- aurelian/agents/phenopackets/phenopackets_tools.py +127 -0
- aurelian/agents/rag/__init__.py +40 -0
- aurelian/agents/rag/rag_agent.py +83 -0
- aurelian/agents/rag/rag_config.py +80 -0
- aurelian/agents/rag/rag_gradio.py +67 -0
- aurelian/agents/rag/rag_mcp.py +107 -0
- aurelian/agents/rag/rag_tools.py +189 -0
- aurelian/agents/rag_agent.py +54 -0
- aurelian/agents/robot/__init__.py +0 -0
- aurelian/agents/robot/assets/__init__.py +3 -0
- aurelian/agents/robot/assets/template.md +384 -0
- aurelian/agents/robot/robot_config.py +25 -0
- aurelian/agents/robot/robot_gradio.py +46 -0
- aurelian/agents/robot/robot_mcp.py +100 -0
- aurelian/agents/robot/robot_ontology_agent.py +139 -0
- aurelian/agents/robot/robot_tools.py +50 -0
- aurelian/agents/talisman/__init__.py +3 -0
- aurelian/agents/talisman/talisman_agent.py +126 -0
- aurelian/agents/talisman/talisman_config.py +66 -0
- aurelian/agents/talisman/talisman_gradio.py +50 -0
- aurelian/agents/talisman/talisman_mcp.py +168 -0
- aurelian/agents/talisman/talisman_tools.py +720 -0
- aurelian/agents/ubergraph/__init__.py +40 -0
- aurelian/agents/ubergraph/ubergraph_agent.py +71 -0
- aurelian/agents/ubergraph/ubergraph_config.py +79 -0
- aurelian/agents/ubergraph/ubergraph_gradio.py +48 -0
- aurelian/agents/ubergraph/ubergraph_mcp.py +69 -0
- aurelian/agents/ubergraph/ubergraph_tools.py +118 -0
- aurelian/agents/uniprot/__init__.py +37 -0
- aurelian/agents/uniprot/uniprot_agent.py +43 -0
- aurelian/agents/uniprot/uniprot_config.py +43 -0
- aurelian/agents/uniprot/uniprot_evals.py +99 -0
- aurelian/agents/uniprot/uniprot_gradio.py +48 -0
- aurelian/agents/uniprot/uniprot_mcp.py +168 -0
- aurelian/agents/uniprot/uniprot_tools.py +136 -0
- aurelian/agents/web/__init__.py +0 -0
- aurelian/agents/web/web_config.py +27 -0
- aurelian/agents/web/web_gradio.py +48 -0
- aurelian/agents/web/web_mcp.py +50 -0
- aurelian/agents/web/web_tools.py +108 -0
- aurelian/chat.py +23 -0
- aurelian/cli.py +800 -0
- aurelian/dependencies/__init__.py +0 -0
- aurelian/dependencies/workdir.py +78 -0
- aurelian/mcp/__init__.py +0 -0
- aurelian/mcp/amigo_mcp_test.py +86 -0
- aurelian/mcp/config_generator.py +123 -0
- aurelian/mcp/example_config.json +43 -0
- aurelian/mcp/generate_sample_config.py +37 -0
- aurelian/mcp/gocam_mcp_test.py +126 -0
- aurelian/mcp/linkml_mcp_tools.py +190 -0
- aurelian/mcp/mcp_discovery.py +87 -0
- aurelian/mcp/mcp_test.py +31 -0
- aurelian/mcp/phenopackets_mcp_test.py +103 -0
- aurelian/tools/__init__.py +0 -0
- aurelian/tools/web/__init__.py +0 -0
- aurelian/tools/web/url_download.py +51 -0
- aurelian/utils/__init__.py +0 -0
- aurelian/utils/async_utils.py +15 -0
- aurelian/utils/data_utils.py +32 -0
- aurelian/utils/documentation_manager.py +59 -0
- aurelian/utils/doi_fetcher.py +238 -0
- aurelian/utils/ontology_utils.py +68 -0
- aurelian/utils/pdf_fetcher.py +23 -0
- aurelian/utils/process_logs.py +100 -0
- aurelian/utils/pubmed_utils.py +238 -0
- aurelian/utils/pytest_report_to_markdown.py +67 -0
- aurelian/utils/robot_ontology_utils.py +112 -0
- aurelian/utils/search_utils.py +95 -0
- aurelian-0.3.2.dist-info/LICENSE +22 -0
- aurelian-0.3.2.dist-info/METADATA +105 -0
- aurelian-0.3.2.dist-info/RECORD +254 -0
- aurelian-0.3.2.dist-info/WHEEL +4 -0
- aurelian-0.3.2.dist-info/entry_points.txt +3 -0
@@ -0,0 +1,1000 @@
|
|
1
|
+
Gene Ontology guidelines for transcription factor
|
2
|
+
annotation
|
3
|
+
|
4
|
+
Authors: Pascale Gaudet, Colin Logie, Ruth Lovering
|
5
|
+
Date last updated: 2023-10-24
|
6
|
+
|
7
|
+
GO has refactored the MF branch representing the activities of proteins involved in transcription.
|
8
|
+
In addition to RNA polymerase, we defined three different types of activities involved in
|
9
|
+
transcription and its regulation:
|
10
|
+
|
11
|
+
I. GTFs: General transcription factors, the molecular machine that assembles
|
12
|
+
with the RNA polymerase at the promoter to form the pre-initiation complex (PIC).
|
13
|
+
II. dbTFs: Specific DNA binding transcription factors that provide genomic
|
14
|
+
addresses and specify the cell types and the conditions under which specific
|
15
|
+
genes are expressed. Central to dbTF function is their binding to specific DNA
|
16
|
+
sequences (often named transcription factor binding sites (TFBS), and
|
17
|
+
III. coTFs: Transcription coregulators (also known as transcription cofactors)
|
18
|
+
serve multiple functions, such as bridging the GTF and the dbTFs, specifying the
|
19
|
+
regulatory effect of DbTFs, modifying the chromatin structure to render it more or
|
20
|
+
less accessible for transcription. coTFs normally exert their function independent
|
21
|
+
of high affinity binding to specific DNA sequences.
|
22
|
+
|
23
|
+
The present guidelines aim to help curators apply the revised transcription terms. Please use
|
24
|
+
more specific child terms whenever possible.
|
25
|
+
|
26
|
+
GTFs annotations should include, depending on the evidence available:
|
27
|
+
|
28
|
+
● MF
|
29
|
+
|
30
|
+
○ GO:0140223 general transcription initiation factor activity
|
31
|
+
|
32
|
+
● BP
|
33
|
+
|
34
|
+
○ GO:0006351 transcription, DNA-templated
|
35
|
+
|
36
|
+
● CC
|
37
|
+
|
38
|
+
is active in GO:0000785 chromatin
|
39
|
+
|
40
|
+
○
|
41
|
+
○ part of GO:0097550 transcriptional preinitiation complex
|
42
|
+
|
43
|
+
dbTFs annotations may include:
|
44
|
+
|
45
|
+
● MF
|
46
|
+
|
47
|
+
○ GO:0003700 DNA binding transcription factor activity
|
48
|
+
|
49
|
+
1
|
50
|
+
|
51
|
+
■ GO:0001228 DNA-binding transcription activator activity, RNA
|
52
|
+
|
53
|
+
polymerase II-specific
|
54
|
+
|
55
|
+
■ GO:0001227 DNA-binding transcription repressor activity, RNA
|
56
|
+
|
57
|
+
polymerase II-specific
|
58
|
+
|
59
|
+
○ GO:0000987 cis-regulatory region sequence-specific DNA binding
|
60
|
+
|
61
|
+
■ GO:0000978 RNA polymerase II cis-regulatory region sequence-specific
|
62
|
+
|
63
|
+
● BP
|
64
|
+
|
65
|
+
DNA binding
|
66
|
+
|
67
|
+
○ GO:0006355 regulation of transcription, DNA-templated,
|
68
|
+
|
69
|
+
■ GO:0045893 positive regulation of transcription, DNA-templated children
|
70
|
+
■ GO:0045892 negative regulation of transcription, DNA-templated children
|
71
|
+
|
72
|
+
● CC
|
73
|
+
|
74
|
+
is active in GO:0000785 chromatin
|
75
|
+
|
76
|
+
○
|
77
|
+
○ part of GO:0005667 transcription factor complex
|
78
|
+
|
79
|
+
coTFs annotations may include:
|
80
|
+
|
81
|
+
Note that coTFs perform a wide range of functions, the common functions are listed below,
|
82
|
+
however, this list is not exhaustive:
|
83
|
+
|
84
|
+
● MF
|
85
|
+
|
86
|
+
○ GO:0003712 transcription coregulator activity
|
87
|
+
|
88
|
+
■ GO:0003713 transcription coactivator activity
|
89
|
+
■ GO:0003714 transcription corepressor activity
|
90
|
+
|
91
|
+
○ GO:0140097 catalytic activity, acting on DNA
|
92
|
+
|
93
|
+
■ GO:0009008 DNA-methyltransferase activity
|
94
|
+
■ etc
|
95
|
+
|
96
|
+
○ GO:0140096 catalytic activity, acting on a protein
|
97
|
+
■ GO:0033558 protein deacetylase activity
|
98
|
+
|
99
|
+
● GO:0004407 histone deacetylase activity
|
100
|
+
|
101
|
+
○ GO:0030234 enzyme regulator activity
|
102
|
+
|
103
|
+
■ GO:0035034 histone acetyltransferase regulator activity
|
104
|
+
■ GO:0035033 histone deacetylase regulator activity
|
105
|
+
|
106
|
+
○ GO:0140297 DNA-binding transcription factor binding
|
107
|
+
|
108
|
+
● BP
|
109
|
+
|
110
|
+
○ GO:0006355 regulation of transcription, DNA-templated
|
111
|
+
○ GO:0031507 heterochromatin assembly
|
112
|
+
|
113
|
+
■
|
114
|
+
■
|
115
|
+
■
|
116
|
+
■
|
117
|
+
|
118
|
+
GO:0140719 constitutive heterochromatin assembly
|
119
|
+
GO:0140718 facultative heterochromatin assembly
|
120
|
+
GO:0071514 genomic imprinting
|
121
|
+
GO:0033696 heterochromatin boundary formation
|
122
|
+
|
123
|
+
○ GO:0016570 histone modification
|
124
|
+
|
125
|
+
2
|
126
|
+
|
127
|
+
■ GO:0031056 regulation of histone modification
|
128
|
+
|
129
|
+
○ GO:0006304 DNA modification
|
130
|
+
|
131
|
+
■ GO:0006306 DNA methylation
|
132
|
+
|
133
|
+
● GO:0044030 regulation of DNA methylation
|
134
|
+
|
135
|
+
● CC
|
136
|
+
|
137
|
+
is active in GO:0000785 chromatin
|
138
|
+
|
139
|
+
○
|
140
|
+
○ part of GO:0005667 transcription factor complex
|
141
|
+
|
142
|
+
Other transcription regulator activities
|
143
|
+
|
144
|
+
There are also proteins that inhibit dbTFs, for example by sequestering them in the cytoplasm or
|
145
|
+
nucleus (‘dbTF-inhibitors’). The difference between coTFs and dbTF regulators is that the latter
|
146
|
+
do not act at the genomic location of target regulated gene, whereas coTFs are associated with
|
147
|
+
the transcriptional regulatory complex. Therefore, the input of the coTF is the dbTF, not the
|
148
|
+
target gene.
|
149
|
+
MF
|
150
|
+
|
151
|
+
●
|
152
|
+
|
153
|
+
○ GO:0140416 DNA-binding transcription factor inhibitor activity
|
154
|
+
○
|
155
|
+
● BP:
|
156
|
+
|
157
|
+
looking for example for positive regulators
|
158
|
+
|
159
|
+
○ GO:0006355 regulation of transcription, DNA-templated
|
160
|
+
|
161
|
+
● CC
|
162
|
+
|
163
|
+
○ as appropriate
|
164
|
+
|
165
|
+
Annotation extensions
|
166
|
+
|
167
|
+
- MF and BP: target gene of a dbTF or a coTF: has_input [dbTF]
|
168
|
+
- MF localization: is_active in [GO:cellular component, cell, tissue….]
|
169
|
+
- BP localization: occurs_in [GO:cellular component, cell, tissue….]
|
170
|
+
|
171
|
+
Annotating a transcription regulator from experimental data
|
172
|
+
|
173
|
+
The following annotation approach follows the strategy that we currently recommend, to ensure
|
174
|
+
that curators use all information available, and do not restrict to annotating papers individually
|
175
|
+
and out of the more general context. Note that the information does not necessarily need to be
|
176
|
+
extracted from a single paper; reviewing a wide range of papers is recommended to ensure
|
177
|
+
annotations are as accurate as possible, so that annotations are based on multiple observations
|
178
|
+
from different articles and independent research groups.
|
179
|
+
|
180
|
+
The following four questions provide a checklist to assess whether a gene can be annotated as
|
181
|
+
a transcriptional regulator:
|
182
|
+
|
183
|
+
3
|
184
|
+
|
185
|
+
1. What is the starting hypothesis: are the authors characterizing a transcription
|
186
|
+
|
187
|
+
regulator? Scientific models are built by adding new data to the existing corpus of
|
188
|
+
evidence. New data can either support or contradict existing models. The introduction
|
189
|
+
section of research articles can be used to understand what prior knowledge the article
|
190
|
+
builds on, and what aspect of the existing model or what new model the authors are
|
191
|
+
assessing. The intent of the authors is essential to understand what GO term should be
|
192
|
+
chosen, with the caveat that inconsistent terminology has been used in transcription
|
193
|
+
research articles and therefore may not always be consistent with the GO term labels.
|
194
|
+
Curators must look carefully at the GO term definitions and the placement of the term in
|
195
|
+
the ontology to ensure that the meaning of the GO term corresponds to the concept
|
196
|
+
being described in the article.
|
197
|
+
|
198
|
+
2. Does knowledge from specific protein domains or characterized orthologs
|
199
|
+
|
200
|
+
support the hypothesis?
|
201
|
+
The presence of specific domains and the existence of well characterized orthologs can
|
202
|
+
provide useful support for interpreting experimental data. Note that domain information
|
203
|
+
and sequence homology data should be used very carefully: not all domains have a
|
204
|
+
single function; and only closely related orthologs whose function have been
|
205
|
+
unambiguously characterized can be used to support the association of a gene with a
|
206
|
+
GO term, if those are consistent with the experimental data presented in the article.
|
207
|
+
|
208
|
+
a. GTFs: GTFs have been characterized in several organisms, from bacteria, to
|
209
|
+
yeast, to mammalian cells (PMID:25693126), and therefore orthology should
|
210
|
+
provide strong support for the decision to associate these proteins with a child
|
211
|
+
specific for RNA polymerase I, II or III of the MF term "GO:0140223 general
|
212
|
+
transcription initiation factor activity". In addition, the naming of GTFs is well
|
213
|
+
established across human and model organism nomenclature groups and can be
|
214
|
+
used to help guide these decisions. Thus, for human GTFs the HUGO Gene
|
215
|
+
Nomenclature Committee (HGNC, www.genenames.org) provide the gene
|
216
|
+
symbol TAF#, for TATA-box binding protein associated factors, and GTF2#s and
|
217
|
+
GTF3#s, for general transcription factor II and III subunits respectively, although
|
218
|
+
a few GTFs have more specific names such as BTAF1: B-TFIID TATA-box
|
219
|
+
binding protein associated factor 1.
|
220
|
+
|
221
|
+
b. dbTFs: Gene products associated with the GO term "GO:0003700 DNA-binding
|
222
|
+
transcription factor activity" should have experimental evidence to confirm
|
223
|
+
their ability to bind DNA and that this binding regulates the expression of a
|
224
|
+
limited set of target genes. In these cases the direct target gene(s) can also be
|
225
|
+
included in the annotation using the "has input relation". Proteins that belong to
|
226
|
+
families of well characterized transcription factors, such as those that contain
|
227
|
+
GATA and homeobox domains and proteins with a one-to-one ortholog already
|
228
|
+
demonstrated to be a dbTF, weaker evidence of DNA binding, such as ChIP
|
229
|
+
experiments is sufficient. For proteins with domains known to be associated with
|
230
|
+
|
231
|
+
4
|
232
|
+
|
233
|
+
functions other than DNA binding (such as zinc fingers) or proteins with
|
234
|
+
enzymatic activity, strong evidence of DNA binding is required.
|
235
|
+
|
236
|
+
In addition, in some cases, neither the protein nor a member of the protein’s
|
237
|
+
family will have been previously associated with the dbTF activity term. In these
|
238
|
+
cases, clear experimental evidence of sequence-specific DNA binding and gene
|
239
|
+
transcription regulation via cognate DNA motifs located in gene-associated
|
240
|
+
cis-regulatory modules will be required for the protein to be classified as a dbTFs.
|
241
|
+
|
242
|
+
c. coTFs: A coTF is defined as a protein that interacts specifically with a dbTF or a
|
243
|
+
coTF at a cis-regulatory region (GO:0003712). This interaction either activates or
|
244
|
+
represses the transcription of specific genes, often acting by altering chromatin
|
245
|
+
structure and modifications. There are many roles that coregulators can play: for
|
246
|
+
example, one class of transcription coregulators modifies chromatin structure
|
247
|
+
through covalent modification of histones. A second class of coregulators
|
248
|
+
modifies the conformation of chromatin in an ATP-dependent manner. A third
|
249
|
+
class modulates interactions of dbTFs with other coTFs.
|
250
|
+
|
251
|
+
Many coTFs have enzymatic activity and do not bind DNA. For coTFs that do
|
252
|
+
bind DNA, many recognize very short, common DNA sequences, not sufficiently
|
253
|
+
unique to enable the coTF to regulate the expression of a limited set of genes in
|
254
|
+
a discrete environmental or developmental stage (for example AT-hook coTFs).
|
255
|
+
|
256
|
+
The distinction between a dbTF and a coTF can be difficult to make, so a more
|
257
|
+
exhaustive review of the literature, including looking at the characterized
|
258
|
+
orthologs, is highly recommended before annotating a coTF.
|
259
|
+
|
260
|
+
3. Are there other GO annotations or published experimental results consistent with
|
261
|
+
|
262
|
+
the hypothesis ?
|
263
|
+
Keeping in mind the gene-by-gene/pathway-by-pathway annotation approach, it can be
|
264
|
+
valuable to take into account results from other research articles, to make sure results
|
265
|
+
and annotations are consistent. Curators should avoid creating annotations that are
|
266
|
+
inconsistent with existing annotations, by either choosing a different term for annotation,
|
267
|
+
or by reviewing and eventually disputing annotations that appear to be incorrect (see
|
268
|
+
next section).
|
269
|
+
|
270
|
+
4. Are the experimental results consistent with the hypothesis ?
|
271
|
+
|
272
|
+
The curator should carefully look at the results presented in the paper and, if those are
|
273
|
+
consistent with the hypothesis that this is a transcription regulator, then appropriate GO
|
274
|
+
annotations can be made.
|
275
|
+
|
276
|
+
Note that DNA-binding transcription factors as well as co-regulators often act as activators or
|
277
|
+
repressors in different promoters or dependent on the context, so annotation to both activator
|
278
|
+
|
279
|
+
5
|
280
|
+
|
281
|
+
and repressor is not considered inconsistent. This may be further resolved through additional
|
282
|
+
context details, e.g. cell type, environmental conditions, etc.
|
283
|
+
|
284
|
+
Reviewing existing annotations
|
285
|
+
|
286
|
+
If time allows, it is very useful for the research community and for future annotation that other
|
287
|
+
annotations associated with the gene be reviewed. If there are conflicting annotations, then the
|
288
|
+
supporting data should be reviewed to see whether the annotations are inconsistent with the
|
289
|
+
data, in which case annotations should be fixed.
|
290
|
+
|
291
|
+
In cases where the primary data is conflicting across different papers (for example a protein is
|
292
|
+
sometimes described as a transcription factor, and sometimes as a coregulator), then the
|
293
|
+
literature should be reviewed carefully to decide whether:
|
294
|
+
|
295
|
+
i.
|
296
|
+
ii.
|
297
|
+
|
298
|
+
iii.
|
299
|
+
|
300
|
+
iv.
|
301
|
+
|
302
|
+
the annotation is incorrect (bad choice of term, wrong protein annotated)
|
303
|
+
knowledge has evolved. If necessary some (usually older) papers should
|
304
|
+
be marked as ‘do not curate’ and the associated annotations should be
|
305
|
+
removed.
|
306
|
+
the protein plays multiple roles under different conditions (ie,
|
307
|
+
acts as a DNA-binding transcription factor in certain contexts and as a
|
308
|
+
cofactor on others), as these two molecular functions are not mutually
|
309
|
+
exclusive.
|
310
|
+
no clear activity has been established yet, in which case: either
|
311
|
+
annotations to both DNA-binding transcription factor and cofactor may be
|
312
|
+
made (if the evidence supports it), or if the data is not sufficient, do not
|
313
|
+
annotate.
|
314
|
+
|
315
|
+
GO-CAM representation of transcription
|
316
|
+
|
317
|
+
GO-CAM Modeling Guidelines: DNA binding transcription factor activity
|
318
|
+
|
319
|
+
Potential pitfalls
|
320
|
+
|
321
|
+
The annotation of dbTFs can be challenging for multiple reasons:
|
322
|
+
|
323
|
+
● Some activities are hard to distinguish from each other, and adding to this difficulty,
|
324
|
+
transcription regulators form complexes that are more or less fluid, so that it can be
|
325
|
+
difficult to detect which protein in a complex is responsible for a specific activity.
|
326
|
+
It can be difficult to distinguish certain activities, and some proteins do have multiple
|
327
|
+
functions.
|
328
|
+
|
329
|
+
●
|
330
|
+
|
331
|
+
6
|
332
|
+
|
333
|
+
● Researchers use "transcription factor" loosely. It can mean a GTF, a dbTF, or a coTF.
|
334
|
+
● The terms "cofactor", "coactivator", and "corepressor" are also used for different
|
335
|
+
|
336
|
+
activities: either as described in this document, and sometimes (especially in older
|
337
|
+
papers), they are used to describe a dbTF that acts as a dimer.
|
338
|
+
|
339
|
+
● The complexity of the transcription process means that no single experiment is usually
|
340
|
+
sufficient to define the function of a protein: interpretation of experimental results that
|
341
|
+
investigate dbTFs must rely on existing knowledge.
|
342
|
+
|
343
|
+
● Many proteins presumed to function as dbTFs have never been experimentally
|
344
|
+
|
345
|
+
demonstrated to bind DNA, but their role is indirectly inferred by the presence of specific
|
346
|
+
domains and in some cases, evidence of an effect on the transcription of putative direct
|
347
|
+
target genes.
|
348
|
+
|
349
|
+
● The presence of a DNA binding domain in a protein does not always imply the protein
|
350
|
+
|
351
|
+
functions as a dbTF.
|
352
|
+
|
353
|
+
________________________ADDITIONAL INFORMATION_____________________
|
354
|
+
|
355
|
+
Additional information
|
356
|
+
|
357
|
+
Ontology structure
|
358
|
+
|
359
|
+
Molecular Function
|
360
|
+
|
361
|
+
MF: GO:0140110 transcription regulator activity and children
|
362
|
+
|
363
|
+
The transcription regulator activity (GO:0140110) MF branch of the GO describes the activities
|
364
|
+
of transcription regulators: GTFs, dbTF, and coTFs (Figure 1).
|
365
|
+
|
366
|
+
7
|
367
|
+
|
368
|
+
Figure 1. Transcription regulator activity branch of the Gene Ontology. This part of the
|
369
|
+
Molecular Function (MF) ontology describes the activities of transcription regulators: GTFs,
|
370
|
+
dbTF, and coTFs. Highlighted in blue are the most specific GO terms available to describe these
|
371
|
+
activities in eukaryotic cells (prokaryotes use a single polymerase).
|
372
|
+
|
373
|
+
MF: GO:0000987 cis-regulatory region sequence-specific DNA binding
|
374
|
+
|
375
|
+
The transcription regulatory region sequence-specific DNA-binding sub-tree of GO includes
|
376
|
+
terms describing specific regulatory regions, such as the core promoter (including the TATA box
|
377
|
+
and the transcription start site), cis-regulatory regions (bound by dbTFs), and specific types of
|
378
|
+
cis-regulatory motifs (such as E-box and N-box). Overview of the GO structure for DNA binding
|
379
|
+
activities is shown in Figure 1.
|
380
|
+
|
381
|
+
8
|
382
|
+
|
383
|
+
Figure 2. DNA binding branch of the Gene Ontology. This part of the Molecular Function
|
384
|
+
(MF) ontology describes DNA binding. The key DNA binding terms that should be associated
|
385
|
+
with dbTF are highlighted in blue, more specific GO terms are available to provide information
|
386
|
+
about the DNA motifs bound by the dbTF.
|
387
|
+
|
388
|
+
Biological process
|
389
|
+
|
390
|
+
**NOTE THAT THIS PART OF THE ONTOLOGY IS STILL BEING REVIEWED**
|
391
|
+
There are currently two axes of classification: by product (Figure 3) and by the RNA polymerase
|
392
|
+
generating the transcript (Figure 4). Ideally both terms should be used; the product generated
|
393
|
+
by the transcription event is the most meaningful biologically. We plan to remove the RNA
|
394
|
+
polymerase-specific branch of the BP ontology. Having a single branch will ensure that
|
395
|
+
consistent and efficient annotation can be achieved.
|
396
|
+
|
397
|
+
For the 'regulation of transcription' branch ("GO:0006355 regulation of transcription,
|
398
|
+
DNA-templated", and children; Figure 5), the 'product' axis does not currently exist. We will
|
399
|
+
rename the polymerase-centric terms to the product-specific equivalent as appropriate. If there
|
400
|
+
are two polymerases responsible for the same transcript, two terms will be instantiated (for
|
401
|
+
|
402
|
+
9
|
403
|
+
|
404
|
+
example, GO:1905380 regulation of snRNA transcription by RNA polymerase II; NEW term
|
405
|
+
regulation of snRNA transcription by RNA polymerase III will both remain).
|
406
|
+
|
407
|
+
Figure 3. Transcription branch of the Gene Ontology. This part of the BP ontology provides
|
408
|
+
terms to describe the role of GTFs in transcription of a specific type of transcript.
|
409
|
+
|
410
|
+
Figure 4. Transcription branch of the Gene Ontology. This part of the BP terms available to
|
411
|
+
describe the role of GTFs in transcription mediated by a specific RNA polymerase.
|
412
|
+
|
413
|
+
Figure 5. Regulation of transcription branch of the Gene Ontology. This part of the BP
|
414
|
+
terms available to describe the role of dbTFs and coTFs in regulating transcription. Highlighted
|
415
|
+
in blue are the key GO terms available to describe regulation of transcription in eukaryotic cells
|
416
|
+
(prokaryotes use a single polymerase). The more specific child terms for the highlighted terms
|
417
|
+
describe the role of transcription in other processes, eg "GO:1900464 negative regulation of
|
418
|
+
cellular hyperosmotic salinity response by negative regulation of transcription from RNA
|
419
|
+
|
420
|
+
10
|
421
|
+
|
422
|
+
polymerase II promoter", these terms are being phased out. Instead, aside from its biochemical
|
423
|
+
activity in the process of transcription, a transcription factor can be annotated to the biological
|
424
|
+
process it receives input from, as a molecular signaling pathway or biomolecular sensor
|
425
|
+
endpoint, and provides transcriptional output for through its target genes.
|
426
|
+
|
427
|
+
Cellular components: Complexes and cellular locations
|
428
|
+
|
429
|
+
**NOTE THAT THIS PART OF THE ONTOLOGY IS STILL BEING REVIEWED**
|
430
|
+
|
431
|
+
Annotation examples from research articles
|
432
|
+
|
433
|
+
Below we describe annotation work flows from a workshop held at EBI in April 2020.
|
434
|
+
|
435
|
+
This is a concept, perhaps other format is needed, a landscape page may be better suited
|
436
|
+
|
437
|
+
hypothesis-exampl
|
438
|
+
e/steps
|
439
|
+
|
440
|
+
GTF
|
441
|
+
|
442
|
+
dbTF
|
443
|
+
|
444
|
+
coTF
|
445
|
+
|
446
|
+
Example article
|
447
|
+
|
448
|
+
PMID:10924514
|
449
|
+
|
450
|
+
PMID:26314965
|
451
|
+
|
452
|
+
PMID:22783022
|
453
|
+
|
454
|
+
Step 1: Hypothesis
|
455
|
+
|
456
|
+
GTF2H2 is a subunit of
|
457
|
+
the TFIIH GTF.
|
458
|
+
|
459
|
+
NKX6.3 is a
|
460
|
+
transcription
|
461
|
+
regulator.
|
462
|
+
|
463
|
+
Step 2: Database
|
464
|
+
mining for protein
|
465
|
+
domains and
|
466
|
+
orthologs
|
467
|
+
|
468
|
+
GTF2H2 contains a
|
469
|
+
TFIIH subunit Ssl1/p44
|
470
|
+
domain (IPR012170),
|
471
|
+
described in InterPro as
|
472
|
+
a component of the
|
473
|
+
transcription factor
|
474
|
+
TFIIH core.
|
475
|
+
UniProt describes
|
476
|
+
GTF2H2 as a
|
477
|
+
"Component of the
|
478
|
+
TFIID-containing RNA
|
479
|
+
polymerase II
|
480
|
+
pre-initiation complex"
|
481
|
+
(https://www.uniprot.org
|
482
|
+
/uniprot/Q13888#functi
|
483
|
+
on).
|
484
|
+
|
485
|
+
NKX6.3 contains a
|
486
|
+
DNA binding
|
487
|
+
homeobox domain
|
488
|
+
(IPR020479).
|
489
|
+
UniProt describes
|
490
|
+
NKX6.3 as a
|
491
|
+
"Putative
|
492
|
+
transcription factor,
|
493
|
+
which may be
|
494
|
+
involved in
|
495
|
+
patterning of central
|
496
|
+
nervous system and
|
497
|
+
pancreas."
|
498
|
+
(https://www.uniprot.
|
499
|
+
org/uniprot/A6NJ46#
|
500
|
+
function).
|
501
|
+
|
502
|
+
SIN3A (Q96ST3) is a
|
503
|
+
transcription
|
504
|
+
co-repressor.
|
505
|
+
|
506
|
+
SIN3A contains a
|
507
|
+
SIN3A domain
|
508
|
+
(IPR037969), among
|
509
|
+
others. IPR037969 is
|
510
|
+
associated with the
|
511
|
+
GO Molecular
|
512
|
+
Function GO:0003714
|
513
|
+
transcription
|
514
|
+
corepressor activity
|
515
|
+
(https://www.ebi.ac.uk
|
516
|
+
/interpro/entry/InterPr
|
517
|
+
o/IPR037969/)
|
518
|
+
UniProt describes
|
519
|
+
SIN3A as "Acts as a
|
520
|
+
transcriptional
|
521
|
+
repressor.
|
522
|
+
Corepressor for
|
523
|
+
|
524
|
+
11
|
525
|
+
|
526
|
+
Step 3: GO
|
527
|
+
annotation and
|
528
|
+
literature mining
|
529
|
+
|
530
|
+
Existing annotations
|
531
|
+
were consistent with
|
532
|
+
the hypothesis that the
|
533
|
+
gene encodes a GTF:
|
534
|
+
there are IDA
|
535
|
+
annotations to
|
536
|
+
GO:0016251 RNA
|
537
|
+
polymerase II general
|
538
|
+
transcription initiation
|
539
|
+
factor activity and
|
540
|
+
GO:0008353 RNA
|
541
|
+
polymerase II CTD
|
542
|
+
heptapeptide repeat
|
543
|
+
kinase activity, both
|
544
|
+
functions of GTFs
|
545
|
+
subunits.
|
546
|
+
|
547
|
+
Existing annotations
|
548
|
+
were consistent with
|
549
|
+
the hypothesis that
|
550
|
+
the gene encodes a
|
551
|
+
dbTF.
|
552
|
+
- GO:0003700 (and
|
553
|
+
children)
|
554
|
+
DNA-binding
|
555
|
+
transcription factor
|
556
|
+
activity
|
557
|
+
IDA, IBA, ISA, ISM
|
558
|
+
- GO:0000978
|
559
|
+
RNA polymerase II
|
560
|
+
cis-regulatory region
|
561
|
+
sequence-specific
|
562
|
+
DNA binding
|
563
|
+
IBA
|
564
|
+
- GO:0043565
|
565
|
+
sequence-specific
|
566
|
+
DNA binding
|
567
|
+
IDA, IBA
|
568
|
+
|
569
|
+
REST."
|
570
|
+
(https://www.uniprot.o
|
571
|
+
rg/uniprot/Q96ST3#fu
|
572
|
+
nction).
|
573
|
+
|
574
|
+
Existing annotations
|
575
|
+
were conflicting. After
|
576
|
+
review of the curated
|
577
|
+
articles, several
|
578
|
+
annotations were
|
579
|
+
removed, because
|
580
|
+
many of the activities
|
581
|
+
were those of proteins
|
582
|
+
SIN3A interacts with,
|
583
|
+
not its function.
|
584
|
+
Searching PubMed
|
585
|
+
for SIN3A, and
|
586
|
+
filtering for reviews,
|
587
|
+
since there is a lot of
|
588
|
+
literature describing
|
589
|
+
SIN3A, result in many
|
590
|
+
articles mentioning
|
591
|
+
epigenetics, further
|
592
|
+
supporting the coTF
|
593
|
+
repressor hypothesis (
|
594
|
+
https://pubmed.ncbi.nl
|
595
|
+
m.nih.gov/?term=Sin3
|
596
|
+
a&filter=pubt.review).
|
597
|
+
|
598
|
+
Step 4: New data
|
599
|
+
extraction from
|
600
|
+
publication
|
601
|
+
|
602
|
+
Figure 4, compares
|
603
|
+
wild type GTF2H2 and
|
604
|
+
a mutant in abortive
|
605
|
+
initiation and promoter
|
606
|
+
escape assays. The
|
607
|
+
results show that the
|
608
|
+
mutant GTF2H2 is
|
609
|
+
unable to initiate
|
610
|
+
transcription. This data
|
611
|
+
supports the annotation
|
612
|
+
GO:0016251 RNA
|
613
|
+
polymerase II general
|
614
|
+
|
615
|
+
Figure 5 shows
|
616
|
+
increased
|
617
|
+
expression of target
|
618
|
+
genes containing
|
619
|
+
predicted binding
|
620
|
+
motif sequences
|
621
|
+
(TAAT), which was
|
622
|
+
abolished by
|
623
|
+
decreasing NKX6.3
|
624
|
+
expression with an
|
625
|
+
antisense transcript.
|
626
|
+
Moreover, ChIP data
|
627
|
+
confirmed that
|
628
|
+
|
629
|
+
Figure 4 shows by
|
630
|
+
ChIP assay that the
|
631
|
+
SIN3A is found in the
|
632
|
+
vicinity of the SOCS3
|
633
|
+
gene, a gene
|
634
|
+
regulated by STAT3.
|
635
|
+
The same figure also
|
636
|
+
shows that STAT3
|
637
|
+
does not interact with
|
638
|
+
the SOCS3 promoter
|
639
|
+
in the presence of
|
640
|
+
SIN3A, indicating a
|
641
|
+
repressor effect of
|
642
|
+
|
643
|
+
12
|
644
|
+
|
645
|
+
transcription initiation
|
646
|
+
factor activity
|
647
|
+
|
648
|
+
SIN3A on STAT3's
|
649
|
+
transcription activator
|
650
|
+
function. This
|
651
|
+
supports an
|
652
|
+
annotation of SIN3A
|
653
|
+
to:
|
654
|
+
- GO:0003714
|
655
|
+
transcription
|
656
|
+
corepressor activity
|
657
|
+
- GO:0000122
|
658
|
+
negative regulation of
|
659
|
+
transcription by RNA
|
660
|
+
polymerase II
|
661
|
+
|
662
|
+
NKX6.3 is bound to
|
663
|
+
regulatory regions of
|
664
|
+
the target genes
|
665
|
+
(see notes above on
|
666
|
+
the use of ChIP data
|
667
|
+
to support an
|
668
|
+
annotation). The
|
669
|
+
dbTF activity
|
670
|
+
annotation for
|
671
|
+
NKX6.3 can
|
672
|
+
therefore include the
|
673
|
+
direct target genes
|
674
|
+
information using
|
675
|
+
the relation "has
|
676
|
+
input".
|
677
|
+
This supports an
|
678
|
+
annotation of
|
679
|
+
NKX6.3 to:
|
680
|
+
- GO:0000978 RNA
|
681
|
+
polymerase II
|
682
|
+
cis-regulatory region
|
683
|
+
sequence-specific
|
684
|
+
DNA binding
|
685
|
+
- GO:0001228
|
686
|
+
DNA-binding
|
687
|
+
transcription
|
688
|
+
activator activity,
|
689
|
+
RNA polymerase
|
690
|
+
II-specific
|
691
|
+
- GO:0001227
|
692
|
+
DNA-binding
|
693
|
+
transcription
|
694
|
+
repressor activity,
|
695
|
+
RNA polymerase
|
696
|
+
II-specific
|
697
|
+
- GO:0045944
|
698
|
+
positive regulation of
|
699
|
+
transcription by RNA
|
700
|
+
polymerase II
|
701
|
+
GO:0000122
|
702
|
+
negative regulation
|
703
|
+
|
704
|
+
13
|
705
|
+
|
706
|
+
of transcription by
|
707
|
+
RNA polymerase II
|
708
|
+
|
709
|
+
Evidence for a GTF
|
710
|
+
|
711
|
+
PMID:10924514 Studies of the function of the GTF2H2 subunit of the TFIIH GTF
|
712
|
+
|
713
|
+
1. The hypothesis is that GTF2H2 is a subunit of the TFIIH GTF.
|
714
|
+
2. Protein domains or characterized orthologs: GTF2H2 contains a TFIIH subunit
|
715
|
+
|
716
|
+
Ssl1/p44 domain (IPR012170), described in InterPro as a component of the transcription
|
717
|
+
factor TFIIH core.
|
718
|
+
UniProt describes GTF2H2 as a "Component of the TFIID-containing RNA polymerase II
|
719
|
+
pre-initiation complex" (https://www.uniprot.org/uniprot/Q13888#function).
|
720
|
+
|
721
|
+
3. Are there other GO annotations or published experimental results consistent with
|
722
|
+
the hypothesis ? Yes, existing annotations were consistent with the hypothesis that the
|
723
|
+
gene encodes a GTF: there are IDA annotations to GO:0016251 RNA polymerase II
|
724
|
+
general transcription initiation factor activity and GO:0008353 RNA polymerase II CTD
|
725
|
+
heptapeptide repeat kinase activity, both functions of GTFs subunits.
|
726
|
+
|
727
|
+
4. Are the experimental results consistent with the hypothesis ? Yes, GTF
|
728
|
+
|
729
|
+
GO:0016251 RNA polymerase II general transcription initiation factor activity from Figure
|
730
|
+
4, which compares wild type GTF2H2 and a mutant in abortive initiation and promoter
|
731
|
+
escape assays. The results show that the mutant GTF2H2 is unable to initiate
|
732
|
+
transcription.
|
733
|
+
|
734
|
+
Evidence for a dbTF
|
735
|
+
|
736
|
+
PMID:26314965 studies the activity of the NKX6.3 (UniProt:A6NJ46) transcription factor. Going
|
737
|
+
through the checklist shows the following:
|
738
|
+
|
739
|
+
1. The hypothesis stated by the authors is that NKX6.3 is a transcription regulator.
|
740
|
+
2. Protein domains or characterized orthologs: NKX6.3 contains a DNA binding
|
741
|
+
|
742
|
+
homeobox domain (IPR020479). UniProt describes NKX6.3 as a "Putative transcription
|
743
|
+
factor, which may be involved in patterning of central nervous system and pancreas."
|
744
|
+
(https://www.uniprot.org/uniprot/A6NJ46#function).
|
745
|
+
|
746
|
+
3. Are there other GO annotations or published experimental results consistent with
|
747
|
+
|
748
|
+
the hypothesis ?
|
749
|
+
All already existing annotations were consistent with the hypothesis that the gene
|
750
|
+
encodes a dbTF.
|
751
|
+
|
752
|
+
GO ID
|
753
|
+
|
754
|
+
Term label
|
755
|
+
|
756
|
+
Evidence(s)
|
757
|
+
|
758
|
+
Consistent with
|
759
|
+
hypothesis?
|
760
|
+
|
761
|
+
14
|
762
|
+
|
763
|
+
GO:0003700
|
764
|
+
(and children)
|
765
|
+
|
766
|
+
DNA-binding transcription factor
|
767
|
+
activity
|
768
|
+
|
769
|
+
IDA, IBA, ISA, ISM
|
770
|
+
|
771
|
+
Yes
|
772
|
+
|
773
|
+
GO:0000978
|
774
|
+
|
775
|
+
RNA polymerase II cis-regulatory
|
776
|
+
region sequence-specific DNA binding
|
777
|
+
|
778
|
+
IBA
|
779
|
+
|
780
|
+
GO:0043565
|
781
|
+
|
782
|
+
sequence-specific DNA binding
|
783
|
+
|
784
|
+
IDA, IBA
|
785
|
+
|
786
|
+
Yes
|
787
|
+
|
788
|
+
Yes
|
789
|
+
|
790
|
+
4. Are the experimental results consistent with the hypothesis ? Yes, Figure 5 shows
|
791
|
+
increased expression of target genes containing predicted binding motif sequences
|
792
|
+
(TAAT), which was abolished by decreasing NKX6.3 expression with an antisense
|
793
|
+
transcript. Moreover, ChIP data confirmed that NKX6.3 is bound to regulatory regions of
|
794
|
+
the target genes (see notes above on the use of ChIP data to support an annotation).
|
795
|
+
The dbTF activity annotation for NKX6.3 can therefore include the direct target genes
|
796
|
+
information using the relation "has input".
|
797
|
+
This supports an annotation of NKX6.3 to:
|
798
|
+
|
799
|
+
- GO:0000978 RNA polymerase II cis-regulatory region sequence-specific DNA
|
800
|
+
|
801
|
+
binding
|
802
|
+
|
803
|
+
- GO:0001228 DNA-binding transcription activator activity, RNA polymerase
|
804
|
+
|
805
|
+
II-specific
|
806
|
+
|
807
|
+
- GO:0001227 DNA-binding transcription repressor activity, RNA polymerase
|
808
|
+
|
809
|
+
II-specific
|
810
|
+
|
811
|
+
- GO:0045944 positive regulation of transcription by RNA polymerase II
|
812
|
+
- GO:0000122 negative regulation of transcription by RNA polymerase II
|
813
|
+
|
814
|
+
Evidence for a coTF
|
815
|
+
|
816
|
+
PMID:22783022 studies the activity of the SIN3A transcription co-activator. This paper shows
|
817
|
+
that the transcription factor STAT3 is a target of regulation by SIN3A. Going through the
|
818
|
+
checklist shows the following:
|
819
|
+
|
820
|
+
1. The starting hypothesis stated by the authors is that SIN3A (Q96ST3) is a transcription
|
821
|
+
|
822
|
+
co-repressor.
|
823
|
+
|
824
|
+
2. Protein domains or characterized orthologs:
|
825
|
+
|
826
|
+
SIN3A contains a SIN3A domain (IPR037969), among others. IPR037969 is associated
|
827
|
+
with the GO Molecular Function GO:0003714 transcription corepressor activity
|
828
|
+
(https://www.ebi.ac.uk/interpro/entry/InterPro/IPR037969/)
|
829
|
+
UniProt describes SIN3A as "Acts as a transcriptional repressor. Corepressor for REST."
|
830
|
+
(https://www.uniprot.org/uniprot/Q96ST3#function).
|
831
|
+
|
832
|
+
3. Are there other GO annotations or published experimental results consistent with
|
833
|
+
|
834
|
+
the hypothesis ?
|
835
|
+
|
836
|
+
a. Existing annotations are conflicting. After review of the original articles, several
|
837
|
+
|
838
|
+
annotations were removed, because many of the activities were those of proteins
|
839
|
+
SIN3A interacts with, not its function.
|
840
|
+
|
841
|
+
15
|
842
|
+
|
843
|
+
b. Searching PubMed for SIN3A, and filtering for reviews, since there is a lot of
|
844
|
+
literature describing SIN3A, result in many articles mentioning epigenetics,
|
845
|
+
further supporting the coTF repressor hypothesis (
|
846
|
+
https://pubmed.ncbi.nlm.nih.gov/?term=Sin3a&filter=pubt.review).
|
847
|
+
|
848
|
+
GO ID
|
849
|
+
|
850
|
+
Term label
|
851
|
+
|
852
|
+
Evidence(s)
|
853
|
+
|
854
|
+
Consistent with
|
855
|
+
hypothesis? /Action if
|
856
|
+
not
|
857
|
+
|
858
|
+
GO:0003714
|
859
|
+
|
860
|
+
transcription corepressor activity
|
861
|
+
|
862
|
+
IBA
|
863
|
+
|
864
|
+
Yes
|
865
|
+
|
866
|
+
GO:0003700 DNA-binding transcription factor activity
|
867
|
+
|
868
|
+
IEA (Ensembl)
|
869
|
+
|
870
|
+
No / Removed
|
871
|
+
|
872
|
+
GO:0000976
|
873
|
+
|
874
|
+
transcription regulatory region
|
875
|
+
sequence-specific DNA binding
|
876
|
+
|
877
|
+
GO:0033558
|
878
|
+
|
879
|
+
protein deacetylase activity
|
880
|
+
|
881
|
+
GO:0004407
|
882
|
+
|
883
|
+
histone deacetylase activity
|
884
|
+
|
885
|
+
GO:0000976
|
886
|
+
|
887
|
+
transcription regulatory region
|
888
|
+
sequence-specific DNA binding
|
889
|
+
|
890
|
+
ISS
|
891
|
+
|
892
|
+
IMP
|
893
|
+
|
894
|
+
IBA
|
895
|
+
|
896
|
+
ISS
|
897
|
+
|
898
|
+
No / Removed
|
899
|
+
|
900
|
+
No / Removed
|
901
|
+
|
902
|
+
No / Removed
|
903
|
+
|
904
|
+
No / Removed
|
905
|
+
|
906
|
+
4. Are the experimental results consistent with the hypothesis ?
|
907
|
+
|
908
|
+
Yes, Figure 4 shows by ChIP assay that the SIN3A is found in the vicinity of the SOCS3
|
909
|
+
gene, a gene regulated by STAT3. The same figure also shows that STAT3 does not
|
910
|
+
interact with the SOCS3 promoter in the presence of SIN3A, indicating a repressor effect
|
911
|
+
of SIN3A on STAT3's transcription activator function. This supports an annotation of
|
912
|
+
SIN3A to:
|
913
|
+
|
914
|
+
- GO:0003714 transcription corepressor activity
|
915
|
+
- GO:0000122 negative regulation of transcription by RNA polymerase II
|
916
|
+
|
917
|
+
Guidance on specific small scale experiments
|
918
|
+
|
919
|
+
Experiments providing evidence for DNA binding transcription factor activity can be found in
|
920
|
+
Tables 3 and 4 of Tripathi 2013 (PMID:27270715).
|
921
|
+
|
922
|
+
Guidance on specific high-throughput experiments
|
923
|
+
|
924
|
+
The strategy outlined above for the annotation of transcription regulators should be applied
|
925
|
+
when curating high-throughput data. The guidelines below are mostly relevant to dbTFs, if the
|
926
|
+
experimental data can be used to capture information about coTFs this will be indicated.
|
927
|
+
|
928
|
+
16
|
929
|
+
|
930
|
+
High-throughput data can be used to capture the ‘direct’ target of a dbTF or a coTF. The direct
|
931
|
+
target may be the genomic coordinates and/or the target gene.
|
932
|
+
|
933
|
+
HT-SELEX experiments (dbTF data)
|
934
|
+
High throughput selection experiments such as the HT-SELEX protocol (PMID:1697402, PMID:
|
935
|
+
2200121) yield data that provides strong experimental evidence of the DNA sequence
|
936
|
+
recognised by a protein. This is then presented as an optimal DNA motif that represents the
|
937
|
+
consensus of many binding observations. Crucially, sequence-specific DNA binding does not
|
938
|
+
necessarily imply transcription regulation function. For instance, PRMT14 has a specific DNA
|
939
|
+
binding activity, but it is involved in meiotic recombination site determination. For annotation to
|
940
|
+
the dbTF Mf term, it is still necessary to have evidence that at least one target gene is
|
941
|
+
regulated. However, often that evidence was already obtained in the course of other published
|
942
|
+
studies. The HT-SELEX experiments, then, provide support for annotation to:
|
943
|
+
|
944
|
+
- GO:0000987 cis-regulatory region sequence-specific DNA binding or the descendent
|
945
|
+
GO:0000978 RNA polymerase II cis-regulatory region sequence-specific DNA binding
|
946
|
+
However, if there is also evidence of regulation of transcription of a gene associated with the
|
947
|
+
regulator region (either by transfection and measuring mRNA or by a reporter gene assay, etc),
|
948
|
+
then HT-SELEX data provides support for the following annotations:
|
949
|
+
|
950
|
+
- GO:0003700 DNA-binding transcription factor activity or a child
|
951
|
+
- GO:0006357 regulation of transcription by RNA polymerase II (or GONEW: regulation of
|
952
|
+
|
953
|
+
mRNA transcription)
|
954
|
+
|
955
|
+
for the TF IDA (HT-SELEX) is one of the strongest experimental evidence types to assign as
|
956
|
+
dbTF. Still, further strengthens this conclusion. Eg: PRMT14 (is a meiosis factor not a TF).
|
957
|
+
Cumulating evidence is important (Oriol Formes email of 18 February).
|
958
|
+
|
959
|
+
ChIP-seq experiments (dbTF or coTF data)
|
960
|
+
|
961
|
+
As ChIP-seq experiments include cross-linking of proteins to other proteins and/or DNA, on their
|
962
|
+
own ChIP-seq experiments are inconclusive. However, if (a) the protein is known to bear a DNA
|
963
|
+
binding domain and (b) there is evidence of regulation of transcription (either by transfection and
|
964
|
+
measuring mRNA or by a reporter gene assay, etc), then the CHIP-seq data provides support
|
965
|
+
for the following annotations:
|
966
|
+
|
967
|
+
- GO:0000987 cis-regulatory region sequence-specific DNA binding or the descendent
|
968
|
+
GO:0000978 RNA polymerase II cis-regulatory region sequence-specific DNA binding
|
969
|
+
|
970
|
+
- GO:0003700 DNA-binding transcription factor activity or a child
|
971
|
+
- GO:0006357 regulation of transcription by RNA polymerase II (or GONEW: regulation of
|
972
|
+
|
973
|
+
mRNA transcription)
|
974
|
+
|
975
|
+
If (c) the protein is known to be a coTF and (b) there is evidence of regulation of transcription
|
976
|
+
(either by transfection and measuring mRNA or by a reporter gene assay, etc), then the
|
977
|
+
CHIP-seq data provides support for the following annotations:
|
978
|
+
- GO:0003712 transcription coregulator activity or a child
|
979
|
+
|
980
|
+
17
|
981
|
+
|
982
|
+
- GO:0006357 regulation of transcription by RNA polymerase II (or GONEW: regulation of
|
983
|
+
|
984
|
+
mRNA transcription)
|
985
|
+
|
986
|
+
Bacterial 1-hybrid experiments
|
987
|
+
|
988
|
+
The bacterial one-hybrid (B1H) system is a method for identifying the sequence-specific target
|
989
|
+
site of a DNA-binding domain. In this system, a given transcription factor (TF) is expressed as a
|
990
|
+
fusion to a subunit of RNA polymerase. In parallel, a library of randomized oligonucleotides
|
991
|
+
representing potential TF target sequences are cloned into a separate vector containing the
|
992
|
+
selectable genes HIS3 and URA3. If the DNA-binding domain (bait) binds a potential DNA target
|
993
|
+
site (prey) in vivo, it will recruit RNA polymerase to the promoter and activate transcription of the
|
994
|
+
reporter genes in that clone (https://en.wikipedia.org/wiki/Bacterial_one-hybrid_system;
|
995
|
+
PMC1435991).
|
996
|
+
|
997
|
+
18
|
998
|
+
|
999
|
+
19
|
1000
|
+
|