aurelian 0.3.2__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (254) hide show
  1. aurelian/__init__.py +9 -0
  2. aurelian/agents/__init__.py +0 -0
  3. aurelian/agents/amigo/__init__.py +3 -0
  4. aurelian/agents/amigo/amigo_agent.py +77 -0
  5. aurelian/agents/amigo/amigo_config.py +85 -0
  6. aurelian/agents/amigo/amigo_evals.py +73 -0
  7. aurelian/agents/amigo/amigo_gradio.py +52 -0
  8. aurelian/agents/amigo/amigo_mcp.py +152 -0
  9. aurelian/agents/amigo/amigo_tools.py +152 -0
  10. aurelian/agents/biblio/__init__.py +42 -0
  11. aurelian/agents/biblio/biblio_agent.py +94 -0
  12. aurelian/agents/biblio/biblio_config.py +40 -0
  13. aurelian/agents/biblio/biblio_gradio.py +67 -0
  14. aurelian/agents/biblio/biblio_mcp.py +115 -0
  15. aurelian/agents/biblio/biblio_tools.py +164 -0
  16. aurelian/agents/biblio_agent.py +46 -0
  17. aurelian/agents/checklist/__init__.py +44 -0
  18. aurelian/agents/checklist/checklist_agent.py +85 -0
  19. aurelian/agents/checklist/checklist_config.py +28 -0
  20. aurelian/agents/checklist/checklist_gradio.py +70 -0
  21. aurelian/agents/checklist/checklist_mcp.py +86 -0
  22. aurelian/agents/checklist/checklist_tools.py +141 -0
  23. aurelian/agents/checklist/content/checklists.yaml +7 -0
  24. aurelian/agents/checklist/content/streams.csv +136 -0
  25. aurelian/agents/checklist_agent.py +40 -0
  26. aurelian/agents/chemistry/__init__.py +3 -0
  27. aurelian/agents/chemistry/chemistry_agent.py +46 -0
  28. aurelian/agents/chemistry/chemistry_config.py +71 -0
  29. aurelian/agents/chemistry/chemistry_evals.py +79 -0
  30. aurelian/agents/chemistry/chemistry_gradio.py +50 -0
  31. aurelian/agents/chemistry/chemistry_mcp.py +120 -0
  32. aurelian/agents/chemistry/chemistry_tools.py +121 -0
  33. aurelian/agents/chemistry/image_agent.py +15 -0
  34. aurelian/agents/d4d/__init__.py +30 -0
  35. aurelian/agents/d4d/d4d_agent.py +72 -0
  36. aurelian/agents/d4d/d4d_config.py +46 -0
  37. aurelian/agents/d4d/d4d_gradio.py +58 -0
  38. aurelian/agents/d4d/d4d_mcp.py +71 -0
  39. aurelian/agents/d4d/d4d_tools.py +157 -0
  40. aurelian/agents/d4d_agent.py +64 -0
  41. aurelian/agents/diagnosis/__init__.py +33 -0
  42. aurelian/agents/diagnosis/diagnosis_agent.py +53 -0
  43. aurelian/agents/diagnosis/diagnosis_config.py +48 -0
  44. aurelian/agents/diagnosis/diagnosis_evals.py +76 -0
  45. aurelian/agents/diagnosis/diagnosis_gradio.py +52 -0
  46. aurelian/agents/diagnosis/diagnosis_mcp.py +141 -0
  47. aurelian/agents/diagnosis/diagnosis_tools.py +204 -0
  48. aurelian/agents/diagnosis_agent.py +28 -0
  49. aurelian/agents/draw/__init__.py +3 -0
  50. aurelian/agents/draw/draw_agent.py +39 -0
  51. aurelian/agents/draw/draw_config.py +26 -0
  52. aurelian/agents/draw/draw_gradio.py +50 -0
  53. aurelian/agents/draw/draw_mcp.py +94 -0
  54. aurelian/agents/draw/draw_tools.py +100 -0
  55. aurelian/agents/draw/judge_agent.py +18 -0
  56. aurelian/agents/filesystem/__init__.py +0 -0
  57. aurelian/agents/filesystem/filesystem_config.py +27 -0
  58. aurelian/agents/filesystem/filesystem_gradio.py +49 -0
  59. aurelian/agents/filesystem/filesystem_mcp.py +89 -0
  60. aurelian/agents/filesystem/filesystem_tools.py +95 -0
  61. aurelian/agents/filesystem/py.typed +0 -0
  62. aurelian/agents/github/__init__.py +0 -0
  63. aurelian/agents/github/github_agent.py +83 -0
  64. aurelian/agents/github/github_cli.py +248 -0
  65. aurelian/agents/github/github_config.py +22 -0
  66. aurelian/agents/github/github_gradio.py +152 -0
  67. aurelian/agents/github/github_mcp.py +252 -0
  68. aurelian/agents/github/github_tools.py +408 -0
  69. aurelian/agents/github/github_tools.py.tmp +413 -0
  70. aurelian/agents/goann/__init__.py +13 -0
  71. aurelian/agents/goann/documents/Transcription_Factors_Annotation_Guidelines.md +1000 -0
  72. aurelian/agents/goann/documents/Transcription_Factors_Annotation_Guidelines.pdf +0 -0
  73. aurelian/agents/goann/documents/Transcription_Factors_Annotation_Guidelines_Paper.md +693 -0
  74. aurelian/agents/goann/documents/Transcription_Factors_Annotation_Guidelines_Paper.pdf +0 -0
  75. aurelian/agents/goann/goann_agent.py +90 -0
  76. aurelian/agents/goann/goann_config.py +90 -0
  77. aurelian/agents/goann/goann_evals.py +104 -0
  78. aurelian/agents/goann/goann_gradio.py +62 -0
  79. aurelian/agents/goann/goann_mcp.py +0 -0
  80. aurelian/agents/goann/goann_tools.py +65 -0
  81. aurelian/agents/gocam/__init__.py +43 -0
  82. aurelian/agents/gocam/documents/DNA-binding transcription factor activity annotation guidelines.docx +0 -0
  83. aurelian/agents/gocam/documents/DNA-binding transcription factor activity annotation guidelines.pdf +0 -0
  84. aurelian/agents/gocam/documents/DNA-binding_transcription_factor_activity_annotation_guidelines.md +100 -0
  85. aurelian/agents/gocam/documents/E3 ubiquitin ligases.docx +0 -0
  86. aurelian/agents/gocam/documents/E3 ubiquitin ligases.pdf +0 -0
  87. aurelian/agents/gocam/documents/E3_ubiquitin_ligases.md +134 -0
  88. aurelian/agents/gocam/documents/GO-CAM annotation guidelines README.docx +0 -0
  89. aurelian/agents/gocam/documents/GO-CAM annotation guidelines README.pdf +0 -0
  90. aurelian/agents/gocam/documents/GO-CAM modelling guidelines TO DO.docx +0 -0
  91. aurelian/agents/gocam/documents/GO-CAM modelling guidelines TO DO.pdf +0 -0
  92. aurelian/agents/gocam/documents/GO-CAM_annotation_guidelines_README.md +1 -0
  93. aurelian/agents/gocam/documents/GO-CAM_modelling_guidelines_TO_DO.md +3 -0
  94. aurelian/agents/gocam/documents/How to annotate complexes in GO-CAM.docx +0 -0
  95. aurelian/agents/gocam/documents/How to annotate complexes in GO-CAM.pdf +0 -0
  96. aurelian/agents/gocam/documents/How to annotate molecular adaptors.docx +0 -0
  97. aurelian/agents/gocam/documents/How to annotate molecular adaptors.pdf +0 -0
  98. aurelian/agents/gocam/documents/How to annotate sequestering proteins.docx +0 -0
  99. aurelian/agents/gocam/documents/How to annotate sequestering proteins.pdf +0 -0
  100. aurelian/agents/gocam/documents/How_to_annotate_complexes_in_GO-CAM.md +29 -0
  101. aurelian/agents/gocam/documents/How_to_annotate_molecular_adaptors.md +31 -0
  102. aurelian/agents/gocam/documents/How_to_annotate_sequestering_proteins.md +42 -0
  103. aurelian/agents/gocam/documents/Molecular adaptor activity.docx +0 -0
  104. aurelian/agents/gocam/documents/Molecular adaptor activity.pdf +0 -0
  105. aurelian/agents/gocam/documents/Molecular carrier activity.docx +0 -0
  106. aurelian/agents/gocam/documents/Molecular carrier activity.pdf +0 -0
  107. aurelian/agents/gocam/documents/Molecular_adaptor_activity.md +51 -0
  108. aurelian/agents/gocam/documents/Molecular_carrier_activity.md +41 -0
  109. aurelian/agents/gocam/documents/Protein sequestering activity.docx +0 -0
  110. aurelian/agents/gocam/documents/Protein sequestering activity.pdf +0 -0
  111. aurelian/agents/gocam/documents/Protein_sequestering_activity.md +50 -0
  112. aurelian/agents/gocam/documents/Signaling receptor activity annotation guidelines.docx +0 -0
  113. aurelian/agents/gocam/documents/Signaling receptor activity annotation guidelines.pdf +0 -0
  114. aurelian/agents/gocam/documents/Signaling_receptor_activity_annotation_guidelines.md +187 -0
  115. aurelian/agents/gocam/documents/Transcription coregulator activity.docx +0 -0
  116. aurelian/agents/gocam/documents/Transcription coregulator activity.pdf +0 -0
  117. aurelian/agents/gocam/documents/Transcription_coregulator_activity.md +36 -0
  118. aurelian/agents/gocam/documents/Transporter activity annotation annotation guidelines.docx +0 -0
  119. aurelian/agents/gocam/documents/Transporter activity annotation annotation guidelines.pdf +0 -0
  120. aurelian/agents/gocam/documents/Transporter_activity_annotation_annotation_guidelines.md +43 -0
  121. Regulatory Processes in GO-CAM.docx +0 -0
  122. Regulatory Processes in GO-CAM.pdf +0 -0
  123. aurelian/agents/gocam/documents/WIP_-_Regulation_and_Regulatory_Processes_in_GO-CAM.md +31 -0
  124. aurelian/agents/gocam/documents/md/DNA-binding_transcription_factor_activity_annotation_guidelines.md +131 -0
  125. aurelian/agents/gocam/documents/md/E3_ubiquitin_ligases.md +166 -0
  126. aurelian/agents/gocam/documents/md/GO-CAM_annotation_guidelines_README.md +1 -0
  127. aurelian/agents/gocam/documents/md/GO-CAM_modelling_guidelines_TO_DO.md +5 -0
  128. aurelian/agents/gocam/documents/md/How_to_annotate_complexes_in_GO-CAM.md +28 -0
  129. aurelian/agents/gocam/documents/md/How_to_annotate_molecular_adaptors.md +19 -0
  130. aurelian/agents/gocam/documents/md/How_to_annotate_sequestering_proteins.md +38 -0
  131. aurelian/agents/gocam/documents/md/Molecular_adaptor_activity.md +52 -0
  132. aurelian/agents/gocam/documents/md/Molecular_carrier_activity.md +59 -0
  133. aurelian/agents/gocam/documents/md/Protein_sequestering_activity.md +52 -0
  134. aurelian/agents/gocam/documents/md/Signaling_receptor_activity_annotation_guidelines.md +271 -0
  135. aurelian/agents/gocam/documents/md/Transcription_coregulator_activity.md +54 -0
  136. aurelian/agents/gocam/documents/md/Transporter_activity_annotation_annotation_guidelines.md +38 -0
  137. aurelian/agents/gocam/documents/md/WIP_-_Regulation_and_Regulatory_Processes_in_GO-CAM.md +39 -0
  138. aurelian/agents/gocam/documents/pandoc_md/Signaling_receptor_activity_annotation_guidelines.md +334 -0
  139. aurelian/agents/gocam/gocam_agent.py +240 -0
  140. aurelian/agents/gocam/gocam_config.py +85 -0
  141. aurelian/agents/gocam/gocam_curator_agent.py +46 -0
  142. aurelian/agents/gocam/gocam_evals.py +67 -0
  143. aurelian/agents/gocam/gocam_gradio.py +89 -0
  144. aurelian/agents/gocam/gocam_mcp.py +224 -0
  145. aurelian/agents/gocam/gocam_tools.py +294 -0
  146. aurelian/agents/linkml/__init__.py +0 -0
  147. aurelian/agents/linkml/linkml_agent.py +62 -0
  148. aurelian/agents/linkml/linkml_config.py +48 -0
  149. aurelian/agents/linkml/linkml_evals.py +66 -0
  150. aurelian/agents/linkml/linkml_gradio.py +45 -0
  151. aurelian/agents/linkml/linkml_mcp.py +186 -0
  152. aurelian/agents/linkml/linkml_tools.py +102 -0
  153. aurelian/agents/literature/__init__.py +3 -0
  154. aurelian/agents/literature/literature_agent.py +55 -0
  155. aurelian/agents/literature/literature_config.py +35 -0
  156. aurelian/agents/literature/literature_gradio.py +52 -0
  157. aurelian/agents/literature/literature_mcp.py +174 -0
  158. aurelian/agents/literature/literature_tools.py +182 -0
  159. aurelian/agents/monarch/__init__.py +25 -0
  160. aurelian/agents/monarch/monarch_agent.py +44 -0
  161. aurelian/agents/monarch/monarch_config.py +45 -0
  162. aurelian/agents/monarch/monarch_gradio.py +51 -0
  163. aurelian/agents/monarch/monarch_mcp.py +65 -0
  164. aurelian/agents/monarch/monarch_tools.py +113 -0
  165. aurelian/agents/oak/__init__.py +0 -0
  166. aurelian/agents/oak/oak_config.py +27 -0
  167. aurelian/agents/oak/oak_gradio.py +57 -0
  168. aurelian/agents/ontology_mapper/__init__.py +31 -0
  169. aurelian/agents/ontology_mapper/ontology_mapper_agent.py +56 -0
  170. aurelian/agents/ontology_mapper/ontology_mapper_config.py +50 -0
  171. aurelian/agents/ontology_mapper/ontology_mapper_evals.py +108 -0
  172. aurelian/agents/ontology_mapper/ontology_mapper_gradio.py +58 -0
  173. aurelian/agents/ontology_mapper/ontology_mapper_mcp.py +81 -0
  174. aurelian/agents/ontology_mapper/ontology_mapper_tools.py +147 -0
  175. aurelian/agents/phenopackets/__init__.py +3 -0
  176. aurelian/agents/phenopackets/phenopackets_agent.py +58 -0
  177. aurelian/agents/phenopackets/phenopackets_config.py +72 -0
  178. aurelian/agents/phenopackets/phenopackets_evals.py +99 -0
  179. aurelian/agents/phenopackets/phenopackets_gradio.py +55 -0
  180. aurelian/agents/phenopackets/phenopackets_mcp.py +178 -0
  181. aurelian/agents/phenopackets/phenopackets_tools.py +127 -0
  182. aurelian/agents/rag/__init__.py +40 -0
  183. aurelian/agents/rag/rag_agent.py +83 -0
  184. aurelian/agents/rag/rag_config.py +80 -0
  185. aurelian/agents/rag/rag_gradio.py +67 -0
  186. aurelian/agents/rag/rag_mcp.py +107 -0
  187. aurelian/agents/rag/rag_tools.py +189 -0
  188. aurelian/agents/rag_agent.py +54 -0
  189. aurelian/agents/robot/__init__.py +0 -0
  190. aurelian/agents/robot/assets/__init__.py +3 -0
  191. aurelian/agents/robot/assets/template.md +384 -0
  192. aurelian/agents/robot/robot_config.py +25 -0
  193. aurelian/agents/robot/robot_gradio.py +46 -0
  194. aurelian/agents/robot/robot_mcp.py +100 -0
  195. aurelian/agents/robot/robot_ontology_agent.py +139 -0
  196. aurelian/agents/robot/robot_tools.py +50 -0
  197. aurelian/agents/talisman/__init__.py +3 -0
  198. aurelian/agents/talisman/talisman_agent.py +126 -0
  199. aurelian/agents/talisman/talisman_config.py +66 -0
  200. aurelian/agents/talisman/talisman_gradio.py +50 -0
  201. aurelian/agents/talisman/talisman_mcp.py +168 -0
  202. aurelian/agents/talisman/talisman_tools.py +720 -0
  203. aurelian/agents/ubergraph/__init__.py +40 -0
  204. aurelian/agents/ubergraph/ubergraph_agent.py +71 -0
  205. aurelian/agents/ubergraph/ubergraph_config.py +79 -0
  206. aurelian/agents/ubergraph/ubergraph_gradio.py +48 -0
  207. aurelian/agents/ubergraph/ubergraph_mcp.py +69 -0
  208. aurelian/agents/ubergraph/ubergraph_tools.py +118 -0
  209. aurelian/agents/uniprot/__init__.py +37 -0
  210. aurelian/agents/uniprot/uniprot_agent.py +43 -0
  211. aurelian/agents/uniprot/uniprot_config.py +43 -0
  212. aurelian/agents/uniprot/uniprot_evals.py +99 -0
  213. aurelian/agents/uniprot/uniprot_gradio.py +48 -0
  214. aurelian/agents/uniprot/uniprot_mcp.py +168 -0
  215. aurelian/agents/uniprot/uniprot_tools.py +136 -0
  216. aurelian/agents/web/__init__.py +0 -0
  217. aurelian/agents/web/web_config.py +27 -0
  218. aurelian/agents/web/web_gradio.py +48 -0
  219. aurelian/agents/web/web_mcp.py +50 -0
  220. aurelian/agents/web/web_tools.py +108 -0
  221. aurelian/chat.py +23 -0
  222. aurelian/cli.py +800 -0
  223. aurelian/dependencies/__init__.py +0 -0
  224. aurelian/dependencies/workdir.py +78 -0
  225. aurelian/mcp/__init__.py +0 -0
  226. aurelian/mcp/amigo_mcp_test.py +86 -0
  227. aurelian/mcp/config_generator.py +123 -0
  228. aurelian/mcp/example_config.json +43 -0
  229. aurelian/mcp/generate_sample_config.py +37 -0
  230. aurelian/mcp/gocam_mcp_test.py +126 -0
  231. aurelian/mcp/linkml_mcp_tools.py +190 -0
  232. aurelian/mcp/mcp_discovery.py +87 -0
  233. aurelian/mcp/mcp_test.py +31 -0
  234. aurelian/mcp/phenopackets_mcp_test.py +103 -0
  235. aurelian/tools/__init__.py +0 -0
  236. aurelian/tools/web/__init__.py +0 -0
  237. aurelian/tools/web/url_download.py +51 -0
  238. aurelian/utils/__init__.py +0 -0
  239. aurelian/utils/async_utils.py +15 -0
  240. aurelian/utils/data_utils.py +32 -0
  241. aurelian/utils/documentation_manager.py +59 -0
  242. aurelian/utils/doi_fetcher.py +238 -0
  243. aurelian/utils/ontology_utils.py +68 -0
  244. aurelian/utils/pdf_fetcher.py +23 -0
  245. aurelian/utils/process_logs.py +100 -0
  246. aurelian/utils/pubmed_utils.py +238 -0
  247. aurelian/utils/pytest_report_to_markdown.py +67 -0
  248. aurelian/utils/robot_ontology_utils.py +112 -0
  249. aurelian/utils/search_utils.py +95 -0
  250. aurelian-0.3.2.dist-info/LICENSE +22 -0
  251. aurelian-0.3.2.dist-info/METADATA +105 -0
  252. aurelian-0.3.2.dist-info/RECORD +254 -0
  253. aurelian-0.3.2.dist-info/WHEEL +4 -0
  254. aurelian-0.3.2.dist-info/entry_points.txt +3 -0
@@ -0,0 +1,1000 @@
1
+ Gene Ontology guidelines for transcription factor
2
+ annotation
3
+
4
+ Authors: Pascale Gaudet, Colin Logie, Ruth Lovering
5
+ Date last updated: 2023-10-24
6
+
7
+ GO has refactored the MF branch representing the activities of proteins involved in transcription.
8
+ In addition to RNA polymerase, we defined three different types of activities involved in
9
+ transcription and its regulation:
10
+
11
+ I. GTFs: General transcription factors, the molecular machine that assembles
12
+ with the RNA polymerase at the promoter to form the pre-initiation complex (PIC).
13
+ II. dbTFs: Specific DNA binding transcription factors that provide genomic
14
+ addresses and specify the cell types and the conditions under which specific
15
+ genes are expressed. Central to dbTF function is their binding to specific DNA
16
+ sequences (often named transcription factor binding sites (TFBS), and
17
+ III. coTFs: Transcription coregulators (also known as transcription cofactors)
18
+ serve multiple functions, such as bridging the GTF and the dbTFs, specifying the
19
+ regulatory effect of DbTFs, modifying the chromatin structure to render it more or
20
+ less accessible for transcription. coTFs normally exert their function independent
21
+ of high affinity binding to specific DNA sequences.
22
+
23
+ The present guidelines aim to help curators apply the revised transcription terms. Please use
24
+ more specific child terms whenever possible.
25
+
26
+ GTFs annotations should include, depending on the evidence available:
27
+
28
+ ● MF
29
+
30
+ ○ GO:0140223 general transcription initiation factor activity
31
+
32
+ ● BP
33
+
34
+ ○ GO:0006351 transcription, DNA-templated
35
+
36
+ ● CC
37
+
38
+ is active in GO:0000785 chromatin
39
+
40
+
41
+ ○ part of GO:0097550 transcriptional preinitiation complex
42
+
43
+ dbTFs annotations may include:
44
+
45
+ ● MF
46
+
47
+ ○ GO:0003700 DNA binding transcription factor activity
48
+
49
+ 1
50
+
51
+ ■ GO:0001228 DNA-binding transcription activator activity, RNA
52
+
53
+ polymerase II-specific
54
+
55
+ ■ GO:0001227 DNA-binding transcription repressor activity, RNA
56
+
57
+ polymerase II-specific
58
+
59
+ ○ GO:0000987 cis-regulatory region sequence-specific DNA binding
60
+
61
+ ■ GO:0000978 RNA polymerase II cis-regulatory region sequence-specific
62
+
63
+ ● BP
64
+
65
+ DNA binding
66
+
67
+ ○ GO:0006355 regulation of transcription, DNA-templated,
68
+
69
+ ■ GO:0045893 positive regulation of transcription, DNA-templated children
70
+ ■ GO:0045892 negative regulation of transcription, DNA-templated children
71
+
72
+ ● CC
73
+
74
+ is active in GO:0000785 chromatin
75
+
76
+
77
+ ○ part of GO:0005667 transcription factor complex
78
+
79
+ coTFs annotations may include:
80
+
81
+ Note that coTFs perform a wide range of functions, the common functions are listed below,
82
+ however, this list is not exhaustive:
83
+
84
+ ● MF
85
+
86
+ ○ GO:0003712 transcription coregulator activity
87
+
88
+ ■ GO:0003713 transcription coactivator activity
89
+ ■ GO:0003714 transcription corepressor activity
90
+
91
+ ○ GO:0140097 catalytic activity, acting on DNA
92
+
93
+ ■ GO:0009008 DNA-methyltransferase activity
94
+ ■ etc
95
+
96
+ ○ GO:0140096 catalytic activity, acting on a protein
97
+ ■ GO:0033558 protein deacetylase activity
98
+
99
+ ● GO:0004407 histone deacetylase activity
100
+
101
+ ○ GO:0030234 enzyme regulator activity
102
+
103
+ ■ GO:0035034 histone acetyltransferase regulator activity
104
+ ■ GO:0035033 histone deacetylase regulator activity
105
+
106
+ ○ GO:0140297 DNA-binding transcription factor binding
107
+
108
+ ● BP
109
+
110
+ ○ GO:0006355 regulation of transcription, DNA-templated
111
+ ○ GO:0031507 heterochromatin assembly
112
+
113
+
114
+
115
+
116
+
117
+
118
+ GO:0140719 constitutive heterochromatin assembly
119
+ GO:0140718 facultative heterochromatin assembly
120
+ GO:0071514 genomic imprinting
121
+ GO:0033696 heterochromatin boundary formation
122
+
123
+ ○ GO:0016570 histone modification
124
+
125
+ 2
126
+
127
+ ■ GO:0031056 regulation of histone modification
128
+
129
+ ○ GO:0006304 DNA modification
130
+
131
+ ■ GO:0006306 DNA methylation
132
+
133
+ ● GO:0044030 regulation of DNA methylation
134
+
135
+ ● CC
136
+
137
+ is active in GO:0000785 chromatin
138
+
139
+
140
+ ○ part of GO:0005667 transcription factor complex
141
+
142
+ Other transcription regulator activities
143
+
144
+ There are also proteins that inhibit dbTFs, for example by sequestering them in the cytoplasm or
145
+ nucleus (‘dbTF-inhibitors’). The difference between coTFs and dbTF regulators is that the latter
146
+ do not act at the genomic location of target regulated gene, whereas coTFs are associated with
147
+ the transcriptional regulatory complex. Therefore, the input of the coTF is the dbTF, not the
148
+ target gene.
149
+ MF
150
+
151
+
152
+
153
+ ○ GO:0140416 DNA-binding transcription factor inhibitor activity
154
+
155
+ ● BP:
156
+
157
+ looking for example for positive regulators
158
+
159
+ ○ GO:0006355 regulation of transcription, DNA-templated
160
+
161
+ ● CC
162
+
163
+ ○ as appropriate
164
+
165
+ Annotation extensions
166
+
167
+ - MF and BP: target gene of a dbTF or a coTF: has_input [dbTF]
168
+ - MF localization: is_active in [GO:cellular component, cell, tissue….]
169
+ - BP localization: occurs_in [GO:cellular component, cell, tissue….]
170
+
171
+ Annotating a transcription regulator from experimental data
172
+
173
+ The following annotation approach follows the strategy that we currently recommend, to ensure
174
+ that curators use all information available, and do not restrict to annotating papers individually
175
+ and out of the more general context. Note that the information does not necessarily need to be
176
+ extracted from a single paper; reviewing a wide range of papers is recommended to ensure
177
+ annotations are as accurate as possible, so that annotations are based on multiple observations
178
+ from different articles and independent research groups.
179
+
180
+ The following four questions provide a checklist to assess whether a gene can be annotated as
181
+ a transcriptional regulator:
182
+
183
+ 3
184
+
185
+ 1. What is the starting hypothesis: are the authors characterizing a transcription
186
+
187
+ regulator? Scientific models are built by adding new data to the existing corpus of
188
+ evidence. New data can either support or contradict existing models. The introduction
189
+ section of research articles can be used to understand what prior knowledge the article
190
+ builds on, and what aspect of the existing model or what new model the authors are
191
+ assessing. The intent of the authors is essential to understand what GO term should be
192
+ chosen, with the caveat that inconsistent terminology has been used in transcription
193
+ research articles and therefore may not always be consistent with the GO term labels.
194
+ Curators must look carefully at the GO term definitions and the placement of the term in
195
+ the ontology to ensure that the meaning of the GO term corresponds to the concept
196
+ being described in the article.
197
+
198
+ 2. Does knowledge from specific protein domains or characterized orthologs
199
+
200
+ support the hypothesis?
201
+ The presence of specific domains and the existence of well characterized orthologs can
202
+ provide useful support for interpreting experimental data. Note that domain information
203
+ and sequence homology data should be used very carefully: not all domains have a
204
+ single function; and only closely related orthologs whose function have been
205
+ unambiguously characterized can be used to support the association of a gene with a
206
+ GO term, if those are consistent with the experimental data presented in the article.
207
+
208
+ a. GTFs: GTFs have been characterized in several organisms, from bacteria, to
209
+ yeast, to mammalian cells (PMID:25693126), and therefore orthology should
210
+ provide strong support for the decision to associate these proteins with a child
211
+ specific for RNA polymerase I, II or III of the MF term "GO:0140223 general
212
+ transcription initiation factor activity". In addition, the naming of GTFs is well
213
+ established across human and model organism nomenclature groups and can be
214
+ used to help guide these decisions. Thus, for human GTFs the HUGO Gene
215
+ Nomenclature Committee (HGNC, www.genenames.org) provide the gene
216
+ symbol TAF#, for TATA-box binding protein associated factors, and GTF2#s and
217
+ GTF3#s, for general transcription factor II and III subunits respectively, although
218
+ a few GTFs have more specific names such as BTAF1: B-TFIID TATA-box
219
+ binding protein associated factor 1.
220
+
221
+ b. dbTFs: Gene products associated with the GO term "GO:0003700 DNA-binding
222
+ transcription factor activity" should have experimental evidence to confirm
223
+ their ability to bind DNA and that this binding regulates the expression of a
224
+ limited set of target genes. In these cases the direct target gene(s) can also be
225
+ included in the annotation using the "has input relation". Proteins that belong to
226
+ families of well characterized transcription factors, such as those that contain
227
+ GATA and homeobox domains and proteins with a one-to-one ortholog already
228
+ demonstrated to be a dbTF, weaker evidence of DNA binding, such as ChIP
229
+ experiments is sufficient. For proteins with domains known to be associated with
230
+
231
+ 4
232
+
233
+ functions other than DNA binding (such as zinc fingers) or proteins with
234
+ enzymatic activity, strong evidence of DNA binding is required.
235
+
236
+ In addition, in some cases, neither the protein nor a member of the protein’s
237
+ family will have been previously associated with the dbTF activity term. In these
238
+ cases, clear experimental evidence of sequence-specific DNA binding and gene
239
+ transcription regulation via cognate DNA motifs located in gene-associated
240
+ cis-regulatory modules will be required for the protein to be classified as a dbTFs.
241
+
242
+ c. coTFs: A coTF is defined as a protein that interacts specifically with a dbTF or a
243
+ coTF at a cis-regulatory region (GO:0003712). This interaction either activates or
244
+ represses the transcription of specific genes, often acting by altering chromatin
245
+ structure and modifications. There are many roles that coregulators can play: for
246
+ example, one class of transcription coregulators modifies chromatin structure
247
+ through covalent modification of histones. A second class of coregulators
248
+ modifies the conformation of chromatin in an ATP-dependent manner. A third
249
+ class modulates interactions of dbTFs with other coTFs.
250
+
251
+ Many coTFs have enzymatic activity and do not bind DNA. For coTFs that do
252
+ bind DNA, many recognize very short, common DNA sequences, not sufficiently
253
+ unique to enable the coTF to regulate the expression of a limited set of genes in
254
+ a discrete environmental or developmental stage (for example AT-hook coTFs).
255
+
256
+ The distinction between a dbTF and a coTF can be difficult to make, so a more
257
+ exhaustive review of the literature, including looking at the characterized
258
+ orthologs, is highly recommended before annotating a coTF.
259
+
260
+ 3. Are there other GO annotations or published experimental results consistent with
261
+
262
+ the hypothesis ?
263
+ Keeping in mind the gene-by-gene/pathway-by-pathway annotation approach, it can be
264
+ valuable to take into account results from other research articles, to make sure results
265
+ and annotations are consistent. Curators should avoid creating annotations that are
266
+ inconsistent with existing annotations, by either choosing a different term for annotation,
267
+ or by reviewing and eventually disputing annotations that appear to be incorrect (see
268
+ next section).
269
+
270
+ 4. Are the experimental results consistent with the hypothesis ?
271
+
272
+ The curator should carefully look at the results presented in the paper and, if those are
273
+ consistent with the hypothesis that this is a transcription regulator, then appropriate GO
274
+ annotations can be made.
275
+
276
+ Note that DNA-binding transcription factors as well as co-regulators often act as activators or
277
+ repressors in different promoters or dependent on the context, so annotation to both activator
278
+
279
+ 5
280
+
281
+ and repressor is not considered inconsistent. This may be further resolved through additional
282
+ context details, e.g. cell type, environmental conditions, etc.
283
+
284
+ Reviewing existing annotations
285
+
286
+ If time allows, it is very useful for the research community and for future annotation that other
287
+ annotations associated with the gene be reviewed. If there are conflicting annotations, then the
288
+ supporting data should be reviewed to see whether the annotations are inconsistent with the
289
+ data, in which case annotations should be fixed.
290
+
291
+ In cases where the primary data is conflicting across different papers (for example a protein is
292
+ sometimes described as a transcription factor, and sometimes as a coregulator), then the
293
+ literature should be reviewed carefully to decide whether:
294
+
295
+ i.
296
+ ii.
297
+
298
+ iii.
299
+
300
+ iv.
301
+
302
+ the annotation is incorrect (bad choice of term, wrong protein annotated)
303
+ knowledge has evolved. If necessary some (usually older) papers should
304
+ be marked as ‘do not curate’ and the associated annotations should be
305
+ removed.
306
+ the protein plays multiple roles under different conditions (ie,
307
+ acts as a DNA-binding transcription factor in certain contexts and as a
308
+ cofactor on others), as these two molecular functions are not mutually
309
+ exclusive.
310
+ no clear activity has been established yet, in which case: either
311
+ annotations to both DNA-binding transcription factor and cofactor may be
312
+ made (if the evidence supports it), or if the data is not sufficient, do not
313
+ annotate.
314
+
315
+ GO-CAM representation of transcription
316
+
317
+ GO-CAM Modeling Guidelines: DNA binding transcription factor activity
318
+
319
+ Potential pitfalls
320
+
321
+ The annotation of dbTFs can be challenging for multiple reasons:
322
+
323
+ ● Some activities are hard to distinguish from each other, and adding to this difficulty,
324
+ transcription regulators form complexes that are more or less fluid, so that it can be
325
+ difficult to detect which protein in a complex is responsible for a specific activity.
326
+ It can be difficult to distinguish certain activities, and some proteins do have multiple
327
+ functions.
328
+
329
+
330
+
331
+ 6
332
+
333
+ ● Researchers use "transcription factor" loosely. It can mean a GTF, a dbTF, or a coTF.
334
+ ● The terms "cofactor", "coactivator", and "corepressor" are also used for different
335
+
336
+ activities: either as described in this document, and sometimes (especially in older
337
+ papers), they are used to describe a dbTF that acts as a dimer.
338
+
339
+ ● The complexity of the transcription process means that no single experiment is usually
340
+ sufficient to define the function of a protein: interpretation of experimental results that
341
+ investigate dbTFs must rely on existing knowledge.
342
+
343
+ ● Many proteins presumed to function as dbTFs have never been experimentally
344
+
345
+ demonstrated to bind DNA, but their role is indirectly inferred by the presence of specific
346
+ domains and in some cases, evidence of an effect on the transcription of putative direct
347
+ target genes.
348
+
349
+ ● The presence of a DNA binding domain in a protein does not always imply the protein
350
+
351
+ functions as a dbTF.
352
+
353
+ ________________________ADDITIONAL INFORMATION_____________________
354
+
355
+ Additional information
356
+
357
+ Ontology structure
358
+
359
+ Molecular Function
360
+
361
+ MF: GO:0140110 transcription regulator activity and children
362
+
363
+ The transcription regulator activity (GO:0140110) MF branch of the GO describes the activities
364
+ of transcription regulators: GTFs, dbTF, and coTFs (Figure 1).
365
+
366
+ 7
367
+
368
+ Figure 1. Transcription regulator activity branch of the Gene Ontology. This part of the
369
+ Molecular Function (MF) ontology describes the activities of transcription regulators: GTFs,
370
+ dbTF, and coTFs. Highlighted in blue are the most specific GO terms available to describe these
371
+ activities in eukaryotic cells (prokaryotes use a single polymerase).
372
+
373
+ MF: GO:0000987 cis-regulatory region sequence-specific DNA binding
374
+
375
+ The transcription regulatory region sequence-specific DNA-binding sub-tree of GO includes
376
+ terms describing specific regulatory regions, such as the core promoter (including the TATA box
377
+ and the transcription start site), cis-regulatory regions (bound by dbTFs), and specific types of
378
+ cis-regulatory motifs (such as E-box and N-box). Overview of the GO structure for DNA binding
379
+ activities is shown in Figure 1.
380
+
381
+ 8
382
+
383
+ Figure 2. DNA binding branch of the Gene Ontology. This part of the Molecular Function
384
+ (MF) ontology describes DNA binding. The key DNA binding terms that should be associated
385
+ with dbTF are highlighted in blue, more specific GO terms are available to provide information
386
+ about the DNA motifs bound by the dbTF.
387
+
388
+ Biological process
389
+
390
+ **NOTE THAT THIS PART OF THE ONTOLOGY IS STILL BEING REVIEWED**
391
+ There are currently two axes of classification: by product (Figure 3) and by the RNA polymerase
392
+ generating the transcript (Figure 4). Ideally both terms should be used; the product generated
393
+ by the transcription event is the most meaningful biologically. We plan to remove the RNA
394
+ polymerase-specific branch of the BP ontology. Having a single branch will ensure that
395
+ consistent and efficient annotation can be achieved.
396
+
397
+ For the 'regulation of transcription' branch ("GO:0006355 regulation of transcription,
398
+ DNA-templated", and children; Figure 5), the 'product' axis does not currently exist. We will
399
+ rename the polymerase-centric terms to the product-specific equivalent as appropriate. If there
400
+ are two polymerases responsible for the same transcript, two terms will be instantiated (for
401
+
402
+ 9
403
+
404
+ example, GO:1905380 regulation of snRNA transcription by RNA polymerase II; NEW term
405
+ regulation of snRNA transcription by RNA polymerase III will both remain).
406
+
407
+ Figure 3. Transcription branch of the Gene Ontology. This part of the BP ontology provides
408
+ terms to describe the role of GTFs in transcription of a specific type of transcript.
409
+
410
+ Figure 4. Transcription branch of the Gene Ontology. This part of the BP terms available to
411
+ describe the role of GTFs in transcription mediated by a specific RNA polymerase.
412
+
413
+ Figure 5. Regulation of transcription branch of the Gene Ontology. This part of the BP
414
+ terms available to describe the role of dbTFs and coTFs in regulating transcription. Highlighted
415
+ in blue are the key GO terms available to describe regulation of transcription in eukaryotic cells
416
+ (prokaryotes use a single polymerase). The more specific child terms for the highlighted terms
417
+ describe the role of transcription in other processes, eg "GO:1900464 negative regulation of
418
+ cellular hyperosmotic salinity response by negative regulation of transcription from RNA
419
+
420
+ 10
421
+
422
+ polymerase II promoter", these terms are being phased out. Instead, aside from its biochemical
423
+ activity in the process of transcription, a transcription factor can be annotated to the biological
424
+ process it receives input from, as a molecular signaling pathway or biomolecular sensor
425
+ endpoint, and provides transcriptional output for through its target genes.
426
+
427
+ Cellular components: Complexes and cellular locations
428
+
429
+ **NOTE THAT THIS PART OF THE ONTOLOGY IS STILL BEING REVIEWED**
430
+
431
+ Annotation examples from research articles
432
+
433
+ Below we describe annotation work flows from a workshop held at EBI in April 2020.
434
+
435
+ This is a concept, perhaps other format is needed, a landscape page may be better suited
436
+
437
+ hypothesis-exampl
438
+ e/steps
439
+
440
+ GTF
441
+
442
+ dbTF
443
+
444
+ coTF
445
+
446
+ Example article
447
+
448
+ PMID:10924514
449
+
450
+ PMID:26314965
451
+
452
+ PMID:22783022
453
+
454
+ Step 1: Hypothesis
455
+
456
+ GTF2H2 is a subunit of
457
+ the TFIIH GTF.
458
+
459
+ NKX6.3 is a
460
+ transcription
461
+ regulator.
462
+
463
+ Step 2: Database
464
+ mining for protein
465
+ domains and
466
+ orthologs
467
+
468
+ GTF2H2 contains a
469
+ TFIIH subunit Ssl1/p44
470
+ domain (IPR012170),
471
+ described in InterPro as
472
+ a component of the
473
+ transcription factor
474
+ TFIIH core.
475
+ UniProt describes
476
+ GTF2H2 as a
477
+ "Component of the
478
+ TFIID-containing RNA
479
+ polymerase II
480
+ pre-initiation complex"
481
+ (https://www.uniprot.org
482
+ /uniprot/Q13888#functi
483
+ on).
484
+
485
+ NKX6.3 contains a
486
+ DNA binding
487
+ homeobox domain
488
+ (IPR020479).
489
+ UniProt describes
490
+ NKX6.3 as a
491
+ "Putative
492
+ transcription factor,
493
+ which may be
494
+ involved in
495
+ patterning of central
496
+ nervous system and
497
+ pancreas."
498
+ (https://www.uniprot.
499
+ org/uniprot/A6NJ46#
500
+ function).
501
+
502
+ SIN3A (Q96ST3) is a
503
+ transcription
504
+ co-repressor.
505
+
506
+ SIN3A contains a
507
+ SIN3A domain
508
+ (IPR037969), among
509
+ others. IPR037969 is
510
+ associated with the
511
+ GO Molecular
512
+ Function GO:0003714
513
+ transcription
514
+ corepressor activity
515
+ (https://www.ebi.ac.uk
516
+ /interpro/entry/InterPr
517
+ o/IPR037969/)
518
+ UniProt describes
519
+ SIN3A as "Acts as a
520
+ transcriptional
521
+ repressor.
522
+ Corepressor for
523
+
524
+ 11
525
+
526
+ Step 3: GO
527
+ annotation and
528
+ literature mining
529
+
530
+ Existing annotations
531
+ were consistent with
532
+ the hypothesis that the
533
+ gene encodes a GTF:
534
+ there are IDA
535
+ annotations to
536
+ GO:0016251 RNA
537
+ polymerase II general
538
+ transcription initiation
539
+ factor activity and
540
+ GO:0008353 RNA
541
+ polymerase II CTD
542
+ heptapeptide repeat
543
+ kinase activity, both
544
+ functions of GTFs
545
+ subunits.
546
+
547
+ Existing annotations
548
+ were consistent with
549
+ the hypothesis that
550
+ the gene encodes a
551
+ dbTF.
552
+ - GO:0003700 (and
553
+ children)
554
+ DNA-binding
555
+ transcription factor
556
+ activity
557
+ IDA, IBA, ISA, ISM
558
+ - GO:0000978
559
+ RNA polymerase II
560
+ cis-regulatory region
561
+ sequence-specific
562
+ DNA binding
563
+ IBA
564
+ - GO:0043565
565
+ sequence-specific
566
+ DNA binding
567
+ IDA, IBA
568
+
569
+ REST."
570
+ (https://www.uniprot.o
571
+ rg/uniprot/Q96ST3#fu
572
+ nction).
573
+
574
+ Existing annotations
575
+ were conflicting. After
576
+ review of the curated
577
+ articles, several
578
+ annotations were
579
+ removed, because
580
+ many of the activities
581
+ were those of proteins
582
+ SIN3A interacts with,
583
+ not its function.
584
+ Searching PubMed
585
+ for SIN3A, and
586
+ filtering for reviews,
587
+ since there is a lot of
588
+ literature describing
589
+ SIN3A, result in many
590
+ articles mentioning
591
+ epigenetics, further
592
+ supporting the coTF
593
+ repressor hypothesis (
594
+ https://pubmed.ncbi.nl
595
+ m.nih.gov/?term=Sin3
596
+ a&filter=pubt.review).
597
+
598
+ Step 4: New data
599
+ extraction from
600
+ publication
601
+
602
+ Figure 4, compares
603
+ wild type GTF2H2 and
604
+ a mutant in abortive
605
+ initiation and promoter
606
+ escape assays. The
607
+ results show that the
608
+ mutant GTF2H2 is
609
+ unable to initiate
610
+ transcription. This data
611
+ supports the annotation
612
+ GO:0016251 RNA
613
+ polymerase II general
614
+
615
+ Figure 5 shows
616
+ increased
617
+ expression of target
618
+ genes containing
619
+ predicted binding
620
+ motif sequences
621
+ (TAAT), which was
622
+ abolished by
623
+ decreasing NKX6.3
624
+ expression with an
625
+ antisense transcript.
626
+ Moreover, ChIP data
627
+ confirmed that
628
+
629
+ Figure 4 shows by
630
+ ChIP assay that the
631
+ SIN3A is found in the
632
+ vicinity of the SOCS3
633
+ gene, a gene
634
+ regulated by STAT3.
635
+ The same figure also
636
+ shows that STAT3
637
+ does not interact with
638
+ the SOCS3 promoter
639
+ in the presence of
640
+ SIN3A, indicating a
641
+ repressor effect of
642
+
643
+ 12
644
+
645
+ transcription initiation
646
+ factor activity
647
+
648
+ SIN3A on STAT3's
649
+ transcription activator
650
+ function. This
651
+ supports an
652
+ annotation of SIN3A
653
+ to:
654
+ - GO:0003714
655
+ transcription
656
+ corepressor activity
657
+ - GO:0000122
658
+ negative regulation of
659
+ transcription by RNA
660
+ polymerase II
661
+
662
+ NKX6.3 is bound to
663
+ regulatory regions of
664
+ the target genes
665
+ (see notes above on
666
+ the use of ChIP data
667
+ to support an
668
+ annotation). The
669
+ dbTF activity
670
+ annotation for
671
+ NKX6.3 can
672
+ therefore include the
673
+ direct target genes
674
+ information using
675
+ the relation "has
676
+ input".
677
+ This supports an
678
+ annotation of
679
+ NKX6.3 to:
680
+ - GO:0000978 RNA
681
+ polymerase II
682
+ cis-regulatory region
683
+ sequence-specific
684
+ DNA binding
685
+ - GO:0001228
686
+ DNA-binding
687
+ transcription
688
+ activator activity,
689
+ RNA polymerase
690
+ II-specific
691
+ - GO:0001227
692
+ DNA-binding
693
+ transcription
694
+ repressor activity,
695
+ RNA polymerase
696
+ II-specific
697
+ - GO:0045944
698
+ positive regulation of
699
+ transcription by RNA
700
+ polymerase II
701
+ GO:0000122
702
+ negative regulation
703
+
704
+ 13
705
+
706
+ of transcription by
707
+ RNA polymerase II
708
+
709
+ Evidence for a GTF
710
+
711
+ PMID:10924514 Studies of the function of the GTF2H2 subunit of the TFIIH GTF
712
+
713
+ 1. The hypothesis is that GTF2H2 is a subunit of the TFIIH GTF.
714
+ 2. Protein domains or characterized orthologs: GTF2H2 contains a TFIIH subunit
715
+
716
+ Ssl1/p44 domain (IPR012170), described in InterPro as a component of the transcription
717
+ factor TFIIH core.
718
+ UniProt describes GTF2H2 as a "Component of the TFIID-containing RNA polymerase II
719
+ pre-initiation complex" (https://www.uniprot.org/uniprot/Q13888#function).
720
+
721
+ 3. Are there other GO annotations or published experimental results consistent with
722
+ the hypothesis ? Yes, existing annotations were consistent with the hypothesis that the
723
+ gene encodes a GTF: there are IDA annotations to GO:0016251 RNA polymerase II
724
+ general transcription initiation factor activity and GO:0008353 RNA polymerase II CTD
725
+ heptapeptide repeat kinase activity, both functions of GTFs subunits.
726
+
727
+ 4. Are the experimental results consistent with the hypothesis ? Yes, GTF
728
+
729
+ GO:0016251 RNA polymerase II general transcription initiation factor activity from Figure
730
+ 4, which compares wild type GTF2H2 and a mutant in abortive initiation and promoter
731
+ escape assays. The results show that the mutant GTF2H2 is unable to initiate
732
+ transcription.
733
+
734
+ Evidence for a dbTF
735
+
736
+ PMID:26314965 studies the activity of the NKX6.3 (UniProt:A6NJ46) transcription factor. Going
737
+ through the checklist shows the following:
738
+
739
+ 1. The hypothesis stated by the authors is that NKX6.3 is a transcription regulator.
740
+ 2. Protein domains or characterized orthologs: NKX6.3 contains a DNA binding
741
+
742
+ homeobox domain (IPR020479). UniProt describes NKX6.3 as a "Putative transcription
743
+ factor, which may be involved in patterning of central nervous system and pancreas."
744
+ (https://www.uniprot.org/uniprot/A6NJ46#function).
745
+
746
+ 3. Are there other GO annotations or published experimental results consistent with
747
+
748
+ the hypothesis ?
749
+ All already existing annotations were consistent with the hypothesis that the gene
750
+ encodes a dbTF.
751
+
752
+ GO ID
753
+
754
+ Term label
755
+
756
+ Evidence(s)
757
+
758
+ Consistent with
759
+ hypothesis?
760
+
761
+ 14
762
+
763
+ GO:0003700
764
+ (and children)
765
+
766
+ DNA-binding transcription factor
767
+ activity
768
+
769
+ IDA, IBA, ISA, ISM
770
+
771
+ Yes
772
+
773
+ GO:0000978
774
+
775
+ RNA polymerase II cis-regulatory
776
+ region sequence-specific DNA binding
777
+
778
+ IBA
779
+
780
+ GO:0043565
781
+
782
+ sequence-specific DNA binding
783
+
784
+ IDA, IBA
785
+
786
+ Yes
787
+
788
+ Yes
789
+
790
+ 4. Are the experimental results consistent with the hypothesis ? Yes, Figure 5 shows
791
+ increased expression of target genes containing predicted binding motif sequences
792
+ (TAAT), which was abolished by decreasing NKX6.3 expression with an antisense
793
+ transcript. Moreover, ChIP data confirmed that NKX6.3 is bound to regulatory regions of
794
+ the target genes (see notes above on the use of ChIP data to support an annotation).
795
+ The dbTF activity annotation for NKX6.3 can therefore include the direct target genes
796
+ information using the relation "has input".
797
+ This supports an annotation of NKX6.3 to:
798
+
799
+ - GO:0000978 RNA polymerase II cis-regulatory region sequence-specific DNA
800
+
801
+ binding
802
+
803
+ - GO:0001228 DNA-binding transcription activator activity, RNA polymerase
804
+
805
+ II-specific
806
+
807
+ - GO:0001227 DNA-binding transcription repressor activity, RNA polymerase
808
+
809
+ II-specific
810
+
811
+ - GO:0045944 positive regulation of transcription by RNA polymerase II
812
+ - GO:0000122 negative regulation of transcription by RNA polymerase II
813
+
814
+ Evidence for a coTF
815
+
816
+ PMID:22783022 studies the activity of the SIN3A transcription co-activator. This paper shows
817
+ that the transcription factor STAT3 is a target of regulation by SIN3A. Going through the
818
+ checklist shows the following:
819
+
820
+ 1. The starting hypothesis stated by the authors is that SIN3A (Q96ST3) is a transcription
821
+
822
+ co-repressor.
823
+
824
+ 2. Protein domains or characterized orthologs:
825
+
826
+ SIN3A contains a SIN3A domain (IPR037969), among others. IPR037969 is associated
827
+ with the GO Molecular Function GO:0003714 transcription corepressor activity
828
+ (https://www.ebi.ac.uk/interpro/entry/InterPro/IPR037969/)
829
+ UniProt describes SIN3A as "Acts as a transcriptional repressor. Corepressor for REST."
830
+ (https://www.uniprot.org/uniprot/Q96ST3#function).
831
+
832
+ 3. Are there other GO annotations or published experimental results consistent with
833
+
834
+ the hypothesis ?
835
+
836
+ a. Existing annotations are conflicting. After review of the original articles, several
837
+
838
+ annotations were removed, because many of the activities were those of proteins
839
+ SIN3A interacts with, not its function.
840
+
841
+ 15
842
+
843
+ b. Searching PubMed for SIN3A, and filtering for reviews, since there is a lot of
844
+ literature describing SIN3A, result in many articles mentioning epigenetics,
845
+ further supporting the coTF repressor hypothesis (
846
+ https://pubmed.ncbi.nlm.nih.gov/?term=Sin3a&filter=pubt.review).
847
+
848
+ GO ID
849
+
850
+ Term label
851
+
852
+ Evidence(s)
853
+
854
+ Consistent with
855
+ hypothesis? /Action if
856
+ not
857
+
858
+ GO:0003714
859
+
860
+ transcription corepressor activity
861
+
862
+ IBA
863
+
864
+ Yes
865
+
866
+ GO:0003700 DNA-binding transcription factor activity
867
+
868
+ IEA (Ensembl)
869
+
870
+ No / Removed
871
+
872
+ GO:0000976
873
+
874
+ transcription regulatory region
875
+ sequence-specific DNA binding
876
+
877
+ GO:0033558
878
+
879
+ protein deacetylase activity
880
+
881
+ GO:0004407
882
+
883
+ histone deacetylase activity
884
+
885
+ GO:0000976
886
+
887
+ transcription regulatory region
888
+ sequence-specific DNA binding
889
+
890
+ ISS
891
+
892
+ IMP
893
+
894
+ IBA
895
+
896
+ ISS
897
+
898
+ No / Removed
899
+
900
+ No / Removed
901
+
902
+ No / Removed
903
+
904
+ No / Removed
905
+
906
+ 4. Are the experimental results consistent with the hypothesis ?
907
+
908
+ Yes, Figure 4 shows by ChIP assay that the SIN3A is found in the vicinity of the SOCS3
909
+ gene, a gene regulated by STAT3. The same figure also shows that STAT3 does not
910
+ interact with the SOCS3 promoter in the presence of SIN3A, indicating a repressor effect
911
+ of SIN3A on STAT3's transcription activator function. This supports an annotation of
912
+ SIN3A to:
913
+
914
+ - GO:0003714 transcription corepressor activity
915
+ - GO:0000122 negative regulation of transcription by RNA polymerase II
916
+
917
+ Guidance on specific small scale experiments
918
+
919
+ Experiments providing evidence for DNA binding transcription factor activity can be found in
920
+ Tables 3 and 4 of Tripathi 2013 (PMID:27270715).
921
+
922
+ Guidance on specific high-throughput experiments
923
+
924
+ The strategy outlined above for the annotation of transcription regulators should be applied
925
+ when curating high-throughput data. The guidelines below are mostly relevant to dbTFs, if the
926
+ experimental data can be used to capture information about coTFs this will be indicated.
927
+
928
+ 16
929
+
930
+ High-throughput data can be used to capture the ‘direct’ target of a dbTF or a coTF. The direct
931
+ target may be the genomic coordinates and/or the target gene.
932
+
933
+ HT-SELEX experiments (dbTF data)
934
+ High throughput selection experiments such as the HT-SELEX protocol (PMID:1697402, PMID:
935
+ 2200121) yield data that provides strong experimental evidence of the DNA sequence
936
+ recognised by a protein. This is then presented as an optimal DNA motif that represents the
937
+ consensus of many binding observations. Crucially, sequence-specific DNA binding does not
938
+ necessarily imply transcription regulation function. For instance, PRMT14 has a specific DNA
939
+ binding activity, but it is involved in meiotic recombination site determination. For annotation to
940
+ the dbTF Mf term, it is still necessary to have evidence that at least one target gene is
941
+ regulated. However, often that evidence was already obtained in the course of other published
942
+ studies. The HT-SELEX experiments, then, provide support for annotation to:
943
+
944
+ - GO:0000987 cis-regulatory region sequence-specific DNA binding or the descendent
945
+ GO:0000978 RNA polymerase II cis-regulatory region sequence-specific DNA binding
946
+ However, if there is also evidence of regulation of transcription of a gene associated with the
947
+ regulator region (either by transfection and measuring mRNA or by a reporter gene assay, etc),
948
+ then HT-SELEX data provides support for the following annotations:
949
+
950
+ - GO:0003700 DNA-binding transcription factor activity or a child
951
+ - GO:0006357 regulation of transcription by RNA polymerase II (or GONEW: regulation of
952
+
953
+ mRNA transcription)
954
+
955
+ for the TF IDA (HT-SELEX) is one of the strongest experimental evidence types to assign as
956
+ dbTF. Still, further strengthens this conclusion. Eg: PRMT14 (is a meiosis factor not a TF).
957
+ Cumulating evidence is important (Oriol Formes email of 18 February).
958
+
959
+ ChIP-seq experiments (dbTF or coTF data)
960
+
961
+ As ChIP-seq experiments include cross-linking of proteins to other proteins and/or DNA, on their
962
+ own ChIP-seq experiments are inconclusive. However, if (a) the protein is known to bear a DNA
963
+ binding domain and (b) there is evidence of regulation of transcription (either by transfection and
964
+ measuring mRNA or by a reporter gene assay, etc), then the CHIP-seq data provides support
965
+ for the following annotations:
966
+
967
+ - GO:0000987 cis-regulatory region sequence-specific DNA binding or the descendent
968
+ GO:0000978 RNA polymerase II cis-regulatory region sequence-specific DNA binding
969
+
970
+ - GO:0003700 DNA-binding transcription factor activity or a child
971
+ - GO:0006357 regulation of transcription by RNA polymerase II (or GONEW: regulation of
972
+
973
+ mRNA transcription)
974
+
975
+ If (c) the protein is known to be a coTF and (b) there is evidence of regulation of transcription
976
+ (either by transfection and measuring mRNA or by a reporter gene assay, etc), then the
977
+ CHIP-seq data provides support for the following annotations:
978
+ - GO:0003712 transcription coregulator activity or a child
979
+
980
+ 17
981
+
982
+ - GO:0006357 regulation of transcription by RNA polymerase II (or GONEW: regulation of
983
+
984
+ mRNA transcription)
985
+
986
+ Bacterial 1-hybrid experiments
987
+
988
+ The bacterial one-hybrid (B1H) system is a method for identifying the sequence-specific target
989
+ site of a DNA-binding domain. In this system, a given transcription factor (TF) is expressed as a
990
+ fusion to a subunit of RNA polymerase. In parallel, a library of randomized oligonucleotides
991
+ representing potential TF target sequences are cloned into a separate vector containing the
992
+ selectable genes HIS3 and URA3. If the DNA-binding domain (bait) binds a potential DNA target
993
+ site (prey) in vivo, it will recruit RNA polymerase to the promoter and activate transcription of the
994
+ reporter genes in that clone (https://en.wikipedia.org/wiki/Bacterial_one-hybrid_system;
995
+ PMC1435991).
996
+
997
+ 18
998
+
999
+ 19
1000
+