rota 0.0.post1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (220) hide show
  1. rota-0.0.post1/.env.example +15 -0
  2. rota-0.0.post1/.github/RELEASE_NOTES_v0.1.1.md +122 -0
  3. rota-0.0.post1/.github/workflows/create-release.yml +38 -0
  4. rota-0.0.post1/.github/workflows/publish-to-pypi.yml +45 -0
  5. rota-0.0.post1/.kiro/specs/cve-neo4j-integration/design.md +490 -0
  6. rota-0.0.post1/.kiro/specs/cve-neo4j-integration/requirements.md +97 -0
  7. rota-0.0.post1/.kiro/specs/cve-neo4j-integration/tasks.md +96 -0
  8. rota-0.0.post1/.kiro/specs/kev-integration/design.md +419 -0
  9. rota-0.0.post1/.kiro/specs/kev-integration/requirements.md +140 -0
  10. rota-0.0.post1/.kiro/specs/kev-integration/tasks.md +392 -0
  11. rota-0.0.post1/.kiro/specs/paper-evaluation-framework/design.md +574 -0
  12. rota-0.0.post1/.kiro/specs/paper-evaluation-framework/requirements.md +130 -0
  13. rota-0.0.post1/.kiro/specs/paper-evaluation-framework/tasks.md +348 -0
  14. rota-0.0.post1/.kiro/specs/refactor-to-rota/design.md +587 -0
  15. rota-0.0.post1/.kiro/specs/refactor-to-rota/requirements.md +233 -0
  16. rota-0.0.post1/.kiro/specs/refactor-to-rota/tasks.md +566 -0
  17. rota-0.0.post1/.kiro/specs/zero-day-prediction-system/design.md +1022 -0
  18. rota-0.0.post1/.kiro/specs/zero-day-prediction-system/requirements.md +137 -0
  19. rota-0.0.post1/.kiro/specs/zero-day-prediction-system/tasks.md +279 -0
  20. rota-0.0.post1/.kiro/steering/product.md +21 -0
  21. rota-0.0.post1/.kiro/steering/research.md +70 -0
  22. rota-0.0.post1/.kiro/steering/structure.md +55 -0
  23. rota-0.0.post1/.kiro/steering/tech.md +50 -0
  24. rota-0.0.post1/.pypirc.example +16 -0
  25. rota-0.0.post1/CHANGELOG.md +90 -0
  26. rota-0.0.post1/LICENSE +21 -0
  27. rota-0.0.post1/MANIFEST.in +17 -0
  28. rota-0.0.post1/PKG-INFO +426 -0
  29. rota-0.0.post1/QUICKSTART.md +332 -0
  30. rota-0.0.post1/README.md +375 -0
  31. rota-0.0.post1/config/cve_config.yaml +42 -0
  32. rota-0.0.post1/config/cve_test_config.yaml +29 -0
  33. rota-0.0.post1/config/epss_config.yaml +17 -0
  34. rota-0.0.post1/config/example_config.yaml +16 -0
  35. rota-0.0.post1/config/exploit_config.yaml +15 -0
  36. rota-0.0.post1/config/github_advisory_config.yaml +30 -0
  37. rota-0.0.post1/dashboard/README.md +66 -0
  38. rota-0.0.post1/dashboard/app.py +270 -0
  39. rota-0.0.post1/docs/README.md +53 -0
  40. rota-0.0.post1/docs/data-roadmap.md +196 -0
  41. rota-0.0.post1/docs/graphiti-comparison.md +137 -0
  42. rota-0.0.post1/docs/guides/release-guide.md +192 -0
  43. rota-0.0.post1/docs/guides/temporal-setup.md +160 -0
  44. rota-0.0.post1/docs/performance-optimization.md +185 -0
  45. rota-0.0.post1/docs/research-directions.md +268 -0
  46. rota-0.0.post1/docs/system-overview.md +357 -0
  47. rota-0.0.post1/docs/usage-guide.md +531 -0
  48. rota-0.0.post1/graph/.env.example +21 -0
  49. rota-0.0.post1/graph/.kiro/steering/agent.md +190 -0
  50. rota-0.0.post1/graph/.kiro/steering/article.md +266 -0
  51. rota-0.0.post1/graph/.kiro/steering/methodology.md +444 -0
  52. rota-0.0.post1/graph/.kiro/steering/product.md +195 -0
  53. rota-0.0.post1/graph/.kiro/steering/structure.md +125 -0
  54. rota-0.0.post1/graph/.kiro/steering/tech.md +172 -0
  55. rota-0.0.post1/graph/ARCHITECTURE.md +322 -0
  56. rota-0.0.post1/graph/QUICKSTART.md +205 -0
  57. rota-0.0.post1/graph/README.md +285 -0
  58. rota-0.0.post1/graph/configs/llm_config.yaml +81 -0
  59. rota-0.0.post1/graph/configs/signal_weights.yaml +52 -0
  60. rota-0.0.post1/graph/docker/Dockerfile.python +41 -0
  61. rota-0.0.post1/graph/docker-compose.yml +56 -0
  62. rota-0.0.post1/graph/requirements.txt +48 -0
  63. rota-0.0.post1/graph/scripts/download_neo4j_benchmark.py +130 -0
  64. rota-0.0.post1/graph/scripts/download_synthcypher.py +179 -0
  65. rota-0.0.post1/graph/scripts/generate_research_proposal.py +331 -0
  66. rota-0.0.post1/graph/scripts/generate_research_proposal_simple.py +267 -0
  67. rota-0.0.post1/graph/scripts/generate_semester_proposal.py +476 -0
  68. rota-0.0.post1/graph/scripts/phase1_collect_data.py +112 -0
  69. rota-0.0.post1/graph/scripts/phase2_extract_signals.py +118 -0
  70. rota-0.0.post1/graph/scripts/phase4_llm_prediction.py +107 -0
  71. rota-0.0.post1/graph/scripts/run_full_pipeline.py +138 -0
  72. rota-0.0.post1/graph/scripts/simple_data_collector.py +80 -0
  73. rota-0.0.post1/graph/scripts/validate_log4shell.py +139 -0
  74. rota-0.0.post1/graph/setup_environment.ps1 +37 -0
  75. rota-0.0.post1/graph/setup_wsl_environment.sh +61 -0
  76. rota-0.0.post1/graph/src/__init__.py +6 -0
  77. rota-0.0.post1/graph/src/api/main.py +209 -0
  78. rota-0.0.post1/graph/src/data_collection/__init__.py +6 -0
  79. rota-0.0.post1/graph/src/data_collection/package_collector.py +142 -0
  80. rota-0.0.post1/graph/src/data_collection/vulnerability_collector.py +166 -0
  81. rota-0.0.post1/graph/src/graph_analysis/__init__.py +5 -0
  82. rota-0.0.post1/graph/src/graph_analysis/dependency_graph.py +223 -0
  83. rota-0.0.post1/graph/src/llm_reasoning/__init__.py +5 -0
  84. rota-0.0.post1/graph/src/llm_reasoning/risk_predictor.py +257 -0
  85. rota-0.0.post1/graph/src/risk_scoring/__init__.py +5 -0
  86. rota-0.0.post1/graph/src/risk_scoring/latent_risk_calculator.py +205 -0
  87. rota-0.0.post1/graph/src/signal_extraction/__init__.py +5 -0
  88. rota-0.0.post1/graph/src/signal_extraction/vulnerability_patterns.py +163 -0
  89. rota-0.0.post1/graph/src/validation/__init__.py +5 -0
  90. rota-0.0.post1/graph/src/validation/historical_validator.py +215 -0
  91. rota-0.0.post1/graph/tests/test_graph_analysis.py +105 -0
  92. rota-0.0.post1/graph/tests/test_signal_extraction.py +93 -0
  93. rota-0.0.post1/graph//354/227/260/352/265/254/352/263/204/355/232/215/354/204/234.md +300 -0
  94. rota-0.0.post1/graph//354/227/260/352/265/254/352/263/204/355/232/215/354/204/234_/355/225/234/355/225/231/352/270/260.md +391 -0
  95. rota-0.0.post1/pyproject.toml +90 -0
  96. rota-0.0.post1/requirements.txt +20 -0
  97. rota-0.0.post1/results/paper/dataset_test/statistics.json +27 -0
  98. rota-0.0.post1/results/paper/dataset_test3/statistics.json +220 -0
  99. rota-0.0.post1/results/paper/validation_real/metrics.json +16 -0
  100. rota-0.0.post1/results/paper/validation_test/metrics.json +16 -0
  101. rota-0.0.post1/scripts/GIT_COMMANDS.ps1 +33 -0
  102. rota-0.0.post1/scripts/GIT_COMMANDS.sh +47 -0
  103. rota-0.0.post1/scripts/README.md +77 -0
  104. rota-0.0.post1/scripts/archive/benchmark_signal_collection.py +180 -0
  105. rota-0.0.post1/scripts/archive/collect_critical_cves.py +140 -0
  106. rota-0.0.post1/scripts/archive/collect_data.py +43 -0
  107. rota-0.0.post1/scripts/archive/collect_opensource_cves.py +209 -0
  108. rota-0.0.post1/scripts/archive/load_cve_with_graphiti.py +285 -0
  109. rota-0.0.post1/scripts/archive/run_historical_validation_mock.py +187 -0
  110. rota-0.0.post1/scripts/archive/test_prediction_concept.py +190 -0
  111. rota-0.0.post1/scripts/archive/tune_threshold.py +148 -0
  112. rota-0.0.post1/scripts/check_neo4j_data.py +67 -0
  113. rota-0.0.post1/scripts/collection/collect_cve_data.py +128 -0
  114. rota-0.0.post1/scripts/collection/collect_epss.py +106 -0
  115. rota-0.0.post1/scripts/collection/collect_exploits.py +124 -0
  116. rota-0.0.post1/scripts/collection/collect_github_advisory.py +118 -0
  117. rota-0.0.post1/scripts/collection/collect_paper_dataset.py +188 -0
  118. rota-0.0.post1/scripts/create_release.ps1 +51 -0
  119. rota-0.0.post1/scripts/create_release.sh +55 -0
  120. rota-0.0.post1/scripts/deployment/publish_to_pypi.py +125 -0
  121. rota-0.0.post1/scripts/experiments/historical_validation.py +218 -0
  122. rota-0.0.post1/scripts/experiments/run_historical_validation.py +139 -0
  123. rota-0.0.post1/scripts/experiments/run_prediction_demo.py +199 -0
  124. rota-0.0.post1/scripts/loading/load_advisory_to_neo4j.py +192 -0
  125. rota-0.0.post1/scripts/loading/load_cve_to_neo4j.py +249 -0
  126. rota-0.0.post1/scripts/loading/load_epss_to_neo4j.py +105 -0
  127. rota-0.0.post1/scripts/loading/load_exploits_to_neo4j.py +147 -0
  128. rota-0.0.post1/setup.cfg +4 -0
  129. rota-0.0.post1/src/rota/__init__.py +17 -0
  130. rota-0.0.post1/src/rota/__main__.py +6 -0
  131. rota-0.0.post1/src/rota/__version__.py +12 -0
  132. rota-0.0.post1/src/rota/_version.py +34 -0
  133. rota-0.0.post1/src/rota/axle/__init__.py +14 -0
  134. rota-0.0.post1/src/rota/cli/__init__.py +14 -0
  135. rota-0.0.post1/src/rota/cli/main.py +457 -0
  136. rota-0.0.post1/src/rota/config.py +116 -0
  137. rota-0.0.post1/src/rota/hub/__init__.py +15 -0
  138. rota-0.0.post1/src/rota/hub/connection.py +72 -0
  139. rota-0.0.post1/src/rota/hub/loader.py +603 -0
  140. rota-0.0.post1/src/rota/hub/query.py +377 -0
  141. rota-0.0.post1/src/rota/hub/supply_chain.py +440 -0
  142. rota-0.0.post1/src/rota/oracle/__init__.py +6 -0
  143. rota-0.0.post1/src/rota/oracle/commit_analyzer.py +443 -0
  144. rota-0.0.post1/src/rota/oracle/integrated_oracle.py +366 -0
  145. rota-0.0.post1/src/rota/oracle/predictor.py +583 -0
  146. rota-0.0.post1/src/rota/oracle/prompts/analysis.jinja2 +42 -0
  147. rota-0.0.post1/src/rota/oracle/prompts/prediction.jinja2 +116 -0
  148. rota-0.0.post1/src/rota/py.typed +1 -0
  149. rota-0.0.post1/src/rota/spokes/__init__.py +30 -0
  150. rota-0.0.post1/src/rota/spokes/base.py +218 -0
  151. rota-0.0.post1/src/rota/spokes/cve.py +251 -0
  152. rota-0.0.post1/src/rota/spokes/cwe.py +159 -0
  153. rota-0.0.post1/src/rota/spokes/epss.py +120 -0
  154. rota-0.0.post1/src/rota/spokes/github.py +323 -0
  155. rota-0.0.post1/src/rota/spokes/kev.py +85 -0
  156. rota-0.0.post1/src/rota/spokes/package.py +382 -0
  157. rota-0.0.post1/src/rota/utils/__init__.py +11 -0
  158. rota-0.0.post1/src/rota/wheel/__init__.py +14 -0
  159. rota-0.0.post1/src/rota.egg-info/PKG-INFO +426 -0
  160. rota-0.0.post1/src/rota.egg-info/SOURCES.txt +218 -0
  161. rota-0.0.post1/src/rota.egg-info/dependency_links.txt +1 -0
  162. rota-0.0.post1/src/rota.egg-info/entry_points.txt +2 -0
  163. rota-0.0.post1/src/rota.egg-info/requires.txt +23 -0
  164. rota-0.0.post1/src/rota.egg-info/top_level.txt +2 -0
  165. rota-0.0.post1/src/zero_day_defense/__init__.py +43 -0
  166. rota-0.0.post1/src/zero_day_defense/cli.py +149 -0
  167. rota-0.0.post1/src/zero_day_defense/config.py +68 -0
  168. rota-0.0.post1/src/zero_day_defense/data_sources/__init__.py +17 -0
  169. rota-0.0.post1/src/zero_day_defense/data_sources/base.py +73 -0
  170. rota-0.0.post1/src/zero_day_defense/data_sources/cve.py +186 -0
  171. rota-0.0.post1/src/zero_day_defense/data_sources/epss.py +75 -0
  172. rota-0.0.post1/src/zero_day_defense/data_sources/exploit_db.py +94 -0
  173. rota-0.0.post1/src/zero_day_defense/data_sources/github.py +124 -0
  174. rota-0.0.post1/src/zero_day_defense/data_sources/github_advisory.py +128 -0
  175. rota-0.0.post1/src/zero_day_defense/data_sources/maven.py +58 -0
  176. rota-0.0.post1/src/zero_day_defense/data_sources/npm.py +42 -0
  177. rota-0.0.post1/src/zero_day_defense/data_sources/pypi.py +48 -0
  178. rota-0.0.post1/src/zero_day_defense/evaluation/__init__.py +18 -0
  179. rota-0.0.post1/src/zero_day_defense/evaluation/ablation/__init__.py +9 -0
  180. rota-0.0.post1/src/zero_day_defense/evaluation/baselines/__init__.py +15 -0
  181. rota-0.0.post1/src/zero_day_defense/evaluation/dataset/__init__.py +11 -0
  182. rota-0.0.post1/src/zero_day_defense/evaluation/dataset/collector.py +400 -0
  183. rota-0.0.post1/src/zero_day_defense/evaluation/dataset/statistics.py +336 -0
  184. rota-0.0.post1/src/zero_day_defense/evaluation/dataset/validator.py +311 -0
  185. rota-0.0.post1/src/zero_day_defense/evaluation/results/__init__.py +13 -0
  186. rota-0.0.post1/src/zero_day_defense/evaluation/statistics/__init__.py +11 -0
  187. rota-0.0.post1/src/zero_day_defense/evaluation/validation/__init__.py +9 -0
  188. rota-0.0.post1/src/zero_day_defense/evaluation/validation/metrics.py +125 -0
  189. rota-0.0.post1/src/zero_day_defense/evaluation/validation/temporal_splitter.py +198 -0
  190. rota-0.0.post1/src/zero_day_defense/pipeline.py +86 -0
  191. rota-0.0.post1/src/zero_day_defense/prediction/__init__.py +27 -0
  192. rota-0.0.post1/src/zero_day_defense/prediction/agents/__init__.py +11 -0
  193. rota-0.0.post1/src/zero_day_defense/prediction/agents/recommendation.py +123 -0
  194. rota-0.0.post1/src/zero_day_defense/prediction/agents/signal_analyzer.py +226 -0
  195. rota-0.0.post1/src/zero_day_defense/prediction/agents/threat_assessment.py +205 -0
  196. rota-0.0.post1/src/zero_day_defense/prediction/engine/__init__.py +9 -0
  197. rota-0.0.post1/src/zero_day_defense/prediction/engine/clusterer.py +272 -0
  198. rota-0.0.post1/src/zero_day_defense/prediction/engine/scorer.py +208 -0
  199. rota-0.0.post1/src/zero_day_defense/prediction/exceptions.py +57 -0
  200. rota-0.0.post1/src/zero_day_defense/prediction/feature_engineering/__init__.py +11 -0
  201. rota-0.0.post1/src/zero_day_defense/prediction/feature_engineering/builder.py +159 -0
  202. rota-0.0.post1/src/zero_day_defense/prediction/feature_engineering/embedder.py +191 -0
  203. rota-0.0.post1/src/zero_day_defense/prediction/feature_engineering/extractor.py +438 -0
  204. rota-0.0.post1/src/zero_day_defense/prediction/models.py +163 -0
  205. rota-0.0.post1/src/zero_day_defense/prediction/signal_collectors/__init__.py +11 -0
  206. rota-0.0.post1/src/zero_day_defense/prediction/signal_collectors/github_signals.py +534 -0
  207. rota-0.0.post1/src/zero_day_defense/prediction/signal_collectors/github_signals_fast.py +373 -0
  208. rota-0.0.post1/src/zero_day_defense/prediction/signal_collectors/package_signals.py +56 -0
  209. rota-0.0.post1/src/zero_day_defense/prediction/signal_collectors/storage.py +172 -0
  210. rota-0.0.post1/src/zero_day_defense/prediction/validation/__init__.py +9 -0
  211. rota-0.0.post1/src/zero_day_defense/prediction/validation/feedback.py +38 -0
  212. rota-0.0.post1/src/zero_day_defense/prediction/validation/validator.py +137 -0
  213. rota-0.0.post1/src/zero_day_defense/py.typed +0 -0
  214. rota-0.0.post1/src/zero_day_defense.egg-info/PKG-INFO +55 -0
  215. rota-0.0.post1/src/zero_day_defense.egg-info/SOURCES.txt +17 -0
  216. rota-0.0.post1/src/zero_day_defense.egg-info/dependency_links.txt +1 -0
  217. rota-0.0.post1/src/zero_day_defense.egg-info/requires.txt +3 -0
  218. rota-0.0.post1/src/zero_day_defense.egg-info/top_level.txt +1 -0
  219. rota-0.0.post1/tests/test_end_to_end.py +277 -0
  220. rota-0.0.post1/tests/test_integrated_oracle.py +147 -0
@@ -0,0 +1,15 @@
1
+ # GitHub API Token
2
+ # Get your token from: https://github.com/settings/tokens
3
+ # Required scopes: repo (for private repos) or public_repo (for public repos only)
4
+ GITHUB_TOKEN=your_github_personal_access_token_here
5
+
6
+ # Gemini API Key
7
+ # Get your key from: https://makersuite.google.com/app/apikey
8
+ GEMINI_API_KEY=your_gemini_api_key_here
9
+ # Or use GOOGLE_API_KEY instead
10
+ # GOOGLE_API_KEY=your_google_api_key_here
11
+
12
+ # Neo4j Configuration (Optional - for data persistence)
13
+ NEO4J_URI=bolt://localhost:7687
14
+ NEO4J_USERNAME=neo4j
15
+ NEO4J_PASSWORD=your_neo4j_password_here
@@ -0,0 +1,122 @@
1
+ # ROTA v0.1.1 - Initial PyPI Release 🎉
2
+
3
+ We're excited to announce the first official release of **ROTA** (Real-time Operational Threat Assessment) on PyPI!
4
+
5
+ ## 🚀 Installation
6
+
7
+ ```bash
8
+ pip install rota
9
+ ```
10
+
11
+ ## ✨ What's New
12
+
13
+ ### Core Features
14
+ - **Real-time Vulnerability Prediction**: AI-powered analysis of code changes
15
+ - **Multi-source Data Collection**: CVE, GitHub, EPSS, Exploit-DB integration
16
+ - **Historical Validation Framework**: Validated on 80+ real CVE cases
17
+ - **CLI Interface**: Easy-to-use command-line tools
18
+ - **Python API**: Programmatic access for automation
19
+
20
+ ### Command-Line Interface
21
+ ```bash
22
+ # Analyze repository risk
23
+ rota predict --repo django/django --commit abc123
24
+
25
+ # Collect security data
26
+ rota collect --source cve --output data.jsonl
27
+
28
+ # Run historical validation
29
+ rota validate --dataset cves.jsonl --output results/
30
+ ```
31
+
32
+ ### Python API
33
+ ```python
34
+ from rota import analyze_code_push
35
+
36
+ result = analyze_code_push("django/django", "abc123")
37
+ print(f"Risk Score: {result['risk_score']}")
38
+ ```
39
+
40
+ ## 📊 Validation Results
41
+
42
+ - **Dataset**: 80 CVEs from Django (2007-2024)
43
+ - **Pilot Study**: 3 CVEs validated with real GitHub data
44
+ - **Average Lead Time**: 90 days before CVE disclosure
45
+ - **Execution Time**: ~22 minutes per CVE
46
+
47
+ ## 🏗️ Architecture
48
+
49
+ - **Spokes**: Multi-source data collectors (CVE, GitHub, EPSS, etc.)
50
+ - **Hub**: Neo4j-based knowledge graph integration
51
+ - **Wheel**: Pattern analysis and clustering
52
+ - **Oracle**: AI-powered prediction engine
53
+ - **Axle**: Historical validation framework
54
+
55
+ ## 📦 What's Included
56
+
57
+ ### Data Sources
58
+ - CVE (NVD)
59
+ - GitHub Advisory
60
+ - EPSS Scores
61
+ - Exploit-DB
62
+ - Package Registries (PyPI, npm, Maven)
63
+
64
+ ### Prediction Components
65
+ - GitHub signal collectors
66
+ - Feature engineering (20+ behavioral features)
67
+ - Risk scoring engine
68
+ - Temporal pattern analysis
69
+
70
+ ### Evaluation Framework
71
+ - Dataset collection automation
72
+ - Historical validation with temporal splitting
73
+ - Performance metrics (Precision, Recall, F1, Lead Time)
74
+ - Baseline comparisons
75
+
76
+ ## 🔧 Requirements
77
+
78
+ - Python 3.10+
79
+ - GitHub API token (for full functionality)
80
+ - Optional: Neo4j for graph analysis
81
+
82
+ ## 📚 Documentation
83
+
84
+ - [Quick Start Guide](https://github.com/susie-Choi/rota/blob/main/HOW_TO_PUBLISH.md)
85
+ - [API Documentation](https://github.com/susie-Choi/rota/blob/main/docs/)
86
+ - [Paper Evaluation Framework](https://github.com/susie-Choi/rota/blob/main/docs/PAPER_FRAMEWORK_SUMMARY.md)
87
+
88
+ ## 🐛 Known Issues
89
+
90
+ - Historical validation can be slow (~22 min/CVE) due to GitHub API rate limits
91
+ - Currently focused on GitHub repositories
92
+ - Limited to English language repositories
93
+
94
+ ## 🔮 Future Plans
95
+
96
+ - GraphQL API integration for better performance
97
+ - Support for additional version control systems
98
+ - Enhanced machine learning models
99
+ - Real-time dashboard improvements
100
+ - Enterprise features
101
+
102
+ ## 🙏 Acknowledgments
103
+
104
+ This project is part of ongoing research on LLM-based pre-signal analysis for predicting potential vulnerabilities in software ecosystems.
105
+
106
+ ## 📝 Changelog
107
+
108
+ See [CHANGELOG.md](https://github.com/susie-Choi/rota/blob/main/CHANGELOG.md) for detailed changes.
109
+
110
+ ## 🤝 Contributing
111
+
112
+ Contributions are welcome! Please feel free to submit issues and pull requests.
113
+
114
+ ## 📄 License
115
+
116
+ MIT License - see [LICENSE](https://github.com/susie-Choi/rota/blob/main/LICENSE) for details.
117
+
118
+ ---
119
+
120
+ **Install now**: `pip install rota`
121
+
122
+ **Star us on GitHub**: https://github.com/susie-Choi/rota ⭐
@@ -0,0 +1,38 @@
1
+ name: Create Release
2
+
3
+ on:
4
+ push:
5
+ tags:
6
+ - 'v*' # Trigger on version tags like v0.1.3
7
+
8
+ jobs:
9
+ create-release:
10
+ runs-on: ubuntu-latest
11
+ permissions:
12
+ contents: write
13
+
14
+ steps:
15
+ - uses: actions/checkout@v4
16
+ with:
17
+ fetch-depth: 0
18
+
19
+ - name: Extract version from tag
20
+ id: get_version
21
+ run: echo "VERSION=${GITHUB_REF#refs/tags/v}" >> $GITHUB_OUTPUT
22
+
23
+ - name: Extract changelog
24
+ id: changelog
25
+ run: |
26
+ # Extract changelog for this version
27
+ VERSION=${{ steps.get_version.outputs.VERSION }}
28
+ sed -n "/## \[$VERSION\]/,/## \[/p" CHANGELOG.md | sed '$d' > release_notes.md
29
+
30
+ - name: Create Release
31
+ uses: softprops/action-gh-release@v1
32
+ with:
33
+ name: v${{ steps.get_version.outputs.VERSION }}
34
+ body_path: release_notes.md
35
+ draft: false
36
+ prerelease: false
37
+ env:
38
+ GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
@@ -0,0 +1,45 @@
1
+ name: Publish to PyPI
2
+
3
+ on:
4
+ push:
5
+ branches:
6
+ - main
7
+ paths:
8
+ - 'pyproject.toml'
9
+ - 'src/**'
10
+ - '.github/workflows/publish-to-pypi.yml'
11
+ workflow_dispatch: # Manual trigger option
12
+ release:
13
+ types: [published]
14
+
15
+ jobs:
16
+ build-and-publish:
17
+ runs-on: ubuntu-latest
18
+
19
+ steps:
20
+ - uses: actions/checkout@v4
21
+
22
+ - name: Set up Python
23
+ uses: actions/setup-python@v4
24
+ with:
25
+ python-version: '3.10'
26
+
27
+ - name: Install dependencies
28
+ run: |
29
+ python -m pip install --upgrade pip
30
+ pip install build twine
31
+
32
+ - name: Clean dist directory
33
+ run: rm -rf dist/ build/ *.egg-info
34
+
35
+ - name: Build package
36
+ run: python -m build
37
+
38
+ - name: Check package
39
+ run: python -m twine check dist/*
40
+
41
+ - name: Publish to PyPI
42
+ env:
43
+ TWINE_USERNAME: __token__
44
+ TWINE_PASSWORD: ${{ secrets.PYPI_API_TOKEN }}
45
+ run: python -m twine upload dist/*
@@ -0,0 +1,490 @@
1
+ # Design Document
2
+
3
+ ## Overview
4
+
5
+ CVE-Neo4j 통합 기능은 NVD (National Vulnerability Database)에서 CVE 데이터를 수집하고, 이를 Neo4j 그래프 데이터베이스에 로드하여 취약점 간의 관계를 분석할 수 있도록 합니다. 이 시스템은 기존 Zero-Day Defense 데이터 파이프라인 아키텍처를 확장하며, Palantir Ontology와 유사한 그래프 기반 분석 경험을 제공합니다.
6
+
7
+ ## Architecture
8
+
9
+ ### High-Level Architecture
10
+
11
+ ```mermaid
12
+ graph LR
13
+ A[NVD API] -->|HTTP Request| B[CVEDataSource]
14
+ B -->|SourceResult| C[JSONL Storage]
15
+ C -->|Read| D[CVEGraphLoader]
16
+ D -->|Cypher Queries| E[Neo4j Database]
17
+ F[YAML Config] -->|Configuration| B
18
+ F -->|Configuration| D
19
+ ```
20
+
21
+ ### Component Layers
22
+
23
+ 1. **Data Collection Layer**: NVD API와 통신하여 CVE 데이터 수집
24
+ 2. **Storage Layer**: JSONL 형식으로 중간 데이터 저장
25
+ 3. **Graph Loading Layer**: Neo4j에 그래프 구조로 데이터 변환 및 로드
26
+ 4. **Configuration Layer**: YAML 기반 설정 관리
27
+
28
+ ## Components and Interfaces
29
+
30
+ ### 1. CVEDataSource
31
+
32
+ **Purpose**: NVD API 2.0을 통해 CVE 데이터를 수집하는 데이터 소스
33
+
34
+ **Class Structure**:
35
+ ```python
36
+ class CVEDataSource(BaseDataSource):
37
+ source_name: str = "nvd_cve"
38
+ BASE_URL: str = "https://services.nvd.nist.gov/rest/json/cves/2.0"
39
+
40
+ def __init__(
41
+ self,
42
+ *,
43
+ timeout: float = 30.0,
44
+ rate_limit_sleep: float = 6.0,
45
+ api_key: Optional[str] = None,
46
+ **kwargs
47
+ )
48
+
49
+ def collect_by_cve_id(self, cve_id: str, *, cutoff: datetime) -> SourceResult
50
+ def collect_by_keyword(self, keyword: str, *, cutoff: datetime, max_results: int = 100) -> SourceResult
51
+ def collect_by_cpe(self, cpe_name: str, *, cutoff: datetime, max_results: int = 100) -> SourceResult
52
+ def collect(self, package: str, *, cutoff: datetime) -> SourceResult
53
+ ```
54
+
55
+ **Key Features**:
56
+ - BaseDataSource 상속으로 기존 아키텍처와 통합
57
+ - 세 가지 수집 방식 지원: CVE ID, 키워드, CPE
58
+ - API 키 유무에 따른 동적 rate limit 조정 (6초 → 0.6초)
59
+ - cutoff_date 기반 시간적 필터링
60
+
61
+ **Rate Limiting Strategy**:
62
+ - API 키 없음: 6초 대기 (NVD 공개 rate limit)
63
+ - API 키 있음: 0.6초 대기 (50 requests per 30 seconds)
64
+
65
+ ### 2. CVEGraphLoader
66
+
67
+ **Purpose**: JSONL 파일에서 CVE 데이터를 읽어 Neo4j 그래프로 변환
68
+
69
+ **Class Structure**:
70
+ ```python
71
+ class CVEGraphLoader:
72
+ def __init__(self, uri: str, username: str, password: str)
73
+ def close(self)
74
+ def create_constraints(self)
75
+ def load_cve(self, cve_data: Dict[str, Any]) -> None
76
+ def load_from_jsonl(self, jsonl_path: Path) -> None
77
+ ```
78
+
79
+ **Key Features**:
80
+ - Neo4j Python driver 사용
81
+ - Uniqueness constraints를 통한 중복 방지 및 성능 최적화
82
+ - MERGE 패턴으로 멱등성(idempotency) 보장
83
+ - 배치 처리 및 진행 상황 로깅
84
+
85
+ ### 3. Command-Line Scripts
86
+
87
+ #### collect_cve_data.py
88
+
89
+ **Purpose**: CVE 데이터 수집 실행
90
+
91
+ **Arguments**:
92
+ - `config`: YAML 설정 파일 경로 (필수)
93
+ - `--log-level`: 로깅 레벨 (기본: INFO)
94
+ - `--output`: 출력 파일 경로 오버라이드 (선택)
95
+
96
+ **Workflow**:
97
+ 1. YAML 설정 로드
98
+ 2. NVD API 키 확인 (config 또는 환경변수)
99
+ 3. CVEDataSource 초기화
100
+ 4. 각 CVE 타겟에 대해 데이터 수집
101
+ 5. JSONL 형식으로 저장
102
+ 6. 수집 통계 출력
103
+
104
+ #### load_cve_to_neo4j.py
105
+
106
+ **Purpose**: JSONL 데이터를 Neo4j에 로드
107
+
108
+ **Arguments**:
109
+ - `jsonl_file`: JSONL 파일 경로 (필수)
110
+ - `--uri`: Neo4j URI (기본: bolt://localhost:7687)
111
+ - `--username`: Neo4j 사용자명 (기본: neo4j)
112
+ - `--password`: Neo4j 비밀번호 (필수)
113
+ - `--log-level`: 로깅 레벨 (기본: INFO)
114
+
115
+ **Workflow**:
116
+ 1. Neo4j 연결 설정
117
+ 2. Uniqueness constraints 생성
118
+ 3. JSONL 파일 읽기
119
+ 4. 각 CVE에 대해 그래프 노드 및 관계 생성
120
+ 5. 로드 통계 출력
121
+
122
+ ## Data Models
123
+
124
+ ### Neo4j Graph Schema
125
+
126
+ #### Node Types
127
+
128
+ **CVE Node**:
129
+ ```cypher
130
+ (:CVE {
131
+ id: String, // CVE-YYYY-NNNNN
132
+ sourceIdentifier: String, // 출처 (e.g., cve@mitre.org)
133
+ published: DateTime, // 공개 날짜
134
+ lastModified: DateTime, // 최종 수정 날짜
135
+ vulnStatus: String, // 상태 (e.g., "Analyzed")
136
+ description: String, // 영문 설명
137
+ cvssVersion: String, // CVSS 버전 (e.g., "3.1")
138
+ cvssScore: Float, // CVSS 점수 (0.0-10.0)
139
+ cvssSeverity: String, // 심각도 (LOW, MEDIUM, HIGH, CRITICAL)
140
+ cvssVector: String // CVSS 벡터 문자열
141
+ })
142
+ ```
143
+
144
+ **CPE Node** (Common Platform Enumeration):
145
+ ```cypher
146
+ (:CPE {
147
+ uri: String, // CPE URI (unique)
148
+ version: String, // 버전 문자열
149
+ versionStartIncluding: String, // 영향받는 버전 시작 (포함)
150
+ versionEndExcluding: String // 영향받는 버전 끝 (제외)
151
+ })
152
+ ```
153
+
154
+ **CWE Node** (Common Weakness Enumeration):
155
+ ```cypher
156
+ (:CWE {
157
+ id: String // CWE-NNN (unique)
158
+ })
159
+ ```
160
+
161
+ **Vendor Node**:
162
+ ```cypher
163
+ (:Vendor {
164
+ name: String // 벤더명 (unique)
165
+ })
166
+ ```
167
+
168
+ **Product Node**:
169
+ ```cypher
170
+ (:Product {
171
+ vendor: String, // 벤더명
172
+ name: String // 제품명
173
+ // (vendor, name) composite unique
174
+ })
175
+ ```
176
+
177
+ **Reference Node**:
178
+ ```cypher
179
+ (:Reference {
180
+ url: String, // URL (unique)
181
+ source: String // 출처
182
+ })
183
+ ```
184
+
185
+ #### Relationship Types
186
+
187
+ ```mermaid
188
+ graph TD
189
+ CVE[CVE Node]
190
+ CPE[CPE Node]
191
+ CWE[CWE Node]
192
+ Vendor[Vendor Node]
193
+ Product[Product Node]
194
+ Ref[Reference Node]
195
+
196
+ CVE -->|AFFECTS| CPE
197
+ CVE -->|HAS_WEAKNESS| CWE
198
+ CVE -->|HAS_REFERENCE| Ref
199
+ Vendor -->|PRODUCES| Product
200
+ Product -->|HAS_VERSION| CPE
201
+ ```
202
+
203
+ **Relationship Descriptions**:
204
+ - `(:CVE)-[:AFFECTS]->(:CPE)`: CVE가 특정 CPE(제품 버전)에 영향을 미침
205
+ - `(:CVE)-[:HAS_WEAKNESS]->(:CWE)`: CVE가 특정 CWE 약점 유형과 연관됨
206
+ - `(:CVE)-[:HAS_REFERENCE]->(:Reference)`: CVE가 참조 링크를 가짐
207
+ - `(:Vendor)-[:PRODUCES]->(:Product)`: 벤더가 제품을 생산함
208
+ - `(:Product)-[:HAS_VERSION]->(:CPE)`: 제품이 특정 버전(CPE)을 가짐
209
+
210
+ ### JSONL Data Format
211
+
212
+ 각 레코드는 다음 구조를 따릅니다:
213
+
214
+ ```json
215
+ {
216
+ "source": "nvd_cve",
217
+ "package": "CVE-2021-44228",
218
+ "collected_at": "2025-10-15T12:34:56.789012",
219
+ "payload": {
220
+ "vulnerabilities": [
221
+ {
222
+ "cve": {
223
+ "id": "CVE-2021-44228",
224
+ "sourceIdentifier": "cve@mitre.org",
225
+ "published": "2021-12-10T10:15:09.000",
226
+ "lastModified": "2021-12-14T01:15:00.000",
227
+ "vulnStatus": "Analyzed",
228
+ "descriptions": [...],
229
+ "metrics": {...},
230
+ "weaknesses": [...],
231
+ "configurations": [...],
232
+ "references": [...]
233
+ }
234
+ }
235
+ ],
236
+ "total_results": 1
237
+ },
238
+ "metadata": {
239
+ "description": "Log4Shell - Apache Log4j2 RCE",
240
+ "target_id": "CVE-2021-44228"
241
+ }
242
+ }
243
+ ```
244
+
245
+ ### Configuration Schema
246
+
247
+ **cve_config.yaml**:
248
+ ```yaml
249
+ cutoff_date: "2021-12-31T23:59:59"
250
+ output_dir: "data/raw"
251
+ request_timeout: 30
252
+ rate_limit_sleep: 6.0
253
+ nvd_api_key: "optional-api-key" # 또는 환경변수 NVD_API_KEY
254
+
255
+ cve_targets:
256
+ - id: "CVE-2021-44228"
257
+ description: "Log4Shell"
258
+ - id: "keyword:log4j"
259
+ description: "All Log4j CVEs"
260
+ - id: "cpe:2.3:a:apache:log4j:2.14.1:*:*:*:*:*:*:*"
261
+ description: "Log4j 2.14.1 specific"
262
+ ```
263
+
264
+ ## Error Handling
265
+
266
+ ### Data Collection Errors
267
+
268
+ **Strategy**: Fail-safe with logging
269
+
270
+ 1. **Network Errors**:
271
+ - BaseDataSource의 _request 메서드가 처리
272
+ - 429 (Rate Limit) 발생 시 자동 재시도
273
+ - 기타 HTTP 에러는 DataSourceError 발생
274
+
275
+ 2. **CVE Not Found**:
276
+ - DataSourceError 발생
277
+ - 로그에 기록하고 다음 타겟으로 진행
278
+
279
+ 3. **Invalid Response**:
280
+ - JSON 파싱 실패 시 예외 발생
281
+ - 로그에 기록하고 다음 타겟으로 진행
282
+
283
+ 4. **Timeout**:
284
+ - requests timeout 설정 (기본 30초)
285
+ - 타임아웃 발생 시 예외 발생 및 로깅
286
+
287
+ **Error Logging**:
288
+ ```python
289
+ try:
290
+ result = cve_source.collect(cve_id, cutoff=cutoff_date)
291
+ collected_count += 1
292
+ except Exception as e:
293
+ logger.error(f"Error collecting {cve_id}: {e}")
294
+ error_count += 1
295
+ continue
296
+ ```
297
+
298
+ ### Neo4j Loading Errors
299
+
300
+ **Strategy**: Transaction-based with constraint handling
301
+
302
+ 1. **Connection Errors**:
303
+ - Neo4j driver가 자동으로 재연결 시도
304
+ - 연결 실패 시 명확한 에러 메시지 출력
305
+
306
+ 2. **Constraint Violations**:
307
+ - MERGE 사용으로 중복 방지
308
+ - Constraint 생성 실패는 경고로 처리 (이미 존재할 수 있음)
309
+
310
+ 3. **Invalid Data**:
311
+ - CVE ID 없는 레코드는 스킵
312
+ - 로그에 경고 기록
313
+
314
+ 4. **Cypher Query Errors**:
315
+ - 각 CVE 로드를 try-except로 감싸서 격리
316
+ - 하나의 CVE 실패가 전체 프로세스를 중단하지 않음
317
+
318
+ **Error Logging**:
319
+ ```python
320
+ try:
321
+ self.load_cve(cve_data)
322
+ count += 1
323
+ except Exception as e:
324
+ logger.error(f"Error loading CVE: {e}")
325
+ ```
326
+
327
+ ## Testing Strategy
328
+
329
+ ### Unit Testing
330
+
331
+ **CVEDataSource Tests**:
332
+ - Mock NVD API 응답을 사용한 각 수집 메서드 테스트
333
+ - cutoff_date 필터링 로직 검증
334
+ - Rate limit 처리 검증
335
+ - API 키 유무에 따른 동작 차이 검증
336
+
337
+ **CVEGraphLoader Tests**:
338
+ - Mock Neo4j driver를 사용한 로드 로직 테스트
339
+ - CPE 파싱 로직 검증
340
+ - Cypher 쿼리 생성 검증
341
+ - 중복 데이터 처리 검증
342
+
343
+ ### Integration Testing
344
+
345
+ **End-to-End Data Flow**:
346
+ 1. 테스트용 CVE 데이터 수집 (실제 NVD API 또는 mock)
347
+ 2. JSONL 파일 생성 검증
348
+ 3. Neo4j 테스트 인스턴스에 로드
349
+ 4. Cypher 쿼리로 데이터 무결성 검증
350
+
351
+ **Sample Verification Queries**:
352
+ ```cypher
353
+ // CVE 노드 수 확인
354
+ MATCH (c:CVE) RETURN count(c)
355
+
356
+ // Log4Shell 관계 확인
357
+ MATCH (c:CVE {id: 'CVE-2021-44228'})-[r]->(n)
358
+ RETURN type(r), labels(n), n LIMIT 10
359
+
360
+ // 특정 벤더의 취약점 수
361
+ MATCH (v:Vendor {name: 'apache'})-[:PRODUCES]->(p:Product)
362
+ <-[:HAS_VERSION]-(cpe:CPE)<-[:AFFECTS]-(c:CVE)
363
+ RETURN p.name, count(DISTINCT c) as vuln_count
364
+ ORDER BY vuln_count DESC
365
+ ```
366
+
367
+ ### Manual Testing
368
+
369
+ **Test Scenarios**:
370
+ 1. 소규모 CVE 세트로 전체 파이프라인 실행
371
+ 2. Neo4j Browser에서 그래프 시각화 확인
372
+ 3. 다양한 Cypher 쿼리로 관계 탐색
373
+ 4. Rate limit 테스트 (API 키 있음/없음)
374
+ 5. 에러 시나리오 테스트 (잘못된 CVE ID, 네트워크 오류 등)
375
+
376
+ ## Performance Considerations
377
+
378
+ ### Data Collection
379
+
380
+ **Bottleneck**: NVD API rate limits
381
+ - API 키 없음: ~10 requests/minute
382
+ - API 키 있음: ~50 requests/30 seconds
383
+
384
+ **Optimization**:
385
+ - 병렬 처리는 rate limit 때문에 효과 제한적
386
+ - 대신 배치 수집 및 재개 기능 고려 (향후 개선)
387
+
388
+ ### Neo4j Loading
389
+
390
+ **Bottleneck**: Cypher 쿼리 실행 시간
391
+
392
+ **Optimization**:
393
+ 1. Uniqueness constraints로 인덱스 자동 생성
394
+ 2. MERGE 사용으로 중복 체크 최적화
395
+ 3. 배치 크기 조정 가능 (현재는 개별 트랜잭션)
396
+
397
+ **Future Improvements**:
398
+ - Batch insert using UNWIND
399
+ - Transaction batching (예: 100개 CVE당 1 트랜잭션)
400
+ - Parallel loading with connection pooling
401
+
402
+ ## Security Considerations
403
+
404
+ 1. **API Key Management**:
405
+ - 환경변수 사용 권장
406
+ - 설정 파일에 저장 시 .gitignore 추가 필요
407
+
408
+ 2. **Neo4j Credentials**:
409
+ - 명령줄 인자로만 전달 (설정 파일에 저장 금지)
410
+ - 환경변수 사용 고려
411
+
412
+ 3. **Input Validation**:
413
+ - CVE ID 형식 검증 (CVE-YYYY-NNNNN)
414
+ - URL 및 문자열 이스케이핑 (Neo4j driver가 자동 처리)
415
+
416
+ 4. **Rate Limiting**:
417
+ - NVD API 남용 방지를 위한 적절한 대기 시간 설정
418
+
419
+ ## Deployment Considerations
420
+
421
+ ### Prerequisites
422
+
423
+ 1. **Python Environment**:
424
+ - Python 3.10+
425
+ - Dependencies: pyyaml, requests, tqdm, neo4j
426
+
427
+ 2. **Neo4j Installation**:
428
+ - Neo4j Desktop 또는 Docker
429
+ - 최소 버전: 4.x+
430
+ - 권장: 5.x+ (constraint 문법 지원)
431
+
432
+ 3. **NVD API Key** (선택):
433
+ - https://nvd.nist.gov/developers/request-an-api-key
434
+
435
+ ### Installation Steps
436
+
437
+ ```bash
438
+ # 1. 의존성 설치
439
+ pip install -r requirements.txt
440
+
441
+ # 2. Neo4j 설치 (Docker 예시)
442
+ docker run -d \
443
+ --name neo4j \
444
+ -p 7474:7474 -p 7687:7687 \
445
+ -e NEO4J_AUTH=neo4j/password \
446
+ neo4j:latest
447
+
448
+ # 3. CVE 데이터 수집
449
+ python scripts/collect_cve_data.py config/cve_config.yaml
450
+
451
+ # 4. Neo4j에 로드
452
+ python scripts/load_cve_to_neo4j.py data/raw/cve_data.jsonl --password password
453
+ ```
454
+
455
+ ### Monitoring
456
+
457
+ **Logging**:
458
+ - 모든 스크립트는 Python logging 사용
459
+ - 로그 레벨 조정 가능 (--log-level)
460
+ - 수집/로드 통계 자동 출력
461
+
462
+ **Metrics to Track**:
463
+ - 수집된 CVE 수
464
+ - 수집 실패 수
465
+ - Neo4j 로드 시간
466
+ - 생성된 노드/관계 수
467
+
468
+ ## Future Enhancements
469
+
470
+ 1. **Incremental Updates**:
471
+ - 이미 수집된 CVE 스킵
472
+ - lastModified 기반 업데이트 감지
473
+
474
+ 2. **Advanced Graph Queries**:
475
+ - 취약점 전파 경로 분석
476
+ - 유사 CVE 클러스터링
477
+ - 시간에 따른 취약점 트렌드
478
+
479
+ 3. **Integration with Existing Pipeline**:
480
+ - 패키지 데이터와 CVE 연결
481
+ - GitHub 이슈/PR과 CVE 매핑
482
+
483
+ 4. **Visualization**:
484
+ - Neo4j Bloom 통합
485
+ - 커스텀 대시보드 개발
486
+
487
+ 5. **Performance**:
488
+ - 비동기 데이터 수집
489
+ - 배치 Neo4j 로딩
490
+ - 캐싱 레이어 추가