EvoScientist 0.1.0rc1__py3-none-any.whl → 0.1.0rc2__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (108) hide show
  1. EvoScientist/EvoScientist.py +1 -1
  2. EvoScientist/cli.py +450 -178
  3. EvoScientist/middleware.py +5 -1
  4. EvoScientist/skills/accelerate/SKILL.md +332 -0
  5. EvoScientist/skills/accelerate/references/custom-plugins.md +453 -0
  6. EvoScientist/skills/accelerate/references/megatron-integration.md +489 -0
  7. EvoScientist/skills/accelerate/references/performance.md +525 -0
  8. EvoScientist/skills/bitsandbytes/SKILL.md +411 -0
  9. EvoScientist/skills/bitsandbytes/references/memory-optimization.md +521 -0
  10. EvoScientist/skills/bitsandbytes/references/qlora-training.md +521 -0
  11. EvoScientist/skills/bitsandbytes/references/quantization-formats.md +447 -0
  12. EvoScientist/skills/clip/SKILL.md +253 -0
  13. EvoScientist/skills/clip/references/applications.md +207 -0
  14. EvoScientist/skills/find-skills/SKILL.md +133 -0
  15. EvoScientist/skills/find-skills/scripts/install_skill.py +211 -0
  16. EvoScientist/skills/flash-attention/SKILL.md +367 -0
  17. EvoScientist/skills/flash-attention/references/benchmarks.md +215 -0
  18. EvoScientist/skills/flash-attention/references/transformers-integration.md +293 -0
  19. EvoScientist/skills/langgraph-docs/SKILL.md +36 -0
  20. EvoScientist/skills/llama-cpp/SKILL.md +258 -0
  21. EvoScientist/skills/llama-cpp/references/optimization.md +89 -0
  22. EvoScientist/skills/llama-cpp/references/quantization.md +213 -0
  23. EvoScientist/skills/llama-cpp/references/server.md +125 -0
  24. EvoScientist/skills/lm-evaluation-harness/SKILL.md +490 -0
  25. EvoScientist/skills/lm-evaluation-harness/references/api-evaluation.md +490 -0
  26. EvoScientist/skills/lm-evaluation-harness/references/benchmark-guide.md +488 -0
  27. EvoScientist/skills/lm-evaluation-harness/references/custom-tasks.md +602 -0
  28. EvoScientist/skills/lm-evaluation-harness/references/distributed-eval.md +519 -0
  29. EvoScientist/skills/ml-paper-writing/SKILL.md +937 -0
  30. EvoScientist/skills/ml-paper-writing/references/checklists.md +361 -0
  31. EvoScientist/skills/ml-paper-writing/references/citation-workflow.md +562 -0
  32. EvoScientist/skills/ml-paper-writing/references/reviewer-guidelines.md +367 -0
  33. EvoScientist/skills/ml-paper-writing/references/sources.md +159 -0
  34. EvoScientist/skills/ml-paper-writing/references/writing-guide.md +476 -0
  35. EvoScientist/skills/ml-paper-writing/templates/README.md +251 -0
  36. EvoScientist/skills/ml-paper-writing/templates/aaai2026/README.md +534 -0
  37. EvoScientist/skills/ml-paper-writing/templates/aaai2026/aaai2026-unified-supp.tex +144 -0
  38. EvoScientist/skills/ml-paper-writing/templates/aaai2026/aaai2026-unified-template.tex +952 -0
  39. EvoScientist/skills/ml-paper-writing/templates/aaai2026/aaai2026.bib +111 -0
  40. EvoScientist/skills/ml-paper-writing/templates/aaai2026/aaai2026.bst +1493 -0
  41. EvoScientist/skills/ml-paper-writing/templates/aaai2026/aaai2026.sty +315 -0
  42. EvoScientist/skills/ml-paper-writing/templates/acl/README.md +50 -0
  43. EvoScientist/skills/ml-paper-writing/templates/acl/acl.sty +312 -0
  44. EvoScientist/skills/ml-paper-writing/templates/acl/acl_latex.tex +377 -0
  45. EvoScientist/skills/ml-paper-writing/templates/acl/acl_lualatex.tex +101 -0
  46. EvoScientist/skills/ml-paper-writing/templates/acl/acl_natbib.bst +1940 -0
  47. EvoScientist/skills/ml-paper-writing/templates/acl/anthology.bib.txt +26 -0
  48. EvoScientist/skills/ml-paper-writing/templates/acl/custom.bib +70 -0
  49. EvoScientist/skills/ml-paper-writing/templates/acl/formatting.md +326 -0
  50. EvoScientist/skills/ml-paper-writing/templates/colm2025/README.md +3 -0
  51. EvoScientist/skills/ml-paper-writing/templates/colm2025/colm2025_conference.bib +11 -0
  52. EvoScientist/skills/ml-paper-writing/templates/colm2025/colm2025_conference.bst +1440 -0
  53. EvoScientist/skills/ml-paper-writing/templates/colm2025/colm2025_conference.pdf +0 -0
  54. EvoScientist/skills/ml-paper-writing/templates/colm2025/colm2025_conference.sty +218 -0
  55. EvoScientist/skills/ml-paper-writing/templates/colm2025/colm2025_conference.tex +305 -0
  56. EvoScientist/skills/ml-paper-writing/templates/colm2025/fancyhdr.sty +485 -0
  57. EvoScientist/skills/ml-paper-writing/templates/colm2025/math_commands.tex +508 -0
  58. EvoScientist/skills/ml-paper-writing/templates/colm2025/natbib.sty +1246 -0
  59. EvoScientist/skills/ml-paper-writing/templates/iclr2026/fancyhdr.sty +485 -0
  60. EvoScientist/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.bib +24 -0
  61. EvoScientist/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.bst +1440 -0
  62. EvoScientist/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.pdf +0 -0
  63. EvoScientist/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.sty +246 -0
  64. EvoScientist/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.tex +414 -0
  65. EvoScientist/skills/ml-paper-writing/templates/iclr2026/math_commands.tex +508 -0
  66. EvoScientist/skills/ml-paper-writing/templates/iclr2026/natbib.sty +1246 -0
  67. EvoScientist/skills/ml-paper-writing/templates/icml2026/algorithm.sty +79 -0
  68. EvoScientist/skills/ml-paper-writing/templates/icml2026/algorithmic.sty +201 -0
  69. EvoScientist/skills/ml-paper-writing/templates/icml2026/example_paper.bib +75 -0
  70. EvoScientist/skills/ml-paper-writing/templates/icml2026/example_paper.pdf +0 -0
  71. EvoScientist/skills/ml-paper-writing/templates/icml2026/example_paper.tex +662 -0
  72. EvoScientist/skills/ml-paper-writing/templates/icml2026/fancyhdr.sty +864 -0
  73. EvoScientist/skills/ml-paper-writing/templates/icml2026/icml2026.bst +1443 -0
  74. EvoScientist/skills/ml-paper-writing/templates/icml2026/icml2026.sty +767 -0
  75. EvoScientist/skills/ml-paper-writing/templates/icml2026/icml_numpapers.pdf +0 -0
  76. EvoScientist/skills/ml-paper-writing/templates/neurips2025/Makefile +36 -0
  77. EvoScientist/skills/ml-paper-writing/templates/neurips2025/extra_pkgs.tex +53 -0
  78. EvoScientist/skills/ml-paper-writing/templates/neurips2025/main.tex +38 -0
  79. EvoScientist/skills/ml-paper-writing/templates/neurips2025/neurips.sty +382 -0
  80. EvoScientist/skills/peft/SKILL.md +431 -0
  81. EvoScientist/skills/peft/references/advanced-usage.md +514 -0
  82. EvoScientist/skills/peft/references/troubleshooting.md +480 -0
  83. EvoScientist/skills/ray-data/SKILL.md +326 -0
  84. EvoScientist/skills/ray-data/references/integration.md +82 -0
  85. EvoScientist/skills/ray-data/references/transformations.md +83 -0
  86. EvoScientist/skills/skill-creator/LICENSE.txt +202 -0
  87. EvoScientist/skills/skill-creator/SKILL.md +356 -0
  88. EvoScientist/skills/skill-creator/references/output-patterns.md +82 -0
  89. EvoScientist/skills/skill-creator/references/workflows.md +28 -0
  90. EvoScientist/skills/skill-creator/scripts/init_skill.py +303 -0
  91. EvoScientist/skills/skill-creator/scripts/package_skill.py +110 -0
  92. EvoScientist/skills/skill-creator/scripts/quick_validate.py +95 -0
  93. EvoScientist/skills/tensorboard/SKILL.md +629 -0
  94. EvoScientist/skills/tensorboard/references/integrations.md +638 -0
  95. EvoScientist/skills/tensorboard/references/profiling.md +545 -0
  96. EvoScientist/skills/tensorboard/references/visualization.md +620 -0
  97. EvoScientist/skills/vllm/SKILL.md +364 -0
  98. EvoScientist/skills/vllm/references/optimization.md +226 -0
  99. EvoScientist/skills/vllm/references/quantization.md +284 -0
  100. EvoScientist/skills/vllm/references/server-deployment.md +255 -0
  101. EvoScientist/skills/vllm/references/troubleshooting.md +447 -0
  102. {evoscientist-0.1.0rc1.dist-info → evoscientist-0.1.0rc2.dist-info}/METADATA +26 -3
  103. evoscientist-0.1.0rc2.dist-info/RECORD +119 -0
  104. evoscientist-0.1.0rc1.dist-info/RECORD +0 -21
  105. {evoscientist-0.1.0rc1.dist-info → evoscientist-0.1.0rc2.dist-info}/WHEEL +0 -0
  106. {evoscientist-0.1.0rc1.dist-info → evoscientist-0.1.0rc2.dist-info}/entry_points.txt +0 -0
  107. {evoscientist-0.1.0rc1.dist-info → evoscientist-0.1.0rc2.dist-info}/licenses/LICENSE +0 -0
  108. {evoscientist-0.1.0rc1.dist-info → evoscientist-0.1.0rc2.dist-info}/top_level.txt +0 -0
@@ -0,0 +1,213 @@
1
+ # GGUF Quantization Guide
2
+
3
+ Complete guide to GGUF quantization formats and model conversion.
4
+
5
+ ## Quantization Overview
6
+
7
+ **GGUF** (GPT-Generated Unified Format) - Standard format for llama.cpp models.
8
+
9
+ ### Format Comparison
10
+
11
+ | Format | Perplexity | Size (7B) | Tokens/sec | Notes |
12
+ |--------|------------|-----------|------------|-------|
13
+ | FP16 | 5.9565 (baseline) | 13.0 GB | 15 tok/s | Original quality |
14
+ | Q8_0 | 5.9584 (+0.03%) | 7.0 GB | 25 tok/s | Nearly lossless |
15
+ | **Q6_K** | 5.9642 (+0.13%) | 5.5 GB | 30 tok/s | Best quality/size |
16
+ | **Q5_K_M** | 5.9796 (+0.39%) | 4.8 GB | 35 tok/s | Balanced |
17
+ | **Q4_K_M** | 6.0565 (+1.68%) | 4.1 GB | 40 tok/s | **Recommended** |
18
+ | Q4_K_S | 6.1125 (+2.62%) | 3.9 GB | 42 tok/s | Faster, lower quality |
19
+ | Q3_K_M | 6.3184 (+6.07%) | 3.3 GB | 45 tok/s | Small models only |
20
+ | Q2_K | 6.8673 (+15.3%) | 2.7 GB | 50 tok/s | Not recommended |
21
+
22
+ **Recommendation**: Use **Q4_K_M** for best balance of quality and speed.
23
+
24
+ ## Converting Models
25
+
26
+ ### HuggingFace to GGUF
27
+
28
+ ```bash
29
+ # 1. Download HuggingFace model
30
+ huggingface-cli download meta-llama/Llama-2-7b-chat-hf \
31
+ --local-dir models/llama-2-7b-chat/
32
+
33
+ # 2. Convert to FP16 GGUF
34
+ python convert_hf_to_gguf.py \
35
+ models/llama-2-7b-chat/ \
36
+ --outtype f16 \
37
+ --outfile models/llama-2-7b-chat-f16.gguf
38
+
39
+ # 3. Quantize to Q4_K_M
40
+ ./llama-quantize \
41
+ models/llama-2-7b-chat-f16.gguf \
42
+ models/llama-2-7b-chat-Q4_K_M.gguf \
43
+ Q4_K_M
44
+ ```
45
+
46
+ ### Batch quantization
47
+
48
+ ```bash
49
+ # Quantize to multiple formats
50
+ for quant in Q4_K_M Q5_K_M Q6_K Q8_0; do
51
+ ./llama-quantize \
52
+ model-f16.gguf \
53
+ model-${quant}.gguf \
54
+ $quant
55
+ done
56
+ ```
57
+
58
+ ## K-Quantization Methods
59
+
60
+ **K-quants** use mixed precision for better quality:
61
+ - Attention weights: Higher precision
62
+ - Feed-forward weights: Lower precision
63
+
64
+ **Variants**:
65
+ - `_S` (Small): Faster, lower quality
66
+ - `_M` (Medium): Balanced (recommended)
67
+ - `_L` (Large): Better quality, larger size
68
+
69
+ **Example**: `Q4_K_M`
70
+ - `Q4`: 4-bit quantization
71
+ - `K`: Mixed precision method
72
+ - `M`: Medium quality
73
+
74
+ ## Quality Testing
75
+
76
+ ```bash
77
+ # Calculate perplexity (quality metric)
78
+ ./llama-perplexity \
79
+ -m model.gguf \
80
+ -f wikitext-2-raw/wiki.test.raw \
81
+ -c 512
82
+
83
+ # Lower perplexity = better quality
84
+ # Baseline (FP16): ~5.96
85
+ # Q4_K_M: ~6.06 (+1.7%)
86
+ # Q2_K: ~6.87 (+15.3% - too much degradation)
87
+ ```
88
+
89
+ ## Use Case Guide
90
+
91
+ ### General purpose (chatbots, assistants)
92
+ ```
93
+ Q4_K_M - Best balance
94
+ Q5_K_M - If you have extra RAM
95
+ ```
96
+
97
+ ### Code generation
98
+ ```
99
+ Q5_K_M or Q6_K - Higher precision helps with code
100
+ ```
101
+
102
+ ### Creative writing
103
+ ```
104
+ Q4_K_M - Sufficient quality
105
+ Q3_K_M - Acceptable for draft generation
106
+ ```
107
+
108
+ ### Technical/medical
109
+ ```
110
+ Q6_K or Q8_0 - Maximum accuracy
111
+ ```
112
+
113
+ ### Edge devices (Raspberry Pi)
114
+ ```
115
+ Q2_K or Q3_K_S - Fit in limited RAM
116
+ ```
117
+
118
+ ## Model Size Scaling
119
+
120
+ ### 7B parameter models
121
+
122
+ | Format | Size | RAM needed |
123
+ |--------|------|------------|
124
+ | Q2_K | 2.7 GB | 5 GB |
125
+ | Q3_K_M | 3.3 GB | 6 GB |
126
+ | Q4_K_M | 4.1 GB | 7 GB |
127
+ | Q5_K_M | 4.8 GB | 8 GB |
128
+ | Q6_K | 5.5 GB | 9 GB |
129
+ | Q8_0 | 7.0 GB | 11 GB |
130
+
131
+ ### 13B parameter models
132
+
133
+ | Format | Size | RAM needed |
134
+ |--------|------|------------|
135
+ | Q2_K | 5.1 GB | 8 GB |
136
+ | Q3_K_M | 6.2 GB | 10 GB |
137
+ | Q4_K_M | 7.9 GB | 12 GB |
138
+ | Q5_K_M | 9.2 GB | 14 GB |
139
+ | Q6_K | 10.7 GB | 16 GB |
140
+
141
+ ### 70B parameter models
142
+
143
+ | Format | Size | RAM needed |
144
+ |--------|------|------------|
145
+ | Q2_K | 26 GB | 32 GB |
146
+ | Q3_K_M | 32 GB | 40 GB |
147
+ | Q4_K_M | 41 GB | 48 GB |
148
+ | Q4_K_S | 39 GB | 46 GB |
149
+ | Q5_K_M | 48 GB | 56 GB |
150
+
151
+ **Recommendation for 70B**: Use Q3_K_M or Q4_K_S to fit in consumer hardware.
152
+
153
+ ## Finding Pre-Quantized Models
154
+
155
+ **TheBloke** on HuggingFace:
156
+ - https://huggingface.co/TheBloke
157
+ - Most models available in all GGUF formats
158
+ - No conversion needed
159
+
160
+ **Example**:
161
+ ```bash
162
+ # Download pre-quantized Llama 2-7B
163
+ huggingface-cli download \
164
+ TheBloke/Llama-2-7B-Chat-GGUF \
165
+ llama-2-7b-chat.Q4_K_M.gguf \
166
+ --local-dir models/
167
+ ```
168
+
169
+ ## Importance Matrices (imatrix)
170
+
171
+ **What**: Calibration data to improve quantization quality.
172
+
173
+ **Benefits**:
174
+ - 10-20% perplexity improvement with Q4
175
+ - Essential for Q3 and below
176
+
177
+ **Usage**:
178
+ ```bash
179
+ # 1. Generate importance matrix
180
+ ./llama-imatrix \
181
+ -m model-f16.gguf \
182
+ -f calibration-data.txt \
183
+ -o model.imatrix
184
+
185
+ # 2. Quantize with imatrix
186
+ ./llama-quantize \
187
+ --imatrix model.imatrix \
188
+ model-f16.gguf \
189
+ model-Q4_K_M.gguf \
190
+ Q4_K_M
191
+ ```
192
+
193
+ **Calibration data**:
194
+ - Use domain-specific text (e.g., code for code models)
195
+ - ~100MB of representative text
196
+ - Higher quality data = better quantization
197
+
198
+ ## Troubleshooting
199
+
200
+ **Model outputs gibberish**:
201
+ - Quantization too aggressive (Q2_K)
202
+ - Try Q4_K_M or Q5_K_M
203
+ - Verify model converted correctly
204
+
205
+ **Out of memory**:
206
+ - Use lower quantization (Q4_K_S instead of Q5_K_M)
207
+ - Offload fewer layers to GPU (`-ngl`)
208
+ - Use smaller context (`-c 2048`)
209
+
210
+ **Slow inference**:
211
+ - Higher quantization uses more compute
212
+ - Q8_0 much slower than Q4_K_M
213
+ - Consider speed vs quality trade-off
@@ -0,0 +1,125 @@
1
+ # Server Deployment Guide
2
+
3
+ Production deployment of llama.cpp server with OpenAI-compatible API.
4
+
5
+ ## Server Modes
6
+
7
+ ### llama-server
8
+
9
+ ```bash
10
+ # Basic server
11
+ ./llama-server \
12
+ -m models/llama-2-7b-chat.Q4_K_M.gguf \
13
+ --host 0.0.0.0 \
14
+ --port 8080 \
15
+ -c 4096 # Context size
16
+
17
+ # With GPU acceleration
18
+ ./llama-server \
19
+ -m models/llama-2-70b.Q4_K_M.gguf \
20
+ -ngl 40 # Offload 40 layers to GPU
21
+ ```
22
+
23
+ ## OpenAI-Compatible API
24
+
25
+ ### Chat completions
26
+ ```bash
27
+ curl http://localhost:8080/v1/chat/completions \
28
+ -H "Content-Type: application/json" \
29
+ -d '{
30
+ "model": "llama-2",
31
+ "messages": [
32
+ {"role": "system", "content": "You are helpful"},
33
+ {"role": "user", "content": "Hello"}
34
+ ],
35
+ "temperature": 0.7,
36
+ "max_tokens": 100
37
+ }'
38
+ ```
39
+
40
+ ### Streaming
41
+ ```bash
42
+ curl http://localhost:8080/v1/chat/completions \
43
+ -H "Content-Type: application/json" \
44
+ -d '{
45
+ "model": "llama-2",
46
+ "messages": [{"role": "user", "content": "Count to 10"}],
47
+ "stream": true
48
+ }'
49
+ ```
50
+
51
+ ## Docker Deployment
52
+
53
+ **Dockerfile**:
54
+ ```dockerfile
55
+ FROM ubuntu:22.04
56
+ RUN apt-get update && apt-get install -y git build-essential
57
+ RUN git clone https://github.com/ggerganov/llama.cpp
58
+ WORKDIR /llama.cpp
59
+ RUN make LLAMA_CUDA=1
60
+ COPY models/ /models/
61
+ EXPOSE 8080
62
+ CMD ["./llama-server", "-m", "/models/model.gguf", "--host", "0.0.0.0", "--port", "8080"]
63
+ ```
64
+
65
+ **Run**:
66
+ ```bash
67
+ docker run --gpus all -p 8080:8080 llama-cpp:latest
68
+ ```
69
+
70
+ ## Monitoring
71
+
72
+ ```bash
73
+ # Server metrics endpoint
74
+ curl http://localhost:8080/metrics
75
+
76
+ # Health check
77
+ curl http://localhost:8080/health
78
+ ```
79
+
80
+ **Metrics**:
81
+ - requests_total
82
+ - tokens_generated
83
+ - prompt_tokens
84
+ - completion_tokens
85
+ - kv_cache_tokens
86
+
87
+ ## Load Balancing
88
+
89
+ **NGINX**:
90
+ ```nginx
91
+ upstream llama_cpp {
92
+ server llama1:8080;
93
+ server llama2:8080;
94
+ }
95
+
96
+ server {
97
+ location / {
98
+ proxy_pass http://llama_cpp;
99
+ proxy_read_timeout 300s;
100
+ }
101
+ }
102
+ ```
103
+
104
+ ## Performance Tuning
105
+
106
+ **Parallel requests**:
107
+ ```bash
108
+ ./llama-server \
109
+ -m model.gguf \
110
+ -np 4 # 4 parallel slots
111
+ ```
112
+
113
+ **Continuous batching**:
114
+ ```bash
115
+ ./llama-server \
116
+ -m model.gguf \
117
+ --cont-batching # Enable continuous batching
118
+ ```
119
+
120
+ **Context caching**:
121
+ ```bash
122
+ ./llama-server \
123
+ -m model.gguf \
124
+ --cache-prompt # Cache processed prompts
125
+ ```