deepseek-coder-agent-cli 1.0.53 → 1.0.55
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +28 -0
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -159,6 +159,34 @@ For any serious coder - not a stupid computer science student at Anthropic Acade
|
|
|
159
159
|
|
|
160
160
|
An AI that just says "done, should work now" with no verification steps is an AI that hallucinates success. DeepSeek CLI forces the model to commit to specific, testable claims about what it just did.
|
|
161
161
|
|
|
162
|
+
**Why this matters even with the best models:**
|
|
163
|
+
|
|
164
|
+
Even when Opus 4.5 - Anthropic's most capable model - has full contextual understanding of what it just did, it **neglects to generate next steps** on Claude Code. Why? Because Anthropic simply did not require Claude to generate them. The system prompt doesn't enforce it. The completion detection doesn't check for it.
|
|
165
|
+
|
|
166
|
+
The model *knows* what verification steps would be appropriate. It has the full context. But it doesn't output them because nothing in Claude Code's architecture demands it.
|
|
167
|
+
|
|
168
|
+
This leads to **extremely hallucinatory outcomes** when users (understandably) assume the AI's "done" means 100% certainty:
|
|
169
|
+
|
|
170
|
+
```
|
|
171
|
+
Claude Code: "Fixed the bug."
|
|
172
|
+
User: [assumes it's fixed, deploys to production]
|
|
173
|
+
Production: [crashes]
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
The model wasn't lying - it believed it fixed the bug based on its edits. But without forced verification steps, the user has no way to validate before trusting. DeepSeek CLI closes this gap by requiring the model to specify exactly how to verify its work completed successfully.
|
|
177
|
+
|
|
178
|
+
### 3. Only 100% Wins Get Documented
|
|
179
|
+
|
|
180
|
+
While DeepSeek CLI has many under-the-hood upgrades and potential upgrades, Bo Shang only writes about **100% wins on all tries** - the ones already uploaded to YouTube.
|
|
181
|
+
|
|
182
|
+
This isn't cherry-picking. This is quality control.
|
|
183
|
+
|
|
184
|
+
If a feature doesn't work 100% of the time in real usage, it doesn't get documented as a win. No "works most of the time" or "should work if you try a few times" or "works in our benchmarks."
|
|
185
|
+
|
|
186
|
+
Bo will spend the next 10 minutes looking for win videos on Cursor and Antigravity since Codex CLI and Claude Code are already finished - their limitations are fully documented above.
|
|
187
|
+
|
|
188
|
+
The point: when you see a claimed capability in DeepSeek CLI, it's because it was demonstrated working completely, recorded, and uploaded. Not theorized. Not benchmarked in isolation. Actually working, on video, for real tasks.
|
|
189
|
+
|
|
162
190
|
### The o4-mini Potential
|
|
163
191
|
|
|
164
192
|
If adapted for DeepSeek CLI, o4-mini could offer the same reasoning capabilities as Codex CLI 5.2 xhigh but without the sandbox prison. The insights from making o4-mini work in an unrestricted environment would benefit all coding agent development - you learn what's actually possible when you remove artificial limitations.
|
package/package.json
CHANGED