PyPI - turboapi - Versions diffs - 0.3.24__tar.gz → 0.3.28__tar.gz - Mend

turboapi 0.3.24tar.gz → 0.3.28tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (134) hide show

{turboapi-0.3.24 → turboapi-0.3.28}/AGENTS.md RENAMED Viewed

@@ -1,17 +1,15 @@
-# TurboAPI v0.3.23 - AI Agent Guide 🤖
+# TurboAPI v0.3.0+ - AI Agent Guide 🤖
 **For AI assistants, code generation tools, and automated development systems**
 ## 🎯 **What TurboAPI Is**
-TurboAPI is a **FastAPI-compatible** Python web framework that delivers **9-10x better performance** through:
+TurboAPI is a **FastAPI-compatible** Python web framework that delivers **5-10x better performance** through:
 - **Rust-powered HTTP core** (zero Python overhead)
-- **Python 3.13 free-threading** with `Python::attach()` (TRUE parallel execution)
-- **pyo3-async-runtimes** integration (native tokio async support)
+- **Python 3.13 free-threading** support (true parallelism)
 - **Zero-copy optimizations** and intelligent caching
 - **100% FastAPI syntax compatibility** with automatic body parsing
 - **Satya validation** (faster than Pydantic)
-- **72,000+ req/s** in production benchmarks
 ## 🚀 **For AI Agents: Key Facts**

turboapi-0.3.28/APACHE_BENCH_RESULTS.md ADDED Viewed

@@ -0,0 +1,230 @@
+# TurboAPI Apache Bench Results 🚀
+**Date**: 2025-10-11
+**Version**: TurboAPI v0.3.27 with Rust Core
+**Python**: 3.14t (free-threading)
+**Tool**: Apache Bench (ab)
+---
+## Test 1: Sync Handler - Light Load
+**Command**: `ab -n 10000 -c 100 http://127.0.0.1:8000/sync`
+### Results
+- **Requests per second**: **31,353 RPS** 🔥
+- **Time per request**: 3.189 ms (mean)
+- **Time per request**: 0.032 ms (mean, across all concurrent requests)
+- **Transfer rate**: 4,409 KB/sec
+- **Failed requests**: 0
+- **Total time**: 0.319 seconds
+### Latency Distribution
+```
+50%      3 ms
+66%      3 ms
+75%      3 ms
+80%      3 ms
+90%      4 ms
+95%      6 ms
+98%      6 ms
+99%      7 ms
+100%    21 ms (longest request)
+```
+---
+## Test 2: Compute Handler - CPU Intensive
+**Command**: `ab -n 10000 -c 100 http://127.0.0.1:8000/compute`
+### Results
+- **Requests per second**: **32,428 RPS** 🔥
+- **Time per request**: 3.084 ms (mean)
+- **Time per request**: 0.031 ms (mean, across all concurrent requests)
+- **Transfer rate**: 4,687 KB/sec
+- **Failed requests**: 0
+- **Total time**: 0.308 seconds
+### Latency Distribution
+```
+50%      3 ms
+66%      3 ms
+75%      3 ms
+80%      3 ms
+90%      3 ms
+95%      4 ms
+98%      6 ms
+99%      6 ms
+100%     6 ms (longest request)
+```
+**Note**: Even with CPU-intensive computation (sum of squares 0-999), performance remains excellent!
+---
+## Test 3: Async Handler - Event Loop Overhead
+**Command**: `ab -n 5000 -c 50 http://127.0.0.1:8000/async`
+### Results
+- **Requests per second**: **543 RPS**
+- **Time per request**: 92.103 ms (mean)
+- **Time per request**: 1.842 ms (mean, across all concurrent requests)
+- **Transfer rate**: 91.18 KB/sec
+- **Failed requests**: 0
+- **Total time**: 9.210 seconds
+### Latency Distribution
+```
+50%     92 ms
+66%     94 ms
+75%     94 ms
+80%     95 ms
+90%     95 ms
+95%     96 ms
+98%     98 ms
+99%    102 ms
+100%   103 ms (longest request)
+```
+**Note**: Slower due to `asyncio.run()` creating new event loop per request. This is expected behavior. For production, consider using a persistent event loop pool.
+---
+## Test 4: High Concurrency - Stress Test
+**Command**: `ab -n 50000 -c 500 http://127.0.0.1:8000/sync`
+### Results
+- **Requests per second**: **27,306 RPS** 🔥
+- **Time per request**: 18.311 ms (mean)
+- **Time per request**: 0.037 ms (mean, across all concurrent requests)
+- **Transfer rate**: 3,840 KB/sec
+- **Failed requests**: 0
+- **Total time**: 1.831 seconds
+### Latency Distribution
+```
+50%     17 ms
+66%     18 ms
+75%     18 ms
+80%     18 ms
+90%     19 ms
+95%     21 ms
+98%     26 ms
+99%     85 ms
+100%   144 ms (longest request)
+```
+**Note**: Even with 500 concurrent connections, TurboAPI maintains 27K+ RPS with zero failures!
+---
+## Performance Summary
+| Test | Concurrency | Requests | RPS | Avg Latency | P95 Latency | P99 Latency |
+|------|-------------|----------|-----|-------------|-------------|-------------|
+| **Sync (Light)** | 100 | 10,000 | **31,353** | 3.2 ms | 6 ms | 7 ms |
+| **Compute (CPU)** | 100 | 10,000 | **32,428** | 3.1 ms | 4 ms | 6 ms |
+| **Async (Event Loop)** | 50 | 5,000 | 543 | 92 ms | 96 ms | 102 ms |
+| **High Concurrency** | 500 | 50,000 | **27,306** | 18 ms | 21 ms | 85 ms |
+---
+## Key Findings
+### ✅ Strengths
+1. **Exceptional sync performance**: 31K-32K RPS consistently
+2. **CPU-intensive workloads**: No performance degradation
+3. **High concurrency**: Handles 500 concurrent connections with 27K RPS
+4. **Zero failures**: 100% success rate across all tests
+5. **Low latency**: Sub-10ms P99 latency under normal load
+### ⚠️ Async Handler Considerations
+- Current implementation creates new event loop per request (`asyncio.run()`)
+- This adds ~90ms overhead per async request
+- **Recommendation**: Implement event loop pooling for production async workloads
+### 🎯 Comparison vs FastAPI
+| Metric | FastAPI | TurboAPI | Improvement |
+|--------|---------|----------|-------------|
+| RPS (100 conn) | ~7,000 | **31,353** | **4.5x faster** |
+| Latency (P95) | ~40ms | **6ms** | **6.7x lower** |
+| Latency (P99) | ~60ms | **7ms** | **8.6x lower** |
+---
+## Architecture Insights
+### Why Sync is Fast
+```
+HTTP Request → Rust (Hyper) → Python Handler (GIL) → JSON → Rust → Response
+                 ↑                                              ↑
+            Zero overhead                              Zero overhead
+```
+### Why Async is Slower (Current Implementation)
+```
+HTTP Request → Rust → spawn_blocking → asyncio.run() → New Event Loop → Handler
+                                           ↑
+                                    ~90ms overhead per request
+```
+### Future Optimization: Event Loop Pool
+```
+HTTP Request → Rust → Event Loop Pool → Reuse Loop → Handler
+                           ↑
+                    Amortized overhead
+```
+---
+## Recommendations
+### For Production Use
+1. **Sync Handlers** (Recommended for most use cases)
+   - Use for: REST APIs, CRUD operations, database queries
+   - Performance: 30K+ RPS
+   - Latency: Sub-10ms
+2. **Async Handlers** (Use with caution)
+   - Current: 543 RPS with 90ms overhead
+   - Future: Implement event loop pooling for better performance
+   - Use for: Long-running I/O operations, WebSockets, streaming
+3. **High Concurrency**
+   - TurboAPI handles 500+ concurrent connections gracefully
+   - Consider load balancing for >1000 concurrent connections
+---
+## Next Steps
+### Immediate
+- ✅ Rust core validated at 30K+ RPS
+- ✅ Sync handlers production-ready
+- ✅ Zero-failure reliability confirmed
+### Future Enhancements
+1. **Event Loop Pooling** - Reduce async overhead from 90ms to <5ms
+2. **Connection Pooling** - Reuse connections for better throughput
+3. **HTTP/2 Support** - Enable multiplexing and server push
+4. **Multi-worker Mode** - Spawn multiple Python worker threads
+5. **Zero-copy Buffers** - Eliminate data copying between Rust/Python
+---
+## Conclusion
+TurboAPI with Rust core delivers **exceptional performance** for sync handlers:
+- ✅ **31K-32K RPS** sustained throughput
+- ✅ **Sub-10ms P99 latency**
+- ✅ **Zero failures** under stress
+- ✅ **4.5x faster** than FastAPI
+The framework is **production-ready** for high-performance REST APIs and sync workloads.
+---
+**Tested by**: Apache Bench 2.3
+**Hardware**: Apple Silicon (M-series)
+**OS**: macOS
+**Python**: 3.14t (free-threading enabled)

turboapi-0.3.28/ASYNC_OPTIMIZATION_ROADMAP.md ADDED Viewed

@@ -0,0 +1,293 @@
+# TurboAPI Async Optimization Roadmap
+**Goal**: Achieve **10-15K RPS** for async endpoints
+**Current**: 3,504 RPS (Phase A complete)
+**Date**: 2025-10-11
+---
+## 📊 Performance Journey
+| Phase | RPS | Latency | Improvement | Status |
+|-------|-----|---------|-------------|--------|
+| **Baseline** | 1,981 | 25ms | - | ✅ Measured |
+| **Phase A** | 3,504 | 13.68ms | +77% | ✅ **COMPLETE** |
+| **Phase B** | 7,000-9,000 | 5-8ms | +100-150% | ⏳ Next |
+| **Phase C** | 10,000-15,000 | 3-5ms | +40-70% | 📋 Planned |
+---
+## ✅ Phase A: Loop Sharding (COMPLETE)
+### What We Did
+- Implemented **14 parallel event loop shards** (one per CPU core)
+- Increased batch size from **32 → 128** requests
+- Added **hash-based shard routing** for cache locality
+- Eliminated **single event loop bottleneck**
+### Results
+- **✅ 3,504 RPS** (77% improvement)
+- **✅ 13.68ms latency** (45% reduction)
+- **✅ Stable under load** (c=50, c=100, c=200)
+### Key Code Changes
+```rust
+// src/server.rs
+fn spawn_loop_shards(num_shards: usize) -> Vec<LoopShard> {
+    // 14 independent event loops
+    // 128 request batching
+    // Per-shard MPSC channels
+}
+```
+**Files**: `PHASE_A_IMPLEMENTATION_GUIDE.md`, `PHASE_A_RESULTS.md`
+---
+## ⏳ Phase B: uvloop + Optimizations (NEXT)
+### What To Do
+1. **Replace asyncio with uvloop** - C-based event loop (2-4x faster)
+2. **Add semaphore gating** - Limit concurrent tasks (512 max)
+3. **Replace json.dumps with orjson** - Faster JSON (2-5x faster)
+### Expected Results
+- **🎯 7,000-9,000 RPS** (2-3x improvement)
+- **🎯 5-8ms latency** (2x faster)
+- **🎯 Better CPU utilization**
+### Implementation Plan
+```python
+# Install dependencies
+pip install uvloop orjson
+# In Rust: spawn_loop_shards()
+uvloop.install()  # Use uvloop instead of asyncio
+orjson.dumps()    # Use orjson instead of json.dumps
+# Add semaphore gating
+limiter = AsyncLimiter(max_concurrent=512)
+```
+**Timeline**: ~3 hours
+**Files**: `PHASE_B_IMPLEMENTATION_GUIDE.md`
+---
+## 📋 Phase C: Bytes-First Handlers (PLANNED)
+### What To Do
+1. **Return bytes directly** - No string conversion overhead
+2. **Zero-copy buffers** - Memory-mapped responses
+3. **Batch serialization** - Serialize multiple responses at once
+### Expected Results
+- **🎯 10,000-15,000 RPS** (40-70% improvement)
+- **🎯 3-5ms latency** (sub-5ms target)
+- **🎯 Zero-copy architecture**
+### Implementation Concept
+```python
+# Handler returns bytes directly
+async def handler():
+    return orjson.dumps({"ok": True})  # Returns bytes!
+# Rust: Zero-copy response
+fn create_response(data: &[u8]) -> Response {
+    // Memory-mapped buffer, no copy
+}
+```
+**Timeline**: ~5 hours
+**Complexity**: High (requires careful memory management)
+---
+## 🔍 Bottleneck Analysis
+### Phase A Bottlenecks (Current)
+1. **Python asyncio** - Pure Python event loop (slow)
+2. **json.dumps** - Pure Python JSON serialization (slow)
+3. **No task limiting** - Event loops can be overloaded
+4. **String conversions** - Bytes → String overhead
+### Phase B Fixes
+- ✅ uvloop (C event loop)
+- ✅ orjson (Rust JSON)
+- ✅ Semaphore gating
+- ⏳ String conversions (Phase C)
+### Phase C Fixes
+- ✅ Bytes-first handlers
+- ✅ Zero-copy buffers
+- ✅ Batch serialization
+---
+## 📈 Performance Projections
+### Conservative Estimates
+```
+Baseline:  1,981 RPS
+Phase A:   3,504 RPS (+77%)   ✅ ACHIEVED
+Phase B:   7,000 RPS (+100%)  🎯 TARGET
+Phase C:  10,000 RPS (+43%)   📋 STRETCH
+```
+### Optimistic Estimates
+```
+Baseline:  1,981 RPS
+Phase A:   3,504 RPS (+77%)   ✅ ACHIEVED
+Phase B:   9,000 RPS (+157%)  🎯 TARGET
+Phase C:  15,000 RPS (+67%)   📋 STRETCH
+```
+### Realistic Target
+```
+Phase B: 7,500 RPS (2.1x from Phase A)
+Phase C: 12,000 RPS (1.6x from Phase B)
+```
+---
+## 🛠️ Implementation Checklist
+### Phase A ✅
+- [x] Define LoopShard struct
+- [x] Implement spawn_loop_shards()
+- [x] Update handle_request() for sharding
+- [x] Increase batch size to 128
+- [x] Test and benchmark
+- [x] Document results
+### Phase B ⏳
+- [ ] Install uvloop and orjson
+- [ ] Update spawn_loop_shards() for uvloop
+- [ ] Create AsyncLimiter class
+- [ ] Update process_request_optimized()
+- [ ] Update serialize_result_optimized()
+- [ ] Test and benchmark
+- [ ] Document results
+### Phase C 📋
+- [ ] Design bytes-first handler API
+- [ ] Implement zero-copy buffers
+- [ ] Add batch serialization
+- [ ] Update handler registration
+- [ ] Test and benchmark
+- [ ] Document results
+---
+## 🧪 Testing Strategy
+### Functional Tests
+```bash
+# Basic functionality
+curl http://localhost:8000/async
+# Multiple endpoints
+curl http://localhost:8000/sync
+curl http://localhost:8000/compute
+```
+### Performance Tests
+```bash
+# Light load
+wrk -t4 -c50 -d10s http://localhost:8000/async
+# Medium load
+wrk -t8 -c100 -d30s http://localhost:8000/async
+# Heavy load
+wrk -t12 -c200 -d60s http://localhost:8000/async
+# Stress test
+wrk -t16 -c500 -d120s http://localhost:8000/async
+```
+### Regression Tests
+```bash
+# Compare before/after
+python benchmarks/turboapi_vs_fastapi_benchmark.py
+```
+---
+## 📚 Documentation
+### Implementation Guides
+- ✅ `PHASE_A_IMPLEMENTATION_GUIDE.md` - Loop sharding
+- ✅ `PHASE_B_IMPLEMENTATION_GUIDE.md` - uvloop + optimizations
+- 📋 `PHASE_C_IMPLEMENTATION_GUIDE.md` - Bytes-first (TODO)
+### Results Documents
+- ✅ `PHASE_A_RESULTS.md` - 3,504 RPS achieved
+- ⏳ `PHASE_B_RESULTS.md` - TBD
+- 📋 `PHASE_C_RESULTS.md` - TBD
+### Technical Analysis
+- ✅ `TRUE_ASYNC_SUCCESS.md` - Async architecture analysis
+- ✅ `EVENT_LOOP_OPTIMIZATION_STATUS.md` - Event loop bottleneck
+- ✅ `APACHE_BENCH_RESULTS.md` - Baseline benchmarks
+---
+## 🎯 Success Metrics
+### Phase B Success
+- **RPS**: 7,000+ (2x from Phase A)
+- **Latency**: <10ms P95
+- **Stability**: No crashes under 200 concurrent connections
+- **CPU**: <80% utilization at peak load
+### Phase C Success
+- **RPS**: 10,000+ (1.4x from Phase B)
+- **Latency**: <5ms P95
+- **Memory**: Zero-copy architecture verified
+- **Throughput**: 15K+ RPS sustained
+---
+## 🚀 Quick Start
+### Run Current (Phase A)
+```bash
+# Build
+maturin develop --manifest-path Cargo.toml --release
+# Run
+python test_multi_worker.py
+# Benchmark
+wrk -t4 -c50 -d10s http://localhost:8000/async
+```
+### Implement Phase B
+```bash
+# Install dependencies
+pip install uvloop orjson
+# Follow guide
+cat PHASE_B_IMPLEMENTATION_GUIDE.md
+# Rebuild and test
+maturin develop --release
+python test_multi_worker.py
+```
+---
+## 📞 Support
+- **Issues**: GitHub Issues
+- **Docs**: `AGENTS.md`, `README.md`
+- **Benchmarks**: `benchmarks/` directory
+---
+**Current Status**: ✅ Phase A Complete (3,504 RPS)
+**Next Action**: Implement Phase B (uvloop + orjson)
+**Final Goal**: 10-15K RPS with Phase C
+🚀 **Let's keep going!**

{turboapi-0.3.24 → turboapi-0.3.28}/Cargo.lock RENAMED Viewed

@@ -1439,7 +1439,7 @@ dependencies = [
 [[package]]
 name = "turbonet"
-version = "0.3.24"
+version = "0.3.28"
 dependencies = [
  "anyhow",
  "bytes",

{turboapi-0.3.24 → turboapi-0.3.28}/Cargo.toml RENAMED Viewed

@@ -1,9 +1,9 @@
 [package]
 name = "turbonet"
-version = "0.3.24"
+version = "0.3.28"
 edition = "2021"
 authors = ["Rach Pradhan <rach@turboapi.dev>"]
-description = "High-performance Python web framework core - Rust-powered HTTP server with Python 3.13 free-threading support"
+description = "High-performance Python web framework core - Rust-powered HTTP server with Python 3.14 free-threading support, FastAPI-compatible security and middleware"
 license = "MIT"
 repository = "https://github.com/justrach/turboAPI"
 homepage = "https://github.com/justrach/turboAPI"
@@ -23,7 +23,7 @@ python = ["pyo3"]
 [dependencies]
 pyo3 = { version = "0.26.0", features = ["extension-module"], optional = true }
 pyo3-async-runtimes = { version = "0.26", features = ["tokio-runtime"] }
-tokio = { version = "1.0", features = ["full"] }
+tokio = { version = "1.47.1", features = ["full"] }
 hyper = { version = "1.7.0", features = ["full", "http2"] }
 hyper-util = { version = "0.1.10", features = ["full", "http2"] }
 http-body-util = "0.1.2"

turboapi 0.3.24__tar.gz → 0.3.28__tar.gz

turboapi 0.3.24tar.gz → 0.3.28tar.gz