slick-queue-py 1.1.0__cp38-cp38-win_amd64.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,816 @@
1
+ Metadata-Version: 2.1
2
+ Name: slick-queue-py
3
+ Version: 1.1.0
4
+ Summary: Lock-free MPMC queue with C++ interoperability via shared memory
5
+ Home-page: https://github.com/SlickQuant/slick-queue-py
6
+ Author: Slick Quant
7
+ Author-email: Slick Quant <slickquant@slickquant.com>
8
+ License: MIT
9
+ Project-URL: Homepage, https://github.com/SlickQuant/slick-queue-py
10
+ Project-URL: Documentation, https://github.com/SlickQuant/slick-queue-py#readme
11
+ Project-URL: Repository, https://github.com/SlickQuant/slick-queue-py
12
+ Project-URL: Bug Tracker, https://github.com/SlickQuant/slick-queue-py/issues
13
+ Keywords: queue,lock-free,atomic,shared-memory,ipc,multiprocessing,mpmc
14
+ Classifier: Development Status :: 4 - Beta
15
+ Classifier: Intended Audience :: Developers
16
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
17
+ Classifier: Topic :: System :: Distributed Computing
18
+ Classifier: License :: OSI Approved :: MIT License
19
+ Classifier: Programming Language :: Python :: 3
20
+ Classifier: Programming Language :: Python :: 3.8
21
+ Classifier: Programming Language :: Python :: 3.9
22
+ Classifier: Programming Language :: Python :: 3.10
23
+ Classifier: Programming Language :: Python :: 3.11
24
+ Classifier: Programming Language :: Python :: 3.12
25
+ Classifier: Programming Language :: C++
26
+ Classifier: Operating System :: Microsoft :: Windows
27
+ Classifier: Operating System :: POSIX :: Linux
28
+ Classifier: Operating System :: MacOS
29
+ Requires-Python: >=3.8
30
+ Description-Content-Type: text/markdown
31
+ License-File: LICENSE
32
+
33
+ # slick-queue-py
34
+
35
+ Python implementation of SlickQueue - a lock-free multi-producer multi-consumer (MPMC) queue with C++ interoperability through shared memory.
36
+
37
+ This is the Python binding for the [SlickQueue C++ library](https://github.com/SlickQuant/slick-queue). The Python implementation maintains exact binary compatibility with the C++ version, enabling seamless interprocess communication between Python and C++ applications.
38
+
39
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
40
+ [![CI](https://github.com/SlickQuant/slick-queue_py/actions/workflows/ci.yml/badge.svg)](https://github.com/SlickQuant/slick-queue-py/actions/workflows/ci.yml)
41
+ [![GitHub release](https://img.shields.io/github/v/release/SlickQuant/slick-queue-py)](https://github.com/SlickQuant/slick-queue-py/releases)
42
+
43
+ ## Features
44
+
45
+ - **Dual Mode Operation**:
46
+ - **Local Memory Mode**: In-process queue using local memory (no shared memory overhead)
47
+ - **Shared Memory Mode**: Inter-process queue for interprocess communication
48
+ - **Lock-Free Multi-Producer Multi-Consumer**: True MPMC support using atomic operations
49
+ - **C++/Python Interoperability**: Python and C++ processes can share the same queue
50
+ - **Cross-Platform**: Windows and Linux/macOS support (x86-64)
51
+ - **Memory Layout Compatible**: Exact binary compatibility with C++ `slick::SlickQueue<T>`
52
+ - **High Performance**: Hardware atomic operations for minimal overhead
53
+
54
+ ## Requirements
55
+
56
+ - Python 3.8+ (uses `multiprocessing.shared_memory`)
57
+ - 64-bit platform
58
+ - For true lock-free operation: x86-64 CPU with CMPXCHG16B support (most CPUs since 2006)
59
+
60
+ ## Installation
61
+
62
+ ```bash
63
+ pip install -e .
64
+ ```
65
+
66
+ Or just copy the Python files to your project.
67
+
68
+ ## Quick Start
69
+
70
+ ### Local Memory Mode (Single Process)
71
+
72
+ ```python
73
+ from slick_queue_py import SlickQueue
74
+
75
+ # Create a queue in local memory (no shared memory)
76
+ q = SlickQueue(size=1024, element_size=256)
77
+
78
+ # Producer: Reserve a slot, write data, and publish
79
+ idx = q.reserve()
80
+ buf = q[idx]
81
+ buf[:len(b'hello')] = b'hello'
82
+ q.publish(idx)
83
+
84
+ # Consumer: Read data
85
+ read_index = 0
86
+ data, size, read_index = q.read(read_index)
87
+ if data is not None:
88
+ print(f"Received: {data[:size]}")
89
+
90
+ q.close() # unlink() does nothing for local mode
91
+ ```
92
+
93
+ ### Shared Memory Mode (Multi-Process)
94
+
95
+ ```python
96
+ from slick_queue_py import SlickQueue
97
+
98
+ # Create a new shared memory queue (size must be power of two)
99
+ q = SlickQueue(name='my_queue', size=1024, element_size=256)
100
+
101
+ # Producer: Reserve a slot, write data, and publish
102
+ idx = q.reserve()
103
+ buf = q[idx]
104
+ buf[:len(b'hello')] = b'hello'
105
+ q.publish(idx)
106
+
107
+ # Consumer: Read data
108
+ read_index = 0
109
+ data, size, read_index = q.read(read_index)
110
+ if data is not None:
111
+ print(f"Received: {data[:size]}")
112
+
113
+ q.close()
114
+ q.unlink() # Delete shared memory segment
115
+ ```
116
+
117
+ ### Multi-Producer Usage
118
+
119
+ ```python
120
+ from multiprocessing import Process
121
+ from slick_queue_py import SlickQueue
122
+ import struct
123
+
124
+ def producer_worker(queue_name, worker_id, num_items):
125
+ # Open existing queue
126
+ q = SlickQueue(name=queue_name, element_size=32)
127
+
128
+ for i in range(num_items):
129
+ # Reserve slot (thread-safe with atomic CAS)
130
+ idx = q.reserve(1)
131
+
132
+ # Write unique data
133
+ data = struct.pack("<I I", worker_id, i)
134
+ slot = q[idx]
135
+ slot[:len(data)] = data
136
+
137
+ # Publish (makes data visible to consumers)
138
+ q.publish(idx, 1)
139
+
140
+ q.close()
141
+
142
+ # Create queue
143
+ q = SlickQueue(name='mpmc_queue', size=64, element_size=32)
144
+
145
+ # Start multiple producers
146
+ producers = []
147
+ for i in range(4):
148
+ p = Process(target=producer_worker, args=('mpmc_queue', i, 100))
149
+ p.start()
150
+ producers.append(p)
151
+
152
+ # Wait for completion
153
+ for p in producers:
154
+ p.join()
155
+
156
+ q.close()
157
+ q.unlink()
158
+ ```
159
+
160
+ ### Multi-Consumer Work-Stealing
161
+
162
+ For multiple consumers sharing work from a single queue, use an `AtomicCursor` to enable work-stealing patterns where each item is consumed by exactly one consumer.
163
+
164
+ #### Local Mode (Multi-Threading)
165
+
166
+ ```python
167
+ from threading import Thread
168
+ from slick_queue_py import SlickQueue, AtomicCursor
169
+ import struct
170
+
171
+ def consumer_worker(q, cursor, worker_id, results):
172
+ items_processed = 0
173
+ while True:
174
+ # Atomically claim next item (work-stealing)
175
+ data, size, index = q.read(cursor)
176
+
177
+ if data is None:
178
+ break # No more data
179
+
180
+ # Process the claimed item
181
+ worker, seq = struct.unpack("<I I", data[:8])
182
+ items_processed += 1
183
+
184
+ results[worker_id] = items_processed
185
+
186
+ # Create local queue and cursor
187
+ q = SlickQueue(size=64, element_size=32)
188
+ cursor_buf = bytearray(8)
189
+ cursor = AtomicCursor(cursor_buf, 0)
190
+ cursor.store(0) # Initialize cursor to 0
191
+
192
+ # Producer writes items
193
+ for i in range(100):
194
+ idx = q.reserve()
195
+ data = struct.pack("<I I", 0, i)
196
+ q[idx][:len(data)] = data
197
+ q.publish(idx)
198
+
199
+ # Start multiple consumer threads that share the work
200
+ results = {}
201
+ threads = []
202
+ for i in range(4):
203
+ t = Thread(target=consumer_worker, args=(q, cursor, i, results))
204
+ t.start()
205
+ threads.append(t)
206
+
207
+ # Wait for all consumers
208
+ for t in threads:
209
+ t.join()
210
+
211
+ print(f"Total items processed: {sum(results.values())}")
212
+ q.close()
213
+ ```
214
+
215
+ #### Shared Memory Mode (Multi-Process)
216
+
217
+ ```python
218
+ from multiprocessing import Process, shared_memory
219
+ from slick_queue_py import SlickQueue, AtomicCursor
220
+ import struct
221
+
222
+ def consumer_worker(queue_name, cursor_name, worker_id):
223
+ # Open shared queue and cursor
224
+ q = SlickQueue(name=queue_name, element_size=32)
225
+ cursor_shm = shared_memory.SharedMemory(name=cursor_name)
226
+ cursor = AtomicCursor(cursor_shm.buf, 0)
227
+
228
+ items_processed = 0
229
+ while True:
230
+ # Atomically claim next item (work-stealing)
231
+ data, size, index = q.read(cursor)
232
+
233
+ if data is None:
234
+ break # No more data
235
+
236
+ # Process the claimed item
237
+ worker, seq = struct.unpack("<I I", data[:8])
238
+ items_processed += 1
239
+
240
+ print(f"Worker {worker_id} processed {items_processed} items")
241
+ cursor_shm.close()
242
+ q.close()
243
+
244
+ # Create queue and shared cursor
245
+ q = SlickQueue(name='work_queue', size=64, element_size=32)
246
+ cursor_shm = shared_memory.SharedMemory(name='work_cursor', create=True, size=8)
247
+ cursor = AtomicCursor(cursor_shm.buf, 0)
248
+ cursor.store(0) # Initialize cursor to 0
249
+
250
+ # Producer writes items
251
+ for i in range(100):
252
+ idx = q.reserve()
253
+ data = struct.pack("<I I", 0, i)
254
+ q[idx][:len(data)] = data
255
+ q.publish(idx)
256
+
257
+ # Start multiple consumer processes that share the work
258
+ consumers = []
259
+ for i in range(4):
260
+ p = Process(target=consumer_worker, args=('work_queue', 'work_cursor', i))
261
+ p.start()
262
+ consumers.append(p)
263
+
264
+ # Wait for all consumers
265
+ for p in consumers:
266
+ p.join()
267
+
268
+ cursor_shm.close()
269
+ cursor_shm.unlink()
270
+ q.close()
271
+ q.unlink()
272
+ ```
273
+
274
+ ### C++/Python Interoperability
275
+
276
+ The Python implementation is fully compatible with the C++ [SlickQueue](https://github.com/SlickQuant/slick-queue) library. Python and C++ processes can produce and consume from the same queue with:
277
+
278
+ - **Exact memory layout compatibility**: Binary-compatible with `slick::SlickQueue<T>`
279
+ - **Atomic operation compatibility**: Same 16-byte and 8-byte CAS semantics
280
+ - **Bidirectional communication**: C++ ↔ Python in both directions
281
+ - **Multi-producer support**: Mix C++ and Python producers on the same queue
282
+
283
+ **Platform Support for C++/Python Interop:**
284
+ - ✅ **Linux/macOS**: Full interoperability (both use POSIX `shm_open`)
285
+ - ✅ **Windows**: Full interoperability
286
+ - ✅ **Python-only**: Works on all platforms (Windows/Linux/macOS)
287
+
288
+ #### Basic C++ → Python Example
289
+
290
+ **C++ Producer:**
291
+ ```cpp
292
+ #include "queue.h"
293
+
294
+ int main() {
295
+ // Open existing queue created by Python
296
+ slick::SlickQueue<uint8_t> q(32, "shared_queue");
297
+
298
+ for (int i = 0; i < 100; i++) {
299
+ auto idx = q.reserve();
300
+ uint32_t value = i;
301
+ std::memcpy(q[idx], &value, sizeof(value));
302
+ q.publish(idx);
303
+ }
304
+ }
305
+ ```
306
+
307
+ **Python Consumer:**
308
+ ```python
309
+ from slick_queue_py import SlickQueue
310
+ import struct
311
+
312
+ # Create queue that C++ will write to
313
+ q = SlickQueue(name='shared_queue', size=64, element_size=32)
314
+
315
+ read_index = 0
316
+ for _ in range(100):
317
+ data, size, read_index = q.read(read_index)
318
+ if data is not None:
319
+ value = struct.unpack("<I", data[:4])[0]
320
+ print(f"Received from C++: {value}")
321
+
322
+ q.close()
323
+ q.unlink()
324
+ ```
325
+
326
+ #### Building C++ Programs
327
+
328
+ To use the C++ SlickQueue library with your Python queues:
329
+
330
+ ```bash
331
+ # Clone the C++ library
332
+ git clone https://github.com/SlickQuant/slick-queue.git
333
+
334
+ # Build your C++ program
335
+ g++ -std=c++17 -I slick-queue/include my_program.cpp -o my_program
336
+ ```
337
+
338
+ Or use CMake (see [CMakeLists.txt](CMakeLists.txt) for reference):
339
+
340
+ ```cmake
341
+ include(FetchContent)
342
+ FetchContent_Declare(
343
+ slick-queue
344
+ GIT_REPOSITORY https://github.com/SlickQuant/slick-queue.git
345
+ GIT_TAG main
346
+ )
347
+ FetchContent_MakeAvailable(slick-queue)
348
+
349
+ add_executable(my_program my_program.cpp)
350
+ target_link_libraries(my_program PRIVATE slick::queue)
351
+ ```
352
+
353
+ See [tests/test_interop.py](tests/test_interop.py) and [tests/cpp_*.cpp](tests/) for comprehensive examples.
354
+
355
+ ## API Reference
356
+
357
+ ### SlickQueue
358
+
359
+ #### `__init__(*, name=None, size=None, element_size=None)`
360
+
361
+ Create a queue in local memory or shared memory mode.
362
+
363
+ **Parameters:**
364
+ - `name` (str, optional): Shared memory segment name. If None, uses local memory mode (single process).
365
+ - `size` (int): Queue capacity (must be power of 2). Required for local mode or when creating shared memory.
366
+ - `element_size` (int, required): Size of each element in bytes
367
+
368
+ **Examples:**
369
+ ```python
370
+ # Local memory mode (single process)
371
+ q = SlickQueue(size=256, element_size=64)
372
+
373
+ # Create new shared memory queue
374
+ q = SlickQueue(name='my_queue', size=256, element_size=64)
375
+
376
+ # Open existing shared memory queue
377
+ q2 = SlickQueue(name='my_queue', element_size=64)
378
+ ```
379
+
380
+ #### `reserve(n=1) -> int`
381
+
382
+ Reserve `n` elements for writing. **Multi-producer safe** using atomic CAS.
383
+
384
+ **Parameters:**
385
+ - `n` (int): Number of elements to reserve (default 1)
386
+
387
+ **Returns:**
388
+ - `int`: Starting index of reserved space
389
+
390
+ **Example:**
391
+ ```python
392
+ idx = q.reserve(1) # Reserve 1 elements
393
+ ```
394
+
395
+ #### `publish(index, n=1)`
396
+
397
+ Publish data written to reserved space. Uses atomic operations with release memory ordering.
398
+
399
+ **Parameters:**
400
+ - `index` (int): Index returned by `reserve()`
401
+ - `n` (int): Number of elements to publish (default 1)
402
+
403
+ **Example:**
404
+ ```python
405
+ idx = q.reserve()
406
+ q[idx][:data_len] = data
407
+ q.publish(idx)
408
+ ```
409
+
410
+ #### `read(read_index) -> Tuple[Optional[bytes], int, int]` or `read(atomic_cursor) -> Tuple[Optional[bytes], int]`
411
+
412
+ Read from queue with two modes:
413
+
414
+ **Single-Consumer Mode** (when `read_index` is `int`):
415
+ Uses a plain int cursor for single-consumer scenarios. Returns the new read_index.
416
+
417
+ **Multi-Consumer Mode** (when `read_index` is `AtomicCursor`):
418
+ Uses an atomic cursor for work-stealing/load-balancing across multiple consumers.
419
+ Each consumer atomically claims items, ensuring each item is consumed exactly once.
420
+
421
+ **Parameters:**
422
+ - `read_index` (int or AtomicCursor): Current read position or shared atomic cursor
423
+
424
+ **Returns:**
425
+ - Single-consumer: `Tuple[Optional[bytes], int, int]` - (data or None, size, new_read_index)
426
+ - Multi-consumer: `Tuple[Optional[bytes], int]` - (data or None, size)
427
+
428
+ **API Difference from C++:**
429
+ Unlike C++ where `read_index` is updated by reference, the Python single-consumer version returns the new index.
430
+ This is the Pythonic pattern since Python doesn't have true pass-by-reference.
431
+
432
+ ```python
433
+ # Python single-consumer (returns new index)
434
+ data, size, read_index = q.read(read_index)
435
+
436
+ # Python multi-consumer (atomic cursor)
437
+ from slick_queue_py import AtomicCursor
438
+ cursor = AtomicCursor(cursor_shm.buf, 0)
439
+ data, size, index = q.read(cursor) # Atomically claim next item
440
+
441
+ # C++ (updates by reference for both)
442
+ auto [data, size] = queue.read(read_index); // read_index modified in-place
443
+ auto [data, size] = queue.read(atomic_cursor); // atomic_cursor modified in-place
444
+ ```
445
+
446
+ **Single-Consumer Example:**
447
+ ```python
448
+ read_index = 0
449
+ while True:
450
+ data, size, read_index = q.read(read_index)
451
+ if data is not None:
452
+ process(data)
453
+ ```
454
+
455
+ **Multi-Consumer Example (Local Mode - Threading):**
456
+ ```python
457
+ from slick_queue_py import AtomicCursor
458
+
459
+ # Create local cursor for multi-threading
460
+ cursor_buf = bytearray(8)
461
+ cursor = AtomicCursor(cursor_buf, 0)
462
+ cursor.store(0)
463
+
464
+ # Multiple threads can share this cursor
465
+ while True:
466
+ data, size, index = q.read(cursor) # Each thread atomically claims items
467
+ if data is not None:
468
+ process(data)
469
+ ```
470
+
471
+ **Multi-Consumer Example (Shared Memory Mode - Multiprocess):**
472
+ ```python
473
+ from multiprocessing import shared_memory
474
+ from slick_queue_py import AtomicCursor
475
+
476
+ # Create shared cursor for multi-process
477
+ cursor_shm = shared_memory.SharedMemory(name='cursor', create=True, size=8)
478
+ cursor = AtomicCursor(cursor_shm.buf, 0)
479
+ cursor.store(0)
480
+
481
+ # Multiple processes can share this cursor
482
+ while True:
483
+ data, size, index = q.read(cursor) # Each process atomically claims items
484
+ if data is not None:
485
+ process(data)
486
+ ```
487
+
488
+ #### `read_last() -> Tuple[Optional[bytes], int]`
489
+
490
+ Read the most recently published item.
491
+
492
+ **Returns:**
493
+ - `Tuple[Optional[bytes], int]`: Tuple of (data, size)
494
+ - `data`: Last published data or None if queue is empty
495
+ - `size`: Number of slots the item occupies (0 if queue is empty)
496
+
497
+ **Example:**
498
+ ```python
499
+ data, size = q.read_last()
500
+ if data is not None:
501
+ print(f"Last item: {data[:size * element_size]}")
502
+ ```
503
+
504
+ #### `__getitem__(index) -> memoryview`
505
+
506
+ Get memoryview for writing to reserved slot.
507
+
508
+ **Parameters:**
509
+ - `index` (int): Index from `reserve()`
510
+
511
+ **Returns:**
512
+ - `memoryview`: View into the data array
513
+
514
+ #### `close()`
515
+
516
+ Close the shared memory connection. Always call this before unlinking.
517
+
518
+ #### `unlink()`
519
+
520
+ Delete the shared memory segment. Only call from the process that created it.
521
+
522
+ ### AtomicCursor
523
+
524
+ The `AtomicCursor` class enables multi-consumer work-stealing patterns by providing an atomic read cursor that multiple consumers can coordinate through. Works in both local mode (multi-threading) and shared memory mode (multi-process).
525
+
526
+ #### `__init__(buffer, offset=0)`
527
+
528
+ Create an atomic cursor wrapper around a memory buffer.
529
+
530
+ **Parameters:**
531
+ - `buffer` (memoryview or bytearray): Memory buffer
532
+ - For local mode (threading): use `bytearray(8)`
533
+ - For shared memory mode (multiprocess): use `SharedMemory.buf`
534
+ - `offset` (int, optional): Byte offset in buffer (default 0)
535
+
536
+ **Local Mode Example (Multi-Threading):**
537
+ ```python
538
+ from slick_queue_py import AtomicCursor
539
+
540
+ # Create local cursor for multi-threading
541
+ cursor_buf = bytearray(8)
542
+ cursor = AtomicCursor(cursor_buf, 0)
543
+ cursor.store(0) # Initialize to 0
544
+ ```
545
+
546
+ **Shared Memory Mode Example (Multi-Process):**
547
+ ```python
548
+ from multiprocessing import shared_memory
549
+ from slick_queue_py import AtomicCursor
550
+
551
+ # Create shared cursor for multi-process
552
+ cursor_shm = shared_memory.SharedMemory(name='cursor', create=True, size=8)
553
+ cursor = AtomicCursor(cursor_shm.buf, 0)
554
+ cursor.store(0) # Initialize to 0
555
+ ```
556
+
557
+ #### `load() -> int`
558
+
559
+ Load the cursor value with atomic acquire semantics.
560
+
561
+ **Returns:**
562
+ - `int`: Current cursor value
563
+
564
+ #### `store(value)`
565
+
566
+ Store a new cursor value with atomic release semantics.
567
+
568
+ **Parameters:**
569
+ - `value` (int): New cursor value
570
+
571
+ #### `compare_exchange_weak(expected, desired) -> Tuple[bool, int]`
572
+
573
+ Atomically compare and swap the cursor value.
574
+
575
+ **Parameters:**
576
+ - `expected` (int): Expected cursor value
577
+ - `desired` (int): Desired cursor value
578
+
579
+ **Returns:**
580
+ - `Tuple[bool, int]`: (success, actual_value)
581
+
582
+ **Note:** This is used internally by `read(atomic_cursor)` and typically doesn't need to be called directly.
583
+
584
+ ## Memory Layout
585
+
586
+ The queue uses the same memory layout as C++ `slick::SlickQueue<T>`:
587
+
588
+ ```
589
+ Offset | Size | Content
590
+ -------|---------------|------------------
591
+ 0 | 16 bytes | reserved_info (atomic)
592
+ | 0-7 | uint64_t index_
593
+ | 8-11 | uint32_t size_
594
+ | 12-15 | padding
595
+ 16 | 4 bytes | uint32_t size_ (queue capacity)
596
+ 20 | 44 bytes | padding (to 64 bytes)
597
+ 64 | 16*size bytes | slot array
598
+ | per slot: |
599
+ | 0-7 | uint64_t data_index (atomic)
600
+ | 8-11 | uint32_t size
601
+ | 12-15 | padding
602
+ 64+... | elem*size | data array
603
+ ```
604
+
605
+ ## Platform Support
606
+
607
+ ### Fully Supported (Lock-Free)
608
+ - **Windows x86-64**: Uses C++ extension (`atomic_ops_ext.pyd`) with `std::atomic`
609
+ - **Linux x86-64**: Uses C++ extension (`atomic_ops_ext.so`) with `std::atomic`, fallback to `libatomic`
610
+ - **macOS x86-64**: Uses C++ extension (`atomic_ops_ext.so`) with `std::atomic`, fallback to compiler builtins
611
+
612
+ **Platform-specific atomic operation implementations:**
613
+ - **All platforms**: The `atomic_ops_ext` C++ extension is now used on all platforms for the most reliable cross-process atomic operations
614
+ - **Fallback support**: Linux/macOS can fall back to `libatomic` or compiler builtins if the extension isn't available
615
+
616
+ ### Building and Installation
617
+
618
+ The C++ extension is built automatically during installation:
619
+
620
+ ```bash
621
+ # Install with automatic extension build
622
+ pip install -e .
623
+
624
+ # Or build manually first
625
+ python setup.py build_ext --inplace
626
+ pip install -e .
627
+ ```
628
+
629
+ **Build requirements:**
630
+ - **Windows**: Visual Studio 2017+ or MSVC build tools
631
+ - **Linux**: GCC 5+ or Clang 3.8+
632
+ - **macOS**: Xcode command line tools (clang)
633
+ - **All platforms**: Python development headers (included with standard Python installation)
634
+
635
+ The extension will be built as:
636
+ - Windows: `atomic_ops_ext.cp3XX-win_amd64.pyd`
637
+ - Linux: `atomic_ops_ext.cpython-3XX-x86_64-linux-gnu.so`
638
+ - macOS: `atomic_ops_ext.cpython-3XX-darwin.so`
639
+
640
+ (where `XX` is your Python version, e.g., `312` for Python 3.12)
641
+
642
+ ### Requirements for Lock-Free Operation
643
+
644
+ **All platforms require hardware support for lock-free atomic operations:**
645
+ - x86-64 CPU with CMPXCHG16B instruction (Intel since ~2006, AMD since ~2007)
646
+ - For C++/Python interoperability, both must use the same atomic hardware instructions
647
+ - No fallback implementation exists - lock-free atomics are mandatory for multi-producer queues
648
+
649
+ **Why no fallback?**
650
+ The queue requires true atomic CAS operations for correctness in multi-producer scenarios. A lock-based fallback would:
651
+ - Break binary compatibility with C++ SlickQueue
652
+ - Fail to work correctly in multi-process scenarios (Python ↔ C++)
653
+ - Not provide the performance guarantees of a lock-free queue
654
+
655
+ ### Not Supported
656
+ - 32-bit platforms (no 16-byte atomic CAS)
657
+ - ARM64 (requires ARMv8.1+ CASP instruction - future support planned)
658
+ - CPUs without CMPXCHG16B support (very old x86-64 CPUs from before 2006)
659
+
660
+ Check platform support:
661
+ ```python
662
+ from atomic_ops import check_platform_support
663
+
664
+ supported, message = check_platform_support()
665
+ print(f"Platform: {message}")
666
+ ```
667
+
668
+ ## Performance
669
+
670
+ Typical throughput on modern hardware (x86-64):
671
+ - Single producer/consumer: ~5-10M items/sec
672
+ - 4 producers/1 consumer: ~3-8M items/sec
673
+ - High contention (8+ producers): ~1-5M items/sec
674
+
675
+ Performance depends on:
676
+ - CPU cache topology
677
+ - Queue size (smaller = more contention)
678
+ - Item size
679
+ - Memory bandwidth
680
+
681
+ ## Advanced Usage
682
+
683
+ ### Batch Operations
684
+
685
+ Reserve and publish multiple elements at once:
686
+
687
+ ```python
688
+ # Reserve 10 elements
689
+ idx = q.reserve(10)
690
+
691
+ # Write data to each slot
692
+ for i in range(10):
693
+ element = q[idx + i]
694
+ element[:data_len] = data[i]
695
+
696
+ # Publish all 10 elements at once
697
+ q.publish(idx, 10)
698
+ ```
699
+
700
+ ### Wrap-Around Handling
701
+
702
+ The queue automatically handles ring buffer wrap-around:
703
+
704
+ ```python
705
+ # Queue with size=8
706
+ q = SlickQueue(name='wrap_test', size=8, element_size=32)
707
+
708
+ # Reserve more items than queue size - wraps automatically
709
+ for i in range(100):
710
+ idx = q.reserve()
711
+ q[idx][:4] = struct.pack("<I", i)
712
+ q.publish(idx)
713
+ ```
714
+
715
+ ## Testing
716
+
717
+ ### Python Tests
718
+
719
+ Run the Python test suite:
720
+
721
+ ```bash
722
+ # Atomic operations tests (clean output)
723
+ python tests/run_test.py tests/test_atomic_ops.py
724
+
725
+ # Basic queue tests (clean output)
726
+ python tests/run_test.py tests/test_queue.py
727
+
728
+ # Local mode tests
729
+ python tests/test_local_mode.py
730
+
731
+ # Multi-producer/consumer tests
732
+ # Note: If tests fail with "File exists" errors, run cleanup first:
733
+ python tests/cleanup_shm.py
734
+ python tests/test_multi_producer.py
735
+ ```
736
+
737
+ ### C++/Python Interoperability Tests
738
+
739
+ Build and run comprehensive interop tests:
740
+
741
+ ```bash
742
+ # 1. Build C++ test programs with CMake
743
+ mkdir build && cd build
744
+ cmake ..
745
+ cmake --build .
746
+
747
+ # 2. Run interoperability test suite
748
+ cd ..
749
+ python tests/test_interop.py
750
+
751
+ # Or run specific tests:
752
+ python tests/test_interop.py --test python_producer_cpp_consumer
753
+ python tests/test_interop.py --test cpp_producer_python_consumer
754
+ python tests/test_interop.py --test multi_producer_interop
755
+ python tests/test_interop.py --test stress_interop
756
+ python tests/test_interop.py --test cpp_shm_creation
757
+ ```
758
+
759
+ The interop tests verify:
760
+ - **Python → C++**: Python producers write data that C++ consumers read
761
+ - **C++ → Python**: C++ producers write data that Python consumers read
762
+ - **Mixed Multi-Producer**: Multiple C++ and Python producers writing to same queue
763
+ - **Stress Test**: High-volume bidirectional communication
764
+ - **SHM created by C++**: C++ producers create the SHM and write data that Python consumers read
765
+
766
+ **Note on Windows**: If child processes from previous test runs don't terminate properly, you may need to manually kill orphaned python.exe processes before running tests again.
767
+
768
+ ## Known Issues
769
+
770
+ 1. **Buffer Cleanup Warning**: You may see a `BufferError: cannot close exported pointers exist` warning during garbage collection. This is a **harmless warning** caused by Python's ctypes creating internal buffer references that persist beyond explicit cleanup. It occurs during program exit and **does not affect functionality, performance, or correctness**. The queue works perfectly despite this warning.
771
+
772
+ 2. **UserWarning**: On Linux you may see `UserWarning: resource_tracker: There appear to be 4 leaked shared_memory objects to clean up at shutdown`. This is a **harmless warning** caused by Python's ctypes creating internal buffer references that persist beyond explicit cleanup. It occurs during program exit and **does not affect functionality, performance, or correctness**. The queue works perfectly despite this warning.
773
+
774
+ ## Architecture
775
+
776
+ ### Atomic Operations
777
+
778
+ The queue uses platform-specific atomic operations:
779
+
780
+ - **8-byte CAS**: For `reserved_info` structure (multi-producer coordination)
781
+ - **8-byte CAS**: For slot `data_index` fields (publish/read synchronization)
782
+ - **Memory barriers**: Acquire/release semantics for proper ordering
783
+
784
+ ### Memory Ordering
785
+
786
+ - `reserve()`: Uses `memory_order_release` on successful CAS
787
+ - `publish()`: Uses `memory_order_release` for data_index store
788
+ - `read()`: Uses `memory_order_acquire` for data_index load
789
+
790
+ This ensures:
791
+ - All writes to data are visible before publishing
792
+ - All reads of data happen after acquiring the index
793
+ - No reordering that could cause data races
794
+
795
+ ## Comparison with C++
796
+
797
+ | Feature | C++ | Python |
798
+ |---------|-----|--------|
799
+ | Multi-producer | ✅ | ✅ |
800
+ | Multi-consumer (work-stealing) | ✅ | ✅ (with AtomicCursor) |
801
+ | Lock-free (x86-64) | ✅ | ✅ |
802
+ | Memory layout | Reference | Matches exactly |
803
+ | Performance | Baseline | ~50-80% of C++ |
804
+ | Ease of use | Medium | High |
805
+ | read(int) single-consumer | ✅ | ✅ |
806
+ | read(atomic cursor) multi-consumer | ✅ | ✅ |
807
+
808
+ ## Contributing
809
+
810
+ Issues and pull requests welcome at [SlickQuant/slick-queue-py](https://github.com/SlickQuant/slick-queue-py).
811
+
812
+ ## License
813
+
814
+ MIT License - see LICENSE file for details.
815
+
816
+ **Made with ⚡ by [SlickQuant](https://github.com/SlickQuant)**