slick-queue-py 1.0.0__cp312-cp312-win_amd64.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,805 @@
1
+ Metadata-Version: 2.4
2
+ Name: slick_queue_py
3
+ Version: 1.0.0
4
+ Summary: Lock-free MPMC queue with C++ interoperability via shared memory
5
+ Home-page: https://github.com/SlickQuant/slick_queue_py
6
+ Author: Slick Quant
7
+ Author-email: Slick Quant <slickquant@slickquant.com>
8
+ License: MIT
9
+ Project-URL: Homepage, https://github.com/SlickQuant/slick_queue_py
10
+ Project-URL: Documentation, https://github.com/SlickQuant/slick_queue_py#readme
11
+ Project-URL: Repository, https://github.com/SlickQuant/slick_queue_py
12
+ Project-URL: Bug Tracker, https://github.com/SlickQuant/slick_queue_py/issues
13
+ Keywords: queue,lock-free,atomic,shared-memory,ipc,multiprocessing,mpmc
14
+ Classifier: Development Status :: 4 - Beta
15
+ Classifier: Intended Audience :: Developers
16
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
17
+ Classifier: Topic :: System :: Distributed Computing
18
+ Classifier: License :: OSI Approved :: MIT License
19
+ Classifier: Programming Language :: Python :: 3
20
+ Classifier: Programming Language :: Python :: 3.8
21
+ Classifier: Programming Language :: Python :: 3.9
22
+ Classifier: Programming Language :: Python :: 3.10
23
+ Classifier: Programming Language :: Python :: 3.11
24
+ Classifier: Programming Language :: Python :: 3.12
25
+ Classifier: Programming Language :: C++
26
+ Classifier: Operating System :: Microsoft :: Windows
27
+ Classifier: Operating System :: POSIX :: Linux
28
+ Classifier: Operating System :: MacOS
29
+ Requires-Python: >=3.8
30
+ Description-Content-Type: text/markdown
31
+ License-File: LICENSE
32
+ Dynamic: author
33
+ Dynamic: home-page
34
+ Dynamic: license-file
35
+ Dynamic: requires-python
36
+
37
+ # slick_queue_py
38
+
39
+ Python implementation of SlickQueue - a lock-free multi-producer multi-consumer (MPMC) queue with C++ interoperability through shared memory.
40
+
41
+ This is the Python binding for the [SlickQueue C++ library](https://github.com/SlickQuant/slick_queue). The Python implementation maintains exact binary compatibility with the C++ version, enabling seamless interprocess communication between Python and C++ applications.
42
+
43
+ ## Features
44
+
45
+ - **Dual Mode Operation**:
46
+ - **Local Memory Mode**: In-process queue using local memory (no shared memory overhead)
47
+ - **Shared Memory Mode**: Inter-process queue for interprocess communication
48
+ - **Lock-Free Multi-Producer Multi-Consumer**: True MPMC support using atomic operations
49
+ - **C++/Python Interoperability**: Python and C++ processes can share the same queue
50
+ - **Cross-Platform**: Windows and Linux/macOS support (x86-64)
51
+ - **Memory Layout Compatible**: Exact binary compatibility with C++ `slick::SlickQueue<T>`
52
+ - **High Performance**: Hardware atomic operations for minimal overhead
53
+
54
+ ## Requirements
55
+
56
+ - Python 3.8+ (uses `multiprocessing.shared_memory`)
57
+ - 64-bit platform
58
+ - For true lock-free operation: x86-64 CPU with CMPXCHG16B support (most CPUs since 2006)
59
+
60
+ ## Installation
61
+
62
+ ```bash
63
+ pip install -e .
64
+ ```
65
+
66
+ Or just copy the Python files to your project.
67
+
68
+ ## Quick Start
69
+
70
+ ### Local Memory Mode (Single Process)
71
+
72
+ ```python
73
+ from slick_queue_py import SlickQueue
74
+
75
+ # Create a queue in local memory (no shared memory)
76
+ q = SlickQueue(size=1024, element_size=256)
77
+
78
+ # Producer: Reserve a slot, write data, and publish
79
+ idx = q.reserve()
80
+ buf = q[idx]
81
+ buf[:len(b'hello')] = b'hello'
82
+ q.publish(idx)
83
+
84
+ # Consumer: Read data
85
+ read_index = 0
86
+ data, size, read_index = q.read(read_index)
87
+ if data is not None:
88
+ print(f"Received: {data[:size]}")
89
+
90
+ q.close() # unlink() does nothing for local mode
91
+ ```
92
+
93
+ ### Shared Memory Mode (Multi-Process)
94
+
95
+ ```python
96
+ from slick_queue_py import SlickQueue
97
+
98
+ # Create a new shared memory queue (size must be power of two)
99
+ q = SlickQueue(name='my_queue', size=1024, element_size=256)
100
+
101
+ # Producer: Reserve a slot, write data, and publish
102
+ idx = q.reserve()
103
+ buf = q[idx]
104
+ buf[:len(b'hello')] = b'hello'
105
+ q.publish(idx)
106
+
107
+ # Consumer: Read data
108
+ read_index = 0
109
+ data, size, read_index = q.read(read_index)
110
+ if data is not None:
111
+ print(f"Received: {data[:size]}")
112
+
113
+ q.close()
114
+ q.unlink() # Delete shared memory segment
115
+ ```
116
+
117
+ ### Multi-Producer Usage
118
+
119
+ ```python
120
+ from multiprocessing import Process
121
+ from slick_queue_py import SlickQueue
122
+ import struct
123
+
124
+ def producer_worker(queue_name, worker_id, num_items):
125
+ # Open existing queue
126
+ q = SlickQueue(name=queue_name, element_size=32)
127
+
128
+ for i in range(num_items):
129
+ # Reserve slot (thread-safe with atomic CAS)
130
+ idx = q.reserve(1)
131
+
132
+ # Write unique data
133
+ data = struct.pack("<I I", worker_id, i)
134
+ slot = q[idx]
135
+ slot[:len(data)] = data
136
+
137
+ # Publish (makes data visible to consumers)
138
+ q.publish(idx, 1)
139
+
140
+ q.close()
141
+
142
+ # Create queue
143
+ q = SlickQueue(name='mpmc_queue', size=64, element_size=32)
144
+
145
+ # Start multiple producers
146
+ producers = []
147
+ for i in range(4):
148
+ p = Process(target=producer_worker, args=('mpmc_queue', i, 100))
149
+ p.start()
150
+ producers.append(p)
151
+
152
+ # Wait for completion
153
+ for p in producers:
154
+ p.join()
155
+
156
+ q.close()
157
+ q.unlink()
158
+ ```
159
+
160
+ ### Multi-Consumer Work-Stealing
161
+
162
+ For multiple consumers sharing work from a single queue, use an `AtomicCursor` to enable work-stealing patterns where each item is consumed by exactly one consumer.
163
+
164
+ #### Local Mode (Multi-Threading)
165
+
166
+ ```python
167
+ from threading import Thread
168
+ from slick_queue_py import SlickQueue, AtomicCursor
169
+ import struct
170
+
171
+ def consumer_worker(q, cursor, worker_id, results):
172
+ items_processed = 0
173
+ while True:
174
+ # Atomically claim next item (work-stealing)
175
+ data, size = q.read(cursor)
176
+
177
+ if data is None:
178
+ break # No more data
179
+
180
+ # Process the claimed item
181
+ worker, seq = struct.unpack("<I I", data[:8])
182
+ items_processed += 1
183
+
184
+ results[worker_id] = items_processed
185
+
186
+ # Create local queue and cursor
187
+ q = SlickQueue(size=64, element_size=32)
188
+ cursor_buf = bytearray(8)
189
+ cursor = AtomicCursor(cursor_buf, 0)
190
+ cursor.store(0) # Initialize cursor to 0
191
+
192
+ # Producer writes items
193
+ for i in range(100):
194
+ idx = q.reserve()
195
+ data = struct.pack("<I I", 0, i)
196
+ q[idx][:len(data)] = data
197
+ q.publish(idx)
198
+
199
+ # Start multiple consumer threads that share the work
200
+ results = {}
201
+ threads = []
202
+ for i in range(4):
203
+ t = Thread(target=consumer_worker, args=(q, cursor, i, results))
204
+ t.start()
205
+ threads.append(t)
206
+
207
+ # Wait for all consumers
208
+ for t in threads:
209
+ t.join()
210
+
211
+ print(f"Total items processed: {sum(results.values())}")
212
+ q.close()
213
+ ```
214
+
215
+ #### Shared Memory Mode (Multi-Process)
216
+
217
+ ```python
218
+ from multiprocessing import Process, shared_memory
219
+ from slick_queue_py import SlickQueue, AtomicCursor
220
+ import struct
221
+
222
+ def consumer_worker(queue_name, cursor_name, worker_id):
223
+ # Open shared queue and cursor
224
+ q = SlickQueue(name=queue_name, element_size=32)
225
+ cursor_shm = shared_memory.SharedMemory(name=cursor_name)
226
+ cursor = AtomicCursor(cursor_shm.buf, 0)
227
+
228
+ items_processed = 0
229
+ while True:
230
+ # Atomically claim next item (work-stealing)
231
+ data, size = q.read(cursor)
232
+
233
+ if data is None:
234
+ break # No more data
235
+
236
+ # Process the claimed item
237
+ worker, seq = struct.unpack("<I I", data[:8])
238
+ items_processed += 1
239
+
240
+ print(f"Worker {worker_id} processed {items_processed} items")
241
+ cursor_shm.close()
242
+ q.close()
243
+
244
+ # Create queue and shared cursor
245
+ q = SlickQueue(name='work_queue', size=64, element_size=32)
246
+ cursor_shm = shared_memory.SharedMemory(name='work_cursor', create=True, size=8)
247
+ cursor = AtomicCursor(cursor_shm.buf, 0)
248
+ cursor.store(0) # Initialize cursor to 0
249
+
250
+ # Producer writes items
251
+ for i in range(100):
252
+ idx = q.reserve()
253
+ data = struct.pack("<I I", 0, i)
254
+ q[idx][:len(data)] = data
255
+ q.publish(idx)
256
+
257
+ # Start multiple consumer processes that share the work
258
+ consumers = []
259
+ for i in range(4):
260
+ p = Process(target=consumer_worker, args=('work_queue', 'work_cursor', i))
261
+ p.start()
262
+ consumers.append(p)
263
+
264
+ # Wait for all consumers
265
+ for p in consumers:
266
+ p.join()
267
+
268
+ cursor_shm.close()
269
+ cursor_shm.unlink()
270
+ q.close()
271
+ q.unlink()
272
+ ```
273
+
274
+ ### C++/Python Interoperability
275
+
276
+ The Python implementation is fully compatible with the C++ [SlickQueue](https://github.com/SlickQuant/slick_queue) library. Python and C++ processes can produce and consume from the same queue with:
277
+
278
+ - **Exact memory layout compatibility**: Binary-compatible with `slick::SlickQueue<T>`
279
+ - **Atomic operation compatibility**: Same 16-byte and 8-byte CAS semantics
280
+ - **Bidirectional communication**: C++ ↔ Python in both directions
281
+ - **Multi-producer support**: Mix C++ and Python producers on the same queue
282
+
283
+ **Platform Support for C++/Python Interop:**
284
+ - ✅ **Linux/macOS**: Full interoperability (both use POSIX `shm_open`)
285
+ - ✅ **Windows**: Full interoperability
286
+ - ✅ **Python-only**: Works on all platforms (Windows/Linux/macOS)
287
+
288
+ #### Basic C++ → Python Example
289
+
290
+ **C++ Producer:**
291
+ ```cpp
292
+ #include "queue.h"
293
+
294
+ int main() {
295
+ // Open existing queue created by Python
296
+ slick::SlickQueue<uint8_t> q(32, "shared_queue");
297
+
298
+ for (int i = 0; i < 100; i++) {
299
+ auto idx = q.reserve();
300
+ uint32_t value = i;
301
+ std::memcpy(q[idx], &value, sizeof(value));
302
+ q.publish(idx);
303
+ }
304
+ }
305
+ ```
306
+
307
+ **Python Consumer:**
308
+ ```python
309
+ from slick_queue_py import SlickQueue
310
+ import struct
311
+
312
+ # Create queue that C++ will write to
313
+ q = SlickQueue(name='shared_queue', size=64, element_size=32)
314
+
315
+ read_index = 0
316
+ for _ in range(100):
317
+ data, size, read_index = q.read(read_index)
318
+ if data is not None:
319
+ value = struct.unpack("<I", data[:4])[0]
320
+ print(f"Received from C++: {value}")
321
+
322
+ q.close()
323
+ q.unlink()
324
+ ```
325
+
326
+ #### Building C++ Programs
327
+
328
+ To use the C++ SlickQueue library with your Python queues:
329
+
330
+ ```bash
331
+ # Clone the C++ library
332
+ git clone https://github.com/SlickQuant/slick_queue.git
333
+
334
+ # Build your C++ program
335
+ g++ -std=c++17 -I slick_queue/include my_program.cpp -o my_program
336
+ ```
337
+
338
+ Or use CMake (see [CMakeLists.txt](CMakeLists.txt) for reference):
339
+
340
+ ```cmake
341
+ include(FetchContent)
342
+ FetchContent_Declare(
343
+ slick_queue
344
+ GIT_REPOSITORY https://github.com/SlickQuant/slick_queue.git
345
+ GIT_TAG main
346
+ )
347
+ FetchContent_MakeAvailable(slick_queue)
348
+
349
+ add_executable(my_program my_program.cpp)
350
+ target_link_libraries(my_program PRIVATE slick_queue)
351
+ ```
352
+
353
+ See [tests/test_interop.py](tests/test_interop.py) and [tests/cpp_*.cpp](tests/) for comprehensive examples.
354
+
355
+ ## API Reference
356
+
357
+ ### SlickQueue
358
+
359
+ #### `__init__(*, name=None, size=None, element_size=None)`
360
+
361
+ Create a queue in local memory or shared memory mode.
362
+
363
+ **Parameters:**
364
+ - `name` (str, optional): Shared memory segment name. If None, uses local memory mode (single process).
365
+ - `size` (int): Queue capacity (must be power of 2). Required for local mode or when creating shared memory.
366
+ - `element_size` (int, required): Size of each element in bytes
367
+
368
+ **Examples:**
369
+ ```python
370
+ # Local memory mode (single process)
371
+ q = SlickQueue(size=256, element_size=64)
372
+
373
+ # Create new shared memory queue
374
+ q = SlickQueue(name='my_queue', size=256, element_size=64)
375
+
376
+ # Open existing shared memory queue
377
+ q2 = SlickQueue(name='my_queue', element_size=64)
378
+ ```
379
+
380
+ #### `reserve(n=1) -> int`
381
+
382
+ Reserve `n` elements for writing. **Multi-producer safe** using atomic CAS.
383
+
384
+ **Parameters:**
385
+ - `n` (int): Number of elements to reserve (default 1)
386
+
387
+ **Returns:**
388
+ - `int`: Starting index of reserved space
389
+
390
+ **Example:**
391
+ ```python
392
+ idx = q.reserve(1) # Reserve 1 elements
393
+ ```
394
+
395
+ #### `publish(index, n=1)`
396
+
397
+ Publish data written to reserved space. Uses atomic operations with release memory ordering.
398
+
399
+ **Parameters:**
400
+ - `index` (int): Index returned by `reserve()`
401
+ - `n` (int): Number of elements to publish (default 1)
402
+
403
+ **Example:**
404
+ ```python
405
+ idx = q.reserve()
406
+ q[idx][:data_len] = data
407
+ q.publish(idx)
408
+ ```
409
+
410
+ #### `read(read_index) -> Tuple[Optional[bytes], int, int]` or `read(atomic_cursor) -> Tuple[Optional[bytes], int]`
411
+
412
+ Read from queue with two modes:
413
+
414
+ **Single-Consumer Mode** (when `read_index` is `int`):
415
+ Uses a plain int cursor for single-consumer scenarios. Returns the new read_index.
416
+
417
+ **Multi-Consumer Mode** (when `read_index` is `AtomicCursor`):
418
+ Uses an atomic cursor for work-stealing/load-balancing across multiple consumers.
419
+ Each consumer atomically claims items, ensuring each item is consumed exactly once.
420
+
421
+ **Parameters:**
422
+ - `read_index` (int or AtomicCursor): Current read position or shared atomic cursor
423
+
424
+ **Returns:**
425
+ - Single-consumer: `Tuple[Optional[bytes], int, int]` - (data or None, size, new_read_index)
426
+ - Multi-consumer: `Tuple[Optional[bytes], int]` - (data or None, size)
427
+
428
+ **API Difference from C++:**
429
+ Unlike C++ where `read_index` is updated by reference, the Python single-consumer version returns the new index.
430
+ This is the Pythonic pattern since Python doesn't have true pass-by-reference.
431
+
432
+ ```python
433
+ # Python single-consumer (returns new index)
434
+ data, size, read_index = q.read(read_index)
435
+
436
+ # Python multi-consumer (atomic cursor)
437
+ from slick_queue_py import AtomicCursor
438
+ cursor = AtomicCursor(cursor_shm.buf, 0)
439
+ data, size = q.read(cursor) # Atomically claim next item
440
+
441
+ # C++ (updates by reference for both)
442
+ auto [data, size] = queue.read(read_index); // read_index modified in-place
443
+ auto [data, size] = queue.read(atomic_cursor); // atomic_cursor modified in-place
444
+ ```
445
+
446
+ **Single-Consumer Example:**
447
+ ```python
448
+ read_index = 0
449
+ while True:
450
+ data, size, read_index = q.read(read_index)
451
+ if data is not None:
452
+ process(data)
453
+ ```
454
+
455
+ **Multi-Consumer Example (Local Mode - Threading):**
456
+ ```python
457
+ from slick_queue_py import AtomicCursor
458
+
459
+ # Create local cursor for multi-threading
460
+ cursor_buf = bytearray(8)
461
+ cursor = AtomicCursor(cursor_buf, 0)
462
+ cursor.store(0)
463
+
464
+ # Multiple threads can share this cursor
465
+ while True:
466
+ data, size = q.read(cursor) # Each thread atomically claims items
467
+ if data is not None:
468
+ process(data)
469
+ ```
470
+
471
+ **Multi-Consumer Example (Shared Memory Mode - Multiprocess):**
472
+ ```python
473
+ from multiprocessing import shared_memory
474
+ from slick_queue_py import AtomicCursor
475
+
476
+ # Create shared cursor for multi-process
477
+ cursor_shm = shared_memory.SharedMemory(name='cursor', create=True, size=8)
478
+ cursor = AtomicCursor(cursor_shm.buf, 0)
479
+ cursor.store(0)
480
+
481
+ # Multiple processes can share this cursor
482
+ while True:
483
+ data, size = q.read(cursor) # Each process atomically claims items
484
+ if data is not None:
485
+ process(data)
486
+ ```
487
+
488
+ #### `read_last() -> Optional[bytes]`
489
+
490
+ Read the most recently published item.
491
+
492
+ **Returns:**
493
+ - `Optional[bytes]`: Last published data or None
494
+
495
+ #### `__getitem__(index) -> memoryview`
496
+
497
+ Get memoryview for writing to reserved slot.
498
+
499
+ **Parameters:**
500
+ - `index` (int): Index from `reserve()`
501
+
502
+ **Returns:**
503
+ - `memoryview`: View into the data array
504
+
505
+ #### `close()`
506
+
507
+ Close the shared memory connection. Always call this before unlinking.
508
+
509
+ #### `unlink()`
510
+
511
+ Delete the shared memory segment. Only call from the process that created it.
512
+
513
+ ### AtomicCursor
514
+
515
+ The `AtomicCursor` class enables multi-consumer work-stealing patterns by providing an atomic read cursor that multiple consumers can coordinate through. Works in both local mode (multi-threading) and shared memory mode (multi-process).
516
+
517
+ #### `__init__(buffer, offset=0)`
518
+
519
+ Create an atomic cursor wrapper around a memory buffer.
520
+
521
+ **Parameters:**
522
+ - `buffer` (memoryview or bytearray): Memory buffer
523
+ - For local mode (threading): use `bytearray(8)`
524
+ - For shared memory mode (multiprocess): use `SharedMemory.buf`
525
+ - `offset` (int, optional): Byte offset in buffer (default 0)
526
+
527
+ **Local Mode Example (Multi-Threading):**
528
+ ```python
529
+ from slick_queue_py import AtomicCursor
530
+
531
+ # Create local cursor for multi-threading
532
+ cursor_buf = bytearray(8)
533
+ cursor = AtomicCursor(cursor_buf, 0)
534
+ cursor.store(0) # Initialize to 0
535
+ ```
536
+
537
+ **Shared Memory Mode Example (Multi-Process):**
538
+ ```python
539
+ from multiprocessing import shared_memory
540
+ from slick_queue_py import AtomicCursor
541
+
542
+ # Create shared cursor for multi-process
543
+ cursor_shm = shared_memory.SharedMemory(name='cursor', create=True, size=8)
544
+ cursor = AtomicCursor(cursor_shm.buf, 0)
545
+ cursor.store(0) # Initialize to 0
546
+ ```
547
+
548
+ #### `load() -> int`
549
+
550
+ Load the cursor value with atomic acquire semantics.
551
+
552
+ **Returns:**
553
+ - `int`: Current cursor value
554
+
555
+ #### `store(value)`
556
+
557
+ Store a new cursor value with atomic release semantics.
558
+
559
+ **Parameters:**
560
+ - `value` (int): New cursor value
561
+
562
+ #### `compare_exchange_weak(expected, desired) -> Tuple[bool, int]`
563
+
564
+ Atomically compare and swap the cursor value.
565
+
566
+ **Parameters:**
567
+ - `expected` (int): Expected cursor value
568
+ - `desired` (int): Desired cursor value
569
+
570
+ **Returns:**
571
+ - `Tuple[bool, int]`: (success, actual_value)
572
+
573
+ **Note:** This is used internally by `read(atomic_cursor)` and typically doesn't need to be called directly.
574
+
575
+ ## Memory Layout
576
+
577
+ The queue uses the same memory layout as C++ `slick::SlickQueue<T>`:
578
+
579
+ ```
580
+ Offset | Size | Content
581
+ -------|---------------|------------------
582
+ 0 | 16 bytes | reserved_info (atomic)
583
+ | 0-7 | uint64_t index_
584
+ | 8-11 | uint32_t size_
585
+ | 12-15 | padding
586
+ 16 | 4 bytes | uint32_t size_ (queue capacity)
587
+ 20 | 44 bytes | padding (to 64 bytes)
588
+ 64 | 16*size bytes | slot array
589
+ | per slot: |
590
+ | 0-7 | uint64_t data_index (atomic)
591
+ | 8-11 | uint32_t size
592
+ | 12-15 | padding
593
+ 64+... | elem*size | data array
594
+ ```
595
+
596
+ ## Platform Support
597
+
598
+ ### Fully Supported (Lock-Free)
599
+ - **Windows x86-64**: Uses native C++ extension (`atomic_ops_ext.pyd`) with MSVC intrinsics
600
+ - **Linux x86-64**: Uses `libatomic` directly via ctypes (no extension needed)
601
+ - **macOS x86-64**: Uses `libatomic` directly via ctypes (no extension needed)
602
+
603
+ **Platform-specific atomic operation implementations:**
604
+ - **Windows**: Requires building the `atomic_ops_ext` C++ extension (uses `std::atomic`)
605
+ - **Linux/macOS**: Uses `libatomic` library directly via ctypes (uses `__sync_val_compare_and_swap_8`)
606
+
607
+ ### Building the Windows Extension
608
+
609
+ On Windows, the native extension is required for lock-free multi-producer support:
610
+
611
+ ```bash
612
+ # Install build dependencies
613
+ pip install setuptools wheel
614
+
615
+ # Build and install the extension
616
+ python setup.py build_ext --inplace
617
+
618
+ # Or install in development mode (builds automatically)
619
+ pip install -e .
620
+ ```
621
+
622
+ **Windows requirements:**
623
+ - Visual Studio 2017+ or MSVC build tools
624
+ - Python development headers (included with standard Python installation)
625
+
626
+ The extension will be built as `atomic_ops_ext.cp312-win_amd64.pyd` (or similar based on Python version).
627
+
628
+ **Linux/macOS:**
629
+ No build step required! The `libatomic` library is typically included with GCC/Clang toolchains and is automatically loaded via ctypes.
630
+
631
+ ### Requirements for Lock-Free Operation
632
+
633
+ **All platforms require hardware support for lock-free atomic operations:**
634
+ - x86-64 CPU with CMPXCHG16B instruction (Intel since ~2006, AMD since ~2007)
635
+ - For C++/Python interoperability, both must use the same atomic hardware instructions
636
+ - No fallback implementation exists - lock-free atomics are mandatory for multi-producer queues
637
+
638
+ **Why no fallback?**
639
+ The queue requires true atomic CAS operations for correctness in multi-producer scenarios. A lock-based fallback would:
640
+ - Break binary compatibility with C++ SlickQueue
641
+ - Fail to work correctly in multi-process scenarios (Python ↔ C++)
642
+ - Not provide the performance guarantees of a lock-free queue
643
+
644
+ ### Not Supported
645
+ - 32-bit platforms (no 16-byte atomic CAS)
646
+ - ARM64 (requires ARMv8.1+ CASP instruction - future support planned)
647
+ - CPUs without CMPXCHG16B support (very old x86-64 CPUs from before 2006)
648
+
649
+ Check platform support:
650
+ ```python
651
+ from atomic_ops import check_platform_support
652
+
653
+ supported, message = check_platform_support()
654
+ print(f"Platform: {message}")
655
+ ```
656
+
657
+ ## Performance
658
+
659
+ Typical throughput on modern hardware (x86-64):
660
+ - Single producer/consumer: ~5-10M items/sec
661
+ - 4 producers/1 consumer: ~3-8M items/sec
662
+ - High contention (8+ producers): ~1-5M items/sec
663
+
664
+ Performance depends on:
665
+ - CPU cache topology
666
+ - Queue size (smaller = more contention)
667
+ - Item size
668
+ - Memory bandwidth
669
+
670
+ ## Advanced Usage
671
+
672
+ ### Batch Operations
673
+
674
+ Reserve and publish multiple elements at once:
675
+
676
+ ```python
677
+ # Reserve 10 elements
678
+ idx = q.reserve(10)
679
+
680
+ # Write data to each slot
681
+ for i in range(10):
682
+ element = q[idx + i]
683
+ element[:data_len] = data[i]
684
+
685
+ # Publish all 10 elements at once
686
+ q.publish(idx, 10)
687
+ ```
688
+
689
+ ### Wrap-Around Handling
690
+
691
+ The queue automatically handles ring buffer wrap-around:
692
+
693
+ ```python
694
+ # Queue with size=8
695
+ q = SlickQueue(name='wrap_test', size=8, element_size=32)
696
+
697
+ # Reserve more items than queue size - wraps automatically
698
+ for i in range(100):
699
+ idx = q.reserve()
700
+ q[idx][:4] = struct.pack("<I", i)
701
+ q.publish(idx)
702
+ ```
703
+
704
+ ## Testing
705
+
706
+ ### Python Tests
707
+
708
+ Run the Python test suite:
709
+
710
+ ```bash
711
+ # Atomic operations tests (clean output)
712
+ python tests/run_test.py tests/test_atomic_ops.py
713
+
714
+ # Basic queue tests (clean output)
715
+ python tests/run_test.py tests/test_queue.py
716
+
717
+ # Local mode tests
718
+ python tests/test_local_mode.py
719
+
720
+ # Multi-producer/consumer tests
721
+ # Note: If tests fail with "File exists" errors, run cleanup first:
722
+ python tests/cleanup_shm.py
723
+ python tests/test_multi_producer.py
724
+ ```
725
+
726
+ ### C++/Python Interoperability Tests
727
+
728
+ Build and run comprehensive interop tests:
729
+
730
+ ```bash
731
+ # 1. Build C++ test programs with CMake
732
+ mkdir build && cd build
733
+ cmake ..
734
+ cmake --build .
735
+
736
+ # 2. Run interoperability test suite
737
+ cd ..
738
+ python tests/test_interop.py
739
+
740
+ # Or run specific tests:
741
+ python tests/test_interop.py --test python_producer_cpp_consumer
742
+ python tests/test_interop.py --test cpp_producer_python_consumer
743
+ python tests/test_interop.py --test multi_producer_interop
744
+ python tests/test_interop.py --test stress_interop
745
+ python tests/test_interop.py --test cpp_shm_creation
746
+ ```
747
+
748
+ The interop tests verify:
749
+ - **Python → C++**: Python producers write data that C++ consumers read
750
+ - **C++ → Python**: C++ producers write data that Python consumers read
751
+ - **Mixed Multi-Producer**: Multiple C++ and Python producers writing to same queue
752
+ - **Stress Test**: High-volume bidirectional communication
753
+ - **SHM created by C++**: C++ producers create the SHM and write data that Python consumers read
754
+
755
+ **Note on Windows**: If child processes from previous test runs don't terminate properly, you may need to manually kill orphaned python.exe processes before running tests again.
756
+
757
+ ## Known Issues
758
+
759
+ 1. **Buffer Cleanup Warning**: You may see a `BufferError: cannot close exported pointers exist` warning during garbage collection. This is a **harmless warning** caused by Python's ctypes creating internal buffer references that persist beyond explicit cleanup. It occurs during program exit and **does not affect functionality, performance, or correctness**. The queue works perfectly despite this warning.
760
+
761
+ 2. **UserWarning**: On Linux you may see `UserWarning: resource_tracker: There appear to be 4 leaked shared_memory objects to clean up at shutdown`. This is a **harmless warning** caused by Python's ctypes creating internal buffer references that persist beyond explicit cleanup. It occurs during program exit and **does not affect functionality, performance, or correctness**. The queue works perfectly despite this warning.
762
+
763
+ ## Architecture
764
+
765
+ ### Atomic Operations
766
+
767
+ The queue uses platform-specific atomic operations:
768
+
769
+ - **8-byte CAS**: For `reserved_info` structure (multi-producer coordination)
770
+ - **8-byte CAS**: For slot `data_index` fields (publish/read synchronization)
771
+ - **Memory barriers**: Acquire/release semantics for proper ordering
772
+
773
+ ### Memory Ordering
774
+
775
+ - `reserve()`: Uses `memory_order_release` on successful CAS
776
+ - `publish()`: Uses `memory_order_release` for data_index store
777
+ - `read()`: Uses `memory_order_acquire` for data_index load
778
+
779
+ This ensures:
780
+ - All writes to data are visible before publishing
781
+ - All reads of data happen after acquiring the index
782
+ - No reordering that could cause data races
783
+
784
+ ## Comparison with C++
785
+
786
+ | Feature | C++ | Python |
787
+ |---------|-----|--------|
788
+ | Multi-producer | ✅ | ✅ |
789
+ | Multi-consumer (work-stealing) | ✅ | ✅ (with AtomicCursor) |
790
+ | Lock-free (x86-64) | ✅ | ✅ |
791
+ | Memory layout | Reference | Matches exactly |
792
+ | Performance | Baseline | ~50-80% of C++ |
793
+ | Ease of use | Medium | High |
794
+ | read(int) single-consumer | ✅ | ✅ |
795
+ | read(atomic cursor) multi-consumer | ✅ | ✅ |
796
+
797
+ ## Contributing
798
+
799
+ Issues and pull requests welcome at [SlickQuant/slick_queue_py](https://github.com/SlickQuant/slick_queue_py).
800
+
801
+ ## License
802
+
803
+ MIT License - see LICENSE file for details.
804
+
805
+ **Made with ⚡ by [SlickQuant](https://github.com/SlickQuant)**