slick-queue-py 1.1.0__cp313-cp313-macosx_10_13_universal2.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,820 @@
1
+ Metadata-Version: 2.4
2
+ Name: slick-queue-py
3
+ Version: 1.1.0
4
+ Summary: Lock-free MPMC queue with C++ interoperability via shared memory
5
+ Home-page: https://github.com/SlickQuant/slick-queue-py
6
+ Author: Slick Quant
7
+ Author-email: Slick Quant <slickquant@slickquant.com>
8
+ License: MIT
9
+ Project-URL: Homepage, https://github.com/SlickQuant/slick-queue-py
10
+ Project-URL: Documentation, https://github.com/SlickQuant/slick-queue-py#readme
11
+ Project-URL: Repository, https://github.com/SlickQuant/slick-queue-py
12
+ Project-URL: Bug Tracker, https://github.com/SlickQuant/slick-queue-py/issues
13
+ Keywords: queue,lock-free,atomic,shared-memory,ipc,multiprocessing,mpmc
14
+ Classifier: Development Status :: 4 - Beta
15
+ Classifier: Intended Audience :: Developers
16
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
17
+ Classifier: Topic :: System :: Distributed Computing
18
+ Classifier: License :: OSI Approved :: MIT License
19
+ Classifier: Programming Language :: Python :: 3
20
+ Classifier: Programming Language :: Python :: 3.8
21
+ Classifier: Programming Language :: Python :: 3.9
22
+ Classifier: Programming Language :: Python :: 3.10
23
+ Classifier: Programming Language :: Python :: 3.11
24
+ Classifier: Programming Language :: Python :: 3.12
25
+ Classifier: Programming Language :: C++
26
+ Classifier: Operating System :: Microsoft :: Windows
27
+ Classifier: Operating System :: POSIX :: Linux
28
+ Classifier: Operating System :: MacOS
29
+ Requires-Python: >=3.8
30
+ Description-Content-Type: text/markdown
31
+ License-File: LICENSE
32
+ Dynamic: author
33
+ Dynamic: home-page
34
+ Dynamic: license-file
35
+ Dynamic: requires-python
36
+
37
+ # slick-queue-py
38
+
39
+ Python implementation of SlickQueue - a lock-free multi-producer multi-consumer (MPMC) queue with C++ interoperability through shared memory.
40
+
41
+ This is the Python binding for the [SlickQueue C++ library](https://github.com/SlickQuant/slick-queue). The Python implementation maintains exact binary compatibility with the C++ version, enabling seamless interprocess communication between Python and C++ applications.
42
+
43
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
44
+ [![CI](https://github.com/SlickQuant/slick-queue_py/actions/workflows/ci.yml/badge.svg)](https://github.com/SlickQuant/slick-queue-py/actions/workflows/ci.yml)
45
+ [![GitHub release](https://img.shields.io/github/v/release/SlickQuant/slick-queue-py)](https://github.com/SlickQuant/slick-queue-py/releases)
46
+
47
+ ## Features
48
+
49
+ - **Dual Mode Operation**:
50
+ - **Local Memory Mode**: In-process queue using local memory (no shared memory overhead)
51
+ - **Shared Memory Mode**: Inter-process queue for interprocess communication
52
+ - **Lock-Free Multi-Producer Multi-Consumer**: True MPMC support using atomic operations
53
+ - **C++/Python Interoperability**: Python and C++ processes can share the same queue
54
+ - **Cross-Platform**: Windows and Linux/macOS support (x86-64)
55
+ - **Memory Layout Compatible**: Exact binary compatibility with C++ `slick::SlickQueue<T>`
56
+ - **High Performance**: Hardware atomic operations for minimal overhead
57
+
58
+ ## Requirements
59
+
60
+ - Python 3.8+ (uses `multiprocessing.shared_memory`)
61
+ - 64-bit platform
62
+ - For true lock-free operation: x86-64 CPU with CMPXCHG16B support (most CPUs since 2006)
63
+
64
+ ## Installation
65
+
66
+ ```bash
67
+ pip install -e .
68
+ ```
69
+
70
+ Or just copy the Python files to your project.
71
+
72
+ ## Quick Start
73
+
74
+ ### Local Memory Mode (Single Process)
75
+
76
+ ```python
77
+ from slick_queue_py import SlickQueue
78
+
79
+ # Create a queue in local memory (no shared memory)
80
+ q = SlickQueue(size=1024, element_size=256)
81
+
82
+ # Producer: Reserve a slot, write data, and publish
83
+ idx = q.reserve()
84
+ buf = q[idx]
85
+ buf[:len(b'hello')] = b'hello'
86
+ q.publish(idx)
87
+
88
+ # Consumer: Read data
89
+ read_index = 0
90
+ data, size, read_index = q.read(read_index)
91
+ if data is not None:
92
+ print(f"Received: {data[:size]}")
93
+
94
+ q.close() # unlink() does nothing for local mode
95
+ ```
96
+
97
+ ### Shared Memory Mode (Multi-Process)
98
+
99
+ ```python
100
+ from slick_queue_py import SlickQueue
101
+
102
+ # Create a new shared memory queue (size must be power of two)
103
+ q = SlickQueue(name='my_queue', size=1024, element_size=256)
104
+
105
+ # Producer: Reserve a slot, write data, and publish
106
+ idx = q.reserve()
107
+ buf = q[idx]
108
+ buf[:len(b'hello')] = b'hello'
109
+ q.publish(idx)
110
+
111
+ # Consumer: Read data
112
+ read_index = 0
113
+ data, size, read_index = q.read(read_index)
114
+ if data is not None:
115
+ print(f"Received: {data[:size]}")
116
+
117
+ q.close()
118
+ q.unlink() # Delete shared memory segment
119
+ ```
120
+
121
+ ### Multi-Producer Usage
122
+
123
+ ```python
124
+ from multiprocessing import Process
125
+ from slick_queue_py import SlickQueue
126
+ import struct
127
+
128
+ def producer_worker(queue_name, worker_id, num_items):
129
+ # Open existing queue
130
+ q = SlickQueue(name=queue_name, element_size=32)
131
+
132
+ for i in range(num_items):
133
+ # Reserve slot (thread-safe with atomic CAS)
134
+ idx = q.reserve(1)
135
+
136
+ # Write unique data
137
+ data = struct.pack("<I I", worker_id, i)
138
+ slot = q[idx]
139
+ slot[:len(data)] = data
140
+
141
+ # Publish (makes data visible to consumers)
142
+ q.publish(idx, 1)
143
+
144
+ q.close()
145
+
146
+ # Create queue
147
+ q = SlickQueue(name='mpmc_queue', size=64, element_size=32)
148
+
149
+ # Start multiple producers
150
+ producers = []
151
+ for i in range(4):
152
+ p = Process(target=producer_worker, args=('mpmc_queue', i, 100))
153
+ p.start()
154
+ producers.append(p)
155
+
156
+ # Wait for completion
157
+ for p in producers:
158
+ p.join()
159
+
160
+ q.close()
161
+ q.unlink()
162
+ ```
163
+
164
+ ### Multi-Consumer Work-Stealing
165
+
166
+ For multiple consumers sharing work from a single queue, use an `AtomicCursor` to enable work-stealing patterns where each item is consumed by exactly one consumer.
167
+
168
+ #### Local Mode (Multi-Threading)
169
+
170
+ ```python
171
+ from threading import Thread
172
+ from slick_queue_py import SlickQueue, AtomicCursor
173
+ import struct
174
+
175
+ def consumer_worker(q, cursor, worker_id, results):
176
+ items_processed = 0
177
+ while True:
178
+ # Atomically claim next item (work-stealing)
179
+ data, size, index = q.read(cursor)
180
+
181
+ if data is None:
182
+ break # No more data
183
+
184
+ # Process the claimed item
185
+ worker, seq = struct.unpack("<I I", data[:8])
186
+ items_processed += 1
187
+
188
+ results[worker_id] = items_processed
189
+
190
+ # Create local queue and cursor
191
+ q = SlickQueue(size=64, element_size=32)
192
+ cursor_buf = bytearray(8)
193
+ cursor = AtomicCursor(cursor_buf, 0)
194
+ cursor.store(0) # Initialize cursor to 0
195
+
196
+ # Producer writes items
197
+ for i in range(100):
198
+ idx = q.reserve()
199
+ data = struct.pack("<I I", 0, i)
200
+ q[idx][:len(data)] = data
201
+ q.publish(idx)
202
+
203
+ # Start multiple consumer threads that share the work
204
+ results = {}
205
+ threads = []
206
+ for i in range(4):
207
+ t = Thread(target=consumer_worker, args=(q, cursor, i, results))
208
+ t.start()
209
+ threads.append(t)
210
+
211
+ # Wait for all consumers
212
+ for t in threads:
213
+ t.join()
214
+
215
+ print(f"Total items processed: {sum(results.values())}")
216
+ q.close()
217
+ ```
218
+
219
+ #### Shared Memory Mode (Multi-Process)
220
+
221
+ ```python
222
+ from multiprocessing import Process, shared_memory
223
+ from slick_queue_py import SlickQueue, AtomicCursor
224
+ import struct
225
+
226
+ def consumer_worker(queue_name, cursor_name, worker_id):
227
+ # Open shared queue and cursor
228
+ q = SlickQueue(name=queue_name, element_size=32)
229
+ cursor_shm = shared_memory.SharedMemory(name=cursor_name)
230
+ cursor = AtomicCursor(cursor_shm.buf, 0)
231
+
232
+ items_processed = 0
233
+ while True:
234
+ # Atomically claim next item (work-stealing)
235
+ data, size, index = q.read(cursor)
236
+
237
+ if data is None:
238
+ break # No more data
239
+
240
+ # Process the claimed item
241
+ worker, seq = struct.unpack("<I I", data[:8])
242
+ items_processed += 1
243
+
244
+ print(f"Worker {worker_id} processed {items_processed} items")
245
+ cursor_shm.close()
246
+ q.close()
247
+
248
+ # Create queue and shared cursor
249
+ q = SlickQueue(name='work_queue', size=64, element_size=32)
250
+ cursor_shm = shared_memory.SharedMemory(name='work_cursor', create=True, size=8)
251
+ cursor = AtomicCursor(cursor_shm.buf, 0)
252
+ cursor.store(0) # Initialize cursor to 0
253
+
254
+ # Producer writes items
255
+ for i in range(100):
256
+ idx = q.reserve()
257
+ data = struct.pack("<I I", 0, i)
258
+ q[idx][:len(data)] = data
259
+ q.publish(idx)
260
+
261
+ # Start multiple consumer processes that share the work
262
+ consumers = []
263
+ for i in range(4):
264
+ p = Process(target=consumer_worker, args=('work_queue', 'work_cursor', i))
265
+ p.start()
266
+ consumers.append(p)
267
+
268
+ # Wait for all consumers
269
+ for p in consumers:
270
+ p.join()
271
+
272
+ cursor_shm.close()
273
+ cursor_shm.unlink()
274
+ q.close()
275
+ q.unlink()
276
+ ```
277
+
278
+ ### C++/Python Interoperability
279
+
280
+ The Python implementation is fully compatible with the C++ [SlickQueue](https://github.com/SlickQuant/slick-queue) library. Python and C++ processes can produce and consume from the same queue with:
281
+
282
+ - **Exact memory layout compatibility**: Binary-compatible with `slick::SlickQueue<T>`
283
+ - **Atomic operation compatibility**: Same 16-byte and 8-byte CAS semantics
284
+ - **Bidirectional communication**: C++ ↔ Python in both directions
285
+ - **Multi-producer support**: Mix C++ and Python producers on the same queue
286
+
287
+ **Platform Support for C++/Python Interop:**
288
+ - ✅ **Linux/macOS**: Full interoperability (both use POSIX `shm_open`)
289
+ - ✅ **Windows**: Full interoperability
290
+ - ✅ **Python-only**: Works on all platforms (Windows/Linux/macOS)
291
+
292
+ #### Basic C++ → Python Example
293
+
294
+ **C++ Producer:**
295
+ ```cpp
296
+ #include "queue.h"
297
+
298
+ int main() {
299
+ // Open existing queue created by Python
300
+ slick::SlickQueue<uint8_t> q(32, "shared_queue");
301
+
302
+ for (int i = 0; i < 100; i++) {
303
+ auto idx = q.reserve();
304
+ uint32_t value = i;
305
+ std::memcpy(q[idx], &value, sizeof(value));
306
+ q.publish(idx);
307
+ }
308
+ }
309
+ ```
310
+
311
+ **Python Consumer:**
312
+ ```python
313
+ from slick_queue_py import SlickQueue
314
+ import struct
315
+
316
+ # Create queue that C++ will write to
317
+ q = SlickQueue(name='shared_queue', size=64, element_size=32)
318
+
319
+ read_index = 0
320
+ for _ in range(100):
321
+ data, size, read_index = q.read(read_index)
322
+ if data is not None:
323
+ value = struct.unpack("<I", data[:4])[0]
324
+ print(f"Received from C++: {value}")
325
+
326
+ q.close()
327
+ q.unlink()
328
+ ```
329
+
330
+ #### Building C++ Programs
331
+
332
+ To use the C++ SlickQueue library with your Python queues:
333
+
334
+ ```bash
335
+ # Clone the C++ library
336
+ git clone https://github.com/SlickQuant/slick-queue.git
337
+
338
+ # Build your C++ program
339
+ g++ -std=c++17 -I slick-queue/include my_program.cpp -o my_program
340
+ ```
341
+
342
+ Or use CMake (see [CMakeLists.txt](CMakeLists.txt) for reference):
343
+
344
+ ```cmake
345
+ include(FetchContent)
346
+ FetchContent_Declare(
347
+ slick-queue
348
+ GIT_REPOSITORY https://github.com/SlickQuant/slick-queue.git
349
+ GIT_TAG main
350
+ )
351
+ FetchContent_MakeAvailable(slick-queue)
352
+
353
+ add_executable(my_program my_program.cpp)
354
+ target_link_libraries(my_program PRIVATE slick::queue)
355
+ ```
356
+
357
+ See [tests/test_interop.py](tests/test_interop.py) and [tests/cpp_*.cpp](tests/) for comprehensive examples.
358
+
359
+ ## API Reference
360
+
361
+ ### SlickQueue
362
+
363
+ #### `__init__(*, name=None, size=None, element_size=None)`
364
+
365
+ Create a queue in local memory or shared memory mode.
366
+
367
+ **Parameters:**
368
+ - `name` (str, optional): Shared memory segment name. If None, uses local memory mode (single process).
369
+ - `size` (int): Queue capacity (must be power of 2). Required for local mode or when creating shared memory.
370
+ - `element_size` (int, required): Size of each element in bytes
371
+
372
+ **Examples:**
373
+ ```python
374
+ # Local memory mode (single process)
375
+ q = SlickQueue(size=256, element_size=64)
376
+
377
+ # Create new shared memory queue
378
+ q = SlickQueue(name='my_queue', size=256, element_size=64)
379
+
380
+ # Open existing shared memory queue
381
+ q2 = SlickQueue(name='my_queue', element_size=64)
382
+ ```
383
+
384
+ #### `reserve(n=1) -> int`
385
+
386
+ Reserve `n` elements for writing. **Multi-producer safe** using atomic CAS.
387
+
388
+ **Parameters:**
389
+ - `n` (int): Number of elements to reserve (default 1)
390
+
391
+ **Returns:**
392
+ - `int`: Starting index of reserved space
393
+
394
+ **Example:**
395
+ ```python
396
+ idx = q.reserve(1) # Reserve 1 elements
397
+ ```
398
+
399
+ #### `publish(index, n=1)`
400
+
401
+ Publish data written to reserved space. Uses atomic operations with release memory ordering.
402
+
403
+ **Parameters:**
404
+ - `index` (int): Index returned by `reserve()`
405
+ - `n` (int): Number of elements to publish (default 1)
406
+
407
+ **Example:**
408
+ ```python
409
+ idx = q.reserve()
410
+ q[idx][:data_len] = data
411
+ q.publish(idx)
412
+ ```
413
+
414
+ #### `read(read_index) -> Tuple[Optional[bytes], int, int]` or `read(atomic_cursor) -> Tuple[Optional[bytes], int]`
415
+
416
+ Read from queue with two modes:
417
+
418
+ **Single-Consumer Mode** (when `read_index` is `int`):
419
+ Uses a plain int cursor for single-consumer scenarios. Returns the new read_index.
420
+
421
+ **Multi-Consumer Mode** (when `read_index` is `AtomicCursor`):
422
+ Uses an atomic cursor for work-stealing/load-balancing across multiple consumers.
423
+ Each consumer atomically claims items, ensuring each item is consumed exactly once.
424
+
425
+ **Parameters:**
426
+ - `read_index` (int or AtomicCursor): Current read position or shared atomic cursor
427
+
428
+ **Returns:**
429
+ - Single-consumer: `Tuple[Optional[bytes], int, int]` - (data or None, size, new_read_index)
430
+ - Multi-consumer: `Tuple[Optional[bytes], int]` - (data or None, size)
431
+
432
+ **API Difference from C++:**
433
+ Unlike C++ where `read_index` is updated by reference, the Python single-consumer version returns the new index.
434
+ This is the Pythonic pattern since Python doesn't have true pass-by-reference.
435
+
436
+ ```python
437
+ # Python single-consumer (returns new index)
438
+ data, size, read_index = q.read(read_index)
439
+
440
+ # Python multi-consumer (atomic cursor)
441
+ from slick_queue_py import AtomicCursor
442
+ cursor = AtomicCursor(cursor_shm.buf, 0)
443
+ data, size, index = q.read(cursor) # Atomically claim next item
444
+
445
+ # C++ (updates by reference for both)
446
+ auto [data, size] = queue.read(read_index); // read_index modified in-place
447
+ auto [data, size] = queue.read(atomic_cursor); // atomic_cursor modified in-place
448
+ ```
449
+
450
+ **Single-Consumer Example:**
451
+ ```python
452
+ read_index = 0
453
+ while True:
454
+ data, size, read_index = q.read(read_index)
455
+ if data is not None:
456
+ process(data)
457
+ ```
458
+
459
+ **Multi-Consumer Example (Local Mode - Threading):**
460
+ ```python
461
+ from slick_queue_py import AtomicCursor
462
+
463
+ # Create local cursor for multi-threading
464
+ cursor_buf = bytearray(8)
465
+ cursor = AtomicCursor(cursor_buf, 0)
466
+ cursor.store(0)
467
+
468
+ # Multiple threads can share this cursor
469
+ while True:
470
+ data, size, index = q.read(cursor) # Each thread atomically claims items
471
+ if data is not None:
472
+ process(data)
473
+ ```
474
+
475
+ **Multi-Consumer Example (Shared Memory Mode - Multiprocess):**
476
+ ```python
477
+ from multiprocessing import shared_memory
478
+ from slick_queue_py import AtomicCursor
479
+
480
+ # Create shared cursor for multi-process
481
+ cursor_shm = shared_memory.SharedMemory(name='cursor', create=True, size=8)
482
+ cursor = AtomicCursor(cursor_shm.buf, 0)
483
+ cursor.store(0)
484
+
485
+ # Multiple processes can share this cursor
486
+ while True:
487
+ data, size, index = q.read(cursor) # Each process atomically claims items
488
+ if data is not None:
489
+ process(data)
490
+ ```
491
+
492
+ #### `read_last() -> Tuple[Optional[bytes], int]`
493
+
494
+ Read the most recently published item.
495
+
496
+ **Returns:**
497
+ - `Tuple[Optional[bytes], int]`: Tuple of (data, size)
498
+ - `data`: Last published data or None if queue is empty
499
+ - `size`: Number of slots the item occupies (0 if queue is empty)
500
+
501
+ **Example:**
502
+ ```python
503
+ data, size = q.read_last()
504
+ if data is not None:
505
+ print(f"Last item: {data[:size * element_size]}")
506
+ ```
507
+
508
+ #### `__getitem__(index) -> memoryview`
509
+
510
+ Get memoryview for writing to reserved slot.
511
+
512
+ **Parameters:**
513
+ - `index` (int): Index from `reserve()`
514
+
515
+ **Returns:**
516
+ - `memoryview`: View into the data array
517
+
518
+ #### `close()`
519
+
520
+ Close the shared memory connection. Always call this before unlinking.
521
+
522
+ #### `unlink()`
523
+
524
+ Delete the shared memory segment. Only call from the process that created it.
525
+
526
+ ### AtomicCursor
527
+
528
+ The `AtomicCursor` class enables multi-consumer work-stealing patterns by providing an atomic read cursor that multiple consumers can coordinate through. Works in both local mode (multi-threading) and shared memory mode (multi-process).
529
+
530
+ #### `__init__(buffer, offset=0)`
531
+
532
+ Create an atomic cursor wrapper around a memory buffer.
533
+
534
+ **Parameters:**
535
+ - `buffer` (memoryview or bytearray): Memory buffer
536
+ - For local mode (threading): use `bytearray(8)`
537
+ - For shared memory mode (multiprocess): use `SharedMemory.buf`
538
+ - `offset` (int, optional): Byte offset in buffer (default 0)
539
+
540
+ **Local Mode Example (Multi-Threading):**
541
+ ```python
542
+ from slick_queue_py import AtomicCursor
543
+
544
+ # Create local cursor for multi-threading
545
+ cursor_buf = bytearray(8)
546
+ cursor = AtomicCursor(cursor_buf, 0)
547
+ cursor.store(0) # Initialize to 0
548
+ ```
549
+
550
+ **Shared Memory Mode Example (Multi-Process):**
551
+ ```python
552
+ from multiprocessing import shared_memory
553
+ from slick_queue_py import AtomicCursor
554
+
555
+ # Create shared cursor for multi-process
556
+ cursor_shm = shared_memory.SharedMemory(name='cursor', create=True, size=8)
557
+ cursor = AtomicCursor(cursor_shm.buf, 0)
558
+ cursor.store(0) # Initialize to 0
559
+ ```
560
+
561
+ #### `load() -> int`
562
+
563
+ Load the cursor value with atomic acquire semantics.
564
+
565
+ **Returns:**
566
+ - `int`: Current cursor value
567
+
568
+ #### `store(value)`
569
+
570
+ Store a new cursor value with atomic release semantics.
571
+
572
+ **Parameters:**
573
+ - `value` (int): New cursor value
574
+
575
+ #### `compare_exchange_weak(expected, desired) -> Tuple[bool, int]`
576
+
577
+ Atomically compare and swap the cursor value.
578
+
579
+ **Parameters:**
580
+ - `expected` (int): Expected cursor value
581
+ - `desired` (int): Desired cursor value
582
+
583
+ **Returns:**
584
+ - `Tuple[bool, int]`: (success, actual_value)
585
+
586
+ **Note:** This is used internally by `read(atomic_cursor)` and typically doesn't need to be called directly.
587
+
588
+ ## Memory Layout
589
+
590
+ The queue uses the same memory layout as C++ `slick::SlickQueue<T>`:
591
+
592
+ ```
593
+ Offset | Size | Content
594
+ -------|---------------|------------------
595
+ 0 | 16 bytes | reserved_info (atomic)
596
+ | 0-7 | uint64_t index_
597
+ | 8-11 | uint32_t size_
598
+ | 12-15 | padding
599
+ 16 | 4 bytes | uint32_t size_ (queue capacity)
600
+ 20 | 44 bytes | padding (to 64 bytes)
601
+ 64 | 16*size bytes | slot array
602
+ | per slot: |
603
+ | 0-7 | uint64_t data_index (atomic)
604
+ | 8-11 | uint32_t size
605
+ | 12-15 | padding
606
+ 64+... | elem*size | data array
607
+ ```
608
+
609
+ ## Platform Support
610
+
611
+ ### Fully Supported (Lock-Free)
612
+ - **Windows x86-64**: Uses C++ extension (`atomic_ops_ext.pyd`) with `std::atomic`
613
+ - **Linux x86-64**: Uses C++ extension (`atomic_ops_ext.so`) with `std::atomic`, fallback to `libatomic`
614
+ - **macOS x86-64**: Uses C++ extension (`atomic_ops_ext.so`) with `std::atomic`, fallback to compiler builtins
615
+
616
+ **Platform-specific atomic operation implementations:**
617
+ - **All platforms**: The `atomic_ops_ext` C++ extension is now used on all platforms for the most reliable cross-process atomic operations
618
+ - **Fallback support**: Linux/macOS can fall back to `libatomic` or compiler builtins if the extension isn't available
619
+
620
+ ### Building and Installation
621
+
622
+ The C++ extension is built automatically during installation:
623
+
624
+ ```bash
625
+ # Install with automatic extension build
626
+ pip install -e .
627
+
628
+ # Or build manually first
629
+ python setup.py build_ext --inplace
630
+ pip install -e .
631
+ ```
632
+
633
+ **Build requirements:**
634
+ - **Windows**: Visual Studio 2017+ or MSVC build tools
635
+ - **Linux**: GCC 5+ or Clang 3.8+
636
+ - **macOS**: Xcode command line tools (clang)
637
+ - **All platforms**: Python development headers (included with standard Python installation)
638
+
639
+ The extension will be built as:
640
+ - Windows: `atomic_ops_ext.cp3XX-win_amd64.pyd`
641
+ - Linux: `atomic_ops_ext.cpython-3XX-x86_64-linux-gnu.so`
642
+ - macOS: `atomic_ops_ext.cpython-3XX-darwin.so`
643
+
644
+ (where `XX` is your Python version, e.g., `312` for Python 3.12)
645
+
646
+ ### Requirements for Lock-Free Operation
647
+
648
+ **All platforms require hardware support for lock-free atomic operations:**
649
+ - x86-64 CPU with CMPXCHG16B instruction (Intel since ~2006, AMD since ~2007)
650
+ - For C++/Python interoperability, both must use the same atomic hardware instructions
651
+ - No fallback implementation exists - lock-free atomics are mandatory for multi-producer queues
652
+
653
+ **Why no fallback?**
654
+ The queue requires true atomic CAS operations for correctness in multi-producer scenarios. A lock-based fallback would:
655
+ - Break binary compatibility with C++ SlickQueue
656
+ - Fail to work correctly in multi-process scenarios (Python ↔ C++)
657
+ - Not provide the performance guarantees of a lock-free queue
658
+
659
+ ### Not Supported
660
+ - 32-bit platforms (no 16-byte atomic CAS)
661
+ - ARM64 (requires ARMv8.1+ CASP instruction - future support planned)
662
+ - CPUs without CMPXCHG16B support (very old x86-64 CPUs from before 2006)
663
+
664
+ Check platform support:
665
+ ```python
666
+ from atomic_ops import check_platform_support
667
+
668
+ supported, message = check_platform_support()
669
+ print(f"Platform: {message}")
670
+ ```
671
+
672
+ ## Performance
673
+
674
+ Typical throughput on modern hardware (x86-64):
675
+ - Single producer/consumer: ~5-10M items/sec
676
+ - 4 producers/1 consumer: ~3-8M items/sec
677
+ - High contention (8+ producers): ~1-5M items/sec
678
+
679
+ Performance depends on:
680
+ - CPU cache topology
681
+ - Queue size (smaller = more contention)
682
+ - Item size
683
+ - Memory bandwidth
684
+
685
+ ## Advanced Usage
686
+
687
+ ### Batch Operations
688
+
689
+ Reserve and publish multiple elements at once:
690
+
691
+ ```python
692
+ # Reserve 10 elements
693
+ idx = q.reserve(10)
694
+
695
+ # Write data to each slot
696
+ for i in range(10):
697
+ element = q[idx + i]
698
+ element[:data_len] = data[i]
699
+
700
+ # Publish all 10 elements at once
701
+ q.publish(idx, 10)
702
+ ```
703
+
704
+ ### Wrap-Around Handling
705
+
706
+ The queue automatically handles ring buffer wrap-around:
707
+
708
+ ```python
709
+ # Queue with size=8
710
+ q = SlickQueue(name='wrap_test', size=8, element_size=32)
711
+
712
+ # Reserve more items than queue size - wraps automatically
713
+ for i in range(100):
714
+ idx = q.reserve()
715
+ q[idx][:4] = struct.pack("<I", i)
716
+ q.publish(idx)
717
+ ```
718
+
719
+ ## Testing
720
+
721
+ ### Python Tests
722
+
723
+ Run the Python test suite:
724
+
725
+ ```bash
726
+ # Atomic operations tests (clean output)
727
+ python tests/run_test.py tests/test_atomic_ops.py
728
+
729
+ # Basic queue tests (clean output)
730
+ python tests/run_test.py tests/test_queue.py
731
+
732
+ # Local mode tests
733
+ python tests/test_local_mode.py
734
+
735
+ # Multi-producer/consumer tests
736
+ # Note: If tests fail with "File exists" errors, run cleanup first:
737
+ python tests/cleanup_shm.py
738
+ python tests/test_multi_producer.py
739
+ ```
740
+
741
+ ### C++/Python Interoperability Tests
742
+
743
+ Build and run comprehensive interop tests:
744
+
745
+ ```bash
746
+ # 1. Build C++ test programs with CMake
747
+ mkdir build && cd build
748
+ cmake ..
749
+ cmake --build .
750
+
751
+ # 2. Run interoperability test suite
752
+ cd ..
753
+ python tests/test_interop.py
754
+
755
+ # Or run specific tests:
756
+ python tests/test_interop.py --test python_producer_cpp_consumer
757
+ python tests/test_interop.py --test cpp_producer_python_consumer
758
+ python tests/test_interop.py --test multi_producer_interop
759
+ python tests/test_interop.py --test stress_interop
760
+ python tests/test_interop.py --test cpp_shm_creation
761
+ ```
762
+
763
+ The interop tests verify:
764
+ - **Python → C++**: Python producers write data that C++ consumers read
765
+ - **C++ → Python**: C++ producers write data that Python consumers read
766
+ - **Mixed Multi-Producer**: Multiple C++ and Python producers writing to same queue
767
+ - **Stress Test**: High-volume bidirectional communication
768
+ - **SHM created by C++**: C++ producers create the SHM and write data that Python consumers read
769
+
770
+ **Note on Windows**: If child processes from previous test runs don't terminate properly, you may need to manually kill orphaned python.exe processes before running tests again.
771
+
772
+ ## Known Issues
773
+
774
+ 1. **Buffer Cleanup Warning**: You may see a `BufferError: cannot close exported pointers exist` warning during garbage collection. This is a **harmless warning** caused by Python's ctypes creating internal buffer references that persist beyond explicit cleanup. It occurs during program exit and **does not affect functionality, performance, or correctness**. The queue works perfectly despite this warning.
775
+
776
+ 2. **UserWarning**: On Linux you may see `UserWarning: resource_tracker: There appear to be 4 leaked shared_memory objects to clean up at shutdown`. This is a **harmless warning** caused by Python's ctypes creating internal buffer references that persist beyond explicit cleanup. It occurs during program exit and **does not affect functionality, performance, or correctness**. The queue works perfectly despite this warning.
777
+
778
+ ## Architecture
779
+
780
+ ### Atomic Operations
781
+
782
+ The queue uses platform-specific atomic operations:
783
+
784
+ - **8-byte CAS**: For `reserved_info` structure (multi-producer coordination)
785
+ - **8-byte CAS**: For slot `data_index` fields (publish/read synchronization)
786
+ - **Memory barriers**: Acquire/release semantics for proper ordering
787
+
788
+ ### Memory Ordering
789
+
790
+ - `reserve()`: Uses `memory_order_release` on successful CAS
791
+ - `publish()`: Uses `memory_order_release` for data_index store
792
+ - `read()`: Uses `memory_order_acquire` for data_index load
793
+
794
+ This ensures:
795
+ - All writes to data are visible before publishing
796
+ - All reads of data happen after acquiring the index
797
+ - No reordering that could cause data races
798
+
799
+ ## Comparison with C++
800
+
801
+ | Feature | C++ | Python |
802
+ |---------|-----|--------|
803
+ | Multi-producer | ✅ | ✅ |
804
+ | Multi-consumer (work-stealing) | ✅ | ✅ (with AtomicCursor) |
805
+ | Lock-free (x86-64) | ✅ | ✅ |
806
+ | Memory layout | Reference | Matches exactly |
807
+ | Performance | Baseline | ~50-80% of C++ |
808
+ | Ease of use | Medium | High |
809
+ | read(int) single-consumer | ✅ | ✅ |
810
+ | read(atomic cursor) multi-consumer | ✅ | ✅ |
811
+
812
+ ## Contributing
813
+
814
+ Issues and pull requests welcome at [SlickQuant/slick-queue-py](https://github.com/SlickQuant/slick-queue-py).
815
+
816
+ ## License
817
+
818
+ MIT License - see LICENSE file for details.
819
+
820
+ **Made with ⚡ by [SlickQuant](https://github.com/SlickQuant)**