michi-ai 0.1.0__py3-none-any.whl → 0.1.1__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
|
@@ -1,12 +1,12 @@
|
|
|
1
1
|
Metadata-Version: 2.1
|
|
2
2
|
Name: michi-ai
|
|
3
|
-
Version: 0.1.
|
|
3
|
+
Version: 0.1.1
|
|
4
4
|
Summary: Full-duplex speech LLM client for MichiAI
|
|
5
5
|
Home-page: https://ketsuilabs.io
|
|
6
6
|
License: Apache-2.0
|
|
7
7
|
Keywords: llm,speech,full-duplex,audio
|
|
8
8
|
Author: Damian Krystkiewicz
|
|
9
|
-
Author-email:
|
|
9
|
+
Author-email: 45499236+dkrystki@users.noreply.github.com
|
|
10
10
|
Requires-Python: >=3.9,<4.0
|
|
11
11
|
Classifier: License :: OSI Approved :: Apache Software License
|
|
12
12
|
Classifier: Programming Language :: Python :: 3
|
|
@@ -40,7 +40,7 @@ Unlike traditional serial pipelines (ASR → LLM → TTS), MichiAI can listen an
|
|
|
40
40
|
| **Latency (TTFA)** | ~75ms (tested on RTX 4090) |
|
|
41
41
|
| **Architecture** | Continuous Embeddings + Rectified Flow Matching |
|
|
42
42
|
| **Base Backbone** | SmolLM-360m |
|
|
43
|
-
| **Key Innovation** | No Coherence Loss / Single
|
|
43
|
+
| **Key Innovation** | No Coherence Loss / Single Step Decoding |
|
|
44
44
|
|
|
45
45
|
|
|
46
46
|
## 🌟 Key Features
|
|
@@ -56,7 +56,7 @@ Unlike traditional serial pipelines (ASR → LLM → TTS), MichiAI can listen an
|
|
|
56
56
|
## 🤖 Architecture Overview
|
|
57
57
|
|
|
58
58
|
### 1. The Listening Head
|
|
59
|
-
A multi-modal encoder mapping raw audio into
|
|
59
|
+
A multi-modal encoder mapping raw audio into continuous embeddings while simultaneously generating text tokens. This ensures the model understands both the semantic meaning and the emotional context.
|
|
60
60
|
|
|
61
61
|
### 2. The Speaking Head
|
|
62
62
|
Predicts audio embeddings using **Rectified Flow Matching**. This allows for fast, high-quality, and diverse speech generation. The embeddings are then processed through a lightweight, causal **HiFi-GAN vocoder** for real-time streaming.
|
|
@@ -0,0 +1,6 @@
|
|
|
1
|
+
michiai/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
|
|
2
|
+
michiai/client.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
|
|
3
|
+
michi_ai-0.1.1.dist-info/LICENSE.txt,sha256=TGMMvGlYQvvAXSjlrZvpXJjiaxnwDsarhDmLY1KIWr0,11351
|
|
4
|
+
michi_ai-0.1.1.dist-info/METADATA,sha256=Kxrbe0rKJ4EEiiaJLPHrtxYuJQuOZPFqEa8yBC9odjU,4133
|
|
5
|
+
michi_ai-0.1.1.dist-info/WHEEL,sha256=sP946D7jFCHeNz5Iq4fL4Lu-PrWrFsgfLXbbkciIZwg,88
|
|
6
|
+
michi_ai-0.1.1.dist-info/RECORD,,
|
michi_ai-0.1.0.dist-info/RECORD
DELETED
|
@@ -1,6 +0,0 @@
|
|
|
1
|
-
michiai/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
|
|
2
|
-
michiai/client.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
|
|
3
|
-
michi_ai-0.1.0.dist-info/LICENSE.txt,sha256=TGMMvGlYQvvAXSjlrZvpXJjiaxnwDsarhDmLY1KIWr0,11351
|
|
4
|
-
michi_ai-0.1.0.dist-info/METADATA,sha256=oKqjq5PuboN4yquaUsRi2pX-IsmUXjy2WRtCRUpDo80,4121
|
|
5
|
-
michi_ai-0.1.0.dist-info/WHEEL,sha256=sP946D7jFCHeNz5Iq4fL4Lu-PrWrFsgfLXbbkciIZwg,88
|
|
6
|
-
michi_ai-0.1.0.dist-info/RECORD,,
|
|
File without changes
|
|
File without changes
|