agora-python-server-sdk 2.1.3__tar.gz → 2.1.5__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of agora-python-server-sdk might be problematic. Click here for more details.

Files changed (45) hide show
  1. agora_python_server_sdk-2.1.5/PKG-INFO +140 -0
  2. agora_python_server_sdk-2.1.5/README.md +125 -0
  3. {agora_python_server_sdk-2.1.3 → agora_python_server_sdk-2.1.5}/agora/rtc/_ctypes_handle/_ctypes_data.py +27 -8
  4. {agora_python_server_sdk-2.1.3 → agora_python_server_sdk-2.1.5}/agora/rtc/agora_base.py +3 -3
  5. {agora_python_server_sdk-2.1.3 → agora_python_server_sdk-2.1.5}/agora/rtc/agora_service.py +11 -2
  6. {agora_python_server_sdk-2.1.3 → agora_python_server_sdk-2.1.5}/agora/rtc/local_user.py +1 -1
  7. {agora_python_server_sdk-2.1.3 → agora_python_server_sdk-2.1.5}/agora/rtc/rtc_connection.py +7 -4
  8. agora_python_server_sdk-2.1.5/agora/rtc/utils/audio_consumer.py +133 -0
  9. agora_python_server_sdk-2.1.5/agora/rtc/utils/vad_dump.py +104 -0
  10. agora_python_server_sdk-2.1.5/agora_python_server_sdk.egg-info/PKG-INFO +140 -0
  11. {agora_python_server_sdk-2.1.3 → agora_python_server_sdk-2.1.5}/agora_python_server_sdk.egg-info/SOURCES.txt +2 -1
  12. {agora_python_server_sdk-2.1.3 → agora_python_server_sdk-2.1.5}/setup.py +2 -2
  13. agora_python_server_sdk-2.1.3/PKG-INFO +0 -51
  14. agora_python_server_sdk-2.1.3/README.md +0 -36
  15. agora_python_server_sdk-2.1.3/agora/rtc/audio_vad.py +0 -164
  16. agora_python_server_sdk-2.1.3/agora_python_server_sdk.egg-info/PKG-INFO +0 -51
  17. {agora_python_server_sdk-2.1.3 → agora_python_server_sdk-2.1.5}/MANIFEST.in +0 -0
  18. {agora_python_server_sdk-2.1.3 → agora_python_server_sdk-2.1.5}/agora/rtc/__init__.py +0 -0
  19. {agora_python_server_sdk-2.1.3 → agora_python_server_sdk-2.1.5}/agora/rtc/_ctypes_handle/_audio_frame_observer.py +0 -0
  20. {agora_python_server_sdk-2.1.3 → agora_python_server_sdk-2.1.5}/agora/rtc/_ctypes_handle/_local_user_observer.py +0 -0
  21. {agora_python_server_sdk-2.1.3 → agora_python_server_sdk-2.1.5}/agora/rtc/_ctypes_handle/_rtc_connection_observer.py +0 -0
  22. {agora_python_server_sdk-2.1.3 → agora_python_server_sdk-2.1.5}/agora/rtc/_ctypes_handle/_video_encoded_frame_observer.py +0 -0
  23. {agora_python_server_sdk-2.1.3 → agora_python_server_sdk-2.1.5}/agora/rtc/_ctypes_handle/_video_frame_observer.py +0 -0
  24. {agora_python_server_sdk-2.1.3 → agora_python_server_sdk-2.1.5}/agora/rtc/_utils/globals.py +0 -0
  25. {agora_python_server_sdk-2.1.3 → agora_python_server_sdk-2.1.5}/agora/rtc/agora_parameter.py +0 -0
  26. {agora_python_server_sdk-2.1.3 → agora_python_server_sdk-2.1.5}/agora/rtc/audio_encoded_frame_sender.py +0 -0
  27. {agora_python_server_sdk-2.1.3 → agora_python_server_sdk-2.1.5}/agora/rtc/audio_frame_observer.py +0 -0
  28. {agora_python_server_sdk-2.1.3 → agora_python_server_sdk-2.1.5}/agora/rtc/audio_pcm_data_sender.py +0 -0
  29. {agora_python_server_sdk-2.1.3 → agora_python_server_sdk-2.1.5}/agora/rtc/audio_sessionctrl.py +0 -0
  30. {agora_python_server_sdk-2.1.3 → agora_python_server_sdk-2.1.5}/agora/rtc/local_audio_track.py +0 -0
  31. {agora_python_server_sdk-2.1.3 → agora_python_server_sdk-2.1.5}/agora/rtc/local_user_observer.py +0 -0
  32. {agora_python_server_sdk-2.1.3 → agora_python_server_sdk-2.1.5}/agora/rtc/local_video_track.py +0 -0
  33. {agora_python_server_sdk-2.1.3 → agora_python_server_sdk-2.1.5}/agora/rtc/media_node_factory.py +0 -0
  34. {agora_python_server_sdk-2.1.3 → agora_python_server_sdk-2.1.5}/agora/rtc/remote_audio_track.py +0 -0
  35. {agora_python_server_sdk-2.1.3 → agora_python_server_sdk-2.1.5}/agora/rtc/remote_video_track.py +0 -0
  36. {agora_python_server_sdk-2.1.3 → agora_python_server_sdk-2.1.5}/agora/rtc/rtc_connection_observer.py +0 -0
  37. {agora_python_server_sdk-2.1.3 → agora_python_server_sdk-2.1.5}/agora/rtc/video_encoded_frame_observer.py +0 -0
  38. {agora_python_server_sdk-2.1.3 → agora_python_server_sdk-2.1.5}/agora/rtc/video_encoded_image_sender.py +0 -0
  39. {agora_python_server_sdk-2.1.3 → agora_python_server_sdk-2.1.5}/agora/rtc/video_frame_observer.py +0 -0
  40. {agora_python_server_sdk-2.1.3 → agora_python_server_sdk-2.1.5}/agora/rtc/video_frame_sender.py +0 -0
  41. {agora_python_server_sdk-2.1.3 → agora_python_server_sdk-2.1.5}/agora/rtc/voice_detection.py +0 -0
  42. {agora_python_server_sdk-2.1.3 → agora_python_server_sdk-2.1.5}/agora_python_server_sdk.egg-info/dependency_links.txt +0 -0
  43. {agora_python_server_sdk-2.1.3 → agora_python_server_sdk-2.1.5}/agora_python_server_sdk.egg-info/top_level.txt +0 -0
  44. {agora_python_server_sdk-2.1.3 → agora_python_server_sdk-2.1.5}/pyproject.toml +0 -0
  45. {agora_python_server_sdk-2.1.3 → agora_python_server_sdk-2.1.5}/setup.cfg +0 -0
@@ -0,0 +1,140 @@
1
+ Metadata-Version: 2.1
2
+ Name: agora_python_server_sdk
3
+ Version: 2.1.5
4
+ Summary: A Python SDK for Agora Server
5
+ Home-page: https://github.com/AgoraIO-Extensions/Agora-Python-Server-SDK
6
+ Classifier: Intended Audience :: Developers
7
+ Classifier: License :: OSI Approved :: MIT License
8
+ Classifier: Topic :: Multimedia :: Sound/Audio
9
+ Classifier: Topic :: Multimedia :: Video
10
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
11
+ Classifier: Programming Language :: Python :: 3.10
12
+ Classifier: Programming Language :: Python :: 3 :: Only
13
+ Requires-Python: >=3.10
14
+ Description-Content-Type: text/markdown
15
+
16
+ # Note
17
+ - This is a Python SDK wrapper for the Agora RTC SDK.
18
+ - It supports Linux and Mac platforms.
19
+ - The examples are provided as very simple demonstrations and are not recommended for use in production environments.
20
+
21
+ # Very Important Notice !!!
22
+ - A process can only have one instance.
23
+ - An instance can have multiple connections.
24
+ - In all observers or callbacks, you must not call the SDK's own APIs, nor perform CPU-intensive tasks in the callbacks; data copying is allowed.
25
+
26
+ # Required Operating Systems and Python Versions
27
+ - Supported Linux versions:
28
+ - Ubuntu 18.04 LTS and above
29
+ - CentOS 7.0 and above
30
+
31
+ - Supported Mac versions:
32
+ - MacOS 13 and above
33
+
34
+ - Python version:
35
+ - Python 3.10 and above
36
+
37
+ # Using Agora-Python-Server-SDK
38
+ ```
39
+ pip install agora_python_server_sdk
40
+ ```
41
+
42
+ # Running Examples
43
+
44
+ ## Preparing Test Data
45
+ - Download and unzip [test_data.zip](https://download.agora.io/demo/test/test_data_202408221437.zip) to the Agora-Python-Server-SDK directory.
46
+
47
+ ## Executing Test Script
48
+ ```
49
+ python agora_rtc/examples/example_audio_pcm_send.py --appId=xxx --channelId=xxx --userId=xxx --audioFile=./test_data/demo.pcm --sampleRate=16000 --numOfChannels=1
50
+ ```
51
+
52
+ # Change log
53
+
54
+ ## 2024.12.03 release Version 2.1.5
55
+ - Modifications:
56
+ - LocalUser/audioTrack:
57
+ -- When the scenario is chorus, developers don't need to call setSendDelayInMs.
58
+ -- When the scenario is chorus, developers don't need to set the audio scenario of the track to chorus.
59
+ -- NOTE: This can reduce the difficulty for developers. In AI scenarios, developers only need to set the service to chorus.
60
+ - Additions:
61
+ -- Added the VadDump class, which can assist in troubleshooting vad issues in the testing environment. However, it should not be enabled in the online env ironment.
62
+ -- Added the on_volume_indication callback.
63
+ -- Added the on_remote_video_track_state_changed callback.
64
+ - Removals:
65
+ -- Removed Vad V1 version, only retaining the V2 version. Refer to voice_detection.py and sample_audio_vad.py.
66
+ - Updates:
67
+ -- Updated relevant samples: audioconsume, vad sample.
68
+
69
+ ## 2024.11.12 release 2.1.4
70
+ - Modify the type of metadata in videoFrame from str to bytes type to be consistent with C++; thus, it can support byte streams.
71
+ - The internal encapsulation of ExteranlVideoFrame has been modified to support byte streams. Regarding the support for alpha encoding, a logical judgment has been made. If fill_alpha_buffer is 0, it will not be processed.
72
+ ## 2024.11.11 release 2.1.3
73
+ - Added a new sample: example_jpeg_send.py which can push JPEG files or JPEG streams to a channel.
74
+ -
75
+ - Performance overhead, as noted in the example comments, can be summarized as follows:
76
+ - For a 1920x1080 JPEG file, the process from reading the file to converting it to an RGBA bytearray - takes approximately 11 milliseconds.
77
+
78
+
79
+ ## 2024.11.07 release 2.1.2
80
+ - Updates `user_id` in the `AudioVolumeInfoInner and AudioVolumeInfo` structure to `str` type.
81
+ - Fixes the bug in `_on_audio_volume_indication` callback, where it could only handle one callback to speaker_number
82
+ - Corrects the parameter type in `IRTCLocalUserObserver::on_audio_volume_indication` callback to `list` type.
83
+
84
+ ## 2024.10.29 release 2.1.1
85
+
86
+ Add audio VAD interface of version 2 and corresponding example.
87
+
88
+ ## 2024.10.24 release 2.1.0
89
+
90
+ Fixed some bug.
91
+
92
+
93
+ ### Common Usage Q&A
94
+ ## The relationship between service and process?
95
+ - A process can only have one service, and the service can only be initialized once.
96
+ - A service can only have one media_node_factory.
97
+ - A service can have multiple connections.
98
+ - Release media_node_factory.release() and service.release() when the process exits.
99
+ ## If using Docker with one user per Docker, when the user starts Docker and logs out, how should Docker be released?
100
+ - In this case, create service/media_node_factory and connection when the process starts.
101
+ - Release service/media_node_factory and connection when the process exits, ensuring that...
102
+ ## If Docker is used to support multiple users and Docker runs for a long time, what should be done?
103
+ - In this case, we recommend using the concept of a connection pool.
104
+ - Create service/media_node_factory and a connection pool (only new connections, without initialization) when the process starts.
105
+ - When a user logs in, get a connection from the connection pool, initialize it, execute con.connect() and set up callbacks, and then join the channel.
106
+ - Handle business operations.
107
+ - When a user logs out, execute con.disconnect() and release the audio/video tracks and observers associated with the connection, but do not call con.release(); then put the connection back into the connection pool.
108
+ - When the process exits, release the connection pool (release each con.release()), service/media_node_factory, and the connection pool (release each con.release()) to ensure resource release and optimal performance.
109
+
110
+ ## Use of VAD
111
+ # Source code: voice_detection.py
112
+ # Sample code: example_audio_vad.py
113
+ # It is recommended to use VAD V2 version, and the class is: AudioVadV2; Reference: voice_detection.py.
114
+ # Use of VAD:
115
+ 1. Call _vad_instance.init(AudioVadConfigV2) to initialize the vad instance. Reference: voice_detection.py. Assume the instance is: _vad_instance
116
+ 2. In audio_frame_observer::on_playback_audio_frame_before_mixing(audio_frame):
117
+
118
+ 3. Call the process of the vad module: state, bytes = _vad_instance.process(audio_frame)
119
+ Judge the value of state according to the returned state, and do corresponding processing.
120
+
121
+ A. If state is _vad_instance._vad_state_startspeaking, it indicates that the user is "starting to speak", and speech recognition (STT/ASR) operations can be started. Remember: be sure to pass the returned bytes to the recognition module instead of the original audio_frame, otherwise the recognition result will be incorrect.
122
+ B. If state is _vad_instance._vad_state_stopspeaking, it indicates that the user is "stopping speaking", and speech recognition (STT/ASR) operations can be stopped. Remember: be sure to pass the returned bytes to the recognition module instead of the original audio_frame, otherwise the recognition result will be incorrect.
123
+ C. If state is _vad_instance._vad_state_speaking, it indicates that the user is "speaking", and speech recognition (STT/ASR) operations can be continued. Remember: be sure to pass the returned bytes to the recognition module instead of the original audio_frame, otherwise the recognition result will be incorrect.
124
+ # Note:
125
+ If the vad module is used and it is expected to use the vad module for speech recognition (STT/ASR) and other operations, then be sure to pass the returned bytes to the recognition module instead of the original audio_frame, otherwise the recognition result will be incorrect.
126
+ # How to better troubleshoot VAD issues: It includes two aspects, configuration and debugging.
127
+ 1. Ensure that the initialization parameters of the vad module are correct. Reference: voice_detection.py.
128
+ 2. In state, bytes = on_playback_audio_frame_before_mixing(audio_frame):
129
+
130
+ - A . Save the data of audio_frame to a local file, reference: example_audio_pcm_send.py. This is to record the original audio data. For example, it can be named: source_{time.time()*1000}.pcm
131
+ - B.Save the result of each vad processing:
132
+
133
+ - a When state == start_speaking: create a new binary file, for example, named: vad_{time.time()*1000}.pcm, and write bytes to the file.
134
+ - b When state == speaking: write bytes to the file.
135
+ - c When state == stop_speaking: write bytes to the file and close the file.
136
+ Note: In this way, problems can be troubleshot based on the original audio file and the audio file processed by vad. This function can be disabled in the production environment.
137
+ ### How to push the audio generated by TTS into the channel?
138
+ # Source code: audio_consumer.py
139
+ # Sample code: example_audio_consumer.py
140
+ ### How to release resources?
@@ -0,0 +1,125 @@
1
+ # Note
2
+ - This is a Python SDK wrapper for the Agora RTC SDK.
3
+ - It supports Linux and Mac platforms.
4
+ - The examples are provided as very simple demonstrations and are not recommended for use in production environments.
5
+
6
+ # Very Important Notice !!!
7
+ - A process can only have one instance.
8
+ - An instance can have multiple connections.
9
+ - In all observers or callbacks, you must not call the SDK's own APIs, nor perform CPU-intensive tasks in the callbacks; data copying is allowed.
10
+
11
+ # Required Operating Systems and Python Versions
12
+ - Supported Linux versions:
13
+ - Ubuntu 18.04 LTS and above
14
+ - CentOS 7.0 and above
15
+
16
+ - Supported Mac versions:
17
+ - MacOS 13 and above
18
+
19
+ - Python version:
20
+ - Python 3.10 and above
21
+
22
+ # Using Agora-Python-Server-SDK
23
+ ```
24
+ pip install agora_python_server_sdk
25
+ ```
26
+
27
+ # Running Examples
28
+
29
+ ## Preparing Test Data
30
+ - Download and unzip [test_data.zip](https://download.agora.io/demo/test/test_data_202408221437.zip) to the Agora-Python-Server-SDK directory.
31
+
32
+ ## Executing Test Script
33
+ ```
34
+ python agora_rtc/examples/example_audio_pcm_send.py --appId=xxx --channelId=xxx --userId=xxx --audioFile=./test_data/demo.pcm --sampleRate=16000 --numOfChannels=1
35
+ ```
36
+
37
+ # Change log
38
+
39
+ ## 2024.12.03 release Version 2.1.5
40
+ - Modifications:
41
+ - LocalUser/audioTrack:
42
+ -- When the scenario is chorus, developers don't need to call setSendDelayInMs.
43
+ -- When the scenario is chorus, developers don't need to set the audio scenario of the track to chorus.
44
+ -- NOTE: This can reduce the difficulty for developers. In AI scenarios, developers only need to set the service to chorus.
45
+ - Additions:
46
+ -- Added the VadDump class, which can assist in troubleshooting vad issues in the testing environment. However, it should not be enabled in the online env ironment.
47
+ -- Added the on_volume_indication callback.
48
+ -- Added the on_remote_video_track_state_changed callback.
49
+ - Removals:
50
+ -- Removed Vad V1 version, only retaining the V2 version. Refer to voice_detection.py and sample_audio_vad.py.
51
+ - Updates:
52
+ -- Updated relevant samples: audioconsume, vad sample.
53
+
54
+ ## 2024.11.12 release 2.1.4
55
+ - Modify the type of metadata in videoFrame from str to bytes type to be consistent with C++; thus, it can support byte streams.
56
+ - The internal encapsulation of ExteranlVideoFrame has been modified to support byte streams. Regarding the support for alpha encoding, a logical judgment has been made. If fill_alpha_buffer is 0, it will not be processed.
57
+ ## 2024.11.11 release 2.1.3
58
+ - Added a new sample: example_jpeg_send.py which can push JPEG files or JPEG streams to a channel.
59
+ -
60
+ - Performance overhead, as noted in the example comments, can be summarized as follows:
61
+ - For a 1920x1080 JPEG file, the process from reading the file to converting it to an RGBA bytearray - takes approximately 11 milliseconds.
62
+
63
+
64
+ ## 2024.11.07 release 2.1.2
65
+ - Updates `user_id` in the `AudioVolumeInfoInner and AudioVolumeInfo` structure to `str` type.
66
+ - Fixes the bug in `_on_audio_volume_indication` callback, where it could only handle one callback to speaker_number
67
+ - Corrects the parameter type in `IRTCLocalUserObserver::on_audio_volume_indication` callback to `list` type.
68
+
69
+ ## 2024.10.29 release 2.1.1
70
+
71
+ Add audio VAD interface of version 2 and corresponding example.
72
+
73
+ ## 2024.10.24 release 2.1.0
74
+
75
+ Fixed some bug.
76
+
77
+
78
+ ### Common Usage Q&A
79
+ ## The relationship between service and process?
80
+ - A process can only have one service, and the service can only be initialized once.
81
+ - A service can only have one media_node_factory.
82
+ - A service can have multiple connections.
83
+ - Release media_node_factory.release() and service.release() when the process exits.
84
+ ## If using Docker with one user per Docker, when the user starts Docker and logs out, how should Docker be released?
85
+ - In this case, create service/media_node_factory and connection when the process starts.
86
+ - Release service/media_node_factory and connection when the process exits, ensuring that...
87
+ ## If Docker is used to support multiple users and Docker runs for a long time, what should be done?
88
+ - In this case, we recommend using the concept of a connection pool.
89
+ - Create service/media_node_factory and a connection pool (only new connections, without initialization) when the process starts.
90
+ - When a user logs in, get a connection from the connection pool, initialize it, execute con.connect() and set up callbacks, and then join the channel.
91
+ - Handle business operations.
92
+ - When a user logs out, execute con.disconnect() and release the audio/video tracks and observers associated with the connection, but do not call con.release(); then put the connection back into the connection pool.
93
+ - When the process exits, release the connection pool (release each con.release()), service/media_node_factory, and the connection pool (release each con.release()) to ensure resource release and optimal performance.
94
+
95
+ ## Use of VAD
96
+ # Source code: voice_detection.py
97
+ # Sample code: example_audio_vad.py
98
+ # It is recommended to use VAD V2 version, and the class is: AudioVadV2; Reference: voice_detection.py.
99
+ # Use of VAD:
100
+ 1. Call _vad_instance.init(AudioVadConfigV2) to initialize the vad instance. Reference: voice_detection.py. Assume the instance is: _vad_instance
101
+ 2. In audio_frame_observer::on_playback_audio_frame_before_mixing(audio_frame):
102
+
103
+ 3. Call the process of the vad module: state, bytes = _vad_instance.process(audio_frame)
104
+ Judge the value of state according to the returned state, and do corresponding processing.
105
+
106
+ A. If state is _vad_instance._vad_state_startspeaking, it indicates that the user is "starting to speak", and speech recognition (STT/ASR) operations can be started. Remember: be sure to pass the returned bytes to the recognition module instead of the original audio_frame, otherwise the recognition result will be incorrect.
107
+ B. If state is _vad_instance._vad_state_stopspeaking, it indicates that the user is "stopping speaking", and speech recognition (STT/ASR) operations can be stopped. Remember: be sure to pass the returned bytes to the recognition module instead of the original audio_frame, otherwise the recognition result will be incorrect.
108
+ C. If state is _vad_instance._vad_state_speaking, it indicates that the user is "speaking", and speech recognition (STT/ASR) operations can be continued. Remember: be sure to pass the returned bytes to the recognition module instead of the original audio_frame, otherwise the recognition result will be incorrect.
109
+ # Note:
110
+ If the vad module is used and it is expected to use the vad module for speech recognition (STT/ASR) and other operations, then be sure to pass the returned bytes to the recognition module instead of the original audio_frame, otherwise the recognition result will be incorrect.
111
+ # How to better troubleshoot VAD issues: It includes two aspects, configuration and debugging.
112
+ 1. Ensure that the initialization parameters of the vad module are correct. Reference: voice_detection.py.
113
+ 2. In state, bytes = on_playback_audio_frame_before_mixing(audio_frame):
114
+
115
+ - A . Save the data of audio_frame to a local file, reference: example_audio_pcm_send.py. This is to record the original audio data. For example, it can be named: source_{time.time()*1000}.pcm
116
+ - B.Save the result of each vad processing:
117
+
118
+ - a When state == start_speaking: create a new binary file, for example, named: vad_{time.time()*1000}.pcm, and write bytes to the file.
119
+ - b When state == speaking: write bytes to the file.
120
+ - c When state == stop_speaking: write bytes to the file and close the file.
121
+ Note: In this way, problems can be troubleshot based on the original audio file and the audio file processed by vad. This function can be disabled in the production environment.
122
+ ### How to push the audio generated by TTS into the channel?
123
+ # Source code: audio_consumer.py
124
+ # Sample code: example_audio_consumer.py
125
+ ### How to release resources?
@@ -151,7 +151,7 @@ class VideoFrameInner(ctypes.Structure):
151
151
  rotation=self.rotation,
152
152
  render_time_ms=self.render_time_ms,
153
153
  avsync_type=self.avsync_type,
154
- metadata=ctypes.string_at(self.metadata_buffer, self.metadata_size).decode() if self.metadata_buffer else None,
154
+ metadata=ctypes.string_at(self.metadata_buffer, self.metadata_size) if self.metadata_buffer else None,
155
155
  shared_context=self.shared_context.decode() if self.shared_context else None,
156
156
  texture_id=self.texture_id,
157
157
  matrix=self.matrix,
@@ -770,12 +770,31 @@ class ExternalVideoFrameInner(ctypes.Structure):
770
770
 
771
771
  @staticmethod
772
772
  def create(frame: ExternalVideoFrame) -> 'ExternalVideoFrameInner':
773
- c_buffer = (ctypes.c_uint8 * len(frame.buffer)).from_buffer(frame.buffer)
774
- c_buffer_ptr = ctypes.cast(c_buffer, ctypes.c_void_p)
775
- c_metadata = bytearray(frame.metadata.encode('utf-8'))
776
- c_metadata_ptr = (ctypes.c_uint8 * len(c_metadata)).from_buffer(c_metadata)
777
- c_alpha_buffer = (ctypes.c_uint8 * len(frame.alpha_buffer)).from_buffer(frame.alpha_buffer)
778
- c_alpha_buffer_ptr = ctypes.cast(c_alpha_buffer, ctypes.c_void_p)
773
+ if frame.buffer is not None:
774
+ c_buffer = (ctypes.c_uint8 * len(frame.buffer)).from_buffer(frame.buffer)
775
+ c_buffer_ptr = ctypes.cast(c_buffer, ctypes.c_void_p)
776
+ else:
777
+ c_buffer_ptr = ctypes.c_void_p(0)
778
+
779
+
780
+ #aplha_buffer and is_fill alpha_buffer
781
+ if (frame.fill_alpha_buffer >0) and (frame.alpha_buffer is not None):
782
+ c_alpha_buffer = (ctypes.c_uint8 * len(frame.alpha_buffer)).from_buffer(frame.alpha_buffer)
783
+ c_alpha_buffer_ptr = ctypes.cast(c_alpha_buffer, ctypes.c_void_p)
784
+ else:
785
+ c_alpha_buffer_ptr = ctypes.c_void_p(0)
786
+
787
+ c_metadata_size = len (frame.metadata)
788
+ if frame.metadata is not None and c_metadata_size > 0:
789
+ c_metadata_ptr = (ctypes.c_uint8 * len(frame.metadata)).from_buffer(frame.metadata)
790
+ c_metadata_size = len(frame.metadata)
791
+ else:
792
+ c_metadata_ptr = ctypes.c_void_p(0)
793
+ c_metadata_size = 0
794
+
795
+
796
+
797
+
779
798
  return ExternalVideoFrameInner(
780
799
  frame.type,
781
800
  frame.format,
@@ -793,7 +812,7 @@ class ExternalVideoFrameInner(ctypes.Structure):
793
812
  frame.texture_id,
794
813
  (ctypes.c_float * 16)(*frame.matrix),
795
814
  c_metadata_ptr,
796
- len(c_metadata),
815
+ c_metadata_size,
797
816
  c_alpha_buffer_ptr,
798
817
  frame.fill_alpha_buffer,
799
818
  frame.alpha_mode
@@ -286,7 +286,7 @@ class VideoFrame():
286
286
  rotation: int = 0
287
287
  render_time_ms: int = 0
288
288
  avsync_type: int = 0
289
- metadata: str = None
289
+ metadata: bytearray = None
290
290
  shared_context: str = None
291
291
  texture_id: int = 0
292
292
  matrix: list = None
@@ -441,8 +441,8 @@ class ExternalVideoFrame:
441
441
  egl_type: int = 0
442
442
  texture_id: int = 0
443
443
  matrix: list = field(default_factory=list)
444
- metadata: str = ""
445
- alpha_buffer: bytearray = field(default_factory=bytearray)
444
+ metadata: bytearray = None #change type from str to bytearray to match the C++ version,i.e to support bytes as metadata
445
+ alpha_buffer: bytearray = None
446
446
  fill_alpha_buffer: int = 0
447
447
  alpha_mode: int = 0
448
448
 
@@ -75,6 +75,8 @@ class AgoraService:
75
75
  if result == 0:
76
76
  self.inited = True
77
77
  logger.debug(f'Initialization result: {result}')
78
+ self._is_low_delay = True if config.audio_scenario == AudioScenarioType.AUDIO_SCENARIO_CHORUS else False
79
+
78
80
 
79
81
  # to enable plugin
80
82
  provider = "agora.builtin"
@@ -87,6 +89,9 @@ class AgoraService:
87
89
  agora_parameter = self.get_agora_parameter()
88
90
  agora_parameter.set_int("rtc.set_app_type", 18)
89
91
 
92
+ # force audio vad v2 to be enabled
93
+ agora_parameter.set_parameters("{\"che.audio.label.enable\": true}")
94
+
90
95
  if config.log_path:
91
96
  log_size = 512 * 1024
92
97
  if config.log_size > 0:
@@ -128,7 +133,7 @@ class AgoraService:
128
133
  rtc_conn_handle = agora_rtc_conn_create(self.service_handle, ctypes.byref(RTCConnConfigInner.create(con_config)))
129
134
  if rtc_conn_handle is None:
130
135
  return None
131
- return RTCConnection(rtc_conn_handle)
136
+ return RTCConnection(rtc_conn_handle, self._is_low_delay)
132
137
 
133
138
  # createCustomAudioTrackPcm: creatae a custom audio track from pcm data sender
134
139
  def create_custom_audio_track_pcm(self, audio_pcm_data_sender: AudioPcmDataSender) -> LocalAudioTrack:
@@ -138,7 +143,11 @@ class AgoraService:
138
143
  custom_audio_track = agora_service_create_custom_audio_track_pcm(self.service_handle, audio_pcm_data_sender.sender_handle)
139
144
  if custom_audio_track is None:
140
145
  return None
141
- return LocalAudioTrack(custom_audio_track)
146
+ local_track = LocalAudioTrack(custom_audio_track)
147
+ #default for ai senario to set min delay to 10ms
148
+ if local_track is not None:
149
+ local_track.set_send_delay_ms(10)
150
+ return local_track
142
151
  # mix_mode: MIX_ENABLED = 0, MIX_DISABLED = 1
143
152
 
144
153
  def create_custom_audio_track_encoded(self, audio_encoded_frame_sender: AudioEncodedFrameSender, mix_mode: int):
@@ -506,7 +506,7 @@ class LocalUser:
506
506
  if ret < 0:
507
507
  logger.error("Failed to unsubscribe all video")
508
508
  else:
509
- self.del_remote_video_map_all(None)
509
+ self.del_remote_video_map(None)
510
510
  return ret
511
511
 
512
512
  def set_audio_volume_indication_parameters(self, interval_in_ms, smooth, report_vad):
@@ -53,16 +53,19 @@ agora_rtc_conn_renew_token.argtypes = [AGORA_HANDLE, ctypes.c_char_p]
53
53
 
54
54
 
55
55
  class RTCConnection:
56
- def __init__(self, conn_handle) -> None:
56
+ def __init__(self, conn_handle, is_low_delay: bool = False) -> None:
57
57
  self.conn_handle = conn_handle
58
58
  self.con_observer = None
59
59
  self.local_user = None
60
60
  self.local_user_handle = agora_rtc_conn_get_local_user(conn_handle)
61
61
  if self.local_user_handle:
62
62
  self.local_user = LocalUser(self.local_user_handle, self)
63
+ #added to set low delay mode
64
+ if is_low_delay == True:
65
+ self.local_user.set_audio_scenario(AudioScenarioType.AUDIO_SCENARIO_CHORUS)
63
66
  # add to map
64
- AgoraHandleInstanceMap().set_local_user_map(self.conn_handle, self)
65
- AgoraHandleInstanceMap().set_con_map(self.conn_handle, self)
67
+ #AgoraHandleInstanceMap().set_local_user_map(self.conn_handle, self)
68
+ #AgoraHandleInstanceMap().set_con_map(self.conn_handle, self)
66
69
 
67
70
  #
68
71
  def connect(self, token: str, chan_id: str, user_id: str) -> int:
@@ -130,7 +133,7 @@ class RTCConnection:
130
133
  def release(self):
131
134
  # release local user map
132
135
  if self.conn_handle:
133
- AgoraHandleInstanceMap().del_local_user_map(self.conn_handle)
136
+ #AgoraHandleInstanceMap().del_local_user_map(self.conn_handle)
134
137
  agora_rtc_conn_release(self.conn_handle)
135
138
  self.conn_handle = None
136
139
  self.local_user = None
@@ -0,0 +1,133 @@
1
+ #!env python
2
+ import threading
3
+ import time
4
+ from agora.rtc.audio_pcm_data_sender import PcmAudioFrame
5
+ from agora.rtc.audio_pcm_data_sender import AudioPcmDataSender
6
+ import logging
7
+ import asyncio
8
+ logger = logging.getLogger(__name__)
9
+
10
+
11
+
12
+ """
13
+ # AudioConsumer
14
+ # 基础类,用于消费PCM数据,并将PCM数据推送到RTC频道中
15
+ # 在AI场景中:
16
+ # 当TTS有数据返回的时候:调用AudioConsumer::push_pcm_data方法,将返回的TTS数据直接push到AudioConsumer
17
+ # 在另外的一个“timer”的触发函数中,调用 AudioConsumer::consume()方法,将数据推送到rtc
18
+ # 推荐:
19
+ # “Timer”可以是asycio的模式;也可以是threading.Timer的模式;也可以和业务已有的timer结合在一起使用,都可以。只需要在timer 触发的函数中,调用 AudioConsumer::consume()即可
20
+ # “Timer”的触发间隔,可以和业务已有的timer间隔一致,也可以根据业务需求调整,推荐在40~80ms之间
21
+
22
+ AudioConsumer调用方式:
23
+ 1. 使用该类的前提:
24
+ - 需要客户在应用层自己实现一个timer,该timer间隔需要在[40ms,80ms]之间。这个timer的触发方法下面用app::TimeFunc表示。
25
+ - 一个用户只能对应一个AudioConsumer对象,也就是保障一个生产者产生的内容对应一个消费者。
26
+ 2. 使用方式:
27
+ A 对每一个“生产pcm数据“的 userid 创建一个AudioConsumer对象,也就是保障一个生产者产生的内容对应一个消费者。
28
+ B 当有pcm数据生成的时候,比如TTS的返回,调用 AudioConsumer::push_pcm_data(data)
29
+ C 当需要消费的时候(通常用app::TimerFunc),调用 AudioConsumer::consume()方法,会自动完成对数据的消费,也就是推送到rtc 频道中
30
+ D 如果需要打断:也就是AI场景中,要停止播放当前AI的对话:调用 AudioConsumer::clear()方法,会自动清空当前buffer中的数据
31
+ E 退出的时候,调用release()方法,释放资源
32
+ """
33
+ class AudioConsumer:
34
+ def __init__(self, pcm_sender: AudioPcmDataSender, sample_rate: int, channels: int) -> None:
35
+ self._lock = threading.Lock()
36
+ self._start_time = 0
37
+ self._data = bytearray()
38
+ self._consumed_packages = 0
39
+ self._pcm_sender = pcm_sender
40
+ self._frame = PcmAudioFrame()
41
+ #init sample rate and channels
42
+ self._frame.sample_rate = sample_rate
43
+ self._frame.number_of_channels = channels
44
+
45
+ #audio parame
46
+ self._frame.bytes_per_sample = 2
47
+ self._bytes_per_frame = sample_rate // 100 * channels * 2
48
+ self._samples_per_channel = sample_rate // 100* channels
49
+ #init pcmaudioframe
50
+ self._frame.timestamp = 0
51
+
52
+ self._init = True
53
+
54
+ pass
55
+ def push_pcm_data(self, data) ->None:
56
+ if self._init == False:
57
+ return
58
+ # add to buffer, lock
59
+ with self._lock:
60
+ self._data += data
61
+ pass
62
+ def _reset(self):
63
+ if self._init == False:
64
+ return
65
+ self._start_time = time.time()*1000
66
+ self._consumed_packages = 0
67
+ pass
68
+
69
+ def consume(self):
70
+ print("consume begin")
71
+ if self._init == False:
72
+ return
73
+ now = time.time()*1000
74
+ elapsed_time = int(now - self._start_time)
75
+ expected_total_packages = int(elapsed_time//10)
76
+ besent_packages = expected_total_packages - self._consumed_packages
77
+ data_len = len(self._data)
78
+
79
+ if besent_packages > 18 and data_len //self._bytes_per_frame < 18: #for fist time, if data_len is not enough, just return and wait for next time
80
+ #print("-----underflow data")
81
+ return
82
+ if besent_packages > 18: #rest to start state, push 18 packs in Start_STATE
83
+ self._reset()
84
+ besent_packages = min(18, data_len//self._bytes_per_frame)
85
+ self._consumed_packages = -besent_packages
86
+
87
+
88
+ #get min packages
89
+ act_besent_packages = (int)(min(besent_packages, data_len//self._bytes_per_frame))
90
+ #print("consume 1:", act_besent_packages, data_len)
91
+ if act_besent_packages < 1:
92
+ return
93
+
94
+ #construct an audio frame to push
95
+ #frame = PcmAudioFrame()
96
+ with self._lock:
97
+ #frame = PcmAudioFrame()
98
+ self._frame.data = self._data[:self._bytes_per_frame*act_besent_packages]
99
+ self._frame.timestamp = 0
100
+ self._frame.samples_per_channel = self._samples_per_channel*act_besent_packages
101
+
102
+ #reset data
103
+ self._data = self._data[self._bytes_per_frame*act_besent_packages:]
104
+ self._consumed_packages += act_besent_packages
105
+
106
+ self._pcm_sender.send_audio_pcm_data(self._frame)
107
+ #print(f"act_besent_packages: {now},{now - self._start_time}, {besent_packages}, {act_besent_packages},{self._consumed_packages},{data_len}")
108
+ pass
109
+
110
+ def len(self) -> int:
111
+ if self._init == False:
112
+ return 0
113
+ with self._lock:
114
+ return len(self._data)
115
+ def clear(self):
116
+ if self._init == False:
117
+ return
118
+ with self._lock:
119
+ self._data = bytearray()
120
+ pass
121
+ def release(self):
122
+ if self._init == False:
123
+ return
124
+
125
+ self._init = False
126
+ with self._lock:
127
+ self._data = None
128
+ self._frame = None
129
+ self._pcm_sender = None
130
+ self._lock = None
131
+ pass
132
+
133
+
@@ -0,0 +1,104 @@
1
+ #!env python
2
+ import time
3
+ from datetime import datetime
4
+ import logging
5
+ import os
6
+ from agora.rtc.agora_base import AudioFrame
7
+ logger = logging.getLogger(__name__)
8
+
9
+
10
+
11
+ """
12
+ ## VadDump helper class
13
+ """
14
+ class VadDump():
15
+ def __init__(self, path: str) -> None:
16
+ self._file_path = path
17
+ self._count = 0
18
+ self._frame_count = 0
19
+ self._is_open = False
20
+ self._source_file = None
21
+ self._label_file = None
22
+ self._vad_file = None
23
+ #check path is existed or not? if not, create new dir
24
+ if self._check_directory_exists(path) is False:
25
+ os.makedirs(path)
26
+ # make suddirectory : ("%s/%04d%02d%02d%02d%02d%02d
27
+ now = datetime.now()
28
+ #format to YYYYMMDDHHMMSS
29
+ self._file_path = "%s/%04d%02d%02d%02d%02d%02d" % (path, now.year, now.month, now.day, now.hour, now.minute, now.second)
30
+ os.makedirs(self._file_path)
31
+
32
+
33
+ pass
34
+ def _check_directory_exists(self, path: str) -> bool:
35
+ return os.path.exists(path) and os.path.isdir(path)
36
+ def _create_vad_file(self) -> None:
37
+ self._close_vad_file()
38
+ #create a new one
39
+ vad_file_path = "%s/vad_%d.pcm" % (self._file_path, self._count)
40
+ self._vad_file = open(vad_file_path, "wb")
41
+ #increment the count
42
+ self._count += 1
43
+ pass
44
+ def _close_vad_file(self) -> None:
45
+ if self._vad_file:
46
+ self._vad_file.close()
47
+ self._vad_file = None
48
+ pass
49
+ def open(self) -> int:
50
+ if self._is_open is True:
51
+ return 1
52
+ self._is_open = True
53
+ #open source file
54
+ source_file_path = self._file_path + "/source.pcm"
55
+ self._source_file = open(source_file_path, "wb")
56
+ #open label file
57
+ label_file_path = self._file_path + "/label.txt"
58
+ self._label_file = open(label_file_path, "w")
59
+ #open vad file
60
+ pass
61
+ def write(self, frame:AudioFrame, vad_result_bytes: bytearray, vad_result_state : int) -> None:
62
+ #write pcm to source
63
+ if self._is_open is False:
64
+ return
65
+ if self._source_file:
66
+ self._source_file.write(frame.buffer)
67
+ # fomat frame 's label informaiton and write to label file
68
+ if self._label_file:
69
+ label_str = "ct:%d fct:%d state:%d far:%d vop:%d rms:%d pitch:%d mup:%d\n" % (self._count, self._frame_count,vad_result_state, frame.far_field_flag, frame.voice_prob, frame.rms, frame.pitch, frame.music_prob)
70
+ self._label_file.write(label_str)
71
+ #write to vad result
72
+ if vad_result_state == 1: # start speaking
73
+ #open new vad file and write header
74
+ self._create_vad_file()
75
+ if self._vad_file:
76
+ self._vad_file.write(vad_result_bytes)
77
+ if vad_result_state == 2:
78
+ if self._vad_file:
79
+ self._vad_file.write(vad_result_bytes)
80
+ if vad_result_state == 3:
81
+ if self._vad_file:
82
+ self._vad_file.write(vad_result_bytes)
83
+ self._close_vad_file()
84
+ #increment frame counter
85
+ self._frame_count += 1
86
+ pass
87
+ def close(self) -> None:
88
+ if self._is_open == False:
89
+ return
90
+ self._is_open = False
91
+ if self._vad_file:
92
+ self._close_vad_file()
93
+ self._vad_file = None
94
+ if self._label_file:
95
+ self._label_file.close()
96
+ self._label_file = None
97
+ self._close_vad_file()
98
+
99
+ # assign to None
100
+ self._count = 0
101
+ self._frame_count = 0
102
+ self._file_path = None
103
+
104
+ pass
@@ -0,0 +1,140 @@
1
+ Metadata-Version: 2.1
2
+ Name: agora_python_server_sdk
3
+ Version: 2.1.5
4
+ Summary: A Python SDK for Agora Server
5
+ Home-page: https://github.com/AgoraIO-Extensions/Agora-Python-Server-SDK
6
+ Classifier: Intended Audience :: Developers
7
+ Classifier: License :: OSI Approved :: MIT License
8
+ Classifier: Topic :: Multimedia :: Sound/Audio
9
+ Classifier: Topic :: Multimedia :: Video
10
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
11
+ Classifier: Programming Language :: Python :: 3.10
12
+ Classifier: Programming Language :: Python :: 3 :: Only
13
+ Requires-Python: >=3.10
14
+ Description-Content-Type: text/markdown
15
+
16
+ # Note
17
+ - This is a Python SDK wrapper for the Agora RTC SDK.
18
+ - It supports Linux and Mac platforms.
19
+ - The examples are provided as very simple demonstrations and are not recommended for use in production environments.
20
+
21
+ # Very Important Notice !!!
22
+ - A process can only have one instance.
23
+ - An instance can have multiple connections.
24
+ - In all observers or callbacks, you must not call the SDK's own APIs, nor perform CPU-intensive tasks in the callbacks; data copying is allowed.
25
+
26
+ # Required Operating Systems and Python Versions
27
+ - Supported Linux versions:
28
+ - Ubuntu 18.04 LTS and above
29
+ - CentOS 7.0 and above
30
+
31
+ - Supported Mac versions:
32
+ - MacOS 13 and above
33
+
34
+ - Python version:
35
+ - Python 3.10 and above
36
+
37
+ # Using Agora-Python-Server-SDK
38
+ ```
39
+ pip install agora_python_server_sdk
40
+ ```
41
+
42
+ # Running Examples
43
+
44
+ ## Preparing Test Data
45
+ - Download and unzip [test_data.zip](https://download.agora.io/demo/test/test_data_202408221437.zip) to the Agora-Python-Server-SDK directory.
46
+
47
+ ## Executing Test Script
48
+ ```
49
+ python agora_rtc/examples/example_audio_pcm_send.py --appId=xxx --channelId=xxx --userId=xxx --audioFile=./test_data/demo.pcm --sampleRate=16000 --numOfChannels=1
50
+ ```
51
+
52
+ # Change log
53
+
54
+ ## 2024.12.03 release Version 2.1.5
55
+ - Modifications:
56
+ - LocalUser/audioTrack:
57
+ -- When the scenario is chorus, developers don't need to call setSendDelayInMs.
58
+ -- When the scenario is chorus, developers don't need to set the audio scenario of the track to chorus.
59
+ -- NOTE: This can reduce the difficulty for developers. In AI scenarios, developers only need to set the service to chorus.
60
+ - Additions:
61
+ -- Added the VadDump class, which can assist in troubleshooting vad issues in the testing environment. However, it should not be enabled in the online env ironment.
62
+ -- Added the on_volume_indication callback.
63
+ -- Added the on_remote_video_track_state_changed callback.
64
+ - Removals:
65
+ -- Removed Vad V1 version, only retaining the V2 version. Refer to voice_detection.py and sample_audio_vad.py.
66
+ - Updates:
67
+ -- Updated relevant samples: audioconsume, vad sample.
68
+
69
+ ## 2024.11.12 release 2.1.4
70
+ - Modify the type of metadata in videoFrame from str to bytes type to be consistent with C++; thus, it can support byte streams.
71
+ - The internal encapsulation of ExteranlVideoFrame has been modified to support byte streams. Regarding the support for alpha encoding, a logical judgment has been made. If fill_alpha_buffer is 0, it will not be processed.
72
+ ## 2024.11.11 release 2.1.3
73
+ - Added a new sample: example_jpeg_send.py which can push JPEG files or JPEG streams to a channel.
74
+ -
75
+ - Performance overhead, as noted in the example comments, can be summarized as follows:
76
+ - For a 1920x1080 JPEG file, the process from reading the file to converting it to an RGBA bytearray - takes approximately 11 milliseconds.
77
+
78
+
79
+ ## 2024.11.07 release 2.1.2
80
+ - Updates `user_id` in the `AudioVolumeInfoInner and AudioVolumeInfo` structure to `str` type.
81
+ - Fixes the bug in `_on_audio_volume_indication` callback, where it could only handle one callback to speaker_number
82
+ - Corrects the parameter type in `IRTCLocalUserObserver::on_audio_volume_indication` callback to `list` type.
83
+
84
+ ## 2024.10.29 release 2.1.1
85
+
86
+ Add audio VAD interface of version 2 and corresponding example.
87
+
88
+ ## 2024.10.24 release 2.1.0
89
+
90
+ Fixed some bug.
91
+
92
+
93
+ ### Common Usage Q&A
94
+ ## The relationship between service and process?
95
+ - A process can only have one service, and the service can only be initialized once.
96
+ - A service can only have one media_node_factory.
97
+ - A service can have multiple connections.
98
+ - Release media_node_factory.release() and service.release() when the process exits.
99
+ ## If using Docker with one user per Docker, when the user starts Docker and logs out, how should Docker be released?
100
+ - In this case, create service/media_node_factory and connection when the process starts.
101
+ - Release service/media_node_factory and connection when the process exits, ensuring that...
102
+ ## If Docker is used to support multiple users and Docker runs for a long time, what should be done?
103
+ - In this case, we recommend using the concept of a connection pool.
104
+ - Create service/media_node_factory and a connection pool (only new connections, without initialization) when the process starts.
105
+ - When a user logs in, get a connection from the connection pool, initialize it, execute con.connect() and set up callbacks, and then join the channel.
106
+ - Handle business operations.
107
+ - When a user logs out, execute con.disconnect() and release the audio/video tracks and observers associated with the connection, but do not call con.release(); then put the connection back into the connection pool.
108
+ - When the process exits, release the connection pool (release each con.release()), service/media_node_factory, and the connection pool (release each con.release()) to ensure resource release and optimal performance.
109
+
110
+ ## Use of VAD
111
+ # Source code: voice_detection.py
112
+ # Sample code: example_audio_vad.py
113
+ # It is recommended to use VAD V2 version, and the class is: AudioVadV2; Reference: voice_detection.py.
114
+ # Use of VAD:
115
+ 1. Call _vad_instance.init(AudioVadConfigV2) to initialize the vad instance. Reference: voice_detection.py. Assume the instance is: _vad_instance
116
+ 2. In audio_frame_observer::on_playback_audio_frame_before_mixing(audio_frame):
117
+
118
+ 3. Call the process of the vad module: state, bytes = _vad_instance.process(audio_frame)
119
+ Judge the value of state according to the returned state, and do corresponding processing.
120
+
121
+ A. If state is _vad_instance._vad_state_startspeaking, it indicates that the user is "starting to speak", and speech recognition (STT/ASR) operations can be started. Remember: be sure to pass the returned bytes to the recognition module instead of the original audio_frame, otherwise the recognition result will be incorrect.
122
+ B. If state is _vad_instance._vad_state_stopspeaking, it indicates that the user is "stopping speaking", and speech recognition (STT/ASR) operations can be stopped. Remember: be sure to pass the returned bytes to the recognition module instead of the original audio_frame, otherwise the recognition result will be incorrect.
123
+ C. If state is _vad_instance._vad_state_speaking, it indicates that the user is "speaking", and speech recognition (STT/ASR) operations can be continued. Remember: be sure to pass the returned bytes to the recognition module instead of the original audio_frame, otherwise the recognition result will be incorrect.
124
+ # Note:
125
+ If the vad module is used and it is expected to use the vad module for speech recognition (STT/ASR) and other operations, then be sure to pass the returned bytes to the recognition module instead of the original audio_frame, otherwise the recognition result will be incorrect.
126
+ # How to better troubleshoot VAD issues: It includes two aspects, configuration and debugging.
127
+ 1. Ensure that the initialization parameters of the vad module are correct. Reference: voice_detection.py.
128
+ 2. In state, bytes = on_playback_audio_frame_before_mixing(audio_frame):
129
+
130
+ - A . Save the data of audio_frame to a local file, reference: example_audio_pcm_send.py. This is to record the original audio data. For example, it can be named: source_{time.time()*1000}.pcm
131
+ - B.Save the result of each vad processing:
132
+
133
+ - a When state == start_speaking: create a new binary file, for example, named: vad_{time.time()*1000}.pcm, and write bytes to the file.
134
+ - b When state == speaking: write bytes to the file.
135
+ - c When state == stop_speaking: write bytes to the file and close the file.
136
+ Note: In this way, problems can be troubleshot based on the original audio file and the audio file processed by vad. This function can be disabled in the production environment.
137
+ ### How to push the audio generated by TTS into the channel?
138
+ # Source code: audio_consumer.py
139
+ # Sample code: example_audio_consumer.py
140
+ ### How to release resources?
@@ -10,7 +10,6 @@ agora/rtc/audio_encoded_frame_sender.py
10
10
  agora/rtc/audio_frame_observer.py
11
11
  agora/rtc/audio_pcm_data_sender.py
12
12
  agora/rtc/audio_sessionctrl.py
13
- agora/rtc/audio_vad.py
14
13
  agora/rtc/local_audio_track.py
15
14
  agora/rtc/local_user.py
16
15
  agora/rtc/local_user_observer.py
@@ -32,6 +31,8 @@ agora/rtc/_ctypes_handle/_rtc_connection_observer.py
32
31
  agora/rtc/_ctypes_handle/_video_encoded_frame_observer.py
33
32
  agora/rtc/_ctypes_handle/_video_frame_observer.py
34
33
  agora/rtc/_utils/globals.py
34
+ agora/rtc/utils/audio_consumer.py
35
+ agora/rtc/utils/vad_dump.py
35
36
  agora_python_server_sdk.egg-info/PKG-INFO
36
37
  agora_python_server_sdk.egg-info/SOURCES.txt
37
38
  agora_python_server_sdk.egg-info/dependency_links.txt
@@ -45,12 +45,12 @@ class CustomInstallCommand(install):
45
45
 
46
46
  setup(
47
47
  name='agora_python_server_sdk',
48
- version='2.1.3',
48
+ version='2.1.5',
49
49
  description='A Python SDK for Agora Server',
50
50
  long_description=open('README.md').read(),
51
51
  long_description_content_type='text/markdown',
52
52
  url='https://github.com/AgoraIO-Extensions/Agora-Python-Server-SDK',
53
- packages=["agora.rtc", "agora.rtc._ctypes_handle", "agora.rtc._utils"],
53
+ packages=["agora.rtc", "agora.rtc._ctypes_handle", "agora.rtc._utils","agora.rtc.utils"],
54
54
  classifiers=[
55
55
  "Intended Audience :: Developers",
56
56
  'License :: OSI Approved :: MIT License',
@@ -1,51 +0,0 @@
1
- Metadata-Version: 2.1
2
- Name: agora_python_server_sdk
3
- Version: 2.1.3
4
- Summary: A Python SDK for Agora Server
5
- Home-page: https://github.com/AgoraIO-Extensions/Agora-Python-Server-SDK
6
- Classifier: Intended Audience :: Developers
7
- Classifier: License :: OSI Approved :: MIT License
8
- Classifier: Topic :: Multimedia :: Sound/Audio
9
- Classifier: Topic :: Multimedia :: Video
10
- Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
11
- Classifier: Programming Language :: Python :: 3.10
12
- Classifier: Programming Language :: Python :: 3 :: Only
13
- Requires-Python: >=3.10
14
- Description-Content-Type: text/markdown
15
-
16
- # Note
17
- - This is a Python SDK wrapper for the Agora RTC SDK.
18
- - It supports Linux and Mac platforms.
19
- - The examples are provided as very simple demonstrations and are not recommended for use in production environments.
20
-
21
- # Very Important Notice !!!
22
- - A process can only have one instance.
23
- - An instance can have multiple connections.
24
- - In all observers or callbacks, you must not call the SDK's own APIs, nor perform CPU-intensive tasks in the callbacks; data copying is allowed.
25
-
26
- # Required Operating Systems and Python Versions
27
- - Supported Linux versions:
28
- - Ubuntu 18.04 LTS and above
29
- - CentOS 7.0 and above
30
-
31
- - Supported Mac versions:
32
- - MacOS 13 and above
33
-
34
- - Python version:
35
- - Python 3.10 and above
36
-
37
- # Using Agora-Python-Server-SDK
38
- ```
39
- pip install agora_python_server_sdk
40
- ```
41
-
42
- # Running Examples
43
-
44
- ## Preparing Test Data
45
- - Download and unzip [test_data.zip](https://download.agora.io/demo/test/test_data_202408221437.zip) to the Agora-Python-Server-SDK directory.
46
-
47
- ## Executing Test Script
48
- ```
49
- python agora_rtc/examples/example_audio_pcm_send.py --appId=xxx --channelId=xxx --audioFile=./test_data/demo.pcm --sampleRate=16000 --numOfChannels=1
50
- ```
51
-
@@ -1,36 +0,0 @@
1
- # Note
2
- - This is a Python SDK wrapper for the Agora RTC SDK.
3
- - It supports Linux and Mac platforms.
4
- - The examples are provided as very simple demonstrations and are not recommended for use in production environments.
5
-
6
- # Very Important Notice !!!
7
- - A process can only have one instance.
8
- - An instance can have multiple connections.
9
- - In all observers or callbacks, you must not call the SDK's own APIs, nor perform CPU-intensive tasks in the callbacks; data copying is allowed.
10
-
11
- # Required Operating Systems and Python Versions
12
- - Supported Linux versions:
13
- - Ubuntu 18.04 LTS and above
14
- - CentOS 7.0 and above
15
-
16
- - Supported Mac versions:
17
- - MacOS 13 and above
18
-
19
- - Python version:
20
- - Python 3.10 and above
21
-
22
- # Using Agora-Python-Server-SDK
23
- ```
24
- pip install agora_python_server_sdk
25
- ```
26
-
27
- # Running Examples
28
-
29
- ## Preparing Test Data
30
- - Download and unzip [test_data.zip](https://download.agora.io/demo/test/test_data_202408221437.zip) to the Agora-Python-Server-SDK directory.
31
-
32
- ## Executing Test Script
33
- ```
34
- python agora_rtc/examples/example_audio_pcm_send.py --appId=xxx --channelId=xxx --audioFile=./test_data/demo.pcm --sampleRate=16000 --numOfChannels=1
35
- ```
36
-
@@ -1,164 +0,0 @@
1
- from . import lib_path
2
- import ctypes
3
- import os
4
- import sys
5
- from enum import Enum, IntEnum
6
- import logging
7
- logger = logging.getLogger(__name__)
8
-
9
-
10
- if sys.platform == 'darwin':
11
- agora_vad_lib_path = os.path.join(lib_path, 'libuap_aed.dylib')
12
- elif sys.platform == 'linux':
13
- agora_vad_lib_path = os.path.join(lib_path, 'libagora_uap_aed.so')
14
- try:
15
- agora_vad_lib = ctypes.CDLL(agora_vad_lib_path)
16
- except OSError as e:
17
- logger.error(f"Error loading the library: {e}")
18
- logger.error(f"Attempted to load from: {agora_vad_lib_path}")
19
- sys.exit(1)
20
-
21
-
22
- class VAD_STATE(ctypes.c_int):
23
- VAD_STATE_NONE_SPEAKING = 0
24
- VAD_STATE_START_SPEAKING = 1
25
- VAD_STATE_SPEAKING = 2
26
- VAD_STATE_STOP_SPEAKING = 3
27
-
28
-
29
- # struct def
30
- """
31
- typedef struct Vad_Config_ {
32
- int fftSz; // fft-size, only support: 128, 256, 512, 1024, default value is 1024
33
- int hopSz; // fft-Hop Size, will be used to check, default value is 160
34
- int anaWindowSz; // fft-window Size, will be used to calc rms, default value is 768
35
- int frqInputAvailableFlag; // whether Aed_InputData will contain external freq. power-sepctra, default value is 0
36
- int useCVersionAIModule; // whether to use the C version of AI submodules, default value is 0
37
- float voiceProbThr; // voice probability threshold 0.0f ~ 1.0f, default value is 0.8
38
- float rmsThr; // rms threshold in dB, default value is -40.0
39
- float jointThr; // joint threshold in dB, default value is 0.0
40
- float aggressive; // aggressive factor, greater value means more aggressive, default value is 5.0
41
- int startRecognizeCount; // start recognize count, buffer size for 10ms 16KHz 16bit 1channel PCM, default value is 10
42
- int stopRecognizeCount; // max recognize count, buffer size for 10ms 16KHz 16bit 1channel PCM, default value is 6
43
- int preStartRecognizeCount; // pre start recognize count, buffer size for 10ms 16KHz 16bit 1channel PCM, default value is 10
44
- float activePercent; // active percent, if over this percent, will be recognized as speaking, default value is 0.6
45
- float inactivePercent; // inactive percent, if below this percent, will be recognized as non-speaking, default value is 0.2
46
- } Vad_Config;
47
- """
48
-
49
-
50
- class VadConfig(ctypes.Structure):
51
- _fields_ = [
52
- ("fftSz", ctypes.c_int),
53
- ("hopSz", ctypes.c_int),
54
- ("anaWindowSz", ctypes.c_int),
55
- ("frqInputAvailableFlag", ctypes.c_int),
56
- ("useCVersionAIModule", ctypes.c_int),
57
- ("voiceProbThr", ctypes.c_float),
58
- ("rmsThr", ctypes.c_float),
59
- ("jointThr", ctypes.c_float),
60
- ("aggressive", ctypes.c_float),
61
- ("startRecognizeCount", ctypes.c_int),
62
- ("stopRecognizeCount", ctypes.c_int),
63
- ("preStartRecognizeCount", ctypes.c_int),
64
- ("activePercent", ctypes.c_float),
65
- ("inactivePercent", ctypes.c_float)
66
- ]
67
-
68
- def __init__(self) -> None:
69
- self.fftSz = 1024
70
- self.hopSz = 160
71
- self.anaWindowSz = 768
72
- self.frqInputAvailableFlag = 0
73
- self.useCVersionAIModule = 0
74
- self.voiceProbThr = 0.8
75
- self.rmsThr = -40.0
76
- self.jointThr = 0.0
77
- self.aggressive = 5.0
78
- self.startRecognizeCount = 10
79
- self.stopRecognizeCount = 6
80
- self.preStartRecognizeCount = 10
81
- self.activePercent = 0.6
82
- self.inactivePercent = 0.2
83
-
84
-
85
- # struct def
86
- class VadAudioData(ctypes.Structure):
87
- _fields_ = [
88
- ("audioData", ctypes.c_void_p),
89
- ("size", ctypes.c_int)
90
- ]
91
- # def __init__(self) -> None:
92
- # self.data = None
93
-
94
-
95
- """
96
- int Agora_UAP_VAD_Create(void** stPtr, const Vad_Config* config);
97
- int Agora_UAP_VAD_Destroy(void** stPtr);
98
- int Agora_UAP_VAD_Proc(void* stPtr, const Vad_AudioData* pIn, Vad_AudioData* pOut, VAD_STATE* state);
99
- """
100
- agora_uap_vad_create = agora_vad_lib.Agora_UAP_VAD_Create
101
- agora_uap_vad_create.restype = ctypes.c_int
102
- agora_uap_vad_create.argtypes = [ctypes.POINTER(ctypes.c_void_p), ctypes.POINTER(VadConfig)]
103
-
104
- agora_uap_vad_destroy = agora_vad_lib.Agora_UAP_VAD_Destroy
105
- agora_uap_vad_destroy.restype = ctypes.c_int
106
- agora_uap_vad_destroy.argtypes = [ctypes.POINTER(ctypes.c_void_p)]
107
-
108
- agora_uap_vad_proc = agora_vad_lib.Agora_UAP_VAD_Proc
109
- agora_uap_vad_proc.restype = ctypes.c_int
110
- agora_uap_vad_proc.argtypes = [ctypes.c_void_p, ctypes.POINTER(VadAudioData), ctypes.POINTER(VadAudioData), ctypes.POINTER(VAD_STATE)]
111
-
112
-
113
- class AudioVad:
114
- def __init__(self) -> None:
115
- self.vadCfg = VadConfig()
116
-
117
- self.handler = None
118
- self.lastOutTs = 0
119
- self.initialized = False
120
- # return 0 if success, -1 if failed
121
-
122
- def Create(self, vadCfg):
123
- if self.initialized:
124
- return 0
125
- self.vadCfg = vadCfg
126
- self.initialized = True
127
- # creat handler
128
- self.handler = ctypes.c_void_p()
129
- ret = agora_uap_vad_create(ctypes.byref(self.handler), ctypes.byref(self.vadCfg))
130
- return ret
131
-
132
- # Destroy
133
- # return 0 if success, -1 if failed
134
- def Destroy(self):
135
- if self.initialized:
136
- agora_uap_vad_destroy(ctypes.byref(self.handler))
137
- self.initialized = False
138
- self.handler = None
139
- return 0
140
-
141
- # Proc
142
- # framein: bytearray object, include audio data
143
- # return ret, frameout, flag, ret: 0 if success, -1 if failed; frameout: bytearray object, include audio data; flag: 0 if non-speaking, 1 if speaking
144
- def Proc(self, framein):
145
- ret = -1
146
- if not self.initialized:
147
- return -1
148
-
149
- # supporse vadout is empty,vadin byte array
150
- inVadData = VadAudioData()
151
- buffer = (ctypes.c_ubyte * len(framein)).from_buffer(framein) # only a pointer to the buffer is needed, not a copy
152
- inVadData.audioData = ctypes.cast(buffer, ctypes.c_void_p)
153
- inVadData.size = len(framein)
154
-
155
- outVadData = VadAudioData(None, 0) # c api will allocate memory
156
- vadflag = VAD_STATE(0)
157
- ret = agora_uap_vad_proc(self.handler, ctypes.byref(inVadData), ctypes.byref(outVadData), ctypes.byref(vadflag))
158
-
159
- # convert from c_char to bytearray
160
- bytes_from_c = ctypes.string_at(outVadData.audioData, outVadData.size)
161
- frameout = bytearray(bytes_from_c)
162
- flag = vadflag.value
163
-
164
- return ret, frameout, flag
@@ -1,51 +0,0 @@
1
- Metadata-Version: 2.1
2
- Name: agora_python_server_sdk
3
- Version: 2.1.3
4
- Summary: A Python SDK for Agora Server
5
- Home-page: https://github.com/AgoraIO-Extensions/Agora-Python-Server-SDK
6
- Classifier: Intended Audience :: Developers
7
- Classifier: License :: OSI Approved :: MIT License
8
- Classifier: Topic :: Multimedia :: Sound/Audio
9
- Classifier: Topic :: Multimedia :: Video
10
- Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
11
- Classifier: Programming Language :: Python :: 3.10
12
- Classifier: Programming Language :: Python :: 3 :: Only
13
- Requires-Python: >=3.10
14
- Description-Content-Type: text/markdown
15
-
16
- # Note
17
- - This is a Python SDK wrapper for the Agora RTC SDK.
18
- - It supports Linux and Mac platforms.
19
- - The examples are provided as very simple demonstrations and are not recommended for use in production environments.
20
-
21
- # Very Important Notice !!!
22
- - A process can only have one instance.
23
- - An instance can have multiple connections.
24
- - In all observers or callbacks, you must not call the SDK's own APIs, nor perform CPU-intensive tasks in the callbacks; data copying is allowed.
25
-
26
- # Required Operating Systems and Python Versions
27
- - Supported Linux versions:
28
- - Ubuntu 18.04 LTS and above
29
- - CentOS 7.0 and above
30
-
31
- - Supported Mac versions:
32
- - MacOS 13 and above
33
-
34
- - Python version:
35
- - Python 3.10 and above
36
-
37
- # Using Agora-Python-Server-SDK
38
- ```
39
- pip install agora_python_server_sdk
40
- ```
41
-
42
- # Running Examples
43
-
44
- ## Preparing Test Data
45
- - Download and unzip [test_data.zip](https://download.agora.io/demo/test/test_data_202408221437.zip) to the Agora-Python-Server-SDK directory.
46
-
47
- ## Executing Test Script
48
- ```
49
- python agora_rtc/examples/example_audio_pcm_send.py --appId=xxx --channelId=xxx --audioFile=./test_data/demo.pcm --sampleRate=16000 --numOfChannels=1
50
- ```
51
-