PyPI - agora-python-server-sdk - Versions diffs - 2.1.6__tar.gz → 2.1.7__tar.gz - Mend

agora-python-server-sdk 2.1.6tar.gz → 2.1.7tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of agora-python-server-sdk might be problematic. Click here for more details.

Files changed (42) hide show

{agora_python_server_sdk-2.1.6/agora_python_server_sdk.egg-info → agora_python_server_sdk-2.1.7}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: agora_python_server_sdk
-Version: 2.1.6
+Version: 2.1.7
 Summary: A Python SDK for Agora Server
 Home-page: https://github.com/AgoraIO-Extensions/Agora-Python-Server-SDK
 Classifier: Intended Audience :: Developers
@@ -51,6 +51,12 @@ python agora_rtc/examples/example_audio_pcm_send.py --appId=xxx --channelId=xxx
 # Change log
+2024.12.17 Release 2.1.7
+--Changes:
+  Fixed the typeError issue in LocalUser::sub/unsub audio/video.
+  Adjusted the default stopRecogCount for VAD from 30 to 50.
+  Modified sample_vad.
 ## 2024.12.09 Release 2.1.6
 - New Features:
   -- Added AudioVadManager to manage VAD (Voice Activity Detection) instances.
@@ -279,3 +285,41 @@ Store the LLM results in a cache as they are received.
 Perform a reverse scan of the cached data to find the most recent punctuation mark.
 Truncate the data from the start to the most recent punctuation mark and pass it to TTS for synthesis.
 Remove the truncated data from the cache. The remaining data should be moved to the beginning of the cache and continue waiting for additional data from the LLM.
+##VAD Configuration Parameters
+AgoraAudioVadConfigV2 Properties
+Property Name	Type	Description	Default Value	Value Range
+preStartRecognizeCount	int	Number of audio frames saved before detecting speech	16	[0, ]
+startRecognizeCount	int	Total number of audio frames to detect speech start	30	[1, max]
+stopRecognizeCount	int	Number of audio frames to detect speech stop	50	[1, max]
+activePercent	float	Percentage of active frames in startRecognizeCount frames	0.7	[0.0, 1.0]
+inactivePercent	float	Percentage of inactive frames in stopRecognizeCount frames	0.5	[0.0, 1.0]
+startVoiceProb	int	Probability that an audio frame contains human voice	70	[0, 100]
+stopVoiceProb	int	Probability that an audio frame contains human voice	70	[0, 100]
+startRmsThreshold	int	Energy dB threshold for detecting speech start	-50	[-100, 0]
+stopRmsThreshold	int	Energy dB threshold for detecting speech stop	-50	[-100, 0]
+Notes:
+startRmsThreshold and stopRmsThreshold:
+The higher the value, the louder the speaker's voice needs to be compared to the surrounding background noise.
+In quiet environments, it is recommended to use the default value of -50.
+In noisy environments, you can increase the threshold to between -40 and -30 to reduce false positives.
+Adjusting these thresholds based on the actual use case and audio characteristics can achieve optimal performance.
+stopRecognizeCount:
+This value reflects how long to wait after detecting non-human voice before concluding that the user has stopped speaking. It controls the gap between consecutive speech utterances. Within this gap, VAD will treat adjacent sentences as part of the same speech.
+A shorter gap will increase the likelihood of adjacent sentences being recognized as separate speech segments. Typically, it is recommended to set this value between 50 and 80.
+For example: "Good afternoon, [interval_between_sentences] what are some fun places to visit in Beijing?"
+If the interval_between_sentences between the speaker's phrases is greater than the stopRecognizeCount, the VAD will recognize the above as two separate VADs:
+VAD1: Good afternoon
+VAD2: What are some fun places to visit in Beijing?
+If the interval_between_sentences is less than stopRecognizeCount, the VAD will recognize the above as a single VAD:
+VAD: Good afternoon, what are some fun places to visit in Beijing?
+If latency is a concern, you can lower this value, or consult with the development team to determine how to manage latency while ensuring semantic continuity in speech recognition. This will help avoid the AI being interrupted too sensitively.

{agora_python_server_sdk-2.1.6 → agora_python_server_sdk-2.1.7}/README.md RENAMED Viewed

@@ -36,6 +36,12 @@ python agora_rtc/examples/example_audio_pcm_send.py --appId=xxx --channelId=xxx
 # Change log
+2024.12.17 Release 2.1.7
+--Changes:
+  Fixed the typeError issue in LocalUser::sub/unsub audio/video.
+  Adjusted the default stopRecogCount for VAD from 30 to 50.
+  Modified sample_vad.
 ## 2024.12.09 Release 2.1.6
 - New Features:
   -- Added AudioVadManager to manage VAD (Voice Activity Detection) instances.
@@ -263,4 +269,42 @@ To achieve a balance between clarity and minimal delay, the following steps shou
 Store the LLM results in a cache as they are received.
 Perform a reverse scan of the cached data to find the most recent punctuation mark.
 Truncate the data from the start to the most recent punctuation mark and pass it to TTS for synthesis.
-Remove the truncated data from the cache. The remaining data should be moved to the beginning of the cache and continue waiting for additional data from the LLM.
+Remove the truncated data from the cache. The remaining data should be moved to the beginning of the cache and continue waiting for additional data from the LLM.
+##VAD Configuration Parameters
+AgoraAudioVadConfigV2 Properties
+Property Name	Type	Description	Default Value	Value Range
+preStartRecognizeCount	int	Number of audio frames saved before detecting speech	16	[0, ]
+startRecognizeCount	int	Total number of audio frames to detect speech start	30	[1, max]
+stopRecognizeCount	int	Number of audio frames to detect speech stop	50	[1, max]
+activePercent	float	Percentage of active frames in startRecognizeCount frames	0.7	[0.0, 1.0]
+inactivePercent	float	Percentage of inactive frames in stopRecognizeCount frames	0.5	[0.0, 1.0]
+startVoiceProb	int	Probability that an audio frame contains human voice	70	[0, 100]
+stopVoiceProb	int	Probability that an audio frame contains human voice	70	[0, 100]
+startRmsThreshold	int	Energy dB threshold for detecting speech start	-50	[-100, 0]
+stopRmsThreshold	int	Energy dB threshold for detecting speech stop	-50	[-100, 0]
+Notes:
+startRmsThreshold and stopRmsThreshold:
+The higher the value, the louder the speaker's voice needs to be compared to the surrounding background noise.
+In quiet environments, it is recommended to use the default value of -50.
+In noisy environments, you can increase the threshold to between -40 and -30 to reduce false positives.
+Adjusting these thresholds based on the actual use case and audio characteristics can achieve optimal performance.
+stopRecognizeCount:
+This value reflects how long to wait after detecting non-human voice before concluding that the user has stopped speaking. It controls the gap between consecutive speech utterances. Within this gap, VAD will treat adjacent sentences as part of the same speech.
+A shorter gap will increase the likelihood of adjacent sentences being recognized as separate speech segments. Typically, it is recommended to set this value between 50 and 80.
+For example: "Good afternoon, [interval_between_sentences] what are some fun places to visit in Beijing?"
+If the interval_between_sentences between the speaker's phrases is greater than the stopRecognizeCount, the VAD will recognize the above as two separate VADs:
+VAD1: Good afternoon
+VAD2: What are some fun places to visit in Beijing?
+If the interval_between_sentences is less than stopRecognizeCount, the VAD will recognize the above as a single VAD:
+VAD: Good afternoon, what are some fun places to visit in Beijing?
+If latency is a concern, you can lower this value, or consult with the development team to determine how to manage latency while ensuring semantic continuity in speech recognition. This will help avoid the AI being interrupted too sensitively.

{agora_python_server_sdk-2.1.6 → agora_python_server_sdk-2.1.7}/agora/rtc/local_user.py RENAMED Viewed

@@ -62,7 +62,7 @@ agora_local_user_subscribe_all_audio.argtypes = [AGORA_HANDLE]
 agora_local_user_unsubscribe_audio = agora_lib.agora_local_user_unsubscribe_audio
 agora_local_user_unsubscribe_audio.restype = AGORA_API_C_INT
-agora_local_user_unsubscribe_audio.argtypes = [AGORA_HANDLE, ctypes.c_uint]
+agora_local_user_unsubscribe_audio.argtypes = [AGORA_HANDLE, user_id_t]
 agora_local_user_unsubscribe_all_audio = agora_lib.agora_local_user_unsubscribe_all_audio
 agora_local_user_unsubscribe_all_audio.restype = AGORA_API_C_INT
@@ -184,7 +184,7 @@ agora_local_user_subscribe_all_video.argtypes = [AGORA_HANDLE, ctypes.POINTER(Vi
 agora_local_user_unsubscribe_video = agora_lib.agora_local_user_unsubscribe_video
 agora_local_user_unsubscribe_video.restype = AGORA_API_C_INT
-agora_local_user_unsubscribe_video.argtypes = [AGORA_HANDLE, ctypes.c_uint]
+agora_local_user_unsubscribe_video.argtypes = [AGORA_HANDLE, user_id_t]
 agora_local_user_unsubscribe_all_video = agora_lib.agora_local_user_unsubscribe_all_video
 agora_local_user_unsubscribe_all_video.restype = AGORA_API_C_INT
@@ -361,7 +361,13 @@ class LocalUser:
         return ret
     def subscribe_audio(self, user_id):
-        ret = agora_local_user_subscribe_audio(self.user_handle, ctypes.c_char_p(user_id.encode()))
+        if user_id is None:
+            return -1
+        uid_str = user_id.encode('utf-8')
+        #ret = agora_local_user_subscribe_audio(self.user_handle, ctypes.create_string_buffer(uid_str))
+        # note：both ctypes.create_string_buffer and ctypes.c_char_p are all can change python's str to c_char_p
+        # but ctypes.c_char_p is more suitable for this case for the c api never change the content of c_char_p
+        ret = agora_local_user_subscribe_audio(self.user_handle, ctypes.c_char_p(uid_str))
         return ret
     def subscribe_all_audio(self):
@@ -369,7 +375,11 @@ class LocalUser:
         return ret
     def unsubscribe_audio(self, user_id):
-        ret = agora_local_user_unsubscribe_audio(self.user_handle, ctypes.c_char_p(user_id.encode()))
+        #validity check
+        if user_id is None:
+            return -1
+        uid_str = user_id.encode('utf-8')
+        ret = agora_local_user_unsubscribe_audio(self.user_handle, ctypes.c_char_p(uid_str))
         if ret < 0:
             logger.error("Failed to unsubscribe audio")
         else:
@@ -485,18 +495,33 @@ class LocalUser:
     #     return ret
     def subscribe_video(self, user_id, options: VideoSubscriptionOptions):
-        user_id_t = user_id.encode('utf-8')
+        if user_id is None:
+            return -1
+        uid_str = user_id.encode('utf-8')
-        ret = agora_local_user_subscribe_video(self.user_handle, user_id_t, ctypes.byref(options))
+        if options is  None:
+            inner = VideoSubscriptionOptionsInner()
+        else:
+            inner = VideoSubscriptionOptionsInner.create(options)
+        c_ptr = ctypes.byref(inner)
+        ret = agora_local_user_subscribe_video(self.user_handle, ctypes.c_char_p(uid_str), c_ptr)
         return ret
     def subscribe_all_video(self, options: VideoSubscriptionOptions):
-        ret = agora_local_user_subscribe_all_video(self.user_handle, ctypes.byref(VideoSubscriptionOptionsInner.create(options)))
+        if options is  None:
+            inner = VideoSubscriptionOptionsInner()
+        else:
+            inner = VideoSubscriptionOptionsInner.create(options)
+        ret = agora_local_user_subscribe_all_video(self.user_handle, ctypes.byref(inner))
         return ret
     def unsubscribe_video(self, user_id):
-        user_id_t = user_id.encode('utf-8')
-        ret = agora_local_user_unsubscribe_video(self.user_handle, user_id_t)
+        if user_id is None:
+            return -1
+        uid_str = user_id.encode('utf-8')
+        ret = agora_local_user_unsubscribe_video(self.user_handle, ctypes.c_char_p(uid_str))
         if ret < 0:
             logger.error("Failed to unsubscribe video")
         else:

{agora_python_server_sdk-2.1.6 → agora_python_server_sdk-2.1.7}/agora/rtc/utils/audio_consumer.py RENAMED Viewed

@@ -67,7 +67,7 @@ class AudioConsumer:
         pass
     def consume(self):
-        print("consume begin")
+        #print("consume begin")
         if self._init == False:
             return -1
         now = time.time()*1000

{agora_python_server_sdk-2.1.6 → agora_python_server_sdk-2.1.7/agora_python_server_sdk.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: agora_python_server_sdk
-Version: 2.1.6
+Version: 2.1.7
 Summary: A Python SDK for Agora Server
 Home-page: https://github.com/AgoraIO-Extensions/Agora-Python-Server-SDK
 Classifier: Intended Audience :: Developers
@@ -51,6 +51,12 @@ python agora_rtc/examples/example_audio_pcm_send.py --appId=xxx --channelId=xxx
 # Change log
+2024.12.17 Release 2.1.7
+--Changes:
+  Fixed the typeError issue in LocalUser::sub/unsub audio/video.
+  Adjusted the default stopRecogCount for VAD from 30 to 50.
+  Modified sample_vad.
 ## 2024.12.09 Release 2.1.6
 - New Features:
   -- Added AudioVadManager to manage VAD (Voice Activity Detection) instances.
@@ -279,3 +285,41 @@ Store the LLM results in a cache as they are received.
 Perform a reverse scan of the cached data to find the most recent punctuation mark.
 Truncate the data from the start to the most recent punctuation mark and pass it to TTS for synthesis.
 Remove the truncated data from the cache. The remaining data should be moved to the beginning of the cache and continue waiting for additional data from the LLM.
+##VAD Configuration Parameters
+AgoraAudioVadConfigV2 Properties
+Property Name	Type	Description	Default Value	Value Range
+preStartRecognizeCount	int	Number of audio frames saved before detecting speech	16	[0, ]
+startRecognizeCount	int	Total number of audio frames to detect speech start	30	[1, max]
+stopRecognizeCount	int	Number of audio frames to detect speech stop	50	[1, max]
+activePercent	float	Percentage of active frames in startRecognizeCount frames	0.7	[0.0, 1.0]
+inactivePercent	float	Percentage of inactive frames in stopRecognizeCount frames	0.5	[0.0, 1.0]
+startVoiceProb	int	Probability that an audio frame contains human voice	70	[0, 100]
+stopVoiceProb	int	Probability that an audio frame contains human voice	70	[0, 100]
+startRmsThreshold	int	Energy dB threshold for detecting speech start	-50	[-100, 0]
+stopRmsThreshold	int	Energy dB threshold for detecting speech stop	-50	[-100, 0]
+Notes:
+startRmsThreshold and stopRmsThreshold:
+The higher the value, the louder the speaker's voice needs to be compared to the surrounding background noise.
+In quiet environments, it is recommended to use the default value of -50.
+In noisy environments, you can increase the threshold to between -40 and -30 to reduce false positives.
+Adjusting these thresholds based on the actual use case and audio characteristics can achieve optimal performance.
+stopRecognizeCount:
+This value reflects how long to wait after detecting non-human voice before concluding that the user has stopped speaking. It controls the gap between consecutive speech utterances. Within this gap, VAD will treat adjacent sentences as part of the same speech.
+A shorter gap will increase the likelihood of adjacent sentences being recognized as separate speech segments. Typically, it is recommended to set this value between 50 and 80.
+For example: "Good afternoon, [interval_between_sentences] what are some fun places to visit in Beijing?"
+If the interval_between_sentences between the speaker's phrases is greater than the stopRecognizeCount, the VAD will recognize the above as two separate VADs:
+VAD1: Good afternoon
+VAD2: What are some fun places to visit in Beijing?
+If the interval_between_sentences is less than stopRecognizeCount, the VAD will recognize the above as a single VAD:
+VAD: Good afternoon, what are some fun places to visit in Beijing?
+If latency is a concern, you can lower this value, or consult with the development team to determine how to manage latency while ensuring semantic continuity in speech recognition. This will help avoid the AI being interrupted too sensitively.

{agora_python_server_sdk-2.1.6 → agora_python_server_sdk-2.1.7}/setup.py RENAMED Viewed

@@ -45,7 +45,7 @@ class CustomInstallCommand(install):
 setup(
     name='agora_python_server_sdk',
-    version='2.1.6',
+    version='2.1.7',
     description='A Python SDK for Agora Server',
     long_description=open('README.md').read(),
     long_description_content_type='text/markdown',