PyPI - nonebot-plugin-parser - Versions diffs - 2.0.0__tar.gz → 2.0.1__tar.gz - Mend

nonebot-plugin-parser 2.0.0tar.gz → 2.0.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (42) hide show

{nonebot_plugin_parser-2.0.0 → nonebot_plugin_parser-2.0.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.3
 Name: nonebot-plugin-parser
-Version: 2.0.0
+Version: 2.0.1
 Summary: NoneBot2 链接分享解析器自动解析, BV号/链接/小程序/卡片 | B站/抖音/快手/微博/小红书/youtube/tiktok/twitter/acfun
 Keywords: nonebot,nonebot2,video,bilibili,youtube,tiktok,twitter,kuaishou,acfun,weibo,xiaohongshu,nga,douyin
 Author: fllesser
@@ -150,9 +150,10 @@ Windows 参考(原项目推荐): https://www.jianshu.com/p/5015a477de3c
 |    parser_need_upload     |  否   |          False           |                                                                                                                          音频解析，是否需要上传群文件                                                                                                                           |
 |     parser_use_base64     |  否   |          False           |                                            视频，图片，音频是否使用 base64 发送，注意：编解码和传输 base64 会占用更多的内存,性能和带宽, 甚至可能会使 websocket 连接崩溃，因此该配置项仅推荐 nonebot 和 协议端不在同一机器的用户配置                                             |
 |  parser_duration_maximum  |  否   |           480            |                                                                                                                          视频最大解析时长，单位：_秒_                                                                                                                           |
-|      parser_max_size      |  否   |           90            |                                                                                                              音视频下载最大文件大小，单位 MB，超过该配置将阻断下载                                                                                                              |
+|      parser_max_size      |  否   |            90            |                                                                                                              音视频下载最大文件大小，单位 MB，超过该配置将阻断下载                                                                                                              |
 | parser_disabled_platforms |  否   |            []            |                               全局禁止的解析，示例 parser_disabled_platforms=["bilibili", "douyin"] 表示禁止了哔哩哔哩和抖, 请根据自己需求填写["bilibili", "douyin", "kuaishou", "twitter", "youtube", "acfun", "tiktok", "weibo", "xiaohongshu"]                               |
-| parser_render_type        |  否   |         "common"        |                                               渲染器类型，可选 "default"(无图片渲染), "common"(PIL 通用图片渲染), "htmlkit"(htmlkit)                                                                                                                |
+|    parser_render_type     |  否   |         "common"         |                                                                                             渲染器类型，可选 "default"(无图片渲染), "common"(PIL 通用图片渲染), "htmlkit"(htmlkit)                                                                                              |
+|     parser_append_url     |  否   |          False           |                                                                                                                           是否在解析结果中附加原始URL                                                                                                                           |
 ## 🎉 使用
 ### 指令表

{nonebot_plugin_parser-2.0.0 → nonebot_plugin_parser-2.0.1}/README.md RENAMED Viewed

@@ -120,9 +120,10 @@ Windows 参考(原项目推荐): https://www.jianshu.com/p/5015a477de3c
 |    parser_need_upload     |  否   |          False           |                                                                                                                          音频解析，是否需要上传群文件                                                                                                                           |
 |     parser_use_base64     |  否   |          False           |                                            视频，图片，音频是否使用 base64 发送，注意：编解码和传输 base64 会占用更多的内存,性能和带宽, 甚至可能会使 websocket 连接崩溃，因此该配置项仅推荐 nonebot 和 协议端不在同一机器的用户配置                                             |
 |  parser_duration_maximum  |  否   |           480            |                                                                                                                          视频最大解析时长，单位：_秒_                                                                                                                           |
-|      parser_max_size      |  否   |           90            |                                                                                                              音视频下载最大文件大小，单位 MB，超过该配置将阻断下载                                                                                                              |
+|      parser_max_size      |  否   |            90            |                                                                                                              音视频下载最大文件大小，单位 MB，超过该配置将阻断下载                                                                                                              |
 | parser_disabled_platforms |  否   |            []            |                               全局禁止的解析，示例 parser_disabled_platforms=["bilibili", "douyin"] 表示禁止了哔哩哔哩和抖, 请根据自己需求填写["bilibili", "douyin", "kuaishou", "twitter", "youtube", "acfun", "tiktok", "weibo", "xiaohongshu"]                               |
-| parser_render_type        |  否   |         "common"        |                                               渲染器类型，可选 "default"(无图片渲染), "common"(PIL 通用图片渲染), "htmlkit"(htmlkit)                                                                                                                |
+|    parser_render_type     |  否   |         "common"         |                                                                                             渲染器类型，可选 "default"(无图片渲染), "common"(PIL 通用图片渲染), "htmlkit"(htmlkit)                                                                                              |
+|     parser_append_url     |  否   |          False           |                                                                                                                           是否在解析结果中附加原始URL                                                                                                                           |
 ## 🎉 使用
 ### 指令表

{nonebot_plugin_parser-2.0.0 → nonebot_plugin_parser-2.0.1}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "nonebot-plugin-parser"
-version = "2.0.0"
+version = "2.0.1"
 description = "NoneBot2 链接分享解析器自动解析, BV号/链接/小程序/卡片 | B站/抖音/快手/微博/小红书/youtube/tiktok/twitter/acfun"
 authors = [{ "name" = "fllesser", "email" = "fllessive@gmail.com" }]
 readme = "README.md"
@@ -185,7 +185,7 @@ build-backend = "uv_build"
 [tool.bumpversion]
-current_version = "2.0.0"
+current_version = "2.0.1"
 commit = true
 message = "🔖 release: bump vesion from {current_version} to {new_version}"
 tag = true

{nonebot_plugin_parser-2.0.0 → nonebot_plugin_parser-2.0.1}/src/nonebot_plugin_parser/config.py RENAMED Viewed

@@ -43,6 +43,8 @@ class Config(BaseModel):
     """资源最大大小 默认 100 单位 MB"""
     parser_duration_maximum: int = 480
     """视频/音频最大时长"""
+    parser_append_url: bool = False
+    """是否在解析结果中附加原始URL"""
     parser_disabled_platforms: list[PlatformNames] = []
     """禁止的解析器"""
     parser_bili_video_codes: list[VideoCodecs] = [VideoCodecs.AVC, VideoCodecs.AV1, VideoCodecs.HEV]
@@ -120,6 +122,11 @@ class Config(BaseModel):
         """是否使用 base64 编码发送图片，音频，视频"""
         return self.parser_use_base64
+    @property
+    def append_url(self) -> bool:
+        """是否在解析结果中附加原始URL"""
+        return self.parser_append_url
 pconfig: Config = get_plugin_config(Config)
 """配置"""

{nonebot_plugin_parser-2.0.0 → nonebot_plugin_parser-2.0.1}/src/nonebot_plugin_parser/parsers/data.py RENAMED Viewed

@@ -195,8 +195,6 @@ class ParseResult:
         for cont in self.contents:
             if isinstance(cont, VideoContent):
                 return await cont.get_cover_path()
-            if isinstance(cont, ImageContent):
-                return await cont.get_path()
         return None
     async def contents_to_segs(self):
@@ -261,7 +259,7 @@ class ParseData:
     url: str | None = None
     video_url: str | None = None
     cover_url: str | None = None
-    images_urls: list[str] | None = None
-    dynamic_urls: list[str] | None = None
+    images_urls: list[str] = field(default_factory=list)
+    dynamic_urls: list[str] = field(default_factory=list)
     extra: dict[str, Any] = field(default_factory=dict)
     repost: "ParseData | None" = None

{nonebot_plugin_parser-2.0.0 → nonebot_plugin_parser-2.0.1}/src/nonebot_plugin_parser/parsers/douyin/video.py RENAMED Viewed

@@ -43,8 +43,8 @@ class VideoData(Struct):
     video: Video | None = None
     @property
-    def images_urls(self) -> list[str] | None:
-        return [image.url_list[0] for image in self.images] if self.images else None
+    def images_urls(self) -> list[str]:
+        return [image.url_list[0] for image in self.images] if self.images else []
     @property
     def video_url(self) -> str | None:
@@ -65,14 +65,14 @@ class VideoData(Struct):
     @property
     def parse_data(self) -> ParseData:
         """转换为ParseData对象"""
+        images_urls = self.images_urls
         return ParseData(
             title=self.desc,
             name=self.author.nickname,
             avatar_url=self.avatar_url,
             timestamp=self.create_time,
-            images_urls=self.images_urls,
-            video_url=self.video_url if self.images_urls is None else None,
+            images_urls=images_urls,
+            video_url=self.video_url if len(images_urls) == 0 else None,
             cover_url=self.cover_url,
         )

{nonebot_plugin_parser-2.0.0 → nonebot_plugin_parser-2.0.1}/src/nonebot_plugin_parser/parsers/kuaishou.py RENAMED Viewed

@@ -86,7 +86,7 @@ class Atlas(Struct):
     @property
     def img_urls(self):
         if len(self.cdn_list) == 0 or len(self.img_route_list) == 0:
-            return None
+            return []
         cdn = random.choice(self.cdn_list).cdn
         return [f"https://{cdn}/{url}" for url in self.img_route_list]

nonebot_plugin_parser-2.0.1/src/nonebot_plugin_parser/parsers/twitter.py ADDED Viewed

@@ -0,0 +1,121 @@
+import re
+from typing import Any, ClassVar
+import httpx
+from ..exception import ParseException
+from .base import BaseParser
+from .data import ParseResult, Platform
+class TwitterParser(BaseParser):
+    # 平台信息
+    platform: ClassVar[Platform] = Platform(name="twitter", display_name="小蓝鸟")
+    # URL 正则表达式模式（keyword, pattern）
+    patterns: ClassVar[list[tuple[str, str]]] = [
+        ("x.com", r"https?://x.com/[0-9-a-zA-Z_]{1,20}/status/([0-9]+)"),
+    ]
+    async def _req_xdown_api(self, url: str) -> dict[str, Any]:
+        headers = {
+            "Accept": "application/json, text/plain, */*",
+            "Content-Type": "application/x-www-form-urlencoded",
+            "Origin": "https://xdown.app",
+            "Referer": "https://xdown.app/",
+            **self.headers,
+        }
+        data = {"q": url, "lang": "zh-cn"}
+        async with httpx.AsyncClient(headers=headers, timeout=self.timeout) as client:
+            url = "https://xdown.app/api/ajaxSearch"
+            response = await client.post(url, data=data)
+            return response.json()
+    async def parse(self, matched: re.Match[str]) -> ParseResult:
+        """解析 URL 获取内容信息并下载资源
+        Args:
+            matched: 正则表达式匹配对象，由平台对应的模式匹配得到
+        Returns:
+            ParseResult: 解析结果（已下载资源，包含 Path)
+        Raises:
+            ParseException: 解析失败时抛出
+        """
+        # 从匹配对象中获取原始URL
+        url = matched.group(0)
+        resp = await self._req_xdown_api(url)
+        if resp.get("status") != "ok":
+            raise ParseException("解析失败")
+        html_content = resp.get("data")
+        if html_content is None:
+            raise ParseException("解析失败, 数据为空")
+        data = self.parse_twitter_html(html_content)
+        return self.build_result(data)
+    @classmethod
+    def parse_twitter_html(cls, html_content: str):
+        """解析 Twitter HTML 内容
+        Args:
+            html_content (str): Twitter HTML 内容
+        Returns:
+            ParseData: 解析数据
+        """
+        from bs4 import BeautifulSoup, Tag
+        from .data import ParseData
+        soup = BeautifulSoup(html_content, "html.parser")
+        data = ParseData()
+        # 1. 提取缩略图链接
+        img_tag = soup.find("img")
+        if img_tag and isinstance(img_tag, Tag):
+            src = img_tag.get("src")
+            if src and isinstance(src, str):
+                data.cover_url = src
+        # 2. 提取下载链接
+        download_links = soup.find_all("a", class_="tw-button-dl")
+        # class="abutton is-success is-fullwidth  btn-premium mt-3"
+        download_items = soup.find_all("a", class_="abutton")
+        for link in download_links + download_items:
+            if isinstance(link, Tag) and (href := link.get("href")) and isinstance(href, str):
+                href = href
+            else:
+                continue
+            text = link.get_text(strip=True)
+            if "下载图片" in text:
+                # 从图片下载链接中提取原始图片URL
+                data.images_urls.append(href)
+            elif "下载 gif" in text:
+                data.dynamic_urls.append(href)  # GIF和MP4是同一个文件
+            elif "下载 MP4" in text:
+                # 从GIF/MP4下载链接中提取原始视频URL
+                data.video_url = href
+                break
+        # 3. 提取标题
+        title_tag = soup.find("h3")
+        if title_tag:
+            data.title = title_tag.get_text(strip=True)
+        # # 4. 提取Twitter ID
+        # twitter_id_input = soup.find("input", {"id": "TwitterId"})
+        # if (
+        #     twitter_id_input
+        #     and isinstance(twitter_id_input, Tag)
+        #     and (value := twitter_id_input.get("value"))
+        #     and isinstance(value, str)
+        # ):
+        data.name = "暂时无法获取用户名"
+        return data

{nonebot_plugin_parser-2.0.0 → nonebot_plugin_parser-2.0.1}/src/nonebot_plugin_parser/parsers/weibo.py RENAMED Viewed

@@ -175,7 +175,6 @@ class WeiBoParser(BaseParser):
         # 用 bytes 更稳，避免编码歧义
         weibo_data = msgspec.json.decode(response.content, type=WeiboResponse).data
-        url = f"https://weibo.com/{weibo_data.user.id}/{weibo_data.bid}"
         return self.build_result(weibo_data.parse_data)
     def _base62_encode(self, number: int) -> str:
@@ -269,7 +268,7 @@ class WeiboData(Struct):
     @property
     def title(self) -> str:
-        return self.status_title or self.page_info.title if self.page_info else ""
+        return self.page_info.title if self.page_info else ""
     @property
     def display_name(self) -> str:
@@ -303,6 +302,14 @@ class WeiboData(Struct):
             return [x.large.url for x in self.pics]
         return []
+    @property
+    def url(self) -> str:
+        return f"https://weibo.com/{self.user.id}/{self.bid}"
+    @property
+    def timestamp(self) -> int:
+        return int(time.mktime(time.strptime(self.created_at, "%a %b %d %H:%M:%S %z %Y")))
     @property
     def parse_data(self) -> ParseData:
         return ParseData(
@@ -310,10 +317,11 @@ class WeiboData(Struct):
             name=self.display_name,
             avatar_url=self.user.profile_image_url,
             text=self.text_content,
-            timestamp=int(time.mktime(time.strptime(self.created_at, "%a %b %d %H:%M:%S %z %Y"))),
+            timestamp=self.timestamp,
             video_url=self.video_url,
             cover_url=self.cover_url,
             images_urls=self.pic_urls,
+            url=self.url,
             repost=self.retweeted_status.parse_data if self.retweeted_status else None,
         )

nonebot-plugin-parser 2.0.0__tar.gz → 2.0.1__tar.gz

nonebot-plugin-parser 2.0.0tar.gz → 2.0.1tar.gz