media-tagging 0.2.0.dev2__tar.gz → 0.3.0.dev1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/PKG-INFO +13 -1
- media_tagging-0.3.0.dev1/README.md +72 -0
- {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/entrypoints/cli.py +5 -43
- {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/entrypoints/server.py +23 -25
- {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/media_tagging/tagger.py +43 -0
- {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/media_tagging/taggers/base.py +2 -0
- {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/media_tagging/taggers/llm.py +111 -0
- {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/media_tagging/tools.py +4 -8
- {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/media_tagging/utils.py +10 -0
- {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/media_tagging.egg-info/PKG-INFO +13 -1
- {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/media_tagging.egg-info/requires.txt +5 -4
- {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/setup.py +6 -5
- {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/tests/end_to_end/test_main.py +5 -6
- media-tagging-0.2.0.dev2/README.md +0 -48
- {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/entrypoints/__init__.py +0 -0
- {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/media_tagging/__init__.py +0 -0
- {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/media_tagging/llms.py +0 -0
- {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/media_tagging/taggers/__init__.py +0 -0
- {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/media_tagging/taggers/api.py +0 -0
- {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/media_tagging/writer.py +0 -0
- {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/media_tagging.egg-info/SOURCES.txt +0 -0
- {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/media_tagging.egg-info/dependency_links.txt +0 -0
- {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/media_tagging.egg-info/entry_points.txt +0 -0
- {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/media_tagging.egg-info/top_level.txt +0 -0
- {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/setup.cfg +0 -0
- {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/tests/__init__.py +0 -0
- {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/tests/conftest.py +0 -0
- {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/tests/end_to_end/__init__.py +0 -0
- {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/tests/unit/__init__.py +0 -0
- {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/tests/unit/test_tagger.py +0 -0
- {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/tests/unit/test_writer.py +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.1
|
|
2
2
|
Name: media-tagging
|
|
3
|
-
Version: 0.
|
|
3
|
+
Version: 0.3.0.dev1
|
|
4
4
|
Author: Google Inc. (gTech gPS CSE team)
|
|
5
5
|
Author-email: no-reply@google.com
|
|
6
6
|
License: Apache 2.0
|
|
@@ -11,3 +11,15 @@ Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
|
|
11
11
|
Classifier: Operating System :: OS Independent
|
|
12
12
|
Classifier: License :: OSI Approved :: Apache Software License
|
|
13
13
|
Description-Content-Type: text/markdown
|
|
14
|
+
Requires-Dist: fastapi==0.111.0
|
|
15
|
+
Requires-Dist: pillow
|
|
16
|
+
Requires-Dist: google-cloud-vision
|
|
17
|
+
Requires-Dist: google-cloud-videointelligence
|
|
18
|
+
Requires-Dist: smart_open
|
|
19
|
+
Requires-Dist: google-ads-api-report-fetcher==1.14.3
|
|
20
|
+
Requires-Dist: langchain==0.2.7
|
|
21
|
+
Requires-Dist: langchain-core==0.2.21
|
|
22
|
+
Requires-Dist: langchain-community==0.2.7
|
|
23
|
+
Requires-Dist: langchain-google-genai==1.0.7
|
|
24
|
+
Requires-Dist: langchain-google-vertexai
|
|
25
|
+
Requires-Dist: jq
|
|
@@ -0,0 +1,72 @@
|
|
|
1
|
+
# Media Tagger
|
|
2
|
+
|
|
3
|
+
## Problem statement
|
|
4
|
+
|
|
5
|
+
When analyzing large amount of creatives of any nature (being images and videos)
|
|
6
|
+
it might be challenging to quickly and reliably understand their content
|
|
7
|
+
and gain insights.
|
|
8
|
+
|
|
9
|
+
## Solution
|
|
10
|
+
|
|
11
|
+
`media-tagger` performs tagging of image and videos based on various taggers
|
|
12
|
+
- simply provide a path to your media files and `media-tagger` will do the rest.
|
|
13
|
+
|
|
14
|
+
## Deliverable (implementation)
|
|
15
|
+
|
|
16
|
+
`media-tagger` is implemented as a:
|
|
17
|
+
|
|
18
|
+
* **library** - Use it in your projects with a help of `media_tagging.tagger.create_tagger` function.
|
|
19
|
+
* **CLI tool** - `media-tagger` tool is available to be used in the terminal.
|
|
20
|
+
* **HTTP endpoint** - `media-tagger` can be easily exposed as HTTP endpoint.
|
|
21
|
+
* **Langchain tool** - integrated `media-tagger` into your Langchain applications.
|
|
22
|
+
|
|
23
|
+
## Deployment
|
|
24
|
+
|
|
25
|
+
### Prerequisites
|
|
26
|
+
|
|
27
|
+
- Python 3.11+
|
|
28
|
+
- A GCP project with billing account attached
|
|
29
|
+
- [Video Intelligence API](https://console.cloud.google.com/apis/library/videointelligence.googleapis.com) and [Vision API](https://console.cloud.google.com/apis/library/vision.googleapis.com) enabled.
|
|
30
|
+
* [API key](https://support.google.com/googleapi/answer/6158862?hl=en) to access to access Google Gemini.
|
|
31
|
+
- Once you created API key export it as an environmental variable
|
|
32
|
+
|
|
33
|
+
```
|
|
34
|
+
export GOOGLE_API_KEY=<YOUR_API_KEY_HERE>
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
|
|
38
|
+
### Installation
|
|
39
|
+
|
|
40
|
+
Install `media-tagger` with `pip install media-tagging` command.
|
|
41
|
+
|
|
42
|
+
### Usage
|
|
43
|
+
|
|
44
|
+
> This section is focused on using `media-tagger` as a CLI tool.
|
|
45
|
+
> Check [library](docs/how-to-use-media-tagger-as-a-library.md),
|
|
46
|
+
> [http endpoint](docs/how-to-use-media-tagger-as-a-http-endpoint.md),
|
|
47
|
+
> [langchain tool](docs/how-to-use-media-tagger-as-a-langchain-tool.md)
|
|
48
|
+
> sections to learn more.
|
|
49
|
+
|
|
50
|
+
Once `media-tagger` is installed you can call it:
|
|
51
|
+
|
|
52
|
+
```
|
|
53
|
+
media-tagger --media-path MEDIA_PATH --tagger TAGGER_TYPE --writer WRITER_TYPE
|
|
54
|
+
```
|
|
55
|
+
where:
|
|
56
|
+
* MEDIA_PATH - comma-separated names of files for tagging (can be urls).
|
|
57
|
+
* TAGGER_TYPE - name of tagger, supported options:
|
|
58
|
+
* `vision-api` - tags images based on [Google Cloud Vision API](https://cloud.google.com/vision/),
|
|
59
|
+
* `video-api` for videos based on [Google Cloud Video Intelligence API](https://cloud.google.com/video-intelligence/)
|
|
60
|
+
* `gemini-image` - Uses Gemini to tags images. Add `--tagger.n_tags=<N_TAGS>`
|
|
61
|
+
parameter to control number of tags returned by tagger.
|
|
62
|
+
* `gemini-structured-image` - Uses Gemini to find certain tags in the images.
|
|
63
|
+
Add `--tagger.tags='tag1, tag2, ..., tagN` parameter to find certain tags
|
|
64
|
+
in the image.
|
|
65
|
+
* `gemini-description-image` - Provides brief description of the image,
|
|
66
|
+
* WRITER_TYPE - name of writer, one of `csv`, `json`
|
|
67
|
+
|
|
68
|
+
By default script will create a single file with tagging results for each media_path.
|
|
69
|
+
If you want to combine results into a single file add `--output OUTPUT_NAME` flag (without extension, i.e. `--output tagging_sample`.
|
|
70
|
+
|
|
71
|
+
## Disclaimer
|
|
72
|
+
This is not an officially supported Google product.
|
|
@@ -15,48 +15,10 @@
|
|
|
15
15
|
|
|
16
16
|
import argparse
|
|
17
17
|
import logging
|
|
18
|
-
import os
|
|
19
18
|
|
|
20
|
-
import smart_open
|
|
21
19
|
from gaarf.cli import utils as gaarf_utils
|
|
22
20
|
|
|
23
|
-
from media_tagging import tagger,
|
|
24
|
-
from media_tagging.taggers import base as base_tagger
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
def tag_media(
|
|
28
|
-
media_path: str | os.PathLike,
|
|
29
|
-
tagger_type: base_tagger.BaseTagger,
|
|
30
|
-
writer_type: writer.BaseWriter = writer.JsonWriter(),
|
|
31
|
-
single_output_name: str | None = None,
|
|
32
|
-
tagging_parameters: dict[str, str] | None = None,
|
|
33
|
-
) -> None:
|
|
34
|
-
"""Runs media tagging algorithm.
|
|
35
|
-
|
|
36
|
-
Args:
|
|
37
|
-
media_path: Local or remote path to media file.
|
|
38
|
-
tagger_type: Initialized tagger.
|
|
39
|
-
writer_type: Initialized writer for saving tagging results.
|
|
40
|
-
single_output_name: Parameter for saving results to a single file.
|
|
41
|
-
tagging_parameters: Optional keywords arguments to be sent for tagging.
|
|
42
|
-
"""
|
|
43
|
-
media_paths = media_path.split(',')
|
|
44
|
-
if not tagging_parameters:
|
|
45
|
-
tagging_parameters = {}
|
|
46
|
-
results = []
|
|
47
|
-
for path in media_paths:
|
|
48
|
-
media_name = utils.convert_path_to_media_name(path)
|
|
49
|
-
logging.info('Processing media: %s', path)
|
|
50
|
-
with smart_open.open(path, 'rb') as f:
|
|
51
|
-
media_bytes = f.read()
|
|
52
|
-
results.append(
|
|
53
|
-
tagger_type.tag(
|
|
54
|
-
media_name,
|
|
55
|
-
media_bytes,
|
|
56
|
-
tagging_options=base_tagger.TaggingOptions(**tagging_parameters),
|
|
57
|
-
)
|
|
58
|
-
)
|
|
59
|
-
writer_type.write(results, single_output_name)
|
|
21
|
+
from media_tagging import tagger, writer
|
|
60
22
|
|
|
61
23
|
|
|
62
24
|
def main():
|
|
@@ -80,13 +42,13 @@ def main():
|
|
|
80
42
|
)
|
|
81
43
|
logging.getLogger(__file__)
|
|
82
44
|
|
|
83
|
-
|
|
84
|
-
|
|
45
|
+
logging.info('Initializing tagger: %s', args.tagger)
|
|
46
|
+
tagging_results = tagger.tag_media(
|
|
47
|
+
media_paths=args.media_path.split(','),
|
|
85
48
|
tagger_type=concrete_tagger,
|
|
86
|
-
writer_type=concrete_writer,
|
|
87
|
-
single_output_name=args.output,
|
|
88
49
|
tagging_parameters=tagging_parameters.get('tagger'),
|
|
89
50
|
)
|
|
51
|
+
concrete_writer.write(tagging_results, args.output)
|
|
90
52
|
|
|
91
53
|
|
|
92
54
|
if __name__ == '__main__':
|
|
@@ -16,7 +16,6 @@
|
|
|
16
16
|
import logging
|
|
17
17
|
|
|
18
18
|
import fastapi
|
|
19
|
-
import smart_open
|
|
20
19
|
from typing_extensions import TypedDict
|
|
21
20
|
|
|
22
21
|
from media_tagging import tagger, utils
|
|
@@ -30,10 +29,12 @@ class MediaPostRequest(TypedDict):
|
|
|
30
29
|
"""Specifies structure of request for tagging media.
|
|
31
30
|
|
|
32
31
|
Attributes:
|
|
32
|
+
tagger_type: Type of tagger.
|
|
33
33
|
media_url: Local or remote URL of media.
|
|
34
34
|
"""
|
|
35
35
|
|
|
36
36
|
media_url: str
|
|
37
|
+
tagger_type: str
|
|
37
38
|
tagging_parameters: dict[str, int | list[str]]
|
|
38
39
|
|
|
39
40
|
|
|
@@ -49,24 +50,7 @@ async def tag_with_llm(
|
|
|
49
50
|
Returns:
|
|
50
51
|
Json results of tagging.
|
|
51
52
|
"""
|
|
52
|
-
|
|
53
|
-
llm_tagger = tagger.create_tagger('gemini-image')
|
|
54
|
-
taggers['gemini-image'] = llm_tagger
|
|
55
|
-
if media_url := data.get('media_url'):
|
|
56
|
-
media_name = utils.convert_path_to_media_name(media_url)
|
|
57
|
-
logging.info('Processing media: %s', media_url)
|
|
58
|
-
with smart_open.open(media_url, 'rb') as f:
|
|
59
|
-
media_bytes = f.read()
|
|
60
|
-
tagging_options = base_tagger.TaggingOptions(
|
|
61
|
-
**data.get('tagging_parameters')
|
|
62
|
-
)
|
|
63
|
-
tagging_result = llm_tagger.tag(
|
|
64
|
-
name=media_name, content=media_bytes, tagging_options=tagging_options
|
|
65
|
-
)
|
|
66
|
-
return fastapi.responses.JSONResponse(
|
|
67
|
-
content=fastapi.encoders.jsonable_encoder(tagging_result.dict())
|
|
68
|
-
)
|
|
69
|
-
raise ValueError('No path to media is provided.')
|
|
53
|
+
return process_post_request(data)
|
|
70
54
|
|
|
71
55
|
|
|
72
56
|
@app.post('/tagger/api')
|
|
@@ -81,18 +65,32 @@ async def tag_with_api(
|
|
|
81
65
|
Returns:
|
|
82
66
|
Json results of tagging.
|
|
83
67
|
"""
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
68
|
+
return process_post_request(data)
|
|
69
|
+
|
|
70
|
+
|
|
71
|
+
def process_post_request(
|
|
72
|
+
data: MediaPostRequest,
|
|
73
|
+
) -> fastapi.responses.JSONResponse:
|
|
74
|
+
"""Helper method for performing tagging.
|
|
75
|
+
|
|
76
|
+
Args:
|
|
77
|
+
data: Post request for media tagging.
|
|
78
|
+
|
|
79
|
+
Returns:
|
|
80
|
+
Json results of tagging.
|
|
81
|
+
"""
|
|
82
|
+
tagger_type = data.get('tagger_type')
|
|
83
|
+
if not (concrete_tagger := taggers.get(tagger_type)):
|
|
84
|
+
concrete_tagger = tagger.create_tagger(tagger_type)
|
|
85
|
+
taggers[tagger_type] = concrete_tagger
|
|
87
86
|
if media_url := data.get('media_url'):
|
|
88
87
|
media_name = utils.convert_path_to_media_name(media_url)
|
|
88
|
+
media_bytes = utils.read_media_as_bytes(media_url)
|
|
89
89
|
logging.info('Processing media: %s', media_url)
|
|
90
|
-
with smart_open.open(media_url, 'rb') as f:
|
|
91
|
-
media_bytes = f.read()
|
|
92
90
|
tagging_options = base_tagger.TaggingOptions(
|
|
93
91
|
**data.get('tagging_parameters')
|
|
94
92
|
)
|
|
95
|
-
tagging_result =
|
|
93
|
+
tagging_result = concrete_tagger.tag(
|
|
96
94
|
name=media_name, content=media_bytes, tagging_options=tagging_options
|
|
97
95
|
)
|
|
98
96
|
return fastapi.responses.JSONResponse(
|
|
@@ -17,6 +17,11 @@ Media tagging sends API requests to tagging engine (i.e. Google Vision API)
|
|
|
17
17
|
and returns tagging results that can be easily written.
|
|
18
18
|
"""
|
|
19
19
|
|
|
20
|
+
import logging
|
|
21
|
+
import os
|
|
22
|
+
from collections.abc import Sequence
|
|
23
|
+
|
|
24
|
+
from media_tagging import utils
|
|
20
25
|
from media_tagging.taggers import api, base, llm
|
|
21
26
|
|
|
22
27
|
_TAGGERS = {
|
|
@@ -25,12 +30,18 @@ _TAGGERS = {
|
|
|
25
30
|
'gemini-image': llm.GeminiImageTagger,
|
|
26
31
|
'gemini-structured-image': llm.GeminiImageTagger,
|
|
27
32
|
'gemini-description-image': llm.GeminiImageTagger,
|
|
33
|
+
'gemini-video': llm.GeminiVideoTagger,
|
|
34
|
+
'gemini-structured-video': llm.GeminiVideoTagger,
|
|
35
|
+
'gemini-description-video': llm.GeminiVideoTagger,
|
|
28
36
|
}
|
|
29
37
|
|
|
30
38
|
_LLM_TAGGERS_TYPES = {
|
|
31
39
|
'gemini-image': llm.LLMTaggerTypeEnum.UNSTRUCTURED,
|
|
32
40
|
'gemini-structured-image': llm.LLMTaggerTypeEnum.STRUCTURED,
|
|
33
41
|
'gemini-description-image': llm.LLMTaggerTypeEnum.DESCRIPTION,
|
|
42
|
+
'gemini-video': llm.LLMTaggerTypeEnum.UNSTRUCTURED,
|
|
43
|
+
'gemini-structured-video': llm.LLMTaggerTypeEnum.STRUCTURED,
|
|
44
|
+
'gemini-description-video': llm.LLMTaggerTypeEnum.DESCRIPTION,
|
|
34
45
|
}
|
|
35
46
|
|
|
36
47
|
|
|
@@ -58,3 +69,35 @@ def create_tagger(
|
|
|
58
69
|
f'Incorrect tagger {type} is provided, '
|
|
59
70
|
f'valid options: {list(_TAGGERS.keys())}'
|
|
60
71
|
)
|
|
72
|
+
|
|
73
|
+
|
|
74
|
+
def tag_media(
|
|
75
|
+
media_paths: Sequence[str | os.PathLike],
|
|
76
|
+
tagger_type: base.BaseTagger,
|
|
77
|
+
tagging_parameters: dict[str, str] | None = None,
|
|
78
|
+
) -> list[base.TaggingResult]:
|
|
79
|
+
"""Runs media tagging algorithm.
|
|
80
|
+
|
|
81
|
+
Args:
|
|
82
|
+
media_paths: Local or remote path to media file.
|
|
83
|
+
tagger_type: Initialized tagger.
|
|
84
|
+
tagging_parameters: Optional keywords arguments to be sent for tagging.
|
|
85
|
+
|
|
86
|
+
Returns:
|
|
87
|
+
Results of tagging for all media.
|
|
88
|
+
"""
|
|
89
|
+
if not tagging_parameters:
|
|
90
|
+
tagging_parameters = {}
|
|
91
|
+
results = []
|
|
92
|
+
for path in media_paths:
|
|
93
|
+
media_name = utils.convert_path_to_media_name(path)
|
|
94
|
+
logging.info('Processing media: %s', path)
|
|
95
|
+
media_bytes = utils.read_media_as_bytes(path)
|
|
96
|
+
results.append(
|
|
97
|
+
tagger_type.tag(
|
|
98
|
+
media_name,
|
|
99
|
+
media_bytes,
|
|
100
|
+
tagging_options=base.TaggingOptions(**tagging_parameters),
|
|
101
|
+
)
|
|
102
|
+
)
|
|
103
|
+
return results
|
|
@@ -14,10 +14,14 @@
|
|
|
14
14
|
"""Module for performing media tagging with LLMs."""
|
|
15
15
|
|
|
16
16
|
import base64
|
|
17
|
+
import dataclasses
|
|
17
18
|
import enum
|
|
19
|
+
import json
|
|
18
20
|
import logging
|
|
21
|
+
import tempfile
|
|
19
22
|
from typing import Final
|
|
20
23
|
|
|
24
|
+
import google.generativeai as google_genai
|
|
21
25
|
import langchain_google_genai as genai
|
|
22
26
|
from langchain_core import (
|
|
23
27
|
language_models,
|
|
@@ -59,6 +63,9 @@ _UNSTRUCTURED_PROMPT: Final[prompts.ChatPromptTemplate] = (
|
|
|
59
63
|
)
|
|
60
64
|
)
|
|
61
65
|
|
|
66
|
+
_UNSTRUCTURED_PROMPT_VIDEO: Final[str] = (
|
|
67
|
+
'Generate {n_tags} tags for the following video.'
|
|
68
|
+
)
|
|
62
69
|
_STRUCTURED_PROMPT: Final[prompts.ChatPromptTemplate] = (
|
|
63
70
|
prompts.ChatPromptTemplate.from_messages(
|
|
64
71
|
[
|
|
@@ -72,6 +79,11 @@ _STRUCTURED_PROMPT: Final[prompts.ChatPromptTemplate] = (
|
|
|
72
79
|
)
|
|
73
80
|
)
|
|
74
81
|
|
|
82
|
+
_STRUCTURED_PROMPT_VIDEO: Final[str] = (
|
|
83
|
+
'Find whether the following tags can be found in the video: {tags}.'
|
|
84
|
+
)
|
|
85
|
+
|
|
86
|
+
|
|
75
87
|
_DESCRIPTION_PROMPT: Final[prompts.ChatPromptTemplate] = (
|
|
76
88
|
prompts.ChatPromptTemplate.from_messages(
|
|
77
89
|
[
|
|
@@ -84,6 +96,7 @@ _DESCRIPTION_PROMPT: Final[prompts.ChatPromptTemplate] = (
|
|
|
84
96
|
)
|
|
85
97
|
)
|
|
86
98
|
|
|
99
|
+
_DESCRIPTION_PROMPT_VIDEO: Final[str] = 'Describe the following video.'
|
|
87
100
|
|
|
88
101
|
llm_tagger_promps: dict[LLMTaggerTypeEnum, prompts.ChatPromptTemplate] = {
|
|
89
102
|
LLMTaggerTypeEnum.UNSTRUCTURED: _UNSTRUCTURED_PROMPT,
|
|
@@ -91,6 +104,12 @@ llm_tagger_promps: dict[LLMTaggerTypeEnum, prompts.ChatPromptTemplate] = {
|
|
|
91
104
|
LLMTaggerTypeEnum.DESCRIPTION: _DESCRIPTION_PROMPT,
|
|
92
105
|
}
|
|
93
106
|
|
|
107
|
+
video_llm_tagger_promps: dict[LLMTaggerTypeEnum, str] = {
|
|
108
|
+
LLMTaggerTypeEnum.UNSTRUCTURED: _UNSTRUCTURED_PROMPT_VIDEO,
|
|
109
|
+
LLMTaggerTypeEnum.STRUCTURED: _STRUCTURED_PROMPT_VIDEO,
|
|
110
|
+
LLMTaggerTypeEnum.DESCRIPTION: _DESCRIPTION_PROMPT_VIDEO,
|
|
111
|
+
}
|
|
112
|
+
|
|
94
113
|
|
|
95
114
|
class LLMTagger(base.BaseTagger):
|
|
96
115
|
"""Tags media via LLM."""
|
|
@@ -190,3 +209,95 @@ class GeminiImageTagger(LLMTagger):
|
|
|
190
209
|
llm_tagger_type=tagger_type,
|
|
191
210
|
llm=genai.ChatGoogleGenerativeAI(model=model_name),
|
|
192
211
|
)
|
|
212
|
+
|
|
213
|
+
|
|
214
|
+
class GeminiVideoTagger(LLMTagger):
|
|
215
|
+
"""Tags video based on Gemini."""
|
|
216
|
+
|
|
217
|
+
def __init__(
|
|
218
|
+
self,
|
|
219
|
+
tagger_type: LLMTaggerTypeEnum,
|
|
220
|
+
model_name: str = 'models/gemini-1.5-flash',
|
|
221
|
+
) -> None:
|
|
222
|
+
"""Initializes GeminiVideoTagger.
|
|
223
|
+
|
|
224
|
+
Args:
|
|
225
|
+
tagger_type: Type of LLM tagger.
|
|
226
|
+
model_name: Name of the model to perform the tagging.
|
|
227
|
+
"""
|
|
228
|
+
self.llm_tagger_type = tagger_type
|
|
229
|
+
self.model_name = model_name
|
|
230
|
+
|
|
231
|
+
@property
|
|
232
|
+
def model(self) -> google_genai.GenerativeModel:
|
|
233
|
+
"""Initializes GenerativeModel."""
|
|
234
|
+
return google_genai.GenerativeModel(model_name=self.model_name)
|
|
235
|
+
|
|
236
|
+
@override
|
|
237
|
+
def tag(
|
|
238
|
+
self,
|
|
239
|
+
name: str,
|
|
240
|
+
content: bytes,
|
|
241
|
+
tagging_options: base.TaggingOptions = base.TaggingOptions(),
|
|
242
|
+
):
|
|
243
|
+
logging.debug('Tagging video "%s" with GeminiVideoTagger', name)
|
|
244
|
+
with tempfile.NamedTemporaryFile(suffix='.mp4') as f:
|
|
245
|
+
f.write(content)
|
|
246
|
+
try:
|
|
247
|
+
video_file = google_genai.upload_file(f.name)
|
|
248
|
+
|
|
249
|
+
result = self.model.generate_content(
|
|
250
|
+
[
|
|
251
|
+
video_file,
|
|
252
|
+
'\n\n',
|
|
253
|
+
f'{self.format_prompt(tagging_options)} ',
|
|
254
|
+
],
|
|
255
|
+
generation_config=google_genai.GenerationConfig(
|
|
256
|
+
response_mime_type='application/json',
|
|
257
|
+
response_schema=self.response_schema,
|
|
258
|
+
),
|
|
259
|
+
)
|
|
260
|
+
finally:
|
|
261
|
+
video_file.delete()
|
|
262
|
+
|
|
263
|
+
if self.llm_tagger_type == LLMTaggerTypeEnum.DESCRIPTION:
|
|
264
|
+
return base.TaggingResult(
|
|
265
|
+
identifier=name,
|
|
266
|
+
type='video',
|
|
267
|
+
content=base.Description(text=json.loads(result.text).get('text')),
|
|
268
|
+
)
|
|
269
|
+
tags = [
|
|
270
|
+
base.Tag(name=r.get('name'), score=r.get('score'))
|
|
271
|
+
for r in json.loads(result.text)
|
|
272
|
+
]
|
|
273
|
+
return base.TaggingResult(identifier=name, type='video', content=tags)
|
|
274
|
+
|
|
275
|
+
def format_prompt(self, tagging_options: base.TaggingOptions) -> str:
|
|
276
|
+
"""Builds correct prompt to send to LLM.
|
|
277
|
+
|
|
278
|
+
Prompt contains format instructions to get output result.
|
|
279
|
+
|
|
280
|
+
Args:
|
|
281
|
+
tagging_options: Parameters to refine the prompt.
|
|
282
|
+
|
|
283
|
+
Returns:
|
|
284
|
+
Formatted prompt.
|
|
285
|
+
"""
|
|
286
|
+
base_prompt = video_llm_tagger_promps[self.llm_tagger_type]
|
|
287
|
+
formatting_instructions = (
|
|
288
|
+
' For each tag provide name and a score from 0 to 1 '
|
|
289
|
+
'where 0 is tag absence and 1 complete tag presence.'
|
|
290
|
+
)
|
|
291
|
+
prompt = base_prompt.format(**dataclasses.asdict(tagging_options))
|
|
292
|
+
if self.llm_tagger_type == LLMTaggerTypeEnum.DESCRIPTION:
|
|
293
|
+
return prompt
|
|
294
|
+
return prompt + formatting_instructions
|
|
295
|
+
|
|
296
|
+
@property
|
|
297
|
+
def response_schema(self) -> list[base.Tag] | base.Description:
|
|
298
|
+
"""Generates correct response schema based on type of LLM tagger."""
|
|
299
|
+
return (
|
|
300
|
+
base.Description
|
|
301
|
+
if self.llm_tagger_type == LLMTaggerTypeEnum.DESCRIPTION
|
|
302
|
+
else list[base.Tag]
|
|
303
|
+
)
|
|
@@ -14,14 +14,13 @@
|
|
|
14
14
|
"""Exposes media tagger as a tool for Langchain agents."""
|
|
15
15
|
|
|
16
16
|
import langchain_core
|
|
17
|
-
import smart_open
|
|
18
17
|
|
|
19
18
|
from media_tagging import tagger, utils
|
|
20
19
|
from media_tagging.taggers import base as base_tagger
|
|
21
20
|
|
|
22
21
|
|
|
23
22
|
class MediaTaggingInput(langchain_core.pydantic_v1.BaseModel):
|
|
24
|
-
"""Input for
|
|
23
|
+
"""Input for media tagging."""
|
|
25
24
|
|
|
26
25
|
tagger_type: str = langchain_core.pydantic_v1.Field(
|
|
27
26
|
description='Type of media tagger'
|
|
@@ -32,16 +31,14 @@ class MediaTaggingInput(langchain_core.pydantic_v1.BaseModel):
|
|
|
32
31
|
|
|
33
32
|
|
|
34
33
|
class MediaTaggingResults(langchain_core.tools.BaseTool):
|
|
35
|
-
"""Tools that performs
|
|
34
|
+
"""Tools that performs media tagging.
|
|
36
35
|
|
|
37
36
|
Attributes:
|
|
38
|
-
llm_parameters: Parameter for LLM initialization.
|
|
39
37
|
name: Name of the tool.
|
|
40
38
|
description: Description the tool.
|
|
41
39
|
args_schema: Input model for the tool.
|
|
42
40
|
"""
|
|
43
41
|
|
|
44
|
-
llm_parameters: dict[str, str] = {'model': 'gemini-1.5-flash'}
|
|
45
42
|
name: str = 'media_tagging_results_json'
|
|
46
43
|
description: str = 'tag media (image or videos)'
|
|
47
44
|
args_schema: type[langchain_core.pydantic_v1.BaseModel] = MediaTaggingInput
|
|
@@ -51,7 +48,7 @@ class MediaTaggingResults(langchain_core.tools.BaseTool):
|
|
|
51
48
|
tagger_type: str,
|
|
52
49
|
media_url: str,
|
|
53
50
|
) -> list[dict[str, str]]:
|
|
54
|
-
"""Performs media tagging based on
|
|
51
|
+
"""Performs media tagging based on selected tagger.
|
|
55
52
|
|
|
56
53
|
Args:
|
|
57
54
|
tagger_type: Type of tagger to use for media tagging.
|
|
@@ -62,8 +59,7 @@ class MediaTaggingResults(langchain_core.tools.BaseTool):
|
|
|
62
59
|
"""
|
|
63
60
|
media_tagger = tagger.create_tagger(tagger_type)
|
|
64
61
|
media_name = utils.convert_path_to_media_name(media_url)
|
|
65
|
-
|
|
66
|
-
media_bytes = f.read()
|
|
62
|
+
media_bytes = utils.read_media_as_bytes(media_url)
|
|
67
63
|
tagging_options = base_tagger.TaggingOptions()
|
|
68
64
|
return media_tagger.tag(
|
|
69
65
|
name=media_name, content=media_bytes, tagging_options=tagging_options
|
|
@@ -13,8 +13,18 @@
|
|
|
13
13
|
# limitations under the License.
|
|
14
14
|
"""Various utils."""
|
|
15
15
|
|
|
16
|
+
import os
|
|
17
|
+
|
|
18
|
+
import smart_open
|
|
19
|
+
|
|
16
20
|
|
|
17
21
|
def convert_path_to_media_name(media_path: str) -> str:
|
|
18
22
|
"""Extracts file name without extension."""
|
|
19
23
|
base_name = media_path.split('/')[-1]
|
|
20
24
|
return base_name.split('.')[0]
|
|
25
|
+
|
|
26
|
+
|
|
27
|
+
def read_media_as_bytes(media_path: str | str | os.PathLike) -> bytes:
|
|
28
|
+
"""Reads media content from local or remote storage."""
|
|
29
|
+
with smart_open.open(media_path, 'rb') as f:
|
|
30
|
+
return f.read()
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.1
|
|
2
2
|
Name: media-tagging
|
|
3
|
-
Version: 0.
|
|
3
|
+
Version: 0.3.0.dev1
|
|
4
4
|
Author: Google Inc. (gTech gPS CSE team)
|
|
5
5
|
Author-email: no-reply@google.com
|
|
6
6
|
License: Apache 2.0
|
|
@@ -11,3 +11,15 @@ Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
|
|
11
11
|
Classifier: Operating System :: OS Independent
|
|
12
12
|
Classifier: License :: OSI Approved :: Apache Software License
|
|
13
13
|
Description-Content-Type: text/markdown
|
|
14
|
+
Requires-Dist: fastapi==0.111.0
|
|
15
|
+
Requires-Dist: pillow
|
|
16
|
+
Requires-Dist: google-cloud-vision
|
|
17
|
+
Requires-Dist: google-cloud-videointelligence
|
|
18
|
+
Requires-Dist: smart_open
|
|
19
|
+
Requires-Dist: google-ads-api-report-fetcher==1.14.3
|
|
20
|
+
Requires-Dist: langchain==0.2.7
|
|
21
|
+
Requires-Dist: langchain-core==0.2.21
|
|
22
|
+
Requires-Dist: langchain-community==0.2.7
|
|
23
|
+
Requires-Dist: langchain-google-genai==1.0.7
|
|
24
|
+
Requires-Dist: langchain-google-vertexai
|
|
25
|
+
Requires-Dist: jq
|
|
@@ -1,11 +1,12 @@
|
|
|
1
|
+
fastapi==0.111.0
|
|
1
2
|
google-ads-api-report-fetcher==1.14.3
|
|
2
3
|
google-cloud-videointelligence
|
|
3
4
|
google-cloud-vision
|
|
4
5
|
jq
|
|
5
|
-
langchain
|
|
6
|
-
langchain-
|
|
7
|
-
langchain-
|
|
8
|
-
langchain-google-genai
|
|
6
|
+
langchain-community==0.2.7
|
|
7
|
+
langchain-core==0.2.21
|
|
8
|
+
langchain-google-genai==1.0.7
|
|
9
9
|
langchain-google-vertexai
|
|
10
|
+
langchain==0.2.7
|
|
10
11
|
pillow
|
|
11
12
|
smart_open
|
|
@@ -17,7 +17,7 @@ from setuptools import find_packages, setup
|
|
|
17
17
|
|
|
18
18
|
setup(
|
|
19
19
|
name='media-tagging',
|
|
20
|
-
version='0.
|
|
20
|
+
version='0.3.0dev1',
|
|
21
21
|
long_description_content_type='text/markdown',
|
|
22
22
|
author='Google Inc. (gTech gPS CSE team)',
|
|
23
23
|
author_email='no-reply@google.com',
|
|
@@ -32,15 +32,16 @@ setup(
|
|
|
32
32
|
],
|
|
33
33
|
packages=find_packages(),
|
|
34
34
|
install_requires=[
|
|
35
|
+
'fastapi==0.111.0',
|
|
35
36
|
'pillow',
|
|
36
37
|
'google-cloud-vision',
|
|
37
38
|
'google-cloud-videointelligence',
|
|
38
39
|
'smart_open',
|
|
39
40
|
'google-ads-api-report-fetcher==1.14.3',
|
|
40
|
-
'langchain',
|
|
41
|
-
'langchain-core',
|
|
42
|
-
'langchain-community',
|
|
43
|
-
'langchain-google-genai',
|
|
41
|
+
'langchain==0.2.7',
|
|
42
|
+
'langchain-core==0.2.21',
|
|
43
|
+
'langchain-community==0.2.7',
|
|
44
|
+
'langchain-google-genai==1.0.7',
|
|
44
45
|
'langchain-google-vertexai',
|
|
45
46
|
'jq',
|
|
46
47
|
],
|
|
@@ -15,8 +15,7 @@
|
|
|
15
15
|
import json
|
|
16
16
|
import pathlib
|
|
17
17
|
|
|
18
|
-
from
|
|
19
|
-
from media_tagging import writer
|
|
18
|
+
from media_tagging import tagger, writer
|
|
20
19
|
from media_tagging.taggers import api
|
|
21
20
|
|
|
22
21
|
_SCRIPT_DIR = pathlib.Path(__file__).parent
|
|
@@ -32,13 +31,13 @@ def test_image_tagging(fake_tagger, mocker):
|
|
|
32
31
|
concrete_writer = writer.JsonWriter()
|
|
33
32
|
image_path = f'{_SCRIPT_DIR}/../unit/data/test_image.jpg'
|
|
34
33
|
image_name = 'test'
|
|
35
|
-
|
|
36
|
-
|
|
34
|
+
tagging_results = tagger.tag_media(
|
|
35
|
+
media_paths=[image_path],
|
|
37
36
|
tagger_type=concrete_tagger,
|
|
38
|
-
writer_type=concrete_writer,
|
|
39
37
|
)
|
|
38
|
+
concrete_writer.write(tagging_results, 'test')
|
|
40
39
|
with open('test.json', 'r', encoding='utf-8') as f:
|
|
41
|
-
data = json.load(f)
|
|
40
|
+
data = json.load(f)[0]
|
|
42
41
|
|
|
43
42
|
assert data.get('identifier') == image_name
|
|
44
43
|
assert data.get('type') == 'image'
|
|
@@ -1,48 +0,0 @@
|
|
|
1
|
-
# Welltech Media Tagging
|
|
2
|
-
|
|
3
|
-
## Prerequisites
|
|
4
|
-
|
|
5
|
-
* Google Cloud project with billing enabled.
|
|
6
|
-
* [Video Intelligence API](https://console.cloud.google.com/apis/library/videointelligence.googleapis.com) and [Vision API](https://console.cloud.google.com/apis/library/vision.googleapis.com) enabled.
|
|
7
|
-
* Python3.8+
|
|
8
|
-
* Access to repository configured. In order to clone this repository you need
|
|
9
|
-
to do the following:
|
|
10
|
-
* Visit https://professional-services.googlesource.com/new-password and
|
|
11
|
-
login with your account.
|
|
12
|
-
* Once authenticated please copy all lines in box
|
|
13
|
-
and paste them in the terminal.
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
## Run
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
1. Install `media-tagger`
|
|
20
|
-
|
|
21
|
-
```
|
|
22
|
-
pip install media-tagging
|
|
23
|
-
```
|
|
24
|
-
|
|
25
|
-
2. Perform tagging
|
|
26
|
-
|
|
27
|
-
```
|
|
28
|
-
media-tagger --media-path MEDIA_PATH --tagger TAGGER_TYPE --writer WRITER_TYPE
|
|
29
|
-
```
|
|
30
|
-
where:
|
|
31
|
-
* MEDIA_PATH - comma-separated names of files for tagging (can be urls).
|
|
32
|
-
* TAGGER_TYPE - name of tagger, supported options:
|
|
33
|
-
* `vision-api` - tags images based on [Google Cloud Vision API](https://cloud.google.com/vision/),
|
|
34
|
-
* `video-api` for videos based on [Google Cloud Video Intelligence API](https://cloud.google.com/video-intelligence/)
|
|
35
|
-
* `gemini-image` - Uses Gemini to tags images. Add `--tagger.n_tags=<N_TAGS>`
|
|
36
|
-
parameter to control number of tags returned by tagger.
|
|
37
|
-
* `gemini-structured-image` - Uses Gemini to find certain tags in the images.
|
|
38
|
-
Add `--tagger.tags='tag1, tag2, ..., tagN` parameter to find certain tags
|
|
39
|
-
in the image.
|
|
40
|
-
* `gemini-description-image` - Provides brief description of the image,
|
|
41
|
-
* WRITER_TYPE - name of writer, one of `csv`, `json`
|
|
42
|
-
|
|
43
|
-
By default script will create a single file with tagging results for each media_path.
|
|
44
|
-
If you want to combine results into a single file add `--output OUTPUT_NAME` flag (without extension, i.e. `--output tagging_sample`.
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
## Disclaimer
|
|
48
|
-
This is not an officially supported Google product.
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
{media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/media_tagging.egg-info/dependency_links.txt
RENAMED
|
File without changes
|
{media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/media_tagging.egg-info/entry_points.txt
RENAMED
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|