media-tagging 0.2.0.dev2__tar.gz → 0.3.0.dev1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (31) hide show
  1. {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/PKG-INFO +13 -1
  2. media_tagging-0.3.0.dev1/README.md +72 -0
  3. {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/entrypoints/cli.py +5 -43
  4. {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/entrypoints/server.py +23 -25
  5. {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/media_tagging/tagger.py +43 -0
  6. {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/media_tagging/taggers/base.py +2 -0
  7. {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/media_tagging/taggers/llm.py +111 -0
  8. {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/media_tagging/tools.py +4 -8
  9. {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/media_tagging/utils.py +10 -0
  10. {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/media_tagging.egg-info/PKG-INFO +13 -1
  11. {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/media_tagging.egg-info/requires.txt +5 -4
  12. {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/setup.py +6 -5
  13. {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/tests/end_to_end/test_main.py +5 -6
  14. media-tagging-0.2.0.dev2/README.md +0 -48
  15. {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/entrypoints/__init__.py +0 -0
  16. {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/media_tagging/__init__.py +0 -0
  17. {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/media_tagging/llms.py +0 -0
  18. {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/media_tagging/taggers/__init__.py +0 -0
  19. {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/media_tagging/taggers/api.py +0 -0
  20. {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/media_tagging/writer.py +0 -0
  21. {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/media_tagging.egg-info/SOURCES.txt +0 -0
  22. {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/media_tagging.egg-info/dependency_links.txt +0 -0
  23. {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/media_tagging.egg-info/entry_points.txt +0 -0
  24. {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/media_tagging.egg-info/top_level.txt +0 -0
  25. {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/setup.cfg +0 -0
  26. {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/tests/__init__.py +0 -0
  27. {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/tests/conftest.py +0 -0
  28. {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/tests/end_to_end/__init__.py +0 -0
  29. {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/tests/unit/__init__.py +0 -0
  30. {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/tests/unit/test_tagger.py +0 -0
  31. {media-tagging-0.2.0.dev2 → media_tagging-0.3.0.dev1}/tests/unit/test_writer.py +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: media-tagging
3
- Version: 0.2.0.dev2
3
+ Version: 0.3.0.dev1
4
4
  Author: Google Inc. (gTech gPS CSE team)
5
5
  Author-email: no-reply@google.com
6
6
  License: Apache 2.0
@@ -11,3 +11,15 @@ Classifier: Topic :: Software Development :: Libraries :: Python Modules
11
11
  Classifier: Operating System :: OS Independent
12
12
  Classifier: License :: OSI Approved :: Apache Software License
13
13
  Description-Content-Type: text/markdown
14
+ Requires-Dist: fastapi==0.111.0
15
+ Requires-Dist: pillow
16
+ Requires-Dist: google-cloud-vision
17
+ Requires-Dist: google-cloud-videointelligence
18
+ Requires-Dist: smart_open
19
+ Requires-Dist: google-ads-api-report-fetcher==1.14.3
20
+ Requires-Dist: langchain==0.2.7
21
+ Requires-Dist: langchain-core==0.2.21
22
+ Requires-Dist: langchain-community==0.2.7
23
+ Requires-Dist: langchain-google-genai==1.0.7
24
+ Requires-Dist: langchain-google-vertexai
25
+ Requires-Dist: jq
@@ -0,0 +1,72 @@
1
+ # Media Tagger
2
+
3
+ ## Problem statement
4
+
5
+ When analyzing large amount of creatives of any nature (being images and videos)
6
+ it might be challenging to quickly and reliably understand their content
7
+ and gain insights.
8
+
9
+ ## Solution
10
+
11
+ `media-tagger` performs tagging of image and videos based on various taggers
12
+ - simply provide a path to your media files and `media-tagger` will do the rest.
13
+
14
+ ## Deliverable (implementation)
15
+
16
+ `media-tagger` is implemented as a:
17
+
18
+ * **library** - Use it in your projects with a help of `media_tagging.tagger.create_tagger` function.
19
+ * **CLI tool** - `media-tagger` tool is available to be used in the terminal.
20
+ * **HTTP endpoint** - `media-tagger` can be easily exposed as HTTP endpoint.
21
+ * **Langchain tool** - integrated `media-tagger` into your Langchain applications.
22
+
23
+ ## Deployment
24
+
25
+ ### Prerequisites
26
+
27
+ - Python 3.11+
28
+ - A GCP project with billing account attached
29
+ - [Video Intelligence API](https://console.cloud.google.com/apis/library/videointelligence.googleapis.com) and [Vision API](https://console.cloud.google.com/apis/library/vision.googleapis.com) enabled.
30
+ * [API key](https://support.google.com/googleapi/answer/6158862?hl=en) to access to access Google Gemini.
31
+ - Once you created API key export it as an environmental variable
32
+
33
+ ```
34
+ export GOOGLE_API_KEY=<YOUR_API_KEY_HERE>
35
+ ```
36
+
37
+
38
+ ### Installation
39
+
40
+ Install `media-tagger` with `pip install media-tagging` command.
41
+
42
+ ### Usage
43
+
44
+ > This section is focused on using `media-tagger` as a CLI tool.
45
+ > Check [library](docs/how-to-use-media-tagger-as-a-library.md),
46
+ > [http endpoint](docs/how-to-use-media-tagger-as-a-http-endpoint.md),
47
+ > [langchain tool](docs/how-to-use-media-tagger-as-a-langchain-tool.md)
48
+ > sections to learn more.
49
+
50
+ Once `media-tagger` is installed you can call it:
51
+
52
+ ```
53
+ media-tagger --media-path MEDIA_PATH --tagger TAGGER_TYPE --writer WRITER_TYPE
54
+ ```
55
+ where:
56
+ * MEDIA_PATH - comma-separated names of files for tagging (can be urls).
57
+ * TAGGER_TYPE - name of tagger, supported options:
58
+ * `vision-api` - tags images based on [Google Cloud Vision API](https://cloud.google.com/vision/),
59
+ * `video-api` for videos based on [Google Cloud Video Intelligence API](https://cloud.google.com/video-intelligence/)
60
+ * `gemini-image` - Uses Gemini to tags images. Add `--tagger.n_tags=<N_TAGS>`
61
+ parameter to control number of tags returned by tagger.
62
+ * `gemini-structured-image` - Uses Gemini to find certain tags in the images.
63
+ Add `--tagger.tags='tag1, tag2, ..., tagN` parameter to find certain tags
64
+ in the image.
65
+ * `gemini-description-image` - Provides brief description of the image,
66
+ * WRITER_TYPE - name of writer, one of `csv`, `json`
67
+
68
+ By default script will create a single file with tagging results for each media_path.
69
+ If you want to combine results into a single file add `--output OUTPUT_NAME` flag (without extension, i.e. `--output tagging_sample`.
70
+
71
+ ## Disclaimer
72
+ This is not an officially supported Google product.
@@ -15,48 +15,10 @@
15
15
 
16
16
  import argparse
17
17
  import logging
18
- import os
19
18
 
20
- import smart_open
21
19
  from gaarf.cli import utils as gaarf_utils
22
20
 
23
- from media_tagging import tagger, utils, writer
24
- from media_tagging.taggers import base as base_tagger
25
-
26
-
27
- def tag_media(
28
- media_path: str | os.PathLike,
29
- tagger_type: base_tagger.BaseTagger,
30
- writer_type: writer.BaseWriter = writer.JsonWriter(),
31
- single_output_name: str | None = None,
32
- tagging_parameters: dict[str, str] | None = None,
33
- ) -> None:
34
- """Runs media tagging algorithm.
35
-
36
- Args:
37
- media_path: Local or remote path to media file.
38
- tagger_type: Initialized tagger.
39
- writer_type: Initialized writer for saving tagging results.
40
- single_output_name: Parameter for saving results to a single file.
41
- tagging_parameters: Optional keywords arguments to be sent for tagging.
42
- """
43
- media_paths = media_path.split(',')
44
- if not tagging_parameters:
45
- tagging_parameters = {}
46
- results = []
47
- for path in media_paths:
48
- media_name = utils.convert_path_to_media_name(path)
49
- logging.info('Processing media: %s', path)
50
- with smart_open.open(path, 'rb') as f:
51
- media_bytes = f.read()
52
- results.append(
53
- tagger_type.tag(
54
- media_name,
55
- media_bytes,
56
- tagging_options=base_tagger.TaggingOptions(**tagging_parameters),
57
- )
58
- )
59
- writer_type.write(results, single_output_name)
21
+ from media_tagging import tagger, writer
60
22
 
61
23
 
62
24
  def main():
@@ -80,13 +42,13 @@ def main():
80
42
  )
81
43
  logging.getLogger(__file__)
82
44
 
83
- tag_media(
84
- media_path=args.media_path,
45
+ logging.info('Initializing tagger: %s', args.tagger)
46
+ tagging_results = tagger.tag_media(
47
+ media_paths=args.media_path.split(','),
85
48
  tagger_type=concrete_tagger,
86
- writer_type=concrete_writer,
87
- single_output_name=args.output,
88
49
  tagging_parameters=tagging_parameters.get('tagger'),
89
50
  )
51
+ concrete_writer.write(tagging_results, args.output)
90
52
 
91
53
 
92
54
  if __name__ == '__main__':
@@ -16,7 +16,6 @@
16
16
  import logging
17
17
 
18
18
  import fastapi
19
- import smart_open
20
19
  from typing_extensions import TypedDict
21
20
 
22
21
  from media_tagging import tagger, utils
@@ -30,10 +29,12 @@ class MediaPostRequest(TypedDict):
30
29
  """Specifies structure of request for tagging media.
31
30
 
32
31
  Attributes:
32
+ tagger_type: Type of tagger.
33
33
  media_url: Local or remote URL of media.
34
34
  """
35
35
 
36
36
  media_url: str
37
+ tagger_type: str
37
38
  tagging_parameters: dict[str, int | list[str]]
38
39
 
39
40
 
@@ -49,24 +50,7 @@ async def tag_with_llm(
49
50
  Returns:
50
51
  Json results of tagging.
51
52
  """
52
- if not (llm_tagger := taggers.get('gemini-image')):
53
- llm_tagger = tagger.create_tagger('gemini-image')
54
- taggers['gemini-image'] = llm_tagger
55
- if media_url := data.get('media_url'):
56
- media_name = utils.convert_path_to_media_name(media_url)
57
- logging.info('Processing media: %s', media_url)
58
- with smart_open.open(media_url, 'rb') as f:
59
- media_bytes = f.read()
60
- tagging_options = base_tagger.TaggingOptions(
61
- **data.get('tagging_parameters')
62
- )
63
- tagging_result = llm_tagger.tag(
64
- name=media_name, content=media_bytes, tagging_options=tagging_options
65
- )
66
- return fastapi.responses.JSONResponse(
67
- content=fastapi.encoders.jsonable_encoder(tagging_result.dict())
68
- )
69
- raise ValueError('No path to media is provided.')
53
+ return process_post_request(data)
70
54
 
71
55
 
72
56
  @app.post('/tagger/api')
@@ -81,18 +65,32 @@ async def tag_with_api(
81
65
  Returns:
82
66
  Json results of tagging.
83
67
  """
84
- if not (api_tagger := taggers.get('vision-api')):
85
- api_tagger = tagger.create_tagger('vision-api')
86
- taggers['vision-api'] = api_tagger
68
+ return process_post_request(data)
69
+
70
+
71
+ def process_post_request(
72
+ data: MediaPostRequest,
73
+ ) -> fastapi.responses.JSONResponse:
74
+ """Helper method for performing tagging.
75
+
76
+ Args:
77
+ data: Post request for media tagging.
78
+
79
+ Returns:
80
+ Json results of tagging.
81
+ """
82
+ tagger_type = data.get('tagger_type')
83
+ if not (concrete_tagger := taggers.get(tagger_type)):
84
+ concrete_tagger = tagger.create_tagger(tagger_type)
85
+ taggers[tagger_type] = concrete_tagger
87
86
  if media_url := data.get('media_url'):
88
87
  media_name = utils.convert_path_to_media_name(media_url)
88
+ media_bytes = utils.read_media_as_bytes(media_url)
89
89
  logging.info('Processing media: %s', media_url)
90
- with smart_open.open(media_url, 'rb') as f:
91
- media_bytes = f.read()
92
90
  tagging_options = base_tagger.TaggingOptions(
93
91
  **data.get('tagging_parameters')
94
92
  )
95
- tagging_result = api_tagger.tag(
93
+ tagging_result = concrete_tagger.tag(
96
94
  name=media_name, content=media_bytes, tagging_options=tagging_options
97
95
  )
98
96
  return fastapi.responses.JSONResponse(
@@ -17,6 +17,11 @@ Media tagging sends API requests to tagging engine (i.e. Google Vision API)
17
17
  and returns tagging results that can be easily written.
18
18
  """
19
19
 
20
+ import logging
21
+ import os
22
+ from collections.abc import Sequence
23
+
24
+ from media_tagging import utils
20
25
  from media_tagging.taggers import api, base, llm
21
26
 
22
27
  _TAGGERS = {
@@ -25,12 +30,18 @@ _TAGGERS = {
25
30
  'gemini-image': llm.GeminiImageTagger,
26
31
  'gemini-structured-image': llm.GeminiImageTagger,
27
32
  'gemini-description-image': llm.GeminiImageTagger,
33
+ 'gemini-video': llm.GeminiVideoTagger,
34
+ 'gemini-structured-video': llm.GeminiVideoTagger,
35
+ 'gemini-description-video': llm.GeminiVideoTagger,
28
36
  }
29
37
 
30
38
  _LLM_TAGGERS_TYPES = {
31
39
  'gemini-image': llm.LLMTaggerTypeEnum.UNSTRUCTURED,
32
40
  'gemini-structured-image': llm.LLMTaggerTypeEnum.STRUCTURED,
33
41
  'gemini-description-image': llm.LLMTaggerTypeEnum.DESCRIPTION,
42
+ 'gemini-video': llm.LLMTaggerTypeEnum.UNSTRUCTURED,
43
+ 'gemini-structured-video': llm.LLMTaggerTypeEnum.STRUCTURED,
44
+ 'gemini-description-video': llm.LLMTaggerTypeEnum.DESCRIPTION,
34
45
  }
35
46
 
36
47
 
@@ -58,3 +69,35 @@ def create_tagger(
58
69
  f'Incorrect tagger {type} is provided, '
59
70
  f'valid options: {list(_TAGGERS.keys())}'
60
71
  )
72
+
73
+
74
+ def tag_media(
75
+ media_paths: Sequence[str | os.PathLike],
76
+ tagger_type: base.BaseTagger,
77
+ tagging_parameters: dict[str, str] | None = None,
78
+ ) -> list[base.TaggingResult]:
79
+ """Runs media tagging algorithm.
80
+
81
+ Args:
82
+ media_paths: Local or remote path to media file.
83
+ tagger_type: Initialized tagger.
84
+ tagging_parameters: Optional keywords arguments to be sent for tagging.
85
+
86
+ Returns:
87
+ Results of tagging for all media.
88
+ """
89
+ if not tagging_parameters:
90
+ tagging_parameters = {}
91
+ results = []
92
+ for path in media_paths:
93
+ media_name = utils.convert_path_to_media_name(path)
94
+ logging.info('Processing media: %s', path)
95
+ media_bytes = utils.read_media_as_bytes(path)
96
+ results.append(
97
+ tagger_type.tag(
98
+ media_name,
99
+ media_bytes,
100
+ tagging_options=base.TaggingOptions(**tagging_parameters),
101
+ )
102
+ )
103
+ return results
@@ -46,6 +46,8 @@ class Tag(pydantic.BaseModel):
46
46
  score: Score assigned to the tag.
47
47
  """
48
48
 
49
+ model_config = pydantic.ConfigDict(frozen=True)
50
+
49
51
  name: str = pydantic.Field(description='tag_name')
50
52
  score: float = pydantic.Field(description='tag_score from 0 to 1')
51
53
 
@@ -14,10 +14,14 @@
14
14
  """Module for performing media tagging with LLMs."""
15
15
 
16
16
  import base64
17
+ import dataclasses
17
18
  import enum
19
+ import json
18
20
  import logging
21
+ import tempfile
19
22
  from typing import Final
20
23
 
24
+ import google.generativeai as google_genai
21
25
  import langchain_google_genai as genai
22
26
  from langchain_core import (
23
27
  language_models,
@@ -59,6 +63,9 @@ _UNSTRUCTURED_PROMPT: Final[prompts.ChatPromptTemplate] = (
59
63
  )
60
64
  )
61
65
 
66
+ _UNSTRUCTURED_PROMPT_VIDEO: Final[str] = (
67
+ 'Generate {n_tags} tags for the following video.'
68
+ )
62
69
  _STRUCTURED_PROMPT: Final[prompts.ChatPromptTemplate] = (
63
70
  prompts.ChatPromptTemplate.from_messages(
64
71
  [
@@ -72,6 +79,11 @@ _STRUCTURED_PROMPT: Final[prompts.ChatPromptTemplate] = (
72
79
  )
73
80
  )
74
81
 
82
+ _STRUCTURED_PROMPT_VIDEO: Final[str] = (
83
+ 'Find whether the following tags can be found in the video: {tags}.'
84
+ )
85
+
86
+
75
87
  _DESCRIPTION_PROMPT: Final[prompts.ChatPromptTemplate] = (
76
88
  prompts.ChatPromptTemplate.from_messages(
77
89
  [
@@ -84,6 +96,7 @@ _DESCRIPTION_PROMPT: Final[prompts.ChatPromptTemplate] = (
84
96
  )
85
97
  )
86
98
 
99
+ _DESCRIPTION_PROMPT_VIDEO: Final[str] = 'Describe the following video.'
87
100
 
88
101
  llm_tagger_promps: dict[LLMTaggerTypeEnum, prompts.ChatPromptTemplate] = {
89
102
  LLMTaggerTypeEnum.UNSTRUCTURED: _UNSTRUCTURED_PROMPT,
@@ -91,6 +104,12 @@ llm_tagger_promps: dict[LLMTaggerTypeEnum, prompts.ChatPromptTemplate] = {
91
104
  LLMTaggerTypeEnum.DESCRIPTION: _DESCRIPTION_PROMPT,
92
105
  }
93
106
 
107
+ video_llm_tagger_promps: dict[LLMTaggerTypeEnum, str] = {
108
+ LLMTaggerTypeEnum.UNSTRUCTURED: _UNSTRUCTURED_PROMPT_VIDEO,
109
+ LLMTaggerTypeEnum.STRUCTURED: _STRUCTURED_PROMPT_VIDEO,
110
+ LLMTaggerTypeEnum.DESCRIPTION: _DESCRIPTION_PROMPT_VIDEO,
111
+ }
112
+
94
113
 
95
114
  class LLMTagger(base.BaseTagger):
96
115
  """Tags media via LLM."""
@@ -190,3 +209,95 @@ class GeminiImageTagger(LLMTagger):
190
209
  llm_tagger_type=tagger_type,
191
210
  llm=genai.ChatGoogleGenerativeAI(model=model_name),
192
211
  )
212
+
213
+
214
+ class GeminiVideoTagger(LLMTagger):
215
+ """Tags video based on Gemini."""
216
+
217
+ def __init__(
218
+ self,
219
+ tagger_type: LLMTaggerTypeEnum,
220
+ model_name: str = 'models/gemini-1.5-flash',
221
+ ) -> None:
222
+ """Initializes GeminiVideoTagger.
223
+
224
+ Args:
225
+ tagger_type: Type of LLM tagger.
226
+ model_name: Name of the model to perform the tagging.
227
+ """
228
+ self.llm_tagger_type = tagger_type
229
+ self.model_name = model_name
230
+
231
+ @property
232
+ def model(self) -> google_genai.GenerativeModel:
233
+ """Initializes GenerativeModel."""
234
+ return google_genai.GenerativeModel(model_name=self.model_name)
235
+
236
+ @override
237
+ def tag(
238
+ self,
239
+ name: str,
240
+ content: bytes,
241
+ tagging_options: base.TaggingOptions = base.TaggingOptions(),
242
+ ):
243
+ logging.debug('Tagging video "%s" with GeminiVideoTagger', name)
244
+ with tempfile.NamedTemporaryFile(suffix='.mp4') as f:
245
+ f.write(content)
246
+ try:
247
+ video_file = google_genai.upload_file(f.name)
248
+
249
+ result = self.model.generate_content(
250
+ [
251
+ video_file,
252
+ '\n\n',
253
+ f'{self.format_prompt(tagging_options)} ',
254
+ ],
255
+ generation_config=google_genai.GenerationConfig(
256
+ response_mime_type='application/json',
257
+ response_schema=self.response_schema,
258
+ ),
259
+ )
260
+ finally:
261
+ video_file.delete()
262
+
263
+ if self.llm_tagger_type == LLMTaggerTypeEnum.DESCRIPTION:
264
+ return base.TaggingResult(
265
+ identifier=name,
266
+ type='video',
267
+ content=base.Description(text=json.loads(result.text).get('text')),
268
+ )
269
+ tags = [
270
+ base.Tag(name=r.get('name'), score=r.get('score'))
271
+ for r in json.loads(result.text)
272
+ ]
273
+ return base.TaggingResult(identifier=name, type='video', content=tags)
274
+
275
+ def format_prompt(self, tagging_options: base.TaggingOptions) -> str:
276
+ """Builds correct prompt to send to LLM.
277
+
278
+ Prompt contains format instructions to get output result.
279
+
280
+ Args:
281
+ tagging_options: Parameters to refine the prompt.
282
+
283
+ Returns:
284
+ Formatted prompt.
285
+ """
286
+ base_prompt = video_llm_tagger_promps[self.llm_tagger_type]
287
+ formatting_instructions = (
288
+ ' For each tag provide name and a score from 0 to 1 '
289
+ 'where 0 is tag absence and 1 complete tag presence.'
290
+ )
291
+ prompt = base_prompt.format(**dataclasses.asdict(tagging_options))
292
+ if self.llm_tagger_type == LLMTaggerTypeEnum.DESCRIPTION:
293
+ return prompt
294
+ return prompt + formatting_instructions
295
+
296
+ @property
297
+ def response_schema(self) -> list[base.Tag] | base.Description:
298
+ """Generates correct response schema based on type of LLM tagger."""
299
+ return (
300
+ base.Description
301
+ if self.llm_tagger_type == LLMTaggerTypeEnum.DESCRIPTION
302
+ else list[base.Tag]
303
+ )
@@ -14,14 +14,13 @@
14
14
  """Exposes media tagger as a tool for Langchain agents."""
15
15
 
16
16
  import langchain_core
17
- import smart_open
18
17
 
19
18
  from media_tagging import tagger, utils
20
19
  from media_tagging.taggers import base as base_tagger
21
20
 
22
21
 
23
22
  class MediaTaggingInput(langchain_core.pydantic_v1.BaseModel):
24
- """Input for text categorization."""
23
+ """Input for media tagging."""
25
24
 
26
25
  tagger_type: str = langchain_core.pydantic_v1.Field(
27
26
  description='Type of media tagger'
@@ -32,16 +31,14 @@ class MediaTaggingInput(langchain_core.pydantic_v1.BaseModel):
32
31
 
33
32
 
34
33
  class MediaTaggingResults(langchain_core.tools.BaseTool):
35
- """Tools that performs text categorization.
34
+ """Tools that performs media tagging.
36
35
 
37
36
  Attributes:
38
- llm_parameters: Parameter for LLM initialization.
39
37
  name: Name of the tool.
40
38
  description: Description the tool.
41
39
  args_schema: Input model for the tool.
42
40
  """
43
41
 
44
- llm_parameters: dict[str, str] = {'model': 'gemini-1.5-flash'}
45
42
  name: str = 'media_tagging_results_json'
46
43
  description: str = 'tag media (image or videos)'
47
44
  args_schema: type[langchain_core.pydantic_v1.BaseModel] = MediaTaggingInput
@@ -51,7 +48,7 @@ class MediaTaggingResults(langchain_core.tools.BaseTool):
51
48
  tagger_type: str,
52
49
  media_url: str,
53
50
  ) -> list[dict[str, str]]:
54
- """Performs media tagging based on LLM and vectorstore.
51
+ """Performs media tagging based on selected tagger.
55
52
 
56
53
  Args:
57
54
  tagger_type: Type of tagger to use for media tagging.
@@ -62,8 +59,7 @@ class MediaTaggingResults(langchain_core.tools.BaseTool):
62
59
  """
63
60
  media_tagger = tagger.create_tagger(tagger_type)
64
61
  media_name = utils.convert_path_to_media_name(media_url)
65
- with smart_open.open(media_url, 'rb') as f:
66
- media_bytes = f.read()
62
+ media_bytes = utils.read_media_as_bytes(media_url)
67
63
  tagging_options = base_tagger.TaggingOptions()
68
64
  return media_tagger.tag(
69
65
  name=media_name, content=media_bytes, tagging_options=tagging_options
@@ -13,8 +13,18 @@
13
13
  # limitations under the License.
14
14
  """Various utils."""
15
15
 
16
+ import os
17
+
18
+ import smart_open
19
+
16
20
 
17
21
  def convert_path_to_media_name(media_path: str) -> str:
18
22
  """Extracts file name without extension."""
19
23
  base_name = media_path.split('/')[-1]
20
24
  return base_name.split('.')[0]
25
+
26
+
27
+ def read_media_as_bytes(media_path: str | str | os.PathLike) -> bytes:
28
+ """Reads media content from local or remote storage."""
29
+ with smart_open.open(media_path, 'rb') as f:
30
+ return f.read()
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: media-tagging
3
- Version: 0.2.0.dev2
3
+ Version: 0.3.0.dev1
4
4
  Author: Google Inc. (gTech gPS CSE team)
5
5
  Author-email: no-reply@google.com
6
6
  License: Apache 2.0
@@ -11,3 +11,15 @@ Classifier: Topic :: Software Development :: Libraries :: Python Modules
11
11
  Classifier: Operating System :: OS Independent
12
12
  Classifier: License :: OSI Approved :: Apache Software License
13
13
  Description-Content-Type: text/markdown
14
+ Requires-Dist: fastapi==0.111.0
15
+ Requires-Dist: pillow
16
+ Requires-Dist: google-cloud-vision
17
+ Requires-Dist: google-cloud-videointelligence
18
+ Requires-Dist: smart_open
19
+ Requires-Dist: google-ads-api-report-fetcher==1.14.3
20
+ Requires-Dist: langchain==0.2.7
21
+ Requires-Dist: langchain-core==0.2.21
22
+ Requires-Dist: langchain-community==0.2.7
23
+ Requires-Dist: langchain-google-genai==1.0.7
24
+ Requires-Dist: langchain-google-vertexai
25
+ Requires-Dist: jq
@@ -1,11 +1,12 @@
1
+ fastapi==0.111.0
1
2
  google-ads-api-report-fetcher==1.14.3
2
3
  google-cloud-videointelligence
3
4
  google-cloud-vision
4
5
  jq
5
- langchain
6
- langchain-community
7
- langchain-core
8
- langchain-google-genai
6
+ langchain-community==0.2.7
7
+ langchain-core==0.2.21
8
+ langchain-google-genai==1.0.7
9
9
  langchain-google-vertexai
10
+ langchain==0.2.7
10
11
  pillow
11
12
  smart_open
@@ -17,7 +17,7 @@ from setuptools import find_packages, setup
17
17
 
18
18
  setup(
19
19
  name='media-tagging',
20
- version='0.2.0dev2',
20
+ version='0.3.0dev1',
21
21
  long_description_content_type='text/markdown',
22
22
  author='Google Inc. (gTech gPS CSE team)',
23
23
  author_email='no-reply@google.com',
@@ -32,15 +32,16 @@ setup(
32
32
  ],
33
33
  packages=find_packages(),
34
34
  install_requires=[
35
+ 'fastapi==0.111.0',
35
36
  'pillow',
36
37
  'google-cloud-vision',
37
38
  'google-cloud-videointelligence',
38
39
  'smart_open',
39
40
  'google-ads-api-report-fetcher==1.14.3',
40
- 'langchain',
41
- 'langchain-core',
42
- 'langchain-community',
43
- 'langchain-google-genai',
41
+ 'langchain==0.2.7',
42
+ 'langchain-core==0.2.21',
43
+ 'langchain-community==0.2.7',
44
+ 'langchain-google-genai==1.0.7',
44
45
  'langchain-google-vertexai',
45
46
  'jq',
46
47
  ],
@@ -15,8 +15,7 @@
15
15
  import json
16
16
  import pathlib
17
17
 
18
- from entrypoints import cli
19
- from media_tagging import writer
18
+ from media_tagging import tagger, writer
20
19
  from media_tagging.taggers import api
21
20
 
22
21
  _SCRIPT_DIR = pathlib.Path(__file__).parent
@@ -32,13 +31,13 @@ def test_image_tagging(fake_tagger, mocker):
32
31
  concrete_writer = writer.JsonWriter()
33
32
  image_path = f'{_SCRIPT_DIR}/../unit/data/test_image.jpg'
34
33
  image_name = 'test'
35
- cli.tag_media(
36
- media_path=image_path,
34
+ tagging_results = tagger.tag_media(
35
+ media_paths=[image_path],
37
36
  tagger_type=concrete_tagger,
38
- writer_type=concrete_writer,
39
37
  )
38
+ concrete_writer.write(tagging_results, 'test')
40
39
  with open('test.json', 'r', encoding='utf-8') as f:
41
- data = json.load(f)
40
+ data = json.load(f)[0]
42
41
 
43
42
  assert data.get('identifier') == image_name
44
43
  assert data.get('type') == 'image'
@@ -1,48 +0,0 @@
1
- # Welltech Media Tagging
2
-
3
- ## Prerequisites
4
-
5
- * Google Cloud project with billing enabled.
6
- * [Video Intelligence API](https://console.cloud.google.com/apis/library/videointelligence.googleapis.com) and [Vision API](https://console.cloud.google.com/apis/library/vision.googleapis.com) enabled.
7
- * Python3.8+
8
- * Access to repository configured. In order to clone this repository you need
9
- to do the following:
10
- * Visit https://professional-services.googlesource.com/new-password and
11
- login with your account.
12
- * Once authenticated please copy all lines in box
13
- and paste them in the terminal.
14
-
15
-
16
- ## Run
17
-
18
-
19
- 1. Install `media-tagger`
20
-
21
- ```
22
- pip install media-tagging
23
- ```
24
-
25
- 2. Perform tagging
26
-
27
- ```
28
- media-tagger --media-path MEDIA_PATH --tagger TAGGER_TYPE --writer WRITER_TYPE
29
- ```
30
- where:
31
- * MEDIA_PATH - comma-separated names of files for tagging (can be urls).
32
- * TAGGER_TYPE - name of tagger, supported options:
33
- * `vision-api` - tags images based on [Google Cloud Vision API](https://cloud.google.com/vision/),
34
- * `video-api` for videos based on [Google Cloud Video Intelligence API](https://cloud.google.com/video-intelligence/)
35
- * `gemini-image` - Uses Gemini to tags images. Add `--tagger.n_tags=<N_TAGS>`
36
- parameter to control number of tags returned by tagger.
37
- * `gemini-structured-image` - Uses Gemini to find certain tags in the images.
38
- Add `--tagger.tags='tag1, tag2, ..., tagN` parameter to find certain tags
39
- in the image.
40
- * `gemini-description-image` - Provides brief description of the image,
41
- * WRITER_TYPE - name of writer, one of `csv`, `json`
42
-
43
- By default script will create a single file with tagging results for each media_path.
44
- If you want to combine results into a single file add `--output OUTPUT_NAME` flag (without extension, i.e. `--output tagging_sample`.
45
-
46
-
47
- ## Disclaimer
48
- This is not an officially supported Google product.