@groupby/ai-dev 0.5.5 → 0.5.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (43) hide show
  1. package/package.json +1 -1
  2. package/teams/OOF/skills/jira-ticket-creator/README.md +22 -0
  3. package/teams/OOF/skills/jira-ticket-creator/SKILL.md +266 -0
  4. package/teams/fhr-ai-team/github/PULL_REQUEST_TEMPLATE/full.md +31 -0
  5. package/teams/fhr-ai-team/github/PULL_REQUEST_TEMPLATE/light.md +7 -0
  6. package/teams/fhr-ai-team/github/copilot-instructions.md +24 -0
  7. package/teams/fhr-ai-team/github/instructions/python.instructions.md +23 -0
  8. package/teams/fhr-ai-team/github/pull_request_template.md +21 -0
  9. package/teams/fhr-ai-team/prompts/brainstorm.md +7 -0
  10. package/teams/fhr-ai-team/prompts/plan-algo-tests.md +7 -0
  11. package/teams/fhr-ai-team/prompts/plan.md +7 -0
  12. package/teams/fhr-ai-team/prompts/pr-description.md +7 -0
  13. package/teams/fhr-ai-team/prompts/test.md +7 -0
  14. package/teams/fhr-ai-team/resources/AGENTS.md +55 -0
  15. package/teams/fhr-ai-team/resources/CLAUDE.md +52 -0
  16. package/teams/fhr-ai-team/resources/README.md +51 -0
  17. package/teams/fhr-ai-team/resources/claude-code-setup.md +60 -0
  18. package/teams/fhr-ai-team/resources/copilot-setup.md +64 -0
  19. package/teams/fhr-ai-team/resources/onboarding.md +179 -0
  20. package/teams/fhr-ai-team/resources/opencode-install.md +29 -0
  21. package/teams/fhr-ai-team/resources/opencode-setup.md +43 -0
  22. package/teams/fhr-ai-team/skills/algo-test-planning/SKILL.md +192 -0
  23. package/teams/fhr-ai-team/skills/algo-test-planning/references/pipeline-registry.md +280 -0
  24. package/teams/fhr-ai-team/skills/brainstorming/SKILL.md +111 -0
  25. package/teams/fhr-ai-team/skills/e2e-testing/SKILL.md +163 -0
  26. package/teams/fhr-ai-team/skills/grill-me/SKILL.md +10 -0
  27. package/teams/fhr-ai-team/skills/ml-tooling-dev/SKILL.md +313 -0
  28. package/teams/fhr-ai-team/skills/ml-tooling-dev/references/kubectl-debug.md +165 -0
  29. package/teams/fhr-ai-team/skills/ml-tooling-dev/references/mongodb-config.md +218 -0
  30. package/teams/fhr-ai-team/skills/ml-tooling-dev/references/pipeline-configs.md +190 -0
  31. package/teams/fhr-ai-team/skills/ml-tooling-dev/references/pipeline-steps.md +182 -0
  32. package/teams/fhr-ai-team/skills/ml-tooling-dev/scripts/kf_logs.py +203 -0
  33. package/teams/fhr-ai-team/skills/ml-tooling-dev/scripts/kf_query.py +233 -0
  34. package/teams/fhr-ai-team/skills/ml-tooling-dev/scripts/kf_wait.py +195 -0
  35. package/teams/fhr-ai-team/skills/ml-tooling-dev/scripts/mlflow_query.py +252 -0
  36. package/teams/fhr-ai-team/skills/ml-tooling-dev/scripts/mongo_predictor.py +352 -0
  37. package/teams/fhr-ai-team/skills/naming-conventions-reviewer/SKILL.md +230 -0
  38. package/teams/fhr-ai-team/skills/naming-conventions-reviewer/references/dataset-naming.md +190 -0
  39. package/teams/fhr-ai-team/skills/naming-conventions-reviewer/references/domain-vocabulary.md +447 -0
  40. package/teams/fhr-ai-team/skills/naming-conventions-reviewer/references/repo-dependency-graph.md +264 -0
  41. package/teams/fhr-ai-team/skills/planning/SKILL.md +138 -0
  42. package/teams/fhr-ai-team/skills/pr-description/SKILL.md +94 -0
  43. package/teams/snpd/skills/code-review-github/SKILL.md +475 -0
@@ -0,0 +1,447 @@
1
+ # Domain Vocabulary & Variable Naming Conventions
2
+
3
+ ## Table of Contents
4
+ - [Language-Specific Casing Rules](#language-casing)
5
+ - [E-commerce Domain Terms](#ecommerce-terms)
6
+ - [Search & Recommendation Terms](#search-terms)
7
+ - [ML Model Terms](#ml-terms)
8
+ - [Pipeline & Orchestration Terms](#pipeline-terms)
9
+ - [Image Processing Terms](#image-terms)
10
+ - [User Behavior Terms](#user-terms)
11
+ - [Configuration Naming Patterns](#config-patterns)
12
+ - [Identifier Patterns](#id-patterns)
13
+ - [Algorithm Name Registry](#algorithm-names)
14
+ - [Service Model Name Registry](#service-model-names)
15
+ - [KPI Template Names](#kpi-templates)
16
+
17
+ ## Language-Specific Casing Rules {#language-casing}
18
+
19
+ | Context | Convention | Example |
20
+ |---|---|---|
21
+ | Python variables | snake_case | `item_id`, `query_data` |
22
+ | Python constants | SCREAMING_SNAKE_CASE | `ITEM_ID_LABEL`, `FM_ALGO_NAME` |
23
+ | Python classes | PascalCase | `QueryDataDataset`, `SearchModelConfig` |
24
+ | Scala variables | camelCase | `itemId`, `tenantId` |
25
+ | Scala classes | PascalCase | `ItemDataDatasetRecord`, `TenantKey` |
26
+ | JSON/YAML keys | camelCase | `"itemDataDataLoaderConfig"`, `"modelConfig"` |
27
+ | Parquet columns | camelCase | `"tenantId"`, `"itemKind"`, `"sortedNbUniqueSearches"` |
28
+ | GCS paths | kebab-case | `item-data-dataset-preprocessing-dataframe` |
29
+ | Docker images | kebab-case | `algo-fm-batch`, `semantic-search` |
30
+ | Strategy IDs | kebab-case | `semantic-search-learning`, `item-images-single-encoding` |
31
+ | Environment vars | SCREAMING_SNAKE_CASE | `POSTGRES_KPI_DB_HOST` |
32
+ | Algorithm names (constants) | kebab-case values | `FM_ALGO_NAME = "fm"`, `SEARCH_ALGO_NAME = "search"` |
33
+
34
+ ## E-commerce Domain Terms {#ecommerce-terms}
35
+
36
+ ### Core Entity Names
37
+ | Canonical Name | Python | Scala | JSON/Parquet | Notes |
38
+ |---|---|---|---|---|
39
+ | Item | `item` | `Item` | `item` | Primary product entity (NOT "product" in ML code) |
40
+ | Item ID | `item_id` | `itemId` / `ItemId` | `"itemId"` | Scala has `case class ItemId(kind, value)` |
41
+ | Item Kind | `item_kind` | `itemKind` / `ItemKind` | `"itemKind"` | Item type classifier |
42
+ | Tenant | `tenant` | `Tenant` | `tenant` | Customer/account |
43
+ | Tenant ID | `tenant_id` | `tenantId` / `TenantId` | `"tenantId"` | |
44
+ | Variant | `variant` | `Variant` | `variant` | Product variant |
45
+ | Variant ID | `variant_id` | `variantId` | `"variantId"` | |
46
+ | Category | `categories` | `categories` | `"categories"` | Always plural |
47
+ | Attribute | `attributes`, `named_attributes` | `attributes`, `namedAttributes` | `"attributes"`, `"namedAttributes"` | |
48
+ | Locale | `locale` | `locale` | `"locale"` | Language/region code |
49
+
50
+ ### Label Constants (Python)
51
+ ```python
52
+ ITEM_ID_LABEL = "itemId"
53
+ ITEM_KIND_LABEL = "itemKind"
54
+ TENANT_ID_LABEL = "tenantId"
55
+ CATEGORIES_LABEL = "categories"
56
+ ATTRIBUTES_LABEL = "attributes"
57
+ LOCALE_LABEL = "locale"
58
+ SEO_KEYPHRASES_OPT_LABEL = "itemSEOKeyPhrasesOpt"
59
+ ```
60
+
61
+ **Pattern**: `{ENTITY}_{FIELD}_LABEL = "{camelCaseFieldName}"` - Maps Python constant to parquet/JSON column name.
62
+
63
+ ## Search & Recommendation Terms {#search-terms}
64
+
65
+ ### Query-Related
66
+ | Canonical Name | Constant | Value |
67
+ |---|---|---|
68
+ | Query | `QUERY_LABEL` | `"query"` |
69
+ | Query Token Indices | `QUERY_TOKEN_INDICES_LABEL` | `"queryTokenIndices"` |
70
+ | Query Encoding Service | `QUERY_ENCODING_SERVICE_MODEL_NAME` | `"query-encoding-service-model"` |
71
+ | Sorted Tenant Item Keys | `TENANT_ITEM_IDS_LABEL` | `"sortedTenantItemIds"` |
72
+ | Sorted Tenant Item ID Locales | `TENANT_ITEM_ID_LOCALES_LABEL` | `"sortedTenantItemIdLocales"` |
73
+ | Unique Searches Count | `NB_UNIQUE_SEARCHES_LABEL` | `"sortedNbUniqueSearches"` |
74
+
75
+ ### Embedding/Vector Terms
76
+ | Term | Usage |
77
+ |---|---|
78
+ | `embeddings` / `encodings` | Used interchangeably but `encodings` is more common in this codebase |
79
+ | `item_embeddings` | Item vector representations |
80
+ | `query_encodings` | Query vector representations |
81
+ | `input_ids` | Tokenizer output IDs |
82
+ | `attention_mask` | Transformer attention mask |
83
+ | `query_item_similarities` | Similarity scores between queries and items |
84
+ | `embedding_dimension` | Vector dimensionality |
85
+
86
+ ### Model Types
87
+ | Constant | Value |
88
+ |---|---|
89
+ | `ONNX_MODEL_TYPE` | `"onnx"` |
90
+ | `PYTORCH_MODEL_TYPE` | `"pytorch"` |
91
+ | `MODEL_TYPE_KWARG` | `"model_type"` |
92
+
93
+ ## ML Model Terms {#ml-terms}
94
+
95
+ ### Config Class Naming (VERY CONSISTENT)
96
+ Pattern: `{Domain}{Purpose}Config`
97
+
98
+ | Pattern | Examples |
99
+ |---|---|
100
+ | `*ModelConfig` | `SearchModelConfig`, `SemanticSearchModelConfig` |
101
+ | `*LearningAlgorithmConfig` | `SemanticSearchLearningAlgorithmConfig`, `LlmSearchLearningAlgorithmConfig` |
102
+ | `*EvaluatorConfig` | `SearchEvaluatorConfig` |
103
+ | `*PreProcessingPipelineConfig` | `SearchPreProcessingPipelineConfig` |
104
+ | `*BatchConfig` | `SearchLearningBatchConfig`, `TransformerTaggingLearningBatchConfig` |
105
+
106
+ ### Parameter Constants
107
+ Pattern: `{NAME}_PARAM = *Param(...)`
108
+
109
+ ```python
110
+ # Common params
111
+ BATCH_SIZE_PARAM
112
+ LEARNING_RATE_PARAM
113
+ WARMUP_STEPS_PARAM
114
+ OUTPUT_DIR_PARAM
115
+ LOGGING_DIR_PARAM
116
+ PER_DEVICE_TRAIN_BATCH_SIZE_PARAM
117
+ GRADIENT_ACCUMULATION_STEPS_PARAM
118
+ DATALOADER_NUM_WORKERS_PARAM
119
+
120
+ # Optional params use _OPT_ suffix
121
+ NB_OBSERVATION_BY_TENANT_OPT_PARAM
122
+ TF32_OPT_PARAM
123
+ EVAL_STEPS_OPT_PARAM
124
+ DATALOADER_PERSISTENT_WORKERS_OPT_PARAM
125
+
126
+ # Config-level params
127
+ MODEL_CONFIG_PARAM
128
+ LEARNING_CONFIG_PARAM
129
+ TRAINING_ARGUMENTS_CONFIG_PARAM
130
+ ITEM_DATA_LOADER_CONFIG_PARAM
131
+ EVALUATOR_CONFIG_PARAM
132
+ PIPELINE_CONFIG_PARAM
133
+ ```
134
+
135
+ ### Model Registry
136
+ | Constant | Value |
137
+ |---|---|
138
+ | `SEMANTIC_SEARCH_MODEL_NAME` | `"semantic-search-model-onnx"` |
139
+ | `SEMANTIC_SEARCH_PYTORCH_MODEL_NAME` | `"semantic-search-model"` |
140
+ | `SEMANTIC_SEARCH_QUANTIZED_MODEL_NAME` | `"semantic-search-model-quantized-onnx"` |
141
+
142
+ ### Predictor/Strategy (Scala Domain)
143
+ ```scala
144
+ case class BatchParams(predictorId: String, ...)
145
+ sealed trait Predictor extends WithLogPrefix
146
+ object SubPredictor
147
+ object AggregatorPredictor
148
+ case class TenantKey(id: String, name: String, algorithmName: String)
149
+ case class StandalonePredictorKey(id: String, name: String, algorithmName: String)
150
+ ```
151
+
152
+ ## Pipeline & Orchestration Terms {#pipeline-terms}
153
+
154
+ ### Pipeline Types
155
+ | Pipeline Name | Description |
156
+ |---|---|
157
+ | `python_batch_pipeline` | Standard Python ML batch |
158
+ | `scala_batch_pipeline` | Scala/Spark batch |
159
+ | `spark_scala_batch_pipeline` | Spark Scala on Dataproc |
160
+ | `large_python_batch_pipeline` | Large-scale Python batch |
161
+
162
+ ### Pipeline Task Naming
163
+ Pattern: `create_{step_name}_task`
164
+
165
+ Common steps:
166
+ - `create_preprocessing_task`
167
+ - `create_query_dataset_generation_task`
168
+ - `create_item_data_dataset_preprocessing_task`
169
+ - `create_items_encoding_task`
170
+ - `create_learning_task`
171
+ - `create_evaluation_task`
172
+ - `create_ftp_exporter_task`
173
+ - `create_fhr_exporter_task`
174
+
175
+ ### Config Parameters (YAML)
176
+ | Parameter | Description |
177
+ |---|---|
178
+ | `predictor_id` | MongoDB ObjectId for predictor |
179
+ | `strategy_id` | Strategy identifier (kebab-case) |
180
+ | `image_name` | Docker image reference |
181
+ | `pipeline_name` | Pipeline template name |
182
+ | `version_name` | Config version (e.g., "0.1.269") |
183
+ | `experiment_name` | Human-readable experiment name |
184
+ | `algo_name` | Algorithm identifier |
185
+
186
+ ### Batch Config Arguments (camelCase)
187
+ | Argument | Description |
188
+ |---|---|
189
+ | `preprocessingRootPath` | Root GCS path for preprocessing |
190
+ | `modelRootPath` | Root GCS path for model artifacts |
191
+ | `itemDataPreprocessingRootPath` | Item data preprocessing path |
192
+ | `queryDatasetPreprocessingDirectoryPath` | Query dataset prep path |
193
+ | `itemDataDatasetPreprocessingDirectoryPath` | Item dataset prep path |
194
+ | `itemImageEncodingsDirectoryPath` | Image encoding path |
195
+ | `outputRootPath` | Output directory root |
196
+ | `yoloModelName` | YOLO model identifier |
197
+ | `itemOutfitImagePathsListPath` | Outfit image paths |
198
+
199
+ ### Resource Config Keys
200
+ | Key | Example Values |
201
+ |---|---|
202
+ | `cpu` | `"11"`, `"1000m"` |
203
+ | `memory` | `"24G"`, `"2G"` |
204
+ | `gpu` | `1`, `2` |
205
+ | `gpu_vendor` | `"nvidia.com/gpu"` |
206
+ | `gpu_accelerator_name` | `"nvidia-l4"` |
207
+ | `java_memory` | `"8g"` |
208
+ | `timeout_ms` / `timeout_s` | Timeout values |
209
+
210
+ ## Image Processing Terms {#image-terms}
211
+
212
+ ### Image Type Names
213
+ | Name | Description |
214
+ |---|---|
215
+ | `stdImages` | Standard product images |
216
+ | `cropImages` | Cropped images |
217
+ | `cutoutImages` | Transparent background cutout images |
218
+ | `topTotalImages` | Top/total view images |
219
+ | `otherImages` | Miscellaneous images |
220
+
221
+ ### Image Processing Constants
222
+ | Constant | Value |
223
+ |---|---|
224
+ | `IMAGE_ENCODINGS_LABEL` | `"imageEncodings"` |
225
+ | `CLOSEUP_IMAGE_ENCODINGS_LABEL` | `"closeupImageEncodings"` |
226
+ | `IMAGE_ENCODER_ALGORITHM_NAME` | `"image-encoder"` |
227
+ | `IMAGE_ENCODING_SERVICE_MODEL_NAME` | `"image-encoding-service-model"` |
228
+
229
+ ### Cutout/Detection Functions
230
+ - `get_item_image_cutout_bounding_boxes()`
231
+ - `get_item_image_cutout_segmentation_masks()`
232
+ - `item_image_cutout_bounding_box` (module path)
233
+ - `item_image_cutout_segmentation_mask` (module path)
234
+
235
+ ## User Behavior Terms {#user-terms}
236
+
237
+ ### Core Identifiers
238
+ | Python | Scala | JSON/Parquet |
239
+ |---|---|---|
240
+ | `user_id` | `userId` | `"userId"` |
241
+ | `tenant_id` | `tenantId` | `"tenantId"` |
242
+ | `session_id` | `sessionId` | `"sessionId"` |
243
+
244
+ ### Activity Models (Scala)
245
+ - `Activity`, `ItemActivity`, `ProfileActivity`
246
+ - `case class ItemActivity(itemId: ItemId, action: String, ...)`
247
+
248
+ ### Environment Variables
249
+ Pattern: `POSTGRES_{SERVICE}_DB_{PARAM}`
250
+ - `POSTGRES_KPI_DB_HOST`
251
+ - `POSTGRES_SESSIONS_DB_PORT`
252
+ - `POSTGRES_AI_INFERENCE_DATA_DB_DATABASE_NAME`
253
+
254
+ ## Configuration Naming Patterns {#config-patterns}
255
+
256
+ ### Config File Locations
257
+ Always in `config/batch.py` or `config/batches.py`:
258
+ - `*LearningBatchConfig` class
259
+ - `*EvaluationBatchConfig` class
260
+ - Related parameter definitions
261
+
262
+ ### JSON Config Keys
263
+ Always **camelCase**:
264
+ ```json
265
+ {
266
+ "chunkSize": 100,
267
+ "nbParallelProcesses": 4,
268
+ "llmConfig": {...},
269
+ "itemDataDataLoaderConfig": {...},
270
+ "queryAugmentationBatchSize": 32,
271
+ "maxNumWorkers": 8,
272
+ "modelName": "...",
273
+ "temperature": 0.7,
274
+ "maxTokens": 512
275
+ }
276
+ ```
277
+
278
+ ### Serialization Label Pattern
279
+ Suffix: `*Label` for data field name constants
280
+ ```python
281
+ TenantIdLabel = "tenantId"
282
+ ItemsSamplePercentageLabel = "itemsSamplePercentage"
283
+ LocalesFilterLabel = "localesFilter"
284
+ UseTokenLanguageLabel = "useTokenLanguage"
285
+ ```
286
+
287
+ ## Identifier Patterns {#id-patterns}
288
+
289
+ ### ID Type Summary
290
+ | Concept | Python | Scala | JSON |
291
+ |---|---|---|---|
292
+ | Item ID | `item_id` | `itemId` / `ItemId(kind, value)` | `"itemId"` |
293
+ | Tenant ID | `tenant_id` | `tenantId` / `TenantId` | `"tenantId"` |
294
+ | User ID | `user_id` | `userId` | `"userId"` |
295
+ | Predictor ID | `predictor_id` | `predictorId` / `PredictorId` | `"predictorId"` |
296
+ | Strategy ID | `strategy_id` | `strategyId` | `"strategyId"` |
297
+ | Session ID | `session_id` | `sessionId` | `"sessionId"` |
298
+ | Image ID | `image_id` | `imageId` | `"imageId"` |
299
+ | Variant ID | `variant_id` | `variantId` | `"variantId"` |
300
+
301
+ ### Scala Key Classes
302
+ ```scala
303
+ case class TenantKey(id: String, name: String, algorithmName: String)
304
+ case class StandalonePredictorKey(id: String, name: String, algorithmName: String)
305
+ case class SubPredictorKey(id: String, ...)
306
+ case class ItemId(kind: String, value: String)
307
+ // Default: DefaultItemId(s"item-$value")
308
+ ```
309
+
310
+ ## Algorithm Name Registry {#algorithm-names}
311
+
312
+ Source: `attraqt-kubeflow-pipelines/kubeflow_pipelines/pipelines/utils/constants.py`
313
+
314
+ | Constant | Value | Domain |
315
+ |---|---|---|
316
+ | `FM_ALGO_NAME` | `"fm"` | Factorization Machines |
317
+ | `ALS_ALGO_NAME` | `"als"` | Alternating Least Squares |
318
+ | `BASIC_ALGO_NAME` | `"basic"` | Basic scoring (popularity, trendiness) |
319
+ | `CONTENT_BASED_ALGO_NAME` | `"content-based"` | Content-based filtering |
320
+ | `GRAPH_ALGO_NAME` | `"graph"` | Graph algorithms (FP-growth) |
321
+ | `NLP_ALGO_NAME` | `"nlp"` | NLP preprocessing (tokenization) |
322
+ | `AUTOCOMPLETE_ALGO_NAME` | `"autocomplete"` | Query autocompletion |
323
+ | `SEARCH_ALGO_NAME` | `"search"` | Semantic Search |
324
+ | `TAGGING_ALGO_NAME` | `"tagging"` | Item Tagging |
325
+ | `COMPUTER_VISION_ALGO_NAME` | `"computer-vision"` | Image encoding |
326
+ | `CLIP_ALGO_NAME` | `"clip"` | CLIP Vision-Language |
327
+ | `SAM_ALGO_NAME` | `"sam"` | Segment Anything |
328
+ | `YOLO_ALGO_NAME` | `"yolo"` | Object Detection |
329
+ | `GPT_ALGO_NAME` | `"gpt"` | GPT Models |
330
+ | `SHOP_THE_LOOK_ALGO_NAME` | `"shop-the-look"` | Shop The Look |
331
+ | `GIBBERISH_DETECTOR_ALGO_NAME` | `"gibberish-detector"` | Gibberish detection |
332
+ | `MERCH_AGENT_ALGO_NAME` | `"merch-agent"` | Merchandising agent |
333
+ | `PASS_THROUGH_ALGO_NAME` | `"pass-through"` | Model import/pass-through |
334
+
335
+ ## Strategy ID Registry {#strategy-ids}
336
+
337
+ Strategy IDs use **kebab-case** and follow the pattern `{domain}-{operation}`.
338
+
339
+ ### Learning Strategies
340
+ - `learning` - Generic learning
341
+ - `semantic-search-learning` - Semantic search training
342
+ - `text-encoder-learning` - Text encoder fine-tuning
343
+ - `image-classifier-learning` - Image classifier training
344
+ - `transformer-tagging-learning` - Transformer-based tagging
345
+ - `clip-learning` - CLIP model training
346
+ - `visual-search-learning` - Visual search training
347
+ - `segmentation-learning` - Segmentation learning
348
+ - `sam-item-segmentation` - SAM item segmentation
349
+ - `shopping-graph-learning` - Shopping graph training
350
+ - `global-learning`, `complementarity-learning` - FM model training
351
+ - `macro-tags-learning`, `macro-tags-learning-albert-base` - Macro tag classification
352
+ - `item-images-outfit-detection` - Outfit detection learning
353
+ - `learning-denoise` - Denoising learning
354
+
355
+ ### Evaluation Strategies
356
+ - `evaluation` - Generic evaluation
357
+ - `search-evaluation` - Search evaluation
358
+ - `image-classifier-evaluation` - Image classifier evaluation
359
+ - `visual-search-evaluation` - Visual search evaluation
360
+ - `transformer-tagging-evaluation` - Tagging evaluation
361
+ - `segmentation-evaluation`, `segmentation-calibration` - Segmentation eval/calibration
362
+ - `item-images-outfit-evaluation` - Outfit detection evaluation
363
+
364
+ ### Preprocessing Strategies (NOTE: legacy uses underscores)
365
+ - `query_dataset_preprocessing` - Query dataset generation
366
+ - `item_data_dataset_preprocessing` - Item data preprocessing
367
+ - `search_item_data_dataset_preprocessing` - Search-specific item data
368
+ - `clip_item_data_dataset_preprocessing` - CLIP-specific item data
369
+ - `gpt_item_data_dataset_preprocessing` - GPT-specific item data
370
+ - `character_tokenizer_preprocessing`, `word_tokenizer_preprocessing` - NLP tokenization
371
+ - `global_preprocessing`, `complementarity_preprocessing` - FM preprocessing
372
+ - `recommend_to_user_evaluation_preprocessing`, `recommend_to_items_evaluation_preprocessing` - Eval data prep
373
+ - `item-image-cutout-images-preprocessing` - Image cutout preprocessing (kebab-case)
374
+ - `item-images-preprocessing`, `item-images-single-encoding` - Image preprocessing
375
+ - `cutout-item-images-preprocessing` - Cutout image processing
376
+ - `visual-search-preprocessing` - Visual search data prep
377
+ - `item-image-outfit-detection-preprocessing` - Outfit detection preprocessing
378
+ - `item-object-detection` - Object detection preprocessing
379
+
380
+ ### Encoding/Export Strategies
381
+ - `items-encoding`, `item-cutout-image-encoding` - Item encoding
382
+ - `item-encodings-updater` - Encoding updates
383
+ - `item-encoding-export` - Encoding export
384
+ - `item-ids-diff-dumper` - Item ID diff dumping
385
+ - `ftp-exporter`, `fhr-exporter` - Model exporters
386
+ - `aleph-product-mapper`, `aleph-analytic-incremental-mapper` - Aleph mapping
387
+ - `aleph-product-feed-download`, `aleph-fhr-analytic-feed-download` - Data download
388
+
389
+ ### Scoring/Recommendation Strategies
390
+ - `most_popular_scorer`, `most_trendy_scorer` - Scoring (underscore, legacy)
391
+ - `scored-graph`, `unscored-graph-format-1`, `unscored-graph-format-2` - Graph formats
392
+ - `shop-the-look-recommendation` - STL recommendation
393
+ - `recommendation-enrichment` - Enrichment
394
+ - `items-tagging`, `tagging`, `image-cutout-image-tagging` - Tagging
395
+ - `bounding-box-computation` - YOLO bounding boxes
396
+ - `outfit-image-enrichment` - Outfit image enrichment
397
+ - `item-segmentation` - Item segmentation
398
+ - `gcs_activities_copy` - GCS activity copying (underscore, legacy)
399
+
400
+ ## Service Model Name Registry {#service-model-names}
401
+
402
+ Pattern: `{DOMAIN}_{TYPE}_SERVICE_MODEL_NAME = "{type}-service-model"`
403
+
404
+ | Constant | Value |
405
+ |---|---|
406
+ | `SEARCH_ITEM_ENCODING_SERVICE_MODEL_NAME` | `"item-encoding-service-model"` |
407
+ | `SEARCH_QUERY_ENCODING_SERVICE_MODEL_NAME` | `"query-encoding-service-model"` |
408
+ | `FM_USER_ENCODING_SERVICE_MODEL_NAME` | `"user-encoding-service-model"` |
409
+ | `ITEM_IMAGE_ENCODING_SERVICE_MODEL_NAME` | `"image-encoding-service-model"` |
410
+ | `GPT_ITEM_ENCODING_SERVICE_MODEL_NAME` | `"item-encoding-service-model"` |
411
+ | `GPT_USER_ENCODING_SERVICE_MODEL_NAME` | `"user-encoding-service-model"` |
412
+ | `SEMANTIC_SEARCH_MODEL_NAME` | `"semantic-search-model-onnx"` |
413
+
414
+ Note: Some service model names are domain-prefixed (`SEARCH_*`, `FM_*`, `GPT_*`) to disambiguate when the same model type serves different algorithms.
415
+
416
+ ## KPI Template Names {#kpi-templates}
417
+
418
+ Pattern: `*_KPI_TEMPLATE_NAME = "batch-*-kpi"`
419
+
420
+ | Constant | Value |
421
+ |---|---|
422
+ | `TENANT_ATTRIBUTION_KPI_TEMPLATE_NAME` | `"batch-tenant-attribution-kpi"` |
423
+ | `WIDGET_ATTRIBUTION_KPI_TEMPLATE_NAME` | `"batch-widget-attribution-kpi"` |
424
+ | `WIDGET_ABTEST_ATTRIBUTION_KPI_TEMPLATE_NAME` | `"batch-widget-abtest-attribution-kpi"` |
425
+ | `STRATEGY_ABTEST_ATTRIBUTION_KPI_TEMPLATE_NAME` | `"batch-strategy-abtest-attribution-kpi"` |
426
+ | `ITEMS_ACTIVITIES_HISTOGRAM_KPI_TEMPLATE_NAME` | `"batch-items-activities-histogram-kpi"` |
427
+ | `ITEMS_ACTIVITIES_STATS_KPI_TEMPLATE_NAME` | `"batch-items-activities-stats-kpi"` |
428
+ | `ITEMS_WITH_MOST_ACTIVITIES_KPI_TEMPLATE_NAME` | `"batch-items-with-most-activities-kpi"` |
429
+ | `SESSIONS_TEMPLATE_NAME` | `"batch-sessions"` |
430
+ | `ITEM_TAGGING_TEMPLATE_NAME` | `"batch-item-tagging"` |
431
+
432
+ ## Launcher Class Naming {#launcher-classes}
433
+
434
+ Scala/Java launcher classes follow `earlybirds.algo.{domain}.batch.{Domain}BatchLauncher`:
435
+
436
+ ```
437
+ earlybirds.algo.als.batch.AlsBatchLauncher
438
+ earlybirds.algo.graph.batch.GraphBatchLauncher
439
+ earlybirds.algo.nlp.batch.NlpDataprocBatchLauncher
440
+ earlybirds.algo.search.batch.SearchDataprocBatchLauncher
441
+ earlybirds.algo.gpt.batch.GPTDataprocBatchLauncher
442
+ earlybirds.algo.item.batch.ItemBatchLauncher
443
+ earlybirds.algo.tagging.batch.TaggingSimpleBatchLauncher
444
+ earlybirds.algo.gpt.batch.GPTSimpleBatchLauncher
445
+ earlybirds.model.updater.EncodingsUpdaterLauncher
446
+ earlybirds.algo.evaluation.preprocessing.EvaluationPreprocessingLauncher
447
+ ```