PyPI - tdsql-mcp - Versions diffs - 1.3.2__tar.gz → 1.3.4__tar.gz - Mend

tdsql-mcp 1.3.2tar.gz → 1.3.4tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (56) hide show

{tdsql_mcp-1.3.2 → tdsql_mcp-1.3.4}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: tdsql-mcp
-Version: 1.3.2
+Version: 1.3.4
 Summary: MCP server for Teradata Vantage — SQL execution and native analytics function reference for AI agents
 Project-URL: Homepage, https://github.com/ksturgeon-td/tdsql-mcp
 Project-URL: Repository, https://github.com/ksturgeon-td/tdsql-mcp

{tdsql_mcp-1.3.2 → tdsql_mcp-1.3.4}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 [project]
 name = "tdsql-mcp"
-version = "1.3.2"
+version = "1.3.4"
 description = "MCP server for Teradata Vantage — SQL execution and native analytics function reference for AI agents"
 readme = "README.md"
 requires-python = ">=3.10"

{tdsql_mcp-1.3.2 → tdsql_mcp-1.3.4}/skills/teradata-sql-analytics/syntax/guidelines.md RENAMED Viewed

@@ -195,6 +195,31 @@ Native functions distribute across all AMPs. The result set returned to the agen
 | External TF-IDF | `TD_TFIDF` | `text-analytics` |
 | External word embeddings | `TD_WordEmbeddings` | `text-analytics` |
+### LLM-Powered Text Analytics (AI_* Functions)
+> **Prerequisites:** Requires an authorization object and LLM provider configuration before use. See `authorization-objects` and `llm-providers` topics. All functions use `TD_SYSFNLIB.<FunctionName>(ON ...)` — do **not** add a `PARTITION BY` clause.
+| Instead of this | Use this (native function) | Topic |
+|-----------------|---------------------------|-------|
+| External/post-hoc sentiment scoring | `AI_AnalyzeSentiment` | `ai-text-analytics` |
+| Per-row LLM API calls with question + context data | `AI_AskLLM` (two-table: InputTable + ContextTable, co-partitioned by key) | `ai-text-analytics` |
+| External language detection | `AI_DetectLanguage` | `ai-text-analytics` |
+| External key phrase extraction | `AI_ExtractKeyPhrases` | `ai-text-analytics` |
+| External PII masking | `AI_MaskPII` (detects PII + returns `Masked_Phrase` with `*` replacement) | `ai-text-analytics` |
+| External NER (general named entities — people, places, orgs, dates) | `AI_RecognizeEntities` | `ai-text-analytics` |
+| External PII entity detection (structured metadata, no masking) | `AI_RecognizePIIEntities` | `ai-text-analytics` |
+| External text classification with custom label set | `AI_TextClassifier` — supports single-label and multi-label | `ai-text-analytics` |
+| External summarization | `AI_TextSummarize` — supports 1–5 compression levels | `ai-text-analytics` |
+| External translation | `AI_TextTranslate` | `ai-text-analytics` |
+**NER/PII function selection:**
+| Need | Function |
+|------|----------|
+| General named entities (people, places, orgs, dates) | `AI_RecognizeEntities` |
+| PII detection + masked text output | `AI_MaskPII` |
+| PII detection + structured metadata only (no masking) | `AI_RecognizePIIEntities` |
 ### Vector Search
 | Instead of this | Use this (native function) | Topic |
@@ -211,6 +236,18 @@ Native functions distribute across all AMPs. The result set returned to the agen
 **Finding the embedding model for an existing corpus:** Query `TD_SYSAI.TD_CollectionsV` or `TD_SYSAI.TD_VectorStores` to discover the model name, provider, and embedding size used to build a corpus. The query pipeline must use the exact same model — mismatched embeddings produce meaningless scores. See `vector-search` topic, "Discovering Existing Vector Stores" section.
+### Embeddings
+| Scenario | Use this | Topic |
+|----------|---------|-------|
+| REST-based embedding API (Azure, AWS Bedrock, GCP, NVIDIA NIM, LiteLLM) | `AI_TextEmbeddings` with `OutputFormat('VECTOR')` | `embeddings` |
+| In-database inference — no external API (air-gapped or latency-sensitive) | `ONNXEmbeddings` — model stored as BLOB in Vantage; requires tokenizer table | `embeddings`, `byom-model-loading` |
+| Classical word/document embeddings (GloVe-style) | `TD_WordEmbeddings` | `text-analytics` |
+| Store embeddings for reuse | CTAS with `VECTOR` column, then `TD_VectorNormalize(Approach('UNITVECTOR'))` at storage time | `embeddings`, `vector-search` |
+| Build fast approximate search index | `TD_HNSW` on normalized VECTOR column | `vector-search` |
+> **Always use `OutputFormat('VECTOR')`** when embeddings will be stored, normalized, or used with `TD_VectorDistance` / `TD_HNSW` / `TD_HNSWPredict`. The `VECTOR` type integrates directly with all vector search functions. See `data-types-casting` for VECTOR sizing (bytes, not dimensions).
 ### JSON Data
 | Operation | Use this | Topic |
@@ -236,6 +273,23 @@ Native functions distribute across all AMPs. The result set returned to the agen
 | Vantage ARRAY → JSON | `ARRAY_TO_JSON(arr_col)` | `json-functions` |
 | ST_Geometry ↔ GeoJSON | `GeoJSONFromGeom(geom)` / `GeomFromGeoJSON(json, srid)` | `json-functions` |
+### BYOM — Bring Your Own Model
+Apply externally trained models to in-database data without moving data out of Teradata. All scoring functions share the same two-table pattern: `InputTable` + `ModelTable DIMENSION`. See `byom-model-loading` topic for model ingestion; see `byom-scoring` for full syntax.
+| Instead of this | Use this (native function) | Topic |
+|-----------------|---------------------------|-------|
+| Running PMML model inference externally | `PMMLPredict` | `byom-scoring` |
+| Running H2O MOJO or Driverless AI model externally | `H2OPredict` (supports contributions, stage probabilities, leaf node assignments) | `byom-scoring` |
+| Running ONNX tabular model externally | `ONNXPredict` (use `ShowModelInputFieldsMap('true')` to inspect tensor mapping) | `byom-scoring` |
+| Running Dataiku Thin JAR externally | `DataikuPredict` (model_id = fully qualified Java class name) | `byom-scoring` |
+| Running DataRobot Scoring Code externally | `DataRobotPredict` (cast DATE/TIMESTAMP to VARCHAR before scoring) | `byom-scoring` |
+| Running MLeap model externally | `MLeapPredict` | `byom-scoring` |
+| In-database text generation / seq-to-seq (translation, summarization) | `ONNXSeq2Seq` — ONNX transformer, no external API | `byom-scoring` |
+| In-database text classification (transformer) | `ONNXClassification` — supports softmax, argmax, custom output column mapping | `byom-scoring` |
+> **Architecture distinction:** `ONNXSeq2Seq` and `ONNXClassification` run Hugging Face ONNX transformer models entirely in-database. For REST-based LLM text tasks (sentiment, PII, translation, summarization), use the `AI_*` functions in `ai-text-analytics` instead.
 ### Statistical Testing
 | Instead of this | Use this (native function) | Topic |

{tdsql_mcp-1.3.2 → tdsql_mcp-1.3.4}/skills/teradata-sql-analytics/syntax/sql-basics.md RENAMED Viewed

@@ -140,15 +140,70 @@ CREATE USER myuser AS
 -- Semicolons: required in BTEQ; optional in most client tools
 ```
-## Reserved Words as Column Names in Table Operator Clauses
+## Reserved Words and Identifier Quoting
-When a column name is a Teradata reserved word (e.g. `type`, `date`, `time`, `value`, `name`, `format`, `title`), it must be double-quoted. In regular SQL projections this looks normal:
+Quote any identifier (column, table, alias) that conflicts with a reserved word using **double quotes**:
 ```sql
-SELECT "type", "date" FROM db.my_table;
+SELECT id, "type", "format" FROM db.events;
+CREATE TABLE db.events (
+    id       INTEGER,
+    "type"   VARCHAR(30),   -- reserved word — must quote
+    label    VARCHAR(100)
+) PRIMARY INDEX (id);
 ```
-In table operator string arguments (`ACCUMULATE`, `IDColumn`, `TargetColumns`, `Accumulate`, etc.), the double-quotes must be embedded **inside** the single-quoted string:
+> **Quoted identifiers are case-sensitive.** Use consistent casing — `"type"` and `"TYPE"` are different identifiers.
+### Teradata-Specific Reserved Words — Common Identifier Conflicts
+The ANSI SQL reserved words are well-known. The words below are **Teradata-only** — not in the ANSI SQL-99 standard — so agents may not recognize them as reserved. Always quote these when using them as column, table, or alias names.
+**Words that frequently appear as column or table names:**
+| Reserved word | Commonly appears as | TD since |
+|--------------|---------------------|----------|
+| `TYPE` | transaction type, event type, record type | V2R3 |
+| `FORMAT` | file format, output format, date format | V2R3 |
+| `TITLE` | document title; also controls the TD column display header | V2R3 |
+| `MODE` | processing mode, run mode, lock mode | V2R3 |
+| `ACCOUNT` | account_id, account-related tables | V2R3 |
+| `LOG` | log tables, audit logs, log level | V2R3 |
+| `LOCK` | lock status, concurrency tables | V2R3 |
+| `HASH` | hash keys, checksums, deduplication columns | V2R3 |
+| `REQUEST` | request_id, API and service event tables | V2R3 |
+| `STATISTICS` | monitoring tables, collected stats columns | V2R3 |
+| `JOURNAL` | financial journals, transaction audit logs | V2R3 |
+| `CLUSTER` | cluster_id, partition or segment label | V2R3 |
+| `NAMED` | TD column alias syntax (`expr (NAMED alias)`) — risky as a column name | V2R3 |
+| `ENABLED` / `DISABLED` | feature flag columns, configuration status | V2R3 |
+| `CLASS` | object class, classification, CSS class | V2R5 |
+| `PROFILE` | user profiles, configuration profiles | V2R5 |
+| `SUMMARY` | summary text columns, reporting tables | V2R5 |
+| `THRESHOLD` | alert thresholds, monitoring limit columns | V2R5 |
+| `TRACE` | trace_id, debug or telemetry columns | V2R5 |
+**Teradata SQL extension keywords — these are clause keywords, not identifiers:**
+| Keyword | Purpose |
+|---------|---------|
+| `QUALIFY` | Filters window function results — like WHERE for OVER clauses; not in ANSI SQL |
+| `SAMPLE` | Random row sampling: `SELECT * FROM t SAMPLE 100` or `SAMPLE .05` |
+| `VOLATILE` | Session-scoped temp table: `CREATE VOLATILE TABLE ...` |
+| `LOCKING` | Lock modifier: `LOCKING TABLE t FOR ACCESS SELECT ...` |
+| `REPLACE` | TD DDL: `REPLACE VIEW` — not `CREATE OR REPLACE` |
+| `EXPLAIN` | Execution plan: `EXPLAIN SELECT ...` |
+| `FALLBACK` | Table-level data protection option at `CREATE TABLE` time |
+| `MULTISET` | Table type allowing duplicate rows: `CREATE MULTISET TABLE ...` |
+| `MACRO` | Stored parameterized query: `CREATE MACRO ...` |
+| `COLLECT` | Statistics collection: `COLLECT STATISTICS ON db.t COLUMN (col)` |
+| `BT` / `ET` | Begin Transaction / End Transaction |
+| `SEL` | Teradata shorthand for `SELECT` |
+### Reserved Words in Table Operator String Arguments
+In table operator clauses that take column names as **string arguments** (`Accumulate`, `IDColumn`, `TargetColumns`, `ResponseColumn`, etc.), double-quotes must be embedded **inside** the single-quoted string:
 ```sql
 -- WRONG: 'type' is a reserved word — Teradata will reject or misparse this
@@ -158,19 +213,16 @@ USING IDColumn('id') Accumulate('type', 'value')
 USING IDColumn('id') Accumulate('"type"', '"value"')
 ```
-This applies to **any** table operator clause that takes column names as string arguments:
+This applies to any table operator clause that takes column names as string arguments:
 ```sql
--- All of these follow the same rule
 IDColumn('"type"')
 TargetColumns('"value"', '"date"', 'non_reserved_col')
 Accumulate('"type"', 'amount', '"date"')
 ResponseColumn('"value"')
 ```
-**When in doubt, quote it.** Double-quoting a non-reserved word in a string argument is harmless; leaving a reserved word unquoted will cause a parse error.
-Common Teradata reserved words that appear as column names: `type`, `date`, `time`, `timestamp`, `value`, `name`, `format`, `title`, `level`, `mode`, `status`, `class`, `key`, `index`, `year`, `month`, `day`, `hour`, `minute`, `second`.
+**When in doubt, quote it.** Double-quoting a non-reserved word inside a string argument is harmless; leaving a reserved word unquoted will cause a parse error.
 ## Teradata Operator Differences