PyPI - txt2stix - Versions diffs - 1.0.1.post1__tar.gz → 1.0.2__tar.gz - Mend

txt2stix 1.0.1.post1tar.gz → 1.0.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (260) hide show

{txt2stix-1.0.1.post1 → txt2stix-1.0.2}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: txt2stix
-Version: 1.0.1.post1
+Version: 1.0.2
 Summary: txt2stix is a Python script that is designed to identify and extract IoCs and TTPs from text files, identify the relationships between them, convert them to STIX 2.1 objects, and output as a STIX 2.1 bundle.
 Project-URL: Homepage, https://github.com/muchdogesec/txt2stix
 Project-URL: Issues, https://github.com/muchdogesec/txt2stix/issues
@@ -26,15 +26,19 @@ Requires-Dist: stix2extensions
 Requires-Dist: tld>=0.13
 Requires-Dist: tldextract>=5.1.2
 Requires-Dist: validators>=0.28.3
-Provides-Extra: full
-Requires-Dist: llama-index-llms-anthropic>=0.7.2; extra == 'full'
-Requires-Dist: llama-index-llms-deepseek>=0.1.2; extra == 'full'
-Requires-Dist: llama-index-llms-gemini>=0.5.0; extra == 'full'
-Requires-Dist: llama-index-llms-openrouter>=0.3.2; extra == 'full'
+Provides-Extra: anthropic
+Requires-Dist: llama-index-llms-anthropic>=0.7.2; extra == 'anthropic'
+Provides-Extra: deepseek
+Requires-Dist: llama-index-llms-deepseek>=0.1.2; extra == 'deepseek'
+Provides-Extra: gemini
+Requires-Dist: llama-index-llms-gemini>=0.5.0; extra == 'gemini'
+Provides-Extra: openrouter
+Requires-Dist: llama-index-llms-openrouter>=0.3.2; extra == 'openrouter'
 Provides-Extra: tests
 Requires-Dist: pytest; extra == 'tests'
 Requires-Dist: pytest-cov; extra == 'tests'
 Requires-Dist: pytest-subtests; extra == 'tests'
+Requires-Dist: python-dateutil; extra == 'tests'
 Requires-Dist: requests; extra == 'tests'
 Description-Content-Type: text/markdown
@@ -86,7 +90,13 @@ cd txt2stix
 python3 -m venv txt2stix-venv
 source txt2stix-venv/bin/activate
 # install requirements
-pip3 install .
+pip3 install txt2stix
+```
+Note, by default txt2stix will install OpenAI to use as the AI provider. You can also use Anthropic, Gemini, OpenRouter or Deepseek. You need to install these manually if you plan to use them as follows (remove those that don't apply)
+```shell
+pip3 install txt2stix[deepseek,gemini,anthropic,openrouter]
 ```
 ### Set variables
@@ -114,39 +124,39 @@ The following arguments are available:
 #### Input settings
-* `--input_file` (REQUIRED): the file to be converted. Must be `.txt`
+* `--input_file` (`path/to/file.txt`, required): the file to be converted. Must be `.txt`
 #### STIX Report generation settings
-* `--name` (REQUIRED): name of file, max 72 chars. Will be used in the STIX Report Object created.
-* `--report_id` (OPTIONAL): Sometimes it is required to control the id of the `report` object generated. You can therefore pass a valid UUIDv4 in this field to be assigned to the report. e.g. passing `2611965-930e-43db-8b95-30a1e119d7e2` would create a STIX object id `report--2611965-930e-43db-8b95-30a1e119d7e2`. If this argument is not passed, the UUID will be randomly generated.
-* `--tlp_level` (OPTIONAL): Options are `clear`, `green`, `amber`, `amber_strict`, `red`. Default if not passed, is `clear`.
-* `--confidence` (OPTIONAL): value between 0-100. Default if not passed is null.
+* `--name` (text, required): name of file, max 72 chars. Will be used in the STIX Report Object created.
+* `--report_id` (UUIDv4, default is random UUIDv4): Sometimes it is required to control the id of the `report` object generated. You can therefore pass a valid UUIDv4 in this field to be assigned to the report. e.g. passing `2611965-930e-43db-8b95-30a1e119d7e2` would create a STIX object id `report--2611965-930e-43db-8b95-30a1e119d7e2`. If this argument is not passed, the UUID will be randomly generated.
+* `--tlp_level` (dictionary, default, `clear`): Options are `clear`, `green`, `amber`, `amber_strict`, `red`.
+* `--confidence` (value between 0-100): If not passed, report will be assigned no confidence score value
 * `--labels` (OPTIONAL): comma seperated list of labels. Case-insensitive (will all be converted to lower-case). Allowed `a-z`, `0-9`. e.g.`label1,label2` would create 2 labels.
-* `--created` (OPTIONAL): by default all object `created` times will take the time the script was run. If you want to explicitly set these times you can do so using this flag. Pass the value in the format `YYYY-MM-DDTHH:MM:SS.sssZ` e.g. `2020-01-01T00:00:00.000Z`
-* `--use_identity` (OPTIONAL): can pass a full STIX 2.1 identity object (make sure to properly escape). Will be validated by the STIX2 library.
+* `--created` (datetime, optional): by default all object `created` times will take the time the script was run. If you want to explicitly set these times you can do so using this flag. Pass the value in the format `YYYY-MM-DDTHH:MM:SS.sssZ` e.g. `2020-01-01T00:00:00.000Z`
+* `--use_identity` (stix identity, optional, default txt2stix identity): can pass a full STIX 2.1 identity object (make sure to properly escape). Will be validated by the STIX2 library.
 * `--external_refs` (OPTIONAL): txt2stix will automatically populate the `external_references` of the report object it creates for the input. You can use this value to add additional objects to `external_references`. Note, you can only add `source_name` and `external_id` values currently. Pass as `source_name=external_id`. e.g. `--external_refs txt2stix=demo1 source=id` would create the following objects under the `external_references` property: `{"source_name":"txt2stix","external_id":"demo1"},{"source_name":"source","external_id":"id"}`
 #### Output settings
 How the extractions are performed
-* `--use_extractions` (REQUIRED): if you only want to use certain extraction types, you can pass their slug found in either `includes/ai/config.yaml`, `includes/lookup/config.yaml` `includes/pattern/config.yaml` (e.g. `pattern_ipv4_address_only`). Default if not passed, no extractions applied. You can also pass a catch all wildcard `*` which will match all extraction paths (e.g. `'pattern_*'` would run all extractions starting with `pattern_` -- make sure to use quotes when using a wildcard)
+* `--use_extractions` (dictionary, required): if you only want to use certain extraction types, you can pass their slug found in either `includes/ai/config.yaml`, `includes/lookup/config.yaml` `includes/pattern/config.yaml` (e.g. `pattern_ipv4_address_only`). Default if not passed, no extractions applied. You can also pass a catch all wildcard `*` which will match all extraction paths (e.g. `'pattern_*'` would run all extractions starting with `pattern_` -- make sure to use quotes when using a wildcard)
 	* Important: if using any AI extractions (`ai_*`), you must set an AI API key in your `.env` file
 	* Important: if you are using any MITRE ATT&CK, CAPEC, CWE, ATLAS or Location extractions you must set `CTIBUTLER` or NVD CPE or CVE extractions you must set `VULMATCH` settings in your `.env` file
-* `--relationship_mode` (REQUIRED): either.
+* `--relationship_mode` (dictionary, required): either.
 	* `ai`: AI provider must be enabled. extractions performed by either regex or AI for extractions user selected. Rich relationships created from AI provider from extractions.
 	* `standard`: extractions performed by either regex or AI (AI provider must be enabled) for extractions user selected. Basic relationships created from extractions back to master Report object generated.
-* `--ignore_extraction_boundary` (OPTIONAL, default `false`, not compatible with AI extractions): in some cases the same string will create multiple extractions depending on extractions set (e.g. `https://www.google.com/file.txt` could create a url, url with file, domain, subdomain, and file). The default behaviour is for txt2stix to take the longest extraction and ignore everything else (e.g. only extract url with file, and ignore url, file, domain, subdomain, and file). If you want to override this behaviour and get all extractions in the output, set this flag to `true`.
-* `--ignore_image_refs` (default `true`): images references in documents don't usually need extracting. e.g. `<img src="https://example.com/image.png" alt="something">` you would not want domain or file extractions extracting `example.com` and `image.png`. Hence these are ignored by default (they are removed from text sent to extraction). Note, only the `img src` is ignored, all other values e.g. `alt` are considered. If you want extractions to consider this data, set it to `false`
-* `--ignore_link_refs` (default `true`): link references in documents don't usually need extracting e.g. `<a href="https://example.com/link.html" title="something">Bad Actor</a>` you would only want `Bad actor` to be considered for extraction. Hence these part of the link are ignored by default (they are removed from text sent to extraction). Note, only the `a href` is ignored, all other values e.g. `title` are considered. Setting this to `false` will also include everything inside the link tag (e.g. `example.com` would extract as a domain)
+* `--ignore_extraction_boundary` (boolean, default `false`, not compatible with AI extractions): in some cases the same string will create multiple extractions depending on extractions set (e.g. `https://www.google.com/file.txt` could create a url, url with file, domain, subdomain, and file). The default behaviour is for txt2stix to take the longest extraction and ignore everything else (e.g. only extract url with file, and ignore url, file, domain, subdomain, and file). If you want to override this behaviour and get all extractions in the output, set this flag to `true`.
+* `--ignore_image_refs` (boolean, default `true`): images references in documents don't usually need extracting. e.g. `<img src="https://example.com/image.png" alt="something">` you would not want domain or file extractions extracting `example.com` and `image.png`. Hence these are ignored by default (they are removed from text sent to extraction). Note, only the `img src` is ignored, all other values e.g. `alt` are considered. If you want extractions to consider this data, set it to `false`
+* `--ignore_link_refs` (boolean, default `true`): link references in documents don't usually need extracting e.g. `<a href="https://example.com/link.html" title="something">Bad Actor</a>` you would only want `Bad actor` to be considered for extraction. Hence these part of the link are ignored by default (they are removed from text sent to extraction). Note, only the `a href` is ignored, all other values e.g. `title` are considered. Setting this to `false` will also include everything inside the link tag (e.g. `example.com` would extract as a domain)
 #### AI settings
 If any AI extractions, or AI relationship mode is set, you must set the following accordingly
-* `--ai_settings_extractions`:
+* `--ai_settings_extractions` (`model:provider`, required if one or more AI extractions set):
 	* defines the `provider:model` to be used for extractions. You can supply more than one provider. Seperate with a space (e.g. `openrouter:openai/gpt-4o` `openrouter:deepseek/deepseek-chat`) If more than one provider passed, txt2stix will take extractions from all models, de-dupelicate them, and them package them in the output. Currently supports:
 		* Provider (env var required `OPENROUTER_API_KEY`): `openrouter:`, providers/models `openai/gpt-4o`, `deepseek/deepseek-chat` ([More here](https://openrouter.ai/models))
 		* Provider (env var required `OPENAI_API_KEY`): `openai:`, models e.g.: `gpt-4o`, `gpt-4o-mini`, `gpt-4-turbo`, `gpt-4` ([More here](https://platform.openai.com/docs/models))
@@ -154,11 +164,16 @@ If any AI extractions, or AI relationship mode is set, you must set the followin
 		* Provider (env var required `GOOGLE_API_KEY`): `gemini:models/`, models: `gemini-1.5-pro-latest`, `gemini-1.5-flash-latest` ([More here](https://ai.google.dev/gemini-api/docs/models/gemini))
 		* Provider (env var required `DEEPSEEK_API_KEY`): `deepseek:`, models `deepseek-chat` ([More here](https://api-docs.deepseek.com/quick_start/pricing))
 	* See `tests/manual-tests/cases-ai-extraction-type.md` for some examples
-* `--ai_settings_relationships`:
+* `--ai_settings_relationships` (`model:provider`, required if AI relationship mode set):
 	* similar to `ai_settings_extractions` but defines the model used to generate relationships. Only one model can be provided. Passed in same format as `ai_settings_extractions`
 	* See `tests/manual-tests/cases-ai-relationships.md` for some examples
-* `--ai_content_check_provider`: Passing this flag will get the AI to try and classify the text in the input to 1) determine if it is talking about threat intelligence, and 2) what type of threat intelligence it is talking about. For context, we use this to filter out non-threat intel posts in Obstracts and Stixify. You pass `provider:model` with this flag to determine the AI model you wish to use to perform the check.
-* `--ai_create_attack_flow`: passing this flag will also prompt the AI model (the same entered for `--ai_settings_relationships`) to generate an [Attack Flow](https://center-for-threat-informed-defense.github.io/attack-flow/) for the MITRE ATT&CK extractions to define the logical order in which they are being described. You must pass `--ai_settings_relationships` for this to work.
+#### Other AI related settings
+* `--ai_content_check_provider` (`model:provider`, required if passed): Passing this flag will get the AI to try and classify the text in the input to 1) determine if it is talking about threat intelligence, and 2) what type of threat intelligence it is talking about. For context, we use this to filter out non-threat intel posts in Obstracts and Stixify. You pass `provider:model` with this flag to determine the AI model you wish to use to perform the check. It will also create a summary of the content passed (and store this into a STIX Note).
+* `--ai_extract_if_no_incidence` (boolean, default `true`, will only work if `ai_content_check_provider` set) if content check decides the report is not related to cyber security intelligence (e.g. vendor marketing), then you can use this setting to decide wether or not script should proceed. Setting to `false` will stop processing. It is designed to save AI tokens processing unknown content at scale in an automated way.
+* `--ai_create_attack_flow` (boolean): passing this flag will also prompt the AI model (the same entered for `--ai_settings_relationships`, default `false`) to generate an [Attack Flow](https://center-for-threat-informed-defense.github.io/attack-flow/) for the MITRE ATT&CK extractions to define the logical order in which they are being described. You must pass `--ai_settings_relationships` for this to work.
+* `--ai_create_attack_navigator_layer` (boolean, default `false`): passing this flag will generate [MITRE ATT&CK Navigator layers](https://mitre-attack.github.io/attack-navigator/) for MITRE ATT&CK extractions. For each ATT&CK domain (Enterprise, ICS, Mobile) txt2stix will generate a layer. You must pass `--ai_settings_relationships` for this to work because the AI is tasked with linking extracted Techniques to the correct Tactic. Known issues with `openai:gpt-3.5` (avoid using this model if possible when using ATT&CK Navigator).
 ## Adding new extractions

{txt2stix-1.0.1.post1 → txt2stix-1.0.2}/README.md RENAMED Viewed

@@ -46,7 +46,13 @@ cd txt2stix
 python3 -m venv txt2stix-venv
 source txt2stix-venv/bin/activate
 # install requirements
-pip3 install .
+pip3 install txt2stix
+```
+Note, by default txt2stix will install OpenAI to use as the AI provider. You can also use Anthropic, Gemini, OpenRouter or Deepseek. You need to install these manually if you plan to use them as follows (remove those that don't apply)
+```shell
+pip3 install txt2stix[deepseek,gemini,anthropic,openrouter]
 ```
 ### Set variables
@@ -74,39 +80,39 @@ The following arguments are available:
 #### Input settings
-* `--input_file` (REQUIRED): the file to be converted. Must be `.txt`
+* `--input_file` (`path/to/file.txt`, required): the file to be converted. Must be `.txt`
 #### STIX Report generation settings
-* `--name` (REQUIRED): name of file, max 72 chars. Will be used in the STIX Report Object created.
-* `--report_id` (OPTIONAL): Sometimes it is required to control the id of the `report` object generated. You can therefore pass a valid UUIDv4 in this field to be assigned to the report. e.g. passing `2611965-930e-43db-8b95-30a1e119d7e2` would create a STIX object id `report--2611965-930e-43db-8b95-30a1e119d7e2`. If this argument is not passed, the UUID will be randomly generated.
-* `--tlp_level` (OPTIONAL): Options are `clear`, `green`, `amber`, `amber_strict`, `red`. Default if not passed, is `clear`.
-* `--confidence` (OPTIONAL): value between 0-100. Default if not passed is null.
+* `--name` (text, required): name of file, max 72 chars. Will be used in the STIX Report Object created.
+* `--report_id` (UUIDv4, default is random UUIDv4): Sometimes it is required to control the id of the `report` object generated. You can therefore pass a valid UUIDv4 in this field to be assigned to the report. e.g. passing `2611965-930e-43db-8b95-30a1e119d7e2` would create a STIX object id `report--2611965-930e-43db-8b95-30a1e119d7e2`. If this argument is not passed, the UUID will be randomly generated.
+* `--tlp_level` (dictionary, default, `clear`): Options are `clear`, `green`, `amber`, `amber_strict`, `red`.
+* `--confidence` (value between 0-100): If not passed, report will be assigned no confidence score value
 * `--labels` (OPTIONAL): comma seperated list of labels. Case-insensitive (will all be converted to lower-case). Allowed `a-z`, `0-9`. e.g.`label1,label2` would create 2 labels.
-* `--created` (OPTIONAL): by default all object `created` times will take the time the script was run. If you want to explicitly set these times you can do so using this flag. Pass the value in the format `YYYY-MM-DDTHH:MM:SS.sssZ` e.g. `2020-01-01T00:00:00.000Z`
-* `--use_identity` (OPTIONAL): can pass a full STIX 2.1 identity object (make sure to properly escape). Will be validated by the STIX2 library.
+* `--created` (datetime, optional): by default all object `created` times will take the time the script was run. If you want to explicitly set these times you can do so using this flag. Pass the value in the format `YYYY-MM-DDTHH:MM:SS.sssZ` e.g. `2020-01-01T00:00:00.000Z`
+* `--use_identity` (stix identity, optional, default txt2stix identity): can pass a full STIX 2.1 identity object (make sure to properly escape). Will be validated by the STIX2 library.
 * `--external_refs` (OPTIONAL): txt2stix will automatically populate the `external_references` of the report object it creates for the input. You can use this value to add additional objects to `external_references`. Note, you can only add `source_name` and `external_id` values currently. Pass as `source_name=external_id`. e.g. `--external_refs txt2stix=demo1 source=id` would create the following objects under the `external_references` property: `{"source_name":"txt2stix","external_id":"demo1"},{"source_name":"source","external_id":"id"}`
 #### Output settings
 How the extractions are performed
-* `--use_extractions` (REQUIRED): if you only want to use certain extraction types, you can pass their slug found in either `includes/ai/config.yaml`, `includes/lookup/config.yaml` `includes/pattern/config.yaml` (e.g. `pattern_ipv4_address_only`). Default if not passed, no extractions applied. You can also pass a catch all wildcard `*` which will match all extraction paths (e.g. `'pattern_*'` would run all extractions starting with `pattern_` -- make sure to use quotes when using a wildcard)
+* `--use_extractions` (dictionary, required): if you only want to use certain extraction types, you can pass their slug found in either `includes/ai/config.yaml`, `includes/lookup/config.yaml` `includes/pattern/config.yaml` (e.g. `pattern_ipv4_address_only`). Default if not passed, no extractions applied. You can also pass a catch all wildcard `*` which will match all extraction paths (e.g. `'pattern_*'` would run all extractions starting with `pattern_` -- make sure to use quotes when using a wildcard)
 	* Important: if using any AI extractions (`ai_*`), you must set an AI API key in your `.env` file
 	* Important: if you are using any MITRE ATT&CK, CAPEC, CWE, ATLAS or Location extractions you must set `CTIBUTLER` or NVD CPE or CVE extractions you must set `VULMATCH` settings in your `.env` file
-* `--relationship_mode` (REQUIRED): either.
+* `--relationship_mode` (dictionary, required): either.
 	* `ai`: AI provider must be enabled. extractions performed by either regex or AI for extractions user selected. Rich relationships created from AI provider from extractions.
 	* `standard`: extractions performed by either regex or AI (AI provider must be enabled) for extractions user selected. Basic relationships created from extractions back to master Report object generated.
-* `--ignore_extraction_boundary` (OPTIONAL, default `false`, not compatible with AI extractions): in some cases the same string will create multiple extractions depending on extractions set (e.g. `https://www.google.com/file.txt` could create a url, url with file, domain, subdomain, and file). The default behaviour is for txt2stix to take the longest extraction and ignore everything else (e.g. only extract url with file, and ignore url, file, domain, subdomain, and file). If you want to override this behaviour and get all extractions in the output, set this flag to `true`.
-* `--ignore_image_refs` (default `true`): images references in documents don't usually need extracting. e.g. `<img src="https://example.com/image.png" alt="something">` you would not want domain or file extractions extracting `example.com` and `image.png`. Hence these are ignored by default (they are removed from text sent to extraction). Note, only the `img src` is ignored, all other values e.g. `alt` are considered. If you want extractions to consider this data, set it to `false`
-* `--ignore_link_refs` (default `true`): link references in documents don't usually need extracting e.g. `<a href="https://example.com/link.html" title="something">Bad Actor</a>` you would only want `Bad actor` to be considered for extraction. Hence these part of the link are ignored by default (they are removed from text sent to extraction). Note, only the `a href` is ignored, all other values e.g. `title` are considered. Setting this to `false` will also include everything inside the link tag (e.g. `example.com` would extract as a domain)
+* `--ignore_extraction_boundary` (boolean, default `false`, not compatible with AI extractions): in some cases the same string will create multiple extractions depending on extractions set (e.g. `https://www.google.com/file.txt` could create a url, url with file, domain, subdomain, and file). The default behaviour is for txt2stix to take the longest extraction and ignore everything else (e.g. only extract url with file, and ignore url, file, domain, subdomain, and file). If you want to override this behaviour and get all extractions in the output, set this flag to `true`.
+* `--ignore_image_refs` (boolean, default `true`): images references in documents don't usually need extracting. e.g. `<img src="https://example.com/image.png" alt="something">` you would not want domain or file extractions extracting `example.com` and `image.png`. Hence these are ignored by default (they are removed from text sent to extraction). Note, only the `img src` is ignored, all other values e.g. `alt` are considered. If you want extractions to consider this data, set it to `false`
+* `--ignore_link_refs` (boolean, default `true`): link references in documents don't usually need extracting e.g. `<a href="https://example.com/link.html" title="something">Bad Actor</a>` you would only want `Bad actor` to be considered for extraction. Hence these part of the link are ignored by default (they are removed from text sent to extraction). Note, only the `a href` is ignored, all other values e.g. `title` are considered. Setting this to `false` will also include everything inside the link tag (e.g. `example.com` would extract as a domain)
 #### AI settings
 If any AI extractions, or AI relationship mode is set, you must set the following accordingly
-* `--ai_settings_extractions`:
+* `--ai_settings_extractions` (`model:provider`, required if one or more AI extractions set):
 	* defines the `provider:model` to be used for extractions. You can supply more than one provider. Seperate with a space (e.g. `openrouter:openai/gpt-4o` `openrouter:deepseek/deepseek-chat`) If more than one provider passed, txt2stix will take extractions from all models, de-dupelicate them, and them package them in the output. Currently supports:
 		* Provider (env var required `OPENROUTER_API_KEY`): `openrouter:`, providers/models `openai/gpt-4o`, `deepseek/deepseek-chat` ([More here](https://openrouter.ai/models))
 		* Provider (env var required `OPENAI_API_KEY`): `openai:`, models e.g.: `gpt-4o`, `gpt-4o-mini`, `gpt-4-turbo`, `gpt-4` ([More here](https://platform.openai.com/docs/models))
@@ -114,11 +120,16 @@ If any AI extractions, or AI relationship mode is set, you must set the followin
 		* Provider (env var required `GOOGLE_API_KEY`): `gemini:models/`, models: `gemini-1.5-pro-latest`, `gemini-1.5-flash-latest` ([More here](https://ai.google.dev/gemini-api/docs/models/gemini))
 		* Provider (env var required `DEEPSEEK_API_KEY`): `deepseek:`, models `deepseek-chat` ([More here](https://api-docs.deepseek.com/quick_start/pricing))
 	* See `tests/manual-tests/cases-ai-extraction-type.md` for some examples
-* `--ai_settings_relationships`:
+* `--ai_settings_relationships` (`model:provider`, required if AI relationship mode set):
 	* similar to `ai_settings_extractions` but defines the model used to generate relationships. Only one model can be provided. Passed in same format as `ai_settings_extractions`
 	* See `tests/manual-tests/cases-ai-relationships.md` for some examples
-* `--ai_content_check_provider`: Passing this flag will get the AI to try and classify the text in the input to 1) determine if it is talking about threat intelligence, and 2) what type of threat intelligence it is talking about. For context, we use this to filter out non-threat intel posts in Obstracts and Stixify. You pass `provider:model` with this flag to determine the AI model you wish to use to perform the check.
-* `--ai_create_attack_flow`: passing this flag will also prompt the AI model (the same entered for `--ai_settings_relationships`) to generate an [Attack Flow](https://center-for-threat-informed-defense.github.io/attack-flow/) for the MITRE ATT&CK extractions to define the logical order in which they are being described. You must pass `--ai_settings_relationships` for this to work.
+#### Other AI related settings
+* `--ai_content_check_provider` (`model:provider`, required if passed): Passing this flag will get the AI to try and classify the text in the input to 1) determine if it is talking about threat intelligence, and 2) what type of threat intelligence it is talking about. For context, we use this to filter out non-threat intel posts in Obstracts and Stixify. You pass `provider:model` with this flag to determine the AI model you wish to use to perform the check. It will also create a summary of the content passed (and store this into a STIX Note).
+* `--ai_extract_if_no_incidence` (boolean, default `true`, will only work if `ai_content_check_provider` set) if content check decides the report is not related to cyber security intelligence (e.g. vendor marketing), then you can use this setting to decide wether or not script should proceed. Setting to `false` will stop processing. It is designed to save AI tokens processing unknown content at scale in an automated way.
+* `--ai_create_attack_flow` (boolean): passing this flag will also prompt the AI model (the same entered for `--ai_settings_relationships`, default `false`) to generate an [Attack Flow](https://center-for-threat-informed-defense.github.io/attack-flow/) for the MITRE ATT&CK extractions to define the logical order in which they are being described. You must pass `--ai_settings_relationships` for this to work.
+* `--ai_create_attack_navigator_layer` (boolean, default `false`): passing this flag will generate [MITRE ATT&CK Navigator layers](https://mitre-attack.github.io/attack-navigator/) for MITRE ATT&CK extractions. For each ATT&CK domain (Enterprise, ICS, Mobile) txt2stix will generate a layer. You must pass `--ai_settings_relationships` for this to work because the AI is tasked with linking extracted Techniques to the correct Tactic. Known issues with `openai:gpt-3.5` (avoid using this model if possible when using ATT&CK Navigator).
 ## Adding new extractions

{txt2stix-1.0.1.post1 → txt2stix-1.0.2}/includes/lookups/_generate_lookups.py RENAMED Viewed

@@ -1,3 +1,5 @@
+## IMPORTANT: if using CTI Butler database locally in arangodb (i.e is not app.ctibutler.com in .env) you need to follow these steps to import the data needed to populate these lookups: https://github.com/muchdogesec/stix2arango/blob/main/utilities/arango_cti_processor/README.md (use `--database ctibutler_database` in the s2a script or change it in this script)
 import os
 from arango import ArangoClient

{txt2stix-1.0.1.post1 → txt2stix-1.0.2}/pyproject.toml RENAMED Viewed

@@ -4,13 +4,9 @@ build-backend = "hatchling.build"
 [project]
 name = "txt2stix"
-version = "1.0.1-1"
-authors = [
-  { name = "dogesec" }
-]
-maintainers = [
-  { name = "dogesec" }
-]
+version = "1.0.2"
+authors = [{ name = "dogesec" }]
+maintainers = [{ name = "dogesec" }]
 description = "txt2stix is a Python script that is designed to identify and extract IoCs and TTPs from text files, identify the relationships between them, convert them to STIX 2.1 objects, and output as a STIX 2.1 bundle."
 readme = "README.md"
 requires-python = ">=3.9"
@@ -21,7 +17,6 @@ classifiers = [
 ]
 dependencies = [
   "pathvalidate>=3.2.0",
   "phonenumbers>=8.13.39",
@@ -55,15 +50,8 @@ stix2arango = "txt2stix.txt2stix:main"
 "includes" = "txt2stix/includes"
 [project.optional-dependencies]
-full = [
-  'llama-index-llms-anthropic>=0.7.2',
-  'llama-index-llms-gemini>=0.5.0',
-  'llama-index-llms-deepseek>=0.1.2',
-  'llama-index-llms-openrouter>=0.3.2',
-]
-tests = [
-"pytest",
-"requests",
-"pytest-subtests",
-"pytest-cov",
-]
+anthropic = ['llama-index-llms-anthropic>=0.7.2']
+gemini = ['llama-index-llms-gemini>=0.5.0']
+deepseek = ['llama-index-llms-deepseek>=0.1.2']
+openrouter = ['llama-index-llms-openrouter>=0.3.2']
+tests = ["pytest", "requests", "pytest-subtests", "pytest-cov", "python-dateutil"]

{txt2stix-1.0.1.post1 → txt2stix-1.0.2}/requirements.txt RENAMED Viewed

@@ -6,7 +6,7 @@
 #
 aiohappyeyeballs==2.6.1
     # via aiohttp
-aiohttp==3.12.13
+aiohttp==3.12.14
     # via llama-index-core
 aiosignal==1.4.0
     # via aiohttp

txt2stix-1.0.2/tests/data/manually_generated_reports/attack_navigator_demo.txt ADDED Viewed

@@ -0,0 +1,9 @@
+Enterprise
+T1595 is used during TA0043
+T1587.001 is then used to T1587.
+Mobile
+T1451 is used for TA0027 to achieve T1662

txt2stix-1.0.2/tests/data/manually_generated_reports/not_security_content.txt ADDED Viewed

	@@ -0,0 +1 @@
1	+ this is not security content

{txt2stix-1.0.1.post1 → txt2stix-1.0.2}/tests/manual-tests/cases-standard-tests.md RENAMED Viewed

@@ -417,6 +417,41 @@ python3 txt2stix.py \
     --report_id 4fa18f2d-278b-4fd4-8470-62a8807d35ad
 ```
+The following should not be passed to AI (not security content)
+```shell
+python3 txt2stix.py \
+    --relationship_mode standard \
+    --input_file tests/data/manually_generated_reports/not_security_content.txt \
+    --name 'Test AI Content check failure' \
+    --tlp_level clear \
+    --confidence 100 \
+	--use_extractions ai_ipv4_address_only \
+	--ai_settings_extractions openai:gpt-4o \
+    --ai_content_check_provider openai:gpt-4o \
+    --ai_extract_if_no_incidence false \
+    --report_id ed6039d6-699c-44f0-9bf0-957d4d0ff99f
+```
+ Will pass but still process, as `ai_content_check_provider` is omitted
+```shell
+python3 txt2stix.py \
+    --relationship_mode standard \
+    --input_file tests/data/extraction_types/all_cases.txt \
+    --name 'Test AI Content check failure' \
+    --tlp_level clear \
+    --confidence 100 \
+	--use_extractions ai_ipv4_address_only \
+	--ai_settings_extractions openai:gpt-4o \
+    --tlp_level clear \
+    --confidence 100 \
+	--use_extractions ai_ipv4_address_only \
+	--ai_settings_extractions openai:gpt-4o \
+	--ai_extract_if_no_incidence false \
+    --report_id 2880d1c1-0211-45b6-8565-befe596ff81f
+```
 ### attack flow demo
 no indicators
@@ -449,4 +484,42 @@ python3 txt2stix.py \
     --ai_settings_extractions openai:gpt-4o \
     --ai_create_attack_flow \
     --report_id 3b160a8d-12dd-4e7c-aee8-5af6e371b425
+```
+### attack navigator demo
+```shell
+python3 txt2stix.py \
+    --relationship_mode ai \
+    --ai_settings_relationships openai:gpt-4o \
+    --input_file tests/data/manually_generated_reports/attack_navigator_demo.txt \
+    --name 'Test MITRE ATT&CK Navigator' \
+    --tlp_level clear \
+    --confidence 100 \
+    --use_extractions 'ai_mitre_attack_*' \
+    --ai_settings_extractions openai:gpt-4o \
+    --ai_create_attack_navigator_layer \
+    --ai_content_check_provider openai:gpt-4o \
+    --report_id b599f044-f22c-4e38-a2ed-3ef43442ccd2
+```
+`ai_content_check_provider` checked to ensure summary is used as description
+### attack navigator and attack flow
+used to check prompts only sent once
+```shell
+python3 txt2stix.py \
+    --relationship_mode ai \
+    --ai_settings_relationships openai:gpt-4o \
+    --input_file tests/data/manually_generated_reports/attack_navigator_demo.txt \
+    --name 'Test MITRE ATT&CK Flow and Navigator' \
+    --tlp_level clear \
+    --confidence 100 \
+    --use_extractions 'ai_mitre_attack_enterprise' \
+    --ai_settings_extractions openai:gpt-4o \
+    --ai_create_attack_flow \
+    --ai_create_attack_navigator_layer \
+    --report_id c0d48262-1d9f-42d2-aa29-f0cba1bfa2e0
 ```

txt2stix 1.0.1.post1__tar.gz → 1.0.2__tar.gz

txt2stix 1.0.1.post1tar.gz → 1.0.2tar.gz