npm - @intentsolutionsio/penetration-tester - Versions diffs - 2.0.0 → 3.0.4 - Mend

@intentsolutionsio/penetration-tester 2.0.0 → 3.0.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (112) hide show

package/skills/detecting-insecure-deserialization/SKILL.md ADDED Viewed

@@ -0,0 +1,148 @@
+---
+name: detecting-insecure-deserialization
+description: |
+  Scan a source tree for unsafe-by-default deserialization APIs:
+  Python pickle.loads / cPickle / shelve / dill, Ruby Marshal.load /
+  YAML.load (pre-3.1 default), Java ObjectInputStream.readObject,
+  PHP unserialize, .NET BinaryFormatter / NetDataContractSerializer,
+  Node.js node-serialize, JavaScript JSON.parse with reviver
+  containing eval.
+  Use when: pre-commit gate on services that accept binary blobs,
+  audit of legacy job-queue code (workers deserializing tasks),
+  post-bug-report when "we accept user-uploaded archives."
+  Threshold: any call to a known-unsafe deserialization API on
+  data that originates from user input, network, file upload,
+  or untrusted storage.
+  Trigger with: "scan deserialization", "pickle audit", "java
+  readObject scan", "yaml.load check".
+allowed-tools:
+  - Read
+  - Bash(python3:*)
+  - Glob
+  - Grep
+disallowed-tools:
+  - Bash(rm:*)
+  - Bash(curl:*)
+version: 3.0.0-dev
+author: Jeremy Longshore <jeremy@intentsolutions.io>
+license: MIT
+compatibility: Designed for Claude Code
+tags:
+  - security
+  - static-analysis
+  - deserialization
+  - pentest
+---
+# Detecting Insecure Deserialization
+## Overview
+Insecure deserialization (CWE-502, OWASP A08:2021) is the highest-
+severity injection class in many language stacks because it directly
+maps to RCE. Pickle, Java serialization, PHP unserialize, and
+BinaryFormatter all execute object-construction code during
+deserialization. If that code includes `__reduce__` /
+`readObject` / `__wakeup` / `OnDeserialization` callbacks that
+the attacker controls, the deserialization step IS code execution.
+Most legitimate use cases have safer alternatives (JSON for data,
+YAML with safe-load, Protocol Buffers, Avro). The remaining cases
+need explicit type allow-lists and HMAC-signed payloads.
+## When the skill produces findings
+| Finding | Severity | Threshold | Affected control |
+|---|---|---|---|
+| Python `pickle.loads(...)` | **CRITICAL** | always (untrusted input) | CWE-502 |
+| Python `pickle.load(file)` | **CRITICAL** | always | CWE-502 |
+| Python `dill.loads` | **CRITICAL** | always | CWE-502 |
+| Python `yaml.load(...)` without Loader= | **CRITICAL** | unsafe legacy default | CWE-502 |
+| Python `yaml.unsafe_load(...)` | **CRITICAL** | explicit unsafe | CWE-502 |
+| Python `shelve.open(...)` | **HIGH** | pickle-backed; user-controllable filename | CWE-502 |
+| Java `ObjectInputStream.readObject()` | **CRITICAL** | always | CWE-502 |
+| PHP `unserialize($input)` | **CRITICAL** | non-literal input | CWE-502 |
+| .NET `BinaryFormatter.Deserialize(...)` | **CRITICAL** | deprecated unsafe API | CWE-502 |
+| .NET `NetDataContractSerializer` | **CRITICAL** | also unsafe | CWE-502 |
+| .NET `LosFormatter.Deserialize` | **CRITICAL** | ViewState path | CWE-502 |
+| Ruby `Marshal.load(...)` | **CRITICAL** | non-literal | CWE-502 |
+| Ruby `YAML.load(...)` (pre-3.1 Psych) | **CRITICAL** | safe in Psych 4.0+; needs version check | CWE-502 |
+| Node.js `node-serialize.unserialize` | **CRITICAL** | known-vulnerable lib | CWE-502 |
+| Node.js `serialize-javascript` reviver | **HIGH** | if used to deserialize untrusted | CWE-502 |
+## Prerequisites
+- Python 3.9+
+- Source tree on local filesystem
+## Instructions
+### Run
+```bash
+python3 ${CLAUDE_PLUGIN_ROOT}/skills/detecting-insecure-deserialization/scripts/scan_deserialization.py /path/to/repo
+```
+Options same as previous skills: `--output`, `--format`,
+`--min-severity`, `--include-tests`, `--languages`.
+### Interpret
+CRITICAL across the board because these APIs grant RCE during
+deserialization if the input is attacker-controlled. The
+verification step is "can the input ever originate from
+untrusted source" — if yes, it's an immediate fix.
+### Remediation
+The fix depends on the data shape:
+- **Data is structured (JSON-shaped):** switch to `json.loads`.
+- **Data needs polymorphism / arbitrary types:** define a strict
+  schema (Pydantic / dataclasses / Protocol Buffers) and validate
+  on parse.
+- **Data must round-trip exact Python / Java / .NET objects:** use
+  HMAC-signed serialization with an explicit type allow-list.
+See `references/PLAYBOOK.md` for per-language migrations.
+## Examples
+### Worker-queue audit
+```bash
+python3 ${CLAUDE_PLUGIN_ROOT}/skills/detecting-insecure-deserialization/scripts/scan_deserialization.py \
+    /path/to/celery-workers --min-severity high
+```
+Celery defaults to pickle in older configurations; this finds the
+remaining unsafe-default callers.
+### CI
+```yaml
+- name: Deserialization scan
+  run: |
+    python3 plugins/security/penetration-tester/skills/detecting-insecure-deserialization/scripts/scan_deserialization.py \
+        . --min-severity high
+```
+## Output
+JSON / JSONL / Markdown. Exit codes: 0 / 1 / 2.
+## Error Handling
+Pickle / Marshal usage on a private cache file written by the same
+application is technically safe (the attacker can't influence the
+file contents). The scanner flags it as CRITICAL; verify by reading
+where the input file originates.
+## Resources
+- `references/THEORY.md` — Why deserialization is RCE, gadget chains,
+  HMAC-signing pattern, schema-validation alternatives
+- `references/PLAYBOOK.md` — Per-language migrations (Python pickle
+  → JSON / msgpack, yaml.load → yaml.safe_load, Java ObjectInputStream
+  → JSON via Jackson with allow-list, PHP unserialize → JSON
+  alternatives, .NET BinaryFormatter → System.Text.Json)

package/skills/detecting-insecure-deserialization/references/PLAYBOOK.md ADDED Viewed

@@ -0,0 +1,333 @@
+# Insecure-Deserialization Remediation Playbook
+The universal migration: switch from a behavioral format (pickle,
+Java serialization, PHP unserialize, BinaryFormatter) to a
+schema-validated structural format (JSON + Pydantic / dataclasses /
+Jackson with allow-list / System.Text.Json).
+## Python — pickle → JSON / msgpack / pydantic
+### Before (Celery task queue with pickle)
+```python
+# Celery worker
+CELERY_TASK_SERIALIZER = 'pickle'  # unsafe
+result = pickle.loads(task_payload)
+```
+### After
+```python
+# Celery configuration
+CELERY_TASK_SERIALIZER = 'json'
+CELERY_ACCEPT_CONTENT = ['json']
+```
+### Cache layer migration
+```python
+# Before
+import pickle
+data = pickle.loads(redis_client.get(key))
+# After (using msgpack for binary efficiency)
+import msgpack
+data = msgpack.unpackb(redis_client.get(key), raw=False)
+# Or JSON if size isn't critical
+import json
+data = json.loads(redis_client.get(key))
+```
+### Application-data migration
+```python
+# Before
+class Order:
+    def __init__(self, id, items, total): ...
+    # Stored via pickle.dumps
+# After (Pydantic with strict validation)
+from pydantic import BaseModel
+class Order(BaseModel):
+    id: int
+    items: list[str]
+    total: float
+# Serialize
+payload = order.model_dump_json()
+# Deserialize with validation
+restored = Order.model_validate_json(payload)
+```
+### YAML migration
+```python
+# Before
+import yaml
+data = yaml.load(file_content)  # UNSAFE
+# After
+data = yaml.safe_load(file_content)  # restricts to basic types
+```
+If you previously relied on `yaml.load` to instantiate Python
+classes from YAML, replace the YAML schema with explicit
+construction:
+```python
+# Before YAML
+# !MyClass
+# arg1: hello
+# arg2: 42
+# Before code
+import yaml
+obj = yaml.load(content)  # auto-constructs MyClass
+# After: data-only YAML + explicit constructor
+data = yaml.safe_load(content)  # returns plain dict
+obj = MyClass(arg1=data['arg1'], arg2=data['arg2'])
+```
+## Java — ObjectInputStream → Jackson with allow-list
+### Before
+```java
+ObjectInputStream ois = new ObjectInputStream(inputStream);
+MyObject obj = (MyObject) ois.readObject();
+```
+### After (Jackson, type-allow-listed)
+```java
+import com.fasterxml.jackson.databind.*;
+import com.fasterxml.jackson.databind.jsontype.*;
+ObjectMapper mapper = new ObjectMapper();
+mapper.activateDefaultTyping(
+    BasicPolymorphicTypeValidator.builder()
+        .allowIfSubType("com.example.")  // your package only
+        .build(),
+    ObjectMapper.DefaultTyping.NON_FINAL
+);
+MyObject obj = mapper.readValue(inputStream, MyObject.class);
+```
+The `BasicPolymorphicTypeValidator` restricts which classes
+Jackson will instantiate during deserialization. Without it,
+Jackson is roughly as exploitable as ObjectInputStream.
+### Legacy ObjectInputStream — minimal safety (last resort)
+If you can't migrate from ObjectInputStream immediately:
+```java
+public class AllowlistedObjectInputStream extends ObjectInputStream {
+    private static final Set<String> ALLOWED = Set.of(
+        "com.example.User",
+        "com.example.Order",
+        "java.lang.String",
+        "java.util.ArrayList"
+    );
+    public AllowlistedObjectInputStream(InputStream is) throws IOException {
+        super(is);
+    }
+    @Override
+    protected Class<?> resolveClass(ObjectStreamClass desc)
+            throws IOException, ClassNotFoundException {
+        if (!ALLOWED.contains(desc.getName())) {
+            throw new InvalidClassException(
+                "Unauthorized deserialization: " + desc.getName());
+        }
+        return super.resolveClass(desc);
+    }
+}
+```
+Use this as transitional protection only; migrate to a schema-
+based format ASAP.
+## PHP — unserialize → json_decode
+### Before
+```php
+$obj = unserialize($_POST['data']);
+```
+### After
+```php
+$obj = json_decode($_POST['data'], true);
+// Validate the shape:
+if (!is_array($obj) || !isset($obj['id'], $obj['name'])) {
+    throw new InvalidArgumentException("Invalid payload");
+}
+```
+### If you must keep unserialize: restrict allowed classes
+```php
+$obj = unserialize($data, [
+    'allowed_classes' => ['User', 'Order', 'OrderItem']
+]);
+```
+In PHP 7.1+, passing `'allowed_classes' => false` restricts to
+basic types only (still safer than the default).
+## .NET — BinaryFormatter → System.Text.Json
+### Before
+```csharp
+var formatter = new BinaryFormatter();
+var obj = (MyClass)formatter.Deserialize(stream);
+```
+### After (System.Text.Json)
+```csharp
+using System.Text.Json;
+var options = new JsonSerializerOptions {
+    PropertyNameCaseInsensitive = true,
+};
+MyClass obj = JsonSerializer.Deserialize<MyClass>(jsonString, options);
+```
+### For polymorphic types — explicit type discriminator
+```csharp
+[JsonDerivedType(typeof(User), "user")]
+[JsonDerivedType(typeof(Order), "order")]
+public abstract class Entity { }
+// Now the JSON looks like {"$type": "user", "id": 1, ...}
+// And only the registered subtypes can be instantiated
+```
+### Avoid:
+```csharp
+// NEVER do this — re-enables full polymorphic deserialization
+new JsonSerializerOptions {
+    TypeInfoResolver = new DefaultJsonTypeInfoResolver { ... }
+};
+```
+## Ruby — Marshal → JSON / safe YAML
+### Before
+```ruby
+obj = Marshal.load(data)
+```
+### After
+```ruby
+require 'json'
+data = JSON.parse(json_str, create_additions: false)  # create_additions: false REQUIRED
+```
+`create_additions: true` (the default in older Ruby) lets JSON
+instantiate arbitrary classes via the `json_create` hook. Set
+`create_additions: false` to disable.
+### YAML — Psych 4.0+ defaults safe; older versions need explicit
+```ruby
+# Psych 4.0+ (default since Ruby 3.1)
+require 'yaml'
+data = YAML.load(content)  # safe by default
+# Pre-4.0 explicit safety
+data = YAML.safe_load(content, permitted_classes: [Symbol, Date])
+```
+## Node.js — node-serialize and friends
+### Before (using the known-vulnerable node-serialize package)
+```javascript
+const serialize = require('node-serialize');
+const obj = serialize.unserialize(data);  // RCE
+```
+### After
+```javascript
+const obj = JSON.parse(data);
+// Validate shape with ajv or zod
+const schema = z.object({ id: z.number(), name: z.string() });
+const validated = schema.parse(obj);
+```
+### Avoid JSON.parse with reviver containing eval
+```javascript
+// BAD
+const obj = JSON.parse(data, (key, value) => eval(value));
+// GOOD
+const obj = JSON.parse(data);
+```
+## HMAC-signed serialization (when migration isn't feasible)
+If you must keep pickle / Marshal / unserialize / BinaryFormatter
+because of legacy storage that can't be re-encoded, wrap with HMAC
+authentication.
+```python
+import hmac, hashlib, pickle, os
+KEY = os.environ["SERIALIZATION_HMAC_KEY"].encode()
+def serialize_signed(obj):
+    payload = pickle.dumps(obj)
+    sig = hmac.new(KEY, payload, hashlib.sha256).digest()
+    return sig + payload
+def deserialize_signed(data):
+    sig, payload = data[:32], data[32:]
+    expected = hmac.new(KEY, payload, hashlib.sha256).digest()
+    if not hmac.compare_digest(sig, expected):
+        raise ValueError("HMAC verification failed")
+    return pickle.loads(payload)
+```
+The HMAC step proves the payload was created by code that holds
+the KEY. Any tampering invalidates the HMAC. Pickle.loads still
+runs, but only on payloads that you yourself signed.
+This is necessary infrastructure when migrating; it's not a
+permanent solution. The KEY is now a high-value target — leak the
+key and the deserialization is exploitable again.
+## CI integration
+```yaml
+- name: Insecure-deserialization scan
+  run: |
+    python3 plugins/security/penetration-tester/skills/detecting-insecure-deserialization/scripts/scan_deserialization.py \
+        . --min-severity high --format json --output deser-scan.json
+- run: |
+    if jq 'length > 0' deser-scan.json | grep -q true; then
+      echo "::error::Insecure deserialization detected"
+      exit 1
+    fi
+```
+## Verification after remediation
+```bash
+python3 ${CLAUDE_PLUGIN_ROOT}/skills/detecting-insecure-deserialization/scripts/scan_deserialization.py \
+    /path/to/repo --min-severity high
+```
+Expected: exit 0, zero findings.

package/skills/detecting-insecure-deserialization/references/THEORY.md ADDED Viewed

@@ -0,0 +1,199 @@
+# Insecure-Deserialization Theory
+## Why deserialization is RCE
+Most serialization formats are structural (JSON: numbers, strings,
+lists, dicts; XML: a tagged tree). They describe data shape, not
+behavior. Deserializing them is a parse step: validate, build value
+trees, return.
+A small subset of formats is BEHAVIORAL. Python pickle, Java
+serialization, PHP unserialize, .NET BinaryFormatter all store
+not just object state but the type information AND any
+deserialization callbacks the type registered. Re-creating the
+object during deserialization invokes those callbacks.
+The attack: craft a serialized payload that describes an object
+graph using types whose deserialization callbacks execute
+arbitrary code. The application calls `pickle.loads(payload)` and
+the constructor / `__reduce__` / `readObject` chain runs the
+attacker's code IN the application's process.
+This is not theoretical. There are public gadget-chain libraries
+for every language with behavioral deserialization:
+- Java: `ysoserial` (Spring, Commons Collections, Hibernate gadgets)
+- .NET: `ysoserial.net` (TypeConfuseDelegate, ObjectDataProvider)
+- Python: pickle is trivially exploitable via `__reduce__` returning `(eval, ('...code...',))`
+- PHP: PHPGGC (PHP Generic Gadget Chains library)
+- Ruby: Marshal exploitation via `_load` callbacks
+If your application accepts any of these formats from any source
+not under your direct control, you have a deserialization RCE
+unless and until you switch formats.
+## The "trusted source" trap
+A common defense: "we only deserialize from a trusted source —
+this database we control, this S3 bucket we own, etc."
+Two problems:
+1. **The "trusted source" is often less trusted than assumed.**
+   The database might be writable by other services. The S3 bucket
+   might be world-readable by accident (see skill #9). An attacker
+   who reaches the storage layer (via SQL injection, IAM
+   misconfiguration, supply-chain compromise) can plant a payload
+   that the deserializer later picks up.
+2. **The format is brittle even without attackers.** Pickle and
+   Java serialization are not stable across language version
+   changes; an upgrade can break deserialization of stored data.
+   Switching to schema-validated JSON / Protobuf eliminates both
+   the security and stability concerns.
+The pragmatic posture: assume any deserialization input could
+become attacker-controlled through some future change. Use safe
+formats by default.
+## Safe alternatives by use case
+### Use case: "I need to store structured data and reload it"
+→ JSON (with Pydantic, dataclasses, or msgspec for schema
+validation). Or Protocol Buffers / Avro / MessagePack for binary
+efficiency.
+### Use case: "I need to round-trip arbitrary Python / Java / .NET objects"
+→ Define a schema. If the data is genuinely polymorphic, use a
+tagged-union pattern with explicit discriminator field. Validate
+the discriminator against an allow-list before instantiating any
+class.
+```python
+# JSON with explicit type discriminator + allow-list
+TYPE_REGISTRY = {
+    "User": User,
+    "Order": Order,
+    "Payment": Payment,
+}
+def deserialize(data: dict):
+    type_name = data["__type"]
+    if type_name not in TYPE_REGISTRY:
+        raise ValueError(f"Unknown type: {type_name}")
+    cls = TYPE_REGISTRY[type_name]
+    return cls(**{k: v for k, v in data.items() if k != "__type"})
+```
+### Use case: "I have a binary cache file I trust"
+→ HMAC-sign the file when writing; verify HMAC before
+deserializing. The HMAC step proves the file wasn't tampered with
+since you wrote it. Then deserialize.
+```python
+import hmac, hashlib, pickle
+def write_signed(path, obj, key):
+    payload = pickle.dumps(obj)
+    sig = hmac.new(key, payload, hashlib.sha256).digest()
+    with open(path, "wb") as f:
+        f.write(sig + payload)
+def read_signed(path, key):
+    with open(path, "rb") as f:
+        data = f.read()
+    sig, payload = data[:32], data[32:]
+    expected = hmac.new(key, payload, hashlib.sha256).digest()
+    if not hmac.compare_digest(sig, expected):
+        raise ValueError("HMAC mismatch")
+    return pickle.loads(payload)  # safe because HMAC verified
+```
+This works only if the HMAC key is genuinely private. Once the
+key is compromised, the protection is gone.
+### Use case: "I need to deserialize YAML config"
+→ Use the safe-load mode of your YAML library. Python's PyYAML
+has `yaml.safe_load()`; Ruby Psych 4.0+ defaults to safe mode;
+SnakeYAML (Java) supports `SafeConstructor`. These restrict
+deserialization to basic types (strings, numbers, lists, dicts)
+and refuse to instantiate arbitrary classes.
+## Language-specific notes
+### Python pickle
+`pickle.loads(b)` on attacker-controlled bytes is roughly equivalent
+to `exec()` on attacker-controlled code. There is no "safer pickle"
+mode. Migrate to JSON / msgpack / pydantic-validated formats.
+If you absolutely must keep pickle for round-tripping:
+- HMAC-sign with a private key (see above)
+- Restrict to in-process / same-machine usage; never accept pickle
+  from a network input
+- Use `pickletools.dis()` to verify payloads conform to expected
+  opcodes (still not safe, just slightly harder to abuse)
+### Java ObjectInputStream
+Use Jackson with type-allow-list. For legacy code that must keep
+`ObjectInputStream`:
+```java
+public class AllowlistedObjectInputStream extends ObjectInputStream {
+    private static final Set<String> ALLOWED = Set.of(
+        "com.example.User", "com.example.Order"
+    );
+    @Override
+    protected Class<?> resolveClass(ObjectStreamClass desc) throws IOException, ClassNotFoundException {
+        if (!ALLOWED.contains(desc.getName())) {
+            throw new InvalidClassException("Not allowed: " + desc.getName());
+        }
+        return super.resolveClass(desc);
+    }
+}
+```
+### PHP unserialize
+`unserialize($input, ["allowed_classes" => false])` restricts to
+basic types (PHP 7+). For class-bearing payloads, use the array
+form: `["allowed_classes" => ["User", "Order"]]`.
+But the better answer is to switch to `json_decode` if the data
+shape allows.
+### .NET BinaryFormatter
+BinaryFormatter is deprecated in .NET 7+ and slated for removal.
+Migrate to `System.Text.Json` for general-purpose serialization,
+or `MessagePack-CSharp` for performance. The migration is
+straightforward in most codebases; the only friction is
+round-tripping types BinaryFormatter handles implicitly that
+require explicit converter registration in System.Text.Json.
+## Gadget chain references
+These are educational, not for use on production systems you
+don't own:
+- `ysoserial` — Java
+- `ysoserial.net` — .NET
+- `PHPGGC` — PHP
+- Anthony Sotirov's pickle-exploit primers
+Knowing the gadget-chain shape is useful for understanding the
+risk; the operational answer is still "don't use unsafe
+deserialization formats."
+## Primary sources
+- [CWE-502 Deserialization of Untrusted Data](https://cwe.mitre.org/data/definitions/502.html)
+- [OWASP A08:2021 Software and Data Integrity Failures](https://owasp.org/Top10/A08_2021-Software_and_Data_Integrity_Failures/)
+- [OWASP Deserialization Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Deserialization_Cheat_Sheet.html)
+- [Python pickle security docs](https://docs.python.org/3/library/pickle.html#restricting-globals)
+- [Microsoft BinaryFormatter deprecation](https://learn.microsoft.com/en-us/dotnet/standard/serialization/binaryformatter-security-guide)