aiwaf 0.1.0__tar.gz → 0.1.3__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of aiwaf might be problematic. Click here for more details.

Files changed (29) hide show
  1. aiwaf-0.1.3/PKG-INFO +181 -0
  2. aiwaf-0.1.3/README.md +170 -0
  3. {aiwaf-0.1.0 → aiwaf-0.1.3}/aiwaf/middleware.py +69 -30
  4. {aiwaf-0.1.0 → aiwaf-0.1.3}/aiwaf/models.py +8 -0
  5. aiwaf-0.1.3/aiwaf/trainer.py +175 -0
  6. aiwaf-0.1.3/aiwaf.egg-info/PKG-INFO +181 -0
  7. {aiwaf-0.1.0 → aiwaf-0.1.3}/aiwaf.egg-info/SOURCES.txt +1 -2
  8. aiwaf-0.1.3/pyproject.toml +9 -0
  9. {aiwaf-0.1.0 → aiwaf-0.1.3}/setup.py +7 -1
  10. aiwaf-0.1.0/PKG-INFO +0 -13
  11. aiwaf-0.1.0/README.md +0 -176
  12. aiwaf-0.1.0/aiwaf/trainer.py +0 -123
  13. aiwaf-0.1.0/aiwaf.egg-info/PKG-INFO +0 -13
  14. aiwaf-0.1.0/aiwaf.egg-info/entry_points.txt +0 -2
  15. aiwaf-0.1.0/aiwaf.egg-info/requires.txt +0 -5
  16. {aiwaf-0.1.0 → aiwaf-0.1.3}/aiwaf/__init__.py +0 -0
  17. {aiwaf-0.1.0 → aiwaf-0.1.3}/aiwaf/apps.py +0 -0
  18. {aiwaf-0.1.0 → aiwaf-0.1.3}/aiwaf/blacklist_manager.py +0 -0
  19. {aiwaf-0.1.0 → aiwaf-0.1.3}/aiwaf/management/__init__.py +0 -0
  20. {aiwaf-0.1.0 → aiwaf-0.1.3}/aiwaf/management/commands/__init__.py +0 -0
  21. {aiwaf-0.1.0 → aiwaf-0.1.3}/aiwaf/management/commands/detect_and_train.py +0 -0
  22. {aiwaf-0.1.0 → aiwaf-0.1.3}/aiwaf/resources/model.pkl +0 -0
  23. {aiwaf-0.1.0 → aiwaf-0.1.3}/aiwaf/storage.py +0 -0
  24. {aiwaf-0.1.0 → aiwaf-0.1.3}/aiwaf/template_tags/__init__.py +0 -0
  25. {aiwaf-0.1.0 → aiwaf-0.1.3}/aiwaf/template_tags/aiwaf_tags.py +0 -0
  26. {aiwaf-0.1.0 → aiwaf-0.1.3}/aiwaf/utils.py +0 -0
  27. {aiwaf-0.1.0 → aiwaf-0.1.3}/aiwaf.egg-info/dependency_links.txt +0 -0
  28. {aiwaf-0.1.0 → aiwaf-0.1.3}/aiwaf.egg-info/top_level.txt +0 -0
  29. {aiwaf-0.1.0 → aiwaf-0.1.3}/setup.cfg +0 -0
aiwaf-0.1.3/PKG-INFO ADDED
@@ -0,0 +1,181 @@
1
+ Metadata-Version: 2.4
2
+ Name: aiwaf
3
+ Version: 0.1.3
4
+ Summary: AI-powered Web Application Firewall
5
+ Author: Aayush Gauba
6
+ Author-email: Aayush Gauba <gauba.aayush@gmail.com>
7
+ License: MIT
8
+ Requires-Python: >=3.8
9
+ Description-Content-Type: text/markdown
10
+ Dynamic: author
11
+
12
+ # AI‑WAF
13
+
14
+ > A self‑learning, Django‑friendly Web Application Firewall
15
+ > with rate‑limiting, anomaly detection, honeypots, UUID‑tamper protection, dynamic keyword extraction, file‑extension probing detection, and daily retraining.
16
+
17
+ ---
18
+
19
+ ## Package Structure
20
+
21
+ ```
22
+ aiwaf/
23
+ ├── __init__.py
24
+ ├── blacklist_manager.py
25
+ ├── middleware.py
26
+ ├── trainer.py # exposes train()
27
+ ├── utils.py
28
+ ├── template_tags/
29
+ │ └── aiwaf_tags.py
30
+ ├── resources/
31
+ │ ├── model.pkl # pre‑trained base model
32
+ │ └── dynamic_keywords.json # evolves daily
33
+ ├── management/
34
+ │ └── commands/
35
+ │ └── detect_and_train.py # `python manage.py detect_and_train`
36
+ └── LICENSE
37
+ ```
38
+
39
+ ---
40
+
41
+ ## Features
42
+
43
+ - **IP Blocklist**
44
+ Instantly blocks suspicious IPs (supports CSV fallback or Django model).
45
+
46
+ - **Rate Limiting**
47
+ Sliding‑window blocks flooders (> `AIWAF_RATE_MAX` per `AIWAF_RATE_WINDOW`), then blacklists them.
48
+
49
+ - **AI Anomaly Detection**
50
+ IsolationForest on features:
51
+ - Path length
52
+ - Keyword hits (static + dynamic)
53
+ - Response time
54
+ - Status‑code index
55
+ - Burst count
56
+ - Total 404s
57
+
58
+ - **Dynamic Keyword Extraction**
59
+ Every retrain: top 10 most frequent “words” from 4xx/5xx paths are appended to your malicious keyword set.
60
+
61
+ - **File‑Extension Probing Detection**
62
+ Tracks repeated 404s on common web‑extensions (e.g. `.php`, `.asp`) and auto‑blocks after a burst.
63
+
64
+ - **Honeypot Field**
65
+ Hidden form field (via template tag) that bots fill → instant block.
66
+
67
+ - **UUID Tampering Protection**
68
+ Any `<uuid:…>` URL that doesn’t map to **any** model in its Django app gets blocked.
69
+
70
+ - **Daily Retraining**
71
+ Reads rotated/gzipped logs, auto‑blocks 404 floods (≥6), retrains the model, updates `model.pkl` + `dynamic_keywords.json`.
72
+
73
+ ---
74
+
75
+ ## Installation
76
+
77
+ ```bash
78
+ # From PyPI
79
+ pip install aiwaf
80
+
81
+ # Or for local development
82
+ git clone https://github.com/aayushgauba/aiwaf.git
83
+ cd aiwaf
84
+ pip install -e .
85
+ ```
86
+
87
+ ---
88
+
89
+ ## ⚙️ Configuration (`settings.py`)
90
+
91
+ ```python
92
+ INSTALLED_APPS += ["aiwaf"]
93
+
94
+ # Required
95
+ AIWAF_ACCESS_LOG = "/var/log/nginx/access.log"
96
+
97
+ # Optional (defaults shown)
98
+ AIWAF_MODEL_PATH = BASE_DIR / "aiwaf" / "resources" / "model.pkl"
99
+ AIWAF_HONEYPOT_FIELD = "hp_field"
100
+ AIWAF_RATE_WINDOW = 10 # seconds
101
+ AIWAF_RATE_MAX = 20 # max reqs/window
102
+ AIWAF_RATE_FLOOD = 10 # flood threshold
103
+ AIWAF_WINDOW_SECONDS = 60 # anomaly window
104
+ AIWAF_FILE_EXTENSIONS = [".php", ".asp", ".jsp"] # 404‑burst tracked extensions
105
+ ```
106
+
107
+ > **Note:** You no longer need to define `AIWAF_MALICIOUS_KEYWORDS` or `AIWAF_STATUS_CODES` in your settings — they’re built in and evolve dynamically.
108
+
109
+ ---
110
+
111
+ ## Middleware Setup
112
+
113
+ Add in **this** order to your `MIDDLEWARE` list:
114
+
115
+ ```python
116
+ MIDDLEWARE = [
117
+ "aiwaf.middleware.IPBlockMiddleware",
118
+ "aiwaf.middleware.RateLimitMiddleware",
119
+ "aiwaf.middleware.AIAnomalyMiddleware",
120
+ "aiwaf.middleware.HoneypotMiddleware",
121
+ "aiwaf.middleware.UUIDTamperMiddleware",
122
+ # ... other middleware ...
123
+ ]
124
+ ```
125
+
126
+ ---
127
+
128
+ ## Honeypot Field (in your template)
129
+
130
+ ```django
131
+ {% load aiwaf_tags %}
132
+
133
+ <form method="post">
134
+ {% csrf_token %}
135
+ {% honeypot_field %}
136
+ <!-- your real fields -->
137
+ </form>
138
+ ```
139
+
140
+ > Renders a hidden `<input name="hp_field" style="display:none">`.
141
+ > Any non‑empty submission → IP blacklisted.
142
+
143
+ ---
144
+
145
+ ## Running Detection & Training
146
+
147
+ ```bash
148
+ python manage.py detect_and_train
149
+ ```
150
+
151
+ **What happens:**
152
+ 1. Read access logs
153
+ 2. Auto‑block IPs with ≥ 6 total 404s
154
+ 3. Extract features & train IsolationForest
155
+ 4. Save `model.pkl`
156
+ 5. Extract top 10 dynamic keywords from 4xx/5xx
157
+
158
+ ---
159
+
160
+ ## How It Works
161
+
162
+ | Middleware | Purpose |
163
+ |--------------------------|------------------------------------------------------------------|
164
+ | IPBlockMiddleware | Blocks requests from known blacklisted IPs |
165
+ | RateLimitMiddleware | Enforces burst & flood thresholds |
166
+ | AIAnomalyMiddleware | ML‑driven behavior analysis + block on anomaly |
167
+ | HoneypotMiddleware | Detects bots filling hidden inputs in forms |
168
+ | UUIDTamperMiddleware | Blocks guessed/nonexistent UUIDs across all models in an app |
169
+
170
+ ---
171
+
172
+ ## License
173
+
174
+ This project is licensed under the **MIT License**. See the [LICENSE](LICENSE) file for details.
175
+
176
+ ---
177
+
178
+ ## Credits
179
+
180
+ **AI‑WAF** by [Aayush Gauba](https://github.com/aayushgauba)
181
+ > “Let your firewall learn and evolve — keep your site a fortress.”
aiwaf-0.1.3/README.md ADDED
@@ -0,0 +1,170 @@
1
+ # AI‑WAF
2
+
3
+ > A self‑learning, Django‑friendly Web Application Firewall
4
+ > with rate‑limiting, anomaly detection, honeypots, UUID‑tamper protection, dynamic keyword extraction, file‑extension probing detection, and daily retraining.
5
+
6
+ ---
7
+
8
+ ## Package Structure
9
+
10
+ ```
11
+ aiwaf/
12
+ ├── __init__.py
13
+ ├── blacklist_manager.py
14
+ ├── middleware.py
15
+ ├── trainer.py # exposes train()
16
+ ├── utils.py
17
+ ├── template_tags/
18
+ │ └── aiwaf_tags.py
19
+ ├── resources/
20
+ │ ├── model.pkl # pre‑trained base model
21
+ │ └── dynamic_keywords.json # evolves daily
22
+ ├── management/
23
+ │ └── commands/
24
+ │ └── detect_and_train.py # `python manage.py detect_and_train`
25
+ └── LICENSE
26
+ ```
27
+
28
+ ---
29
+
30
+ ## Features
31
+
32
+ - **IP Blocklist**
33
+ Instantly blocks suspicious IPs (supports CSV fallback or Django model).
34
+
35
+ - **Rate Limiting**
36
+ Sliding‑window blocks flooders (> `AIWAF_RATE_MAX` per `AIWAF_RATE_WINDOW`), then blacklists them.
37
+
38
+ - **AI Anomaly Detection**
39
+ IsolationForest on features:
40
+ - Path length
41
+ - Keyword hits (static + dynamic)
42
+ - Response time
43
+ - Status‑code index
44
+ - Burst count
45
+ - Total 404s
46
+
47
+ - **Dynamic Keyword Extraction**
48
+ Every retrain: top 10 most frequent “words” from 4xx/5xx paths are appended to your malicious keyword set.
49
+
50
+ - **File‑Extension Probing Detection**
51
+ Tracks repeated 404s on common web‑extensions (e.g. `.php`, `.asp`) and auto‑blocks after a burst.
52
+
53
+ - **Honeypot Field**
54
+ Hidden form field (via template tag) that bots fill → instant block.
55
+
56
+ - **UUID Tampering Protection**
57
+ Any `<uuid:…>` URL that doesn’t map to **any** model in its Django app gets blocked.
58
+
59
+ - **Daily Retraining**
60
+ Reads rotated/gzipped logs, auto‑blocks 404 floods (≥6), retrains the model, updates `model.pkl` + `dynamic_keywords.json`.
61
+
62
+ ---
63
+
64
+ ## Installation
65
+
66
+ ```bash
67
+ # From PyPI
68
+ pip install aiwaf
69
+
70
+ # Or for local development
71
+ git clone https://github.com/aayushgauba/aiwaf.git
72
+ cd aiwaf
73
+ pip install -e .
74
+ ```
75
+
76
+ ---
77
+
78
+ ## ⚙️ Configuration (`settings.py`)
79
+
80
+ ```python
81
+ INSTALLED_APPS += ["aiwaf"]
82
+
83
+ # Required
84
+ AIWAF_ACCESS_LOG = "/var/log/nginx/access.log"
85
+
86
+ # Optional (defaults shown)
87
+ AIWAF_MODEL_PATH = BASE_DIR / "aiwaf" / "resources" / "model.pkl"
88
+ AIWAF_HONEYPOT_FIELD = "hp_field"
89
+ AIWAF_RATE_WINDOW = 10 # seconds
90
+ AIWAF_RATE_MAX = 20 # max reqs/window
91
+ AIWAF_RATE_FLOOD = 10 # flood threshold
92
+ AIWAF_WINDOW_SECONDS = 60 # anomaly window
93
+ AIWAF_FILE_EXTENSIONS = [".php", ".asp", ".jsp"] # 404‑burst tracked extensions
94
+ ```
95
+
96
+ > **Note:** You no longer need to define `AIWAF_MALICIOUS_KEYWORDS` or `AIWAF_STATUS_CODES` in your settings — they’re built in and evolve dynamically.
97
+
98
+ ---
99
+
100
+ ## Middleware Setup
101
+
102
+ Add in **this** order to your `MIDDLEWARE` list:
103
+
104
+ ```python
105
+ MIDDLEWARE = [
106
+ "aiwaf.middleware.IPBlockMiddleware",
107
+ "aiwaf.middleware.RateLimitMiddleware",
108
+ "aiwaf.middleware.AIAnomalyMiddleware",
109
+ "aiwaf.middleware.HoneypotMiddleware",
110
+ "aiwaf.middleware.UUIDTamperMiddleware",
111
+ # ... other middleware ...
112
+ ]
113
+ ```
114
+
115
+ ---
116
+
117
+ ## Honeypot Field (in your template)
118
+
119
+ ```django
120
+ {% load aiwaf_tags %}
121
+
122
+ <form method="post">
123
+ {% csrf_token %}
124
+ {% honeypot_field %}
125
+ <!-- your real fields -->
126
+ </form>
127
+ ```
128
+
129
+ > Renders a hidden `<input name="hp_field" style="display:none">`.
130
+ > Any non‑empty submission → IP blacklisted.
131
+
132
+ ---
133
+
134
+ ## Running Detection & Training
135
+
136
+ ```bash
137
+ python manage.py detect_and_train
138
+ ```
139
+
140
+ **What happens:**
141
+ 1. Read access logs
142
+ 2. Auto‑block IPs with ≥ 6 total 404s
143
+ 3. Extract features & train IsolationForest
144
+ 4. Save `model.pkl`
145
+ 5. Extract top 10 dynamic keywords from 4xx/5xx
146
+
147
+ ---
148
+
149
+ ## How It Works
150
+
151
+ | Middleware | Purpose |
152
+ |--------------------------|------------------------------------------------------------------|
153
+ | IPBlockMiddleware | Blocks requests from known blacklisted IPs |
154
+ | RateLimitMiddleware | Enforces burst & flood thresholds |
155
+ | AIAnomalyMiddleware | ML‑driven behavior analysis + block on anomaly |
156
+ | HoneypotMiddleware | Detects bots filling hidden inputs in forms |
157
+ | UUIDTamperMiddleware | Blocks guessed/nonexistent UUIDs across all models in an app |
158
+
159
+ ---
160
+
161
+ ## License
162
+
163
+ This project is licensed under the **MIT License**. See the [LICENSE](LICENSE) file for details.
164
+
165
+ ---
166
+
167
+ ## Credits
168
+
169
+ **AI‑WAF** by [Aayush Gauba](https://github.com/aayushgauba)
170
+ > “Let your firewall learn and evolve — keep your site a fortress.”
@@ -1,23 +1,40 @@
1
+ # aiwaf/middleware.py
2
+
1
3
  import time
4
+ import re
5
+ import os
2
6
  import numpy as np
3
7
  import joblib
8
+
4
9
  from collections import defaultdict
5
10
  from django.utils.deprecation import MiddlewareMixin
6
11
  from django.http import JsonResponse
7
12
  from django.conf import settings
8
13
  from django.core.cache import cache
9
- from django.urls import resolve
14
+ from django.db.models import F
10
15
  from django.apps import apps
11
- from .blacklist_manager import BlacklistManager
12
16
 
13
- try:
14
- MODEL_PATH = settings.AIWAF_MODEL_PATH
15
- except AttributeError:
16
- import importlib.resources
17
- MODEL_PATH = importlib.resources.files("aiwaf").joinpath("resources/model.pkl")
17
+ from .blacklist_manager import BlacklistManager
18
+ from .models import DynamicKeyword
18
19
 
20
+ # ─── Model loading with fallback ────────────────────────────────────────────
21
+ MODEL_PATH = getattr(
22
+ settings,
23
+ "AIWAF_MODEL_PATH",
24
+ os.path.join(os.path.dirname(__file__), "resources", "model.pkl")
25
+ )
19
26
  MODEL = joblib.load(MODEL_PATH)
20
27
 
28
+ # ─── Static keywords default ────────────────────────────────────────────────
29
+ STATIC_KW = getattr(
30
+ settings,
31
+ "AIWAF_MALICIOUS_KEYWORDS",
32
+ [
33
+ ".php", "xmlrpc", "wp-", ".env", ".git", ".bak",
34
+ "conflg", "shell", "filemanager"
35
+ ]
36
+ )
37
+
21
38
  def get_ip(request):
22
39
  xff = request.META.get("HTTP_X_FORWARDED_FOR")
23
40
  if xff:
@@ -37,18 +54,21 @@ class IPBlockMiddleware:
37
54
 
38
55
 
39
56
  class RateLimitMiddleware:
40
- WINDOW = getattr(settings, "AIWAF_RATE_WINDOW", 10)
41
- MAX = getattr(settings, "AIWAF_RATE_MAX", 20)
42
- FLOOD = getattr(settings, "AIWAF_RATE_FLOOD", 10)
57
+ WINDOW = 10
58
+ MAX = 20
59
+ FLOOD = 10
60
+
43
61
  def __init__(self, get_response):
44
62
  self.get_response = get_response
45
63
  self.logs = defaultdict(list)
64
+
46
65
  def __call__(self, request):
47
- ip = get_ip(request)
66
+ ip = get_ip(request)
48
67
  now = time.time()
49
68
  recs = [t for t in self.logs[ip] if now - t < self.WINDOW]
50
69
  recs.append(now)
51
70
  self.logs[ip] = recs
71
+
52
72
  if len(recs) > self.MAX:
53
73
  return JsonResponse({"error": "too_many_requests"}, status=429)
54
74
  if len(recs) > self.FLOOD:
@@ -59,40 +79,57 @@ class RateLimitMiddleware:
59
79
 
60
80
 
61
81
  class AIAnomalyMiddleware(MiddlewareMixin):
62
- WINDOW_SECONDS = getattr(settings, "AIWAF_WINDOW_SECONDS", 60)
82
+ WINDOW = getattr(settings, "AIWAF_WINDOW_SECONDS", 60)
83
+ TOP_N = getattr(settings, "AIWAF_DYNAMIC_TOP_N", 10)
84
+
63
85
  def process_request(self, request):
64
86
  ip = get_ip(request)
65
87
  if BlacklistManager.is_blocked(ip):
66
88
  return JsonResponse({"error": "blocked"}, status=403)
89
+
67
90
  now = time.time()
68
91
  key = f"aiwaf:{ip}"
69
92
  data = cache.get(key, [])
93
+ # TODO: you may want to capture real status & response_time in process_response
70
94
  data.append((now, request.path, 0, 0.0))
71
- data = [d for d in data if now - d[0] < self.WINDOW_SECONDS]
72
- cache.set(key, data, timeout=self.WINDOW_SECONDS)
95
+ data = [d for d in data if now - d[0] < self.WINDOW]
96
+ cache.set(key, data, timeout=self.WINDOW)
97
+
98
+ # update dynamic‐keyword counts
99
+ for seg in re.split(r"\W+", request.path.lower()):
100
+ if len(seg) > 3:
101
+ obj, _ = DynamicKeyword.objects.get_or_create(keyword=seg)
102
+ DynamicKeyword.objects.filter(pk=obj.pk).update(count=F("count") + 1)
103
+
73
104
  if len(data) < 5:
74
105
  return None
75
- total = len(data)
76
- ratio_404 = sum(1 for (_, _, st, _) in data if st == 404) / total
77
- hits = sum(
78
- any(k in path.lower() for k in settings.AIWAF_MALICIOUS_KEYWORDS)
79
- for (_, path, _, _) in data
106
+
107
+ # pull top‐N dynamic tokens
108
+ top_dynamic = list(
109
+ DynamicKeyword.objects
110
+ .order_by("-count")
111
+ .values_list("keyword", flat=True)[: self.TOP_N]
80
112
  )
81
- avg_rt = np.mean([rt for (_, _, _, rt) in data]) if data else 0.0
82
- intervals = [
83
- data[i][0] - data[i-1][0] for i in range(1, total)
84
- ]
85
- avg_iv = np.mean(intervals) if intervals else 0.0
86
- X = np.array([[total, ratio_404, hits, avg_rt, avg_iv]], dtype=float)
113
+ ALL_KW = set(STATIC_KW) | set(top_dynamic)
114
+
115
+ total = len(data)
116
+ ratio404 = sum(1 for (_, _, st, _) in data if st == 404) / total
117
+ hits = sum(any(kw in path.lower() for kw in ALL_KW) for (_, path, _, _) in data)
118
+ avg_rt = np.mean([rt for (_, _, _, rt) in data]) if data else 0.0
119
+ ivs = [data[i][0] - data[i - 1][0] for i in range(1, total)]
120
+ avg_iv = np.mean(ivs) if ivs else 0.0
121
+
122
+ X = np.array([[total, ratio404, hits, avg_rt, avg_iv]], dtype=float)
87
123
  if MODEL.predict(X)[0] == -1:
88
124
  BlacklistManager.block(ip, "AI anomaly")
89
125
  return JsonResponse({"error": "blocked"}, status=403)
126
+
90
127
  return None
91
128
 
92
129
 
93
130
  class HoneypotMiddleware(MiddlewareMixin):
94
131
  def process_view(self, request, view_func, view_args, view_kwargs):
95
- trap = request.POST.get(settings.AIWAF_HONEYPOT_FIELD, "")
132
+ trap = request.POST.get(getattr(settings, "AIWAF_HONEYPOT_FIELD", "hp_field"), "")
96
133
  if trap:
97
134
  ip = get_ip(request)
98
135
  BlacklistManager.block(ip, "HONEYPOT triggered")
@@ -105,11 +142,13 @@ class UUIDTamperMiddleware(MiddlewareMixin):
105
142
  uid = view_kwargs.get("uuid")
106
143
  if not uid:
107
144
  return None
145
+
108
146
  ip = get_ip(request)
109
- app_label = view_kwargs.get("app_label") or view_func.__module__.split('.')[0]
110
- app_config = apps.get_app_config(app_label)
111
- for Model in app_config.get_models():
147
+ app_label = view_func.__module__.split(".")[0]
148
+ app_cfg = apps.get_app_config(app_label)
149
+ for Model in app_cfg.get_models():
112
150
  if Model.objects.filter(pk=uid).exists():
113
151
  return None
152
+
114
153
  BlacklistManager.block(ip, "UUID tampering")
115
- return JsonResponse({"error": "blocked"}, status=403)
154
+ return JsonResponse({"error": "blocked"}, status=403)
@@ -26,3 +26,11 @@ class BlacklistEntry(models.Model):
26
26
 
27
27
  def __str__(self):
28
28
  return f"{self.ip_address} ({self.reason})"
29
+
30
+ class DynamicKeyword(models.Model):
31
+ keyword = models.CharField(max_length=100, unique=True)
32
+ count = models.PositiveIntegerField(default=0)
33
+ last_updated = models.DateTimeField(auto_now=True)
34
+
35
+ class Meta:
36
+ ordering = ['-count']
@@ -0,0 +1,175 @@
1
+ import os
2
+ import glob
3
+ import gzip
4
+ import re
5
+ import json
6
+ import joblib
7
+
8
+ from datetime import datetime
9
+ from collections import defaultdict, Counter
10
+
11
+ import pandas as pd
12
+ from sklearn.ensemble import IsolationForest
13
+
14
+ from django.conf import settings
15
+ from django.apps import apps
16
+
17
+ # ─── CONFIG ────────────────────────────────────────────────────────────────
18
+
19
+ # Where to read your access logs (and rotated/.gz siblings)
20
+ LOG_PATH = settings.AIWAF_ACCESS_LOG
21
+
22
+ # Where we save our trained model
23
+ MODEL_PATH = os.path.join(
24
+ os.path.dirname(__file__),
25
+ "resources",
26
+ "model.pkl"
27
+ )
28
+
29
+ # Static “malicious” path keywords & file extensions
30
+ MALICIOUS_KEYWORDS = [
31
+ ".php", "xmlrpc", "wp-", ".env", ".git", ".bak",
32
+ "conflg", "shell", "filemanager"
33
+ ]
34
+ STATUS_CODES = ["200", "403", "404", "500"]
35
+
36
+ # Regex for combined log with response-time=…
37
+ _LOG_RX = re.compile(
38
+ r'(\d+\.\d+\.\d+\.\d+).*\[(.*?)\].*"(?:GET|POST) (.*?) HTTP/.*?" '
39
+ r'(\d{3}).*?"(.*?)" "(.*?)".*?response-time=(\d+\.\d+)'
40
+ )
41
+
42
+ # Your Django model for storing blocked IPs
43
+ BlacklistEntry = apps.get_model("aiwaf", "BlacklistEntry")
44
+
45
+
46
+ # ─── READ & PARSE LOG LINES ─────────────────────────────────────────────────
47
+
48
+ def _read_all_logs():
49
+ lines = []
50
+ if LOG_PATH and os.path.exists(LOG_PATH):
51
+ with open(LOG_PATH, "r", errors="ignore") as f:
52
+ lines += f.readlines()
53
+ for path in sorted(glob.glob(LOG_PATH + ".*")):
54
+ opener = gzip.open if path.endswith(".gz") else open
55
+ try:
56
+ with opener(path, "rt", errors="ignore") as f:
57
+ lines += f.readlines()
58
+ except OSError:
59
+ continue
60
+ return lines
61
+
62
+ def _parse(line):
63
+ m = _LOG_RX.search(line)
64
+ if not m:
65
+ return None
66
+ ip, ts_str, path, status, ref, ua, rt = m.groups()
67
+ try:
68
+ ts = datetime.strptime(ts_str.split()[0], "%d/%b/%Y:%H:%M:%S")
69
+ except ValueError:
70
+ return None
71
+ return {
72
+ "ip": ip,
73
+ "timestamp": ts,
74
+ "path": path,
75
+ "status": status,
76
+ "ua": ua,
77
+ "response_time": float(rt),
78
+ }
79
+
80
+
81
+ # ─── TRAIN ENTRYPOINT ───────────────────────────────────────────────────────
82
+
83
+ def train():
84
+ raw = _read_all_logs()
85
+ if not raw:
86
+ print("❌ No log lines found – check settings.AIWAF_ACCESS_LOG")
87
+ return
88
+
89
+ parsed = []
90
+ ip_404 = defaultdict(int)
91
+ ip_times = defaultdict(list)
92
+
93
+ # parse + accumulate timestamps & 404 counts
94
+ for ln in raw:
95
+ rec = _parse(ln)
96
+ if not rec:
97
+ continue
98
+ parsed.append(rec)
99
+ ip_times[rec["ip"]].append(rec["timestamp"])
100
+ if rec["status"] == "404":
101
+ ip_404[rec["ip"]] += 1
102
+
103
+ # auto-block IPs with >=6 total 404s
104
+ newly_blocked = []
105
+ for ip, cnt in ip_404.items():
106
+ if cnt >= 6:
107
+ obj, created = BlacklistEntry.objects.get_or_create(
108
+ ip_address=ip,
109
+ defaults={"reason": "Excessive 404s (≥6)"}
110
+ )
111
+ if created:
112
+ newly_blocked.append(ip)
113
+ if newly_blocked:
114
+ print(f"🔒 Blocked {len(newly_blocked)} IPs for 404 flood: {newly_blocked}")
115
+
116
+ # build feature vectors
117
+ rows = []
118
+ for r in parsed:
119
+ ip = r["ip"]
120
+ burst = sum(
121
+ 1 for t in ip_times[ip]
122
+ if (r["timestamp"] - t).total_seconds() <= 10
123
+ )
124
+ total404 = ip_404[ip]
125
+ kw_hits = sum(k in r["path"].lower() for k in MALICIOUS_KEYWORDS)
126
+ status_idx = STATUS_CODES.index(r["status"]) if r["status"] in STATUS_CODES else -1
127
+
128
+ rows.append([
129
+ len(r["path"]),
130
+ kw_hits,
131
+ r["response_time"],
132
+ status_idx,
133
+ burst,
134
+ total404
135
+ ])
136
+
137
+ if not rows:
138
+ print("⚠️ No entries to train on.")
139
+ return
140
+
141
+ df = pd.DataFrame(
142
+ rows,
143
+ columns=[
144
+ "path_len", "kw_hits", "resp_time",
145
+ "status_idx", "burst_count", "total_404"
146
+ ]
147
+ ).fillna(0).astype(float)
148
+
149
+ # train & save
150
+ clf = IsolationForest(contamination=0.01, random_state=42)
151
+ clf.fit(df.values)
152
+ os.makedirs(os.path.dirname(MODEL_PATH), exist_ok=True)
153
+ joblib.dump(clf, MODEL_PATH)
154
+ print(f"✅ Model trained on {len(df)} samples → {MODEL_PATH}")
155
+
156
+ # extract top‑10 dynamic keywords from 4xx/5xx paths
157
+ tokens = Counter()
158
+ for r in parsed:
159
+ if r["status"].startswith(("4", "5")):
160
+ segs = re.split(r"\W+", r["path"].lower())
161
+ for seg in segs:
162
+ if len(seg) > 3 and seg not in MALICIOUS_KEYWORDS:
163
+ tokens[seg] += 1
164
+
165
+ new_kw = [kw for kw, _ in tokens.most_common(10)]
166
+ DK_FILE = os.path.join(os.path.dirname(__file__), "resources", "dynamic_keywords.json")
167
+ try:
168
+ existing = set(json.load(open(DK_FILE)))
169
+ except FileNotFoundError:
170
+ existing = set()
171
+ updated = sorted(existing | set(new_kw))
172
+ with open(DK_FILE, "w") as f:
173
+ json.dump(updated, f, indent=2)
174
+
175
+ print(f"📝 Updated dynamic keywords: {new_kw}")
@@ -0,0 +1,181 @@
1
+ Metadata-Version: 2.4
2
+ Name: aiwaf
3
+ Version: 0.1.3
4
+ Summary: AI-powered Web Application Firewall
5
+ Author: Aayush Gauba
6
+ Author-email: Aayush Gauba <gauba.aayush@gmail.com>
7
+ License: MIT
8
+ Requires-Python: >=3.8
9
+ Description-Content-Type: text/markdown
10
+ Dynamic: author
11
+
12
+ # AI‑WAF
13
+
14
+ > A self‑learning, Django‑friendly Web Application Firewall
15
+ > with rate‑limiting, anomaly detection, honeypots, UUID‑tamper protection, dynamic keyword extraction, file‑extension probing detection, and daily retraining.
16
+
17
+ ---
18
+
19
+ ## Package Structure
20
+
21
+ ```
22
+ aiwaf/
23
+ ├── __init__.py
24
+ ├── blacklist_manager.py
25
+ ├── middleware.py
26
+ ├── trainer.py # exposes train()
27
+ ├── utils.py
28
+ ├── template_tags/
29
+ │ └── aiwaf_tags.py
30
+ ├── resources/
31
+ │ ├── model.pkl # pre‑trained base model
32
+ │ └── dynamic_keywords.json # evolves daily
33
+ ├── management/
34
+ │ └── commands/
35
+ │ └── detect_and_train.py # `python manage.py detect_and_train`
36
+ └── LICENSE
37
+ ```
38
+
39
+ ---
40
+
41
+ ## Features
42
+
43
+ - **IP Blocklist**
44
+ Instantly blocks suspicious IPs (supports CSV fallback or Django model).
45
+
46
+ - **Rate Limiting**
47
+ Sliding‑window blocks flooders (> `AIWAF_RATE_MAX` per `AIWAF_RATE_WINDOW`), then blacklists them.
48
+
49
+ - **AI Anomaly Detection**
50
+ IsolationForest on features:
51
+ - Path length
52
+ - Keyword hits (static + dynamic)
53
+ - Response time
54
+ - Status‑code index
55
+ - Burst count
56
+ - Total 404s
57
+
58
+ - **Dynamic Keyword Extraction**
59
+ Every retrain: top 10 most frequent “words” from 4xx/5xx paths are appended to your malicious keyword set.
60
+
61
+ - **File‑Extension Probing Detection**
62
+ Tracks repeated 404s on common web‑extensions (e.g. `.php`, `.asp`) and auto‑blocks after a burst.
63
+
64
+ - **Honeypot Field**
65
+ Hidden form field (via template tag) that bots fill → instant block.
66
+
67
+ - **UUID Tampering Protection**
68
+ Any `<uuid:…>` URL that doesn’t map to **any** model in its Django app gets blocked.
69
+
70
+ - **Daily Retraining**
71
+ Reads rotated/gzipped logs, auto‑blocks 404 floods (≥6), retrains the model, updates `model.pkl` + `dynamic_keywords.json`.
72
+
73
+ ---
74
+
75
+ ## Installation
76
+
77
+ ```bash
78
+ # From PyPI
79
+ pip install aiwaf
80
+
81
+ # Or for local development
82
+ git clone https://github.com/aayushgauba/aiwaf.git
83
+ cd aiwaf
84
+ pip install -e .
85
+ ```
86
+
87
+ ---
88
+
89
+ ## ⚙️ Configuration (`settings.py`)
90
+
91
+ ```python
92
+ INSTALLED_APPS += ["aiwaf"]
93
+
94
+ # Required
95
+ AIWAF_ACCESS_LOG = "/var/log/nginx/access.log"
96
+
97
+ # Optional (defaults shown)
98
+ AIWAF_MODEL_PATH = BASE_DIR / "aiwaf" / "resources" / "model.pkl"
99
+ AIWAF_HONEYPOT_FIELD = "hp_field"
100
+ AIWAF_RATE_WINDOW = 10 # seconds
101
+ AIWAF_RATE_MAX = 20 # max reqs/window
102
+ AIWAF_RATE_FLOOD = 10 # flood threshold
103
+ AIWAF_WINDOW_SECONDS = 60 # anomaly window
104
+ AIWAF_FILE_EXTENSIONS = [".php", ".asp", ".jsp"] # 404‑burst tracked extensions
105
+ ```
106
+
107
+ > **Note:** You no longer need to define `AIWAF_MALICIOUS_KEYWORDS` or `AIWAF_STATUS_CODES` in your settings — they’re built in and evolve dynamically.
108
+
109
+ ---
110
+
111
+ ## Middleware Setup
112
+
113
+ Add in **this** order to your `MIDDLEWARE` list:
114
+
115
+ ```python
116
+ MIDDLEWARE = [
117
+ "aiwaf.middleware.IPBlockMiddleware",
118
+ "aiwaf.middleware.RateLimitMiddleware",
119
+ "aiwaf.middleware.AIAnomalyMiddleware",
120
+ "aiwaf.middleware.HoneypotMiddleware",
121
+ "aiwaf.middleware.UUIDTamperMiddleware",
122
+ # ... other middleware ...
123
+ ]
124
+ ```
125
+
126
+ ---
127
+
128
+ ## Honeypot Field (in your template)
129
+
130
+ ```django
131
+ {% load aiwaf_tags %}
132
+
133
+ <form method="post">
134
+ {% csrf_token %}
135
+ {% honeypot_field %}
136
+ <!-- your real fields -->
137
+ </form>
138
+ ```
139
+
140
+ > Renders a hidden `<input name="hp_field" style="display:none">`.
141
+ > Any non‑empty submission → IP blacklisted.
142
+
143
+ ---
144
+
145
+ ## Running Detection & Training
146
+
147
+ ```bash
148
+ python manage.py detect_and_train
149
+ ```
150
+
151
+ **What happens:**
152
+ 1. Read access logs
153
+ 2. Auto‑block IPs with ≥ 6 total 404s
154
+ 3. Extract features & train IsolationForest
155
+ 4. Save `model.pkl`
156
+ 5. Extract top 10 dynamic keywords from 4xx/5xx
157
+
158
+ ---
159
+
160
+ ## How It Works
161
+
162
+ | Middleware | Purpose |
163
+ |--------------------------|------------------------------------------------------------------|
164
+ | IPBlockMiddleware | Blocks requests from known blacklisted IPs |
165
+ | RateLimitMiddleware | Enforces burst & flood thresholds |
166
+ | AIAnomalyMiddleware | ML‑driven behavior analysis + block on anomaly |
167
+ | HoneypotMiddleware | Detects bots filling hidden inputs in forms |
168
+ | UUIDTamperMiddleware | Blocks guessed/nonexistent UUIDs across all models in an app |
169
+
170
+ ---
171
+
172
+ ## License
173
+
174
+ This project is licensed under the **MIT License**. See the [LICENSE](LICENSE) file for details.
175
+
176
+ ---
177
+
178
+ ## Credits
179
+
180
+ **AI‑WAF** by [Aayush Gauba](https://github.com/aayushgauba)
181
+ > “Let your firewall learn and evolve — keep your site a fortress.”
@@ -1,4 +1,5 @@
1
1
  README.md
2
+ pyproject.toml
2
3
  setup.py
3
4
  aiwaf/__init__.py
4
5
  aiwaf/apps.py
@@ -11,8 +12,6 @@ aiwaf/utils.py
11
12
  aiwaf.egg-info/PKG-INFO
12
13
  aiwaf.egg-info/SOURCES.txt
13
14
  aiwaf.egg-info/dependency_links.txt
14
- aiwaf.egg-info/entry_points.txt
15
- aiwaf.egg-info/requires.txt
16
15
  aiwaf.egg-info/top_level.txt
17
16
  aiwaf/management/__init__.py
18
17
  aiwaf/management/commands/__init__.py
@@ -0,0 +1,9 @@
1
+ [project]
2
+ name = "aiwaf"
3
+ version = "0.1.3"
4
+ description = "AI-powered Web Application Firewall"
5
+ readme = "README.md"
6
+ requires-python = ">=3.8"
7
+ license = {text = "MIT"}
8
+ authors = [{ name = "Aayush Gauba", email = "gauba.aayush@gmail.com" }]
9
+ dependencies = [ ]
@@ -1,9 +1,15 @@
1
1
  from setuptools import setup, find_packages
2
+ from pathlib import Path
3
+
4
+ this_directory = Path(__file__).parent
5
+ long_description = (this_directory / "README.md").read_text(encoding="utf-8")
2
6
 
3
7
  setup(
4
8
  name="aiwaf",
5
- version="0.1.0",
9
+ version="0.1.3",
6
10
  description="AI‑driven pluggable Web Application Firewall for Django (CSV or DB storage)",
11
+ long_description=long_description,
12
+ long_description_content_type="text/markdown", # <- required for markdown support
7
13
  author="Aayush Gauba",
8
14
  packages=find_packages(),
9
15
  package_data={
aiwaf-0.1.0/PKG-INFO DELETED
@@ -1,13 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: aiwaf
3
- Version: 0.1.0
4
- Summary: AI‑driven pluggable Web Application Firewall for Django (CSV or DB storage)
5
- Author: Aayush Gauba
6
- Requires-Dist: django>=3.0
7
- Requires-Dist: scikit-learn
8
- Requires-Dist: numpy
9
- Requires-Dist: pandas
10
- Requires-Dist: joblib
11
- Dynamic: author
12
- Dynamic: requires-dist
13
- Dynamic: summary
aiwaf-0.1.0/README.md DELETED
@@ -1,176 +0,0 @@
1
- # AI‑WAF
2
-
3
- > A self-learning, Django-friendly Web Application Firewall
4
- > with rate-limiting, anomaly detection, honeypots, UUID-tamper protection, and daily retraining.
5
-
6
- ---
7
-
8
- ## Package Structure
9
-
10
- ```
11
- aiwaf/
12
- ├── __init__.py
13
- ├── blacklist_manager.py
14
- ├── middleware.py
15
- ├── trainer.py # exposes detect_and_train()
16
- ├── utils.py
17
- ├── template_tags/
18
- │ └── aiwaf_tags.py
19
- ├── resources/
20
- │ └── model.pkl # pre-trained base model
21
- ├── management/
22
- │ └── commands/
23
- │ └── detect_and_train.py # python manage.py detect_and_train
24
- └── LICENSE
25
- ```
26
-
27
- ---
28
-
29
- ## Features
30
-
31
- - **IP Blocklist**
32
- Automatically blocks suspicious IPs; optionally backed by CSV or Django model.
33
-
34
- - **Rate Limiting**
35
- Sliding window logic blocks IPs exceeding a threshold of requests per second.
36
-
37
- - **AI Anomaly Detection**
38
- IsolationForest trained on real logs with features like:
39
- - Path length
40
- - Keyword hits
41
- - Response time
42
- - Status code index
43
- - Burst count
44
- - Total 404s
45
-
46
- - **Honeypot Field**
47
- Hidden form field that bots are likely to fill — if triggered, the IP is blocked.
48
-
49
- - **UUID Tampering Protection**
50
- Detects if someone is probing by injecting random/nonexistent UUIDs into URLs.
51
-
52
- - **Daily Retraining**
53
- A single command retrains your model every day based on your logs.
54
-
55
- ---
56
-
57
- ## Installation
58
-
59
- Install locally or from PyPI:
60
-
61
- ```bash
62
- pip install aiwaf
63
- ```
64
-
65
- Or for local dev:
66
-
67
- ```bash
68
- git clone https://github.com/aayushgauba/aiwaf.git
69
- cd aiwaf
70
- pip install -e .
71
- ```
72
-
73
- ---
74
-
75
- ## ⚙️ Configuration (`settings.py`)
76
-
77
- ```python
78
- INSTALLED_APPS += [
79
- "aiwaf",
80
- ]
81
-
82
- # Required
83
- AIWAF_ACCESS_LOG = "/var/log/nginx/access.log"
84
-
85
- # Optional (defaults included)
86
- AIWAF_MODEL_PATH = BASE_DIR / "aiwaf" / "resources" / "model.pkl"
87
- AIWAF_MALICIOUS_KEYWORDS = [".php", "xmlrpc", "wp-", ".env", ".git", ".bak", "conflg", "shell", "filemanager"]
88
- AIWAF_STATUS_CODES = ["200", "403", "404", "500"]
89
- AIWAF_HONEYPOT_FIELD = "hp_field"
90
- ```
91
-
92
- ---
93
-
94
- ## Middleware Setup
95
-
96
- Add to `MIDDLEWARE` in order:
97
-
98
- ```python
99
- MIDDLEWARE = [
100
- "aiwaf.middleware.IPBlockMiddleware",
101
- "aiwaf.middleware.RateLimitMiddleware",
102
- "aiwaf.middleware.AIAnomalyMiddleware",
103
- "aiwaf.middleware.HoneypotMiddleware",
104
- "aiwaf.middleware.UUIDTamperMiddleware",
105
- ...
106
- ]
107
- ```
108
-
109
- ---
110
-
111
- ## Honeypot Field (in template)
112
-
113
- ```html
114
- {% load aiwaf_tags %}
115
-
116
- <form method="post">
117
- {% csrf_token %}
118
- {% honeypot_field %}
119
- <!-- other fields -->
120
- </form>
121
- ```
122
-
123
- The hidden field will be `<input type="hidden" name="hp_field">`.
124
- If it’s ever filled → IP gets blocked.
125
-
126
- ---
127
-
128
- ## Run Detection + Training
129
-
130
- ```bash
131
- python manage.py detect_and_train
132
- ```
133
-
134
- What it does:
135
-
136
- - Reads logs (supports `.gz` and rotated logs).
137
- - Detects excessive 404s (≥6) → instant block.
138
- - Builds feature vectors from logs.
139
- - Trains IsolationForest and saves `model.pkl`.
140
-
141
- Schedule it to run daily via `cron`, `Celery beat`, or systemd timer.
142
-
143
- ---
144
-
145
- ## How It Works (Simplified)
146
-
147
- | Middleware | Functionality |
148
- |------------------------|--------------------------------------------------------------|
149
- | IPBlockMiddleware | Blocks requests from known blacklisted IPs |
150
- | RateLimitMiddleware | Blocks flooders (>20/10s) and blacklists them (>10/10s) |
151
- | AIAnomalyMiddleware | Uses ML to detect suspicious behavior in request patterns |
152
- | HoneypotMiddleware | Detects bots filling hidden inputs in forms |
153
- | UUIDTamperMiddleware | Detects guessing/probing by checking invalid UUID access |
154
-
155
- ---
156
-
157
- ## Development Roadmap
158
-
159
- - [ ] Add CSV blocklist fallback
160
- - [ ] Admin dashboard integration
161
- - [ ] Auto-pruning of old block entries
162
- - [ ] Real-time log streaming compatibility
163
- - [ ] Docker/Helm deployment guide
164
-
165
- ---
166
-
167
- ## License
168
-
169
- This project is licensed under the **MIT License** — see `LICENSE` for details.
170
-
171
- ---
172
-
173
- ## Credits
174
-
175
- **AIWAF** by [Aayush Gauba](https://github.com/aayushgauba)
176
- > "Let your firewall learn and evolve with your logs. Make your site a fortress."
@@ -1,123 +0,0 @@
1
- # aiwaf/trainer.py
2
-
3
- import os
4
- import glob
5
- import gzip
6
- import re
7
- import joblib
8
- from datetime import datetime
9
- from collections import defaultdict
10
- from .models import BlacklistEntry
11
- import pandas as pd
12
- from sklearn.ensemble import IsolationForest
13
- from django.conf import settings
14
- from django.apps import apps
15
-
16
- LOG_PATH = settings.AIWAF_ACCESS_LOG
17
- MODEL_PATH = os.path.join(
18
- os.path.dirname(__file__),
19
- "resources",
20
- "model.pkl"
21
- )
22
- MALICIOUS_KEYWORDS = [".php", "xmlrpc", "wp-", ".env", ".git", ".bak", "conflg", "shell", "filemanager"]
23
- STATUS_CODES = ["200", "403", "404", "500"]
24
- _LOG_RX = re.compile(
25
- r'(\d+\.\d+\.\d+\.\d+).*\[(.*?)\].*"(?:GET|POST) (.*?) HTTP/.*?" (\d{3}).*?"(.*?)" "(.*?)".*?response-time=(\d+\.\d+)'
26
- )
27
- BlacklistedIP = BlacklistEntry.objects.all()
28
- def _read_all_logs():
29
- lines = []
30
- if LOG_PATH and os.path.exists(LOG_PATH):
31
- with open(LOG_PATH, "r", errors="ignore") as f:
32
- lines += f.readlines()
33
- for path in sorted(glob.glob(LOG_PATH + ".*")):
34
- opener = gzip.open if path.endswith(".gz") else open
35
- try:
36
- with opener(path, "rt", errors="ignore") as f:
37
- lines += f.readlines()
38
- except OSError:
39
- continue
40
- return lines
41
-
42
- def _parse(line):
43
- m = _LOG_RX.search(line)
44
- if not m:
45
- return None
46
- ip, ts_str, path, status, ref, ua, rt = m.groups()
47
- try:
48
- ts = datetime.strptime(ts_str.split()[0], "%d/%b/%Y:%H:%M:%S")
49
- except ValueError:
50
- return None
51
- return {
52
- "ip": ip,
53
- "timestamp": ts,
54
- "path": path,
55
- "status": status,
56
- "ua": ua,
57
- "response_time": float(rt),
58
- }
59
-
60
-
61
- def train():
62
- raw = _read_all_logs()
63
- if not raw:
64
- print("No log lines found – check AIWAF_ACCESS_LOG")
65
- return
66
- parsed = []
67
- ip_404 = defaultdict(int)
68
- ip_times = defaultdict(list)
69
- for ln in raw:
70
- rec = _parse(ln)
71
- if not rec:
72
- continue
73
- parsed.append(rec)
74
- ip_times[rec["ip"]].append(rec["timestamp"])
75
- if rec["status"] == "404":
76
- ip_404[rec["ip"]] += 1
77
- blocked = []
78
- for ip, count in ip_404.items():
79
- if count >= 6:
80
- obj, created = BlacklistEntry.objects.get_or_create(
81
- ip_address=ip,
82
- defaults={"reason": "Excessive 404s (≥6)"}
83
- )
84
- if created:
85
- blocked.append(ip)
86
- if blocked:
87
- print(f"Auto‑blocked {len(blocked)} IPs for ≥6 404s: {', '.join(blocked)}")
88
- rows = []
89
- for r in parsed:
90
- ip = r["ip"]
91
- burst = sum(
92
- 1 for t in ip_times[ip]
93
- if (r["timestamp"] - t).total_seconds() <= 10
94
- )
95
- total404 = ip_404[ip]
96
- kw_hits = sum(k in r["path"].lower() for k in MALICIOUS_KEYWORDS)
97
- status_idx = STATUS_CODES.index(r["status"]) if r["status"] in STATUS_CODES else -1
98
- rows.append([
99
- len(r["path"]),
100
- kw_hits,
101
- r["response_time"],
102
- status_idx,
103
- burst,
104
- total404
105
- ])
106
-
107
- if not rows:
108
- print("No entries to train on!")
109
- return
110
-
111
- df = pd.DataFrame(
112
- rows,
113
- columns=[
114
- "path_len", "kw_hits", "resp_time",
115
- "status_idx", "burst_count", "total_404"
116
- ]
117
- ).fillna(0).astype(float)
118
- clf = IsolationForest(contamination=0.01, random_state=42)
119
- clf.fit(df.values)
120
- os.makedirs(os.path.dirname(MODEL_PATH), exist_ok=True)
121
- joblib.dump(clf, MODEL_PATH)
122
- print(f"Model trained on {len(df)} samples and saved to {MODEL_PATH}")
123
-
@@ -1,13 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: aiwaf
3
- Version: 0.1.0
4
- Summary: AI‑driven pluggable Web Application Firewall for Django (CSV or DB storage)
5
- Author: Aayush Gauba
6
- Requires-Dist: django>=3.0
7
- Requires-Dist: scikit-learn
8
- Requires-Dist: numpy
9
- Requires-Dist: pandas
10
- Requires-Dist: joblib
11
- Dynamic: author
12
- Dynamic: requires-dist
13
- Dynamic: summary
@@ -1,2 +0,0 @@
1
- [console_scripts]
2
- aiwaf-detect = aiwaf.trainer:detect_and_train
@@ -1,5 +0,0 @@
1
- django>=3.0
2
- scikit-learn
3
- numpy
4
- pandas
5
- joblib
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes