aiwaf 0.1.7__tar.gz → 0.1.7.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of aiwaf might be problematic. Click here for more details.

@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: aiwaf
3
- Version: 0.1.7
3
+ Version: 0.1.7.2
4
4
  Summary: AI-powered Web Application Firewall
5
5
  Home-page: https://github.com/aayushgauba/aiwaf
6
6
  Author: Aayush Gauba
@@ -15,14 +15,14 @@ Dynamic: license-file
15
15
  Dynamic: requires-python
16
16
 
17
17
 
18
- # AI‑WAF
18
+ # AI‑WAF
19
19
 
20
20
  > A self‑learning, Django‑friendly Web Application Firewall
21
- > with rate‑limiting, anomaly detection, honeypots, UUID‑tamper protection, dynamic keyword extraction, file‑extension probing detection, and daily retraining.
21
+ > with rate‑limiting, anomaly detection, honeypots, UUID‑tamper protection, dynamic keyword extraction, file‑extension probing detection, exempt path awareness, and daily retraining.
22
22
 
23
23
  ---
24
24
 
25
- ## Package Structure
25
+ ## 📁 Package Structure
26
26
 
27
27
  ```
28
28
  aiwaf/
@@ -44,7 +44,7 @@ aiwaf/
44
44
 
45
45
  ---
46
46
 
47
- ## Features
47
+ ## 🚀 Features
48
48
 
49
49
  - **IP Blocklist**
50
50
  Instantly blocks suspicious IPs (supports CSV fallback or Django model).
@@ -53,7 +53,7 @@ aiwaf/
53
53
  Sliding‑window blocks flooders (> `AIWAF_RATE_MAX` per `AIWAF_RATE_WINDOW`), then blacklists them.
54
54
 
55
55
  - **AI Anomaly Detection**
56
- IsolationForest on features:
56
+ IsolationForest trained on:
57
57
  - Path length
58
58
  - Keyword hits (static + dynamic)
59
59
  - Response time
@@ -61,34 +61,28 @@ aiwaf/
61
61
  - Burst count
62
62
  - Total 404s
63
63
 
64
- - **Dynamic Keyword Extraction**
65
- Every retrain: top 10 most frequent “words” from 4xx/5xx paths are appended to your malicious keyword set.
64
+ - **Dynamic Keyword Extraction & Cleanup**
65
+ - Every retrain adds top 10 keyword segments from 4xx/5xx paths
66
+ - **If a path is added to `AIWAF_EXEMPT_PATHS`, its keywords are automatically removed from the database**
66
67
 
67
68
  - **File‑Extension Probing Detection**
68
- Tracks repeated 404s on common web‑extensions (e.g. `.php`, `.asp`) and auto‑blocks after a burst.
69
+ Tracks repeated 404s on common extensions (e.g. `.php`, `.asp`) and blocks IPs.
69
70
 
70
71
  - **Honeypot Field**
71
- Hidden form field (via template tag) that bots fill instant block.
72
+ Hidden field for bot detection IP blacklisted on fill.
72
73
 
73
74
  - **UUID Tampering Protection**
74
- Any `<uuid:…>` URL that doesn’t map to **any** model in its Django app gets blocked.
75
+ Blocks guessed or invalid UUIDs that don’t resolve to real models.
75
76
 
76
- - **Daily Retraining**
77
- Reads rotated/gzipped logs, auto‑blocks 404 floods (≥6), retrains the model, updates `model.pkl` + `dynamic_keywords.json`.
78
-
79
- ---
80
-
81
- ## Installation
82
-
83
- ```bash
84
- # From PyPI
85
- pip install aiwaf
77
+ - **Exempt Path Awareness**
78
+ Fully respects `AIWAF_EXEMPT_PATHS` across all modules exempt paths are:
79
+ - Skipped from keyword learning
80
+ - Immune to AI blocking
81
+ - Ignored in log training
82
+ - Cleaned from `DynamicKeyword` model automatically
86
83
 
87
- # Or for local development
88
- git clone https://github.com/aayushgauba/aiwaf.git
89
- cd aiwaf
90
- pip install -e .
91
- ```
84
+ - **Daily Retraining**
85
+ Reads rotated logs, auto‑blocks 404 floods, retrains the IsolationForest, updates `model.pkl`, and evolves the keyword DB.
92
86
 
93
87
  ---
94
88
 
@@ -96,33 +90,51 @@ pip install -e .
96
90
 
97
91
  ```python
98
92
  INSTALLED_APPS += ["aiwaf"]
93
+ ```
99
94
 
100
95
  ### Database Setup
101
96
 
102
- After adding `aiwaf` to your `INSTALLED_APPS`, create the necessary tables for the IP‐blacklist and dynamic‐keyword models:
97
+ After adding `aiwaf` to your `INSTALLED_APPS`, run the following to create the necessary tables:
103
98
 
104
99
  ```bash
105
100
  python manage.py makemigrations aiwaf
106
101
  python manage.py migrate
102
+ ```
107
103
 
108
- # Required
104
+ ---
105
+
106
+ ### Required
107
+
108
+ ```python
109
109
  AIWAF_ACCESS_LOG = "/var/log/nginx/access.log"
110
+ ```
111
+
112
+ ---
113
+
114
+ ### Optional (defaults shown)
110
115
 
111
- # Optional (defaults shown)
116
+ ```python
112
117
  AIWAF_MODEL_PATH = BASE_DIR / "aiwaf" / "resources" / "model.pkl"
113
118
  AIWAF_HONEYPOT_FIELD = "hp_field"
114
119
  AIWAF_RATE_WINDOW = 10 # seconds
115
- AIWAF_RATE_MAX = 20 # max reqs/window
120
+ AIWAF_RATE_MAX = 20 # max requests per window
116
121
  AIWAF_RATE_FLOOD = 10 # flood threshold
117
- AIWAF_WINDOW_SECONDS = 60 # anomaly window
118
- AIWAF_FILE_EXTENSIONS = [".php", ".asp", ".jsp"] # 404‑burst tracked extensions
122
+ AIWAF_WINDOW_SECONDS = 60 # anomaly detection window
123
+ AIWAF_FILE_EXTENSIONS = [".php", ".asp", ".jsp"]
124
+ AIWAF_EXEMPT_PATHS = [ # optional but highly recommended
125
+ "/favicon.ico",
126
+ "/robots.txt",
127
+ "/static/",
128
+ "/media/",
129
+ "/health/",
130
+ ]
119
131
  ```
120
132
 
121
- > **Note:** You no longer need to define `AIWAF_MALICIOUS_KEYWORDS` or `AIWAF_STATUS_CODES` in your settings — they’re built in and evolve dynamically.
133
+ > **Note:** You no longer need to define `AIWAF_MALICIOUS_KEYWORDS` or `AIWAF_STATUS_CODES` — they evolve dynamically.
122
134
 
123
135
  ---
124
136
 
125
- ## Middleware Setup
137
+ ## 🧱 Middleware Setup
126
138
 
127
139
  Add in **this** order to your `MIDDLEWARE` list:
128
140
 
@@ -139,7 +151,7 @@ MIDDLEWARE = [
139
151
 
140
152
  ---
141
153
 
142
- ## Honeypot Field (in your template)
154
+ ## 🕵️ Honeypot Field (in your template)
143
155
 
144
156
  ```django
145
157
  {% load aiwaf_tags %}
@@ -156,22 +168,23 @@ MIDDLEWARE = [
156
168
 
157
169
  ---
158
170
 
159
- ## Running Detection & Training
171
+ ## 🔁 Running Detection & Training
160
172
 
161
173
  ```bash
162
174
  python manage.py detect_and_train
163
175
  ```
164
176
 
165
- **What happens:**
166
- 1. Read access logs
177
+ ### What happens:
178
+ 1. Read access logs (incl. rotated or gzipped)
167
179
  2. Auto‑block IPs with ≥ 6 total 404s
168
180
  3. Extract features & train IsolationForest
169
181
  4. Save `model.pkl`
170
182
  5. Extract top 10 dynamic keywords from 4xx/5xx
183
+ 6. Remove any keywords associated with newly exempt paths
171
184
 
172
185
  ---
173
186
 
174
- ## How It Works
187
+ ## 🧠 How It Works
175
188
 
176
189
  | Middleware | Purpose |
177
190
  |------------------------------------|-----------------------------------------------------------------|
@@ -180,15 +193,16 @@ python manage.py detect_and_train
180
193
  | AIAnomalyMiddleware | ML‑driven behavior analysis + block on anomaly |
181
194
  | HoneypotMiddleware | Detects bots filling hidden inputs in forms |
182
195
  | UUIDTamperMiddleware | Blocks guessed/nonexistent UUIDs across all models in an app |
196
+
183
197
  ---
184
198
 
185
- ## License
199
+ ## 📄 License
186
200
 
187
201
  This project is licensed under the **MIT License**. See the [LICENSE](LICENSE) file for details.
188
202
 
189
203
  ---
190
204
 
191
- ## Credits
205
+ ## 👤 Credits
192
206
 
193
207
  **AI‑WAF** by [Aayush Gauba](https://github.com/aayushgauba)
194
208
  > “Let your firewall learn and evolve — keep your site a fortress.”
@@ -1,12 +1,12 @@
1
1
 
2
- # AI‑WAF
2
+ # AI‑WAF
3
3
 
4
4
  > A self‑learning, Django‑friendly Web Application Firewall
5
- > with rate‑limiting, anomaly detection, honeypots, UUID‑tamper protection, dynamic keyword extraction, file‑extension probing detection, and daily retraining.
5
+ > with rate‑limiting, anomaly detection, honeypots, UUID‑tamper protection, dynamic keyword extraction, file‑extension probing detection, exempt path awareness, and daily retraining.
6
6
 
7
7
  ---
8
8
 
9
- ## Package Structure
9
+ ## 📁 Package Structure
10
10
 
11
11
  ```
12
12
  aiwaf/
@@ -28,7 +28,7 @@ aiwaf/
28
28
 
29
29
  ---
30
30
 
31
- ## Features
31
+ ## 🚀 Features
32
32
 
33
33
  - **IP Blocklist**
34
34
  Instantly blocks suspicious IPs (supports CSV fallback or Django model).
@@ -37,7 +37,7 @@ aiwaf/
37
37
  Sliding‑window blocks flooders (> `AIWAF_RATE_MAX` per `AIWAF_RATE_WINDOW`), then blacklists them.
38
38
 
39
39
  - **AI Anomaly Detection**
40
- IsolationForest on features:
40
+ IsolationForest trained on:
41
41
  - Path length
42
42
  - Keyword hits (static + dynamic)
43
43
  - Response time
@@ -45,34 +45,28 @@ aiwaf/
45
45
  - Burst count
46
46
  - Total 404s
47
47
 
48
- - **Dynamic Keyword Extraction**
49
- Every retrain: top 10 most frequent “words” from 4xx/5xx paths are appended to your malicious keyword set.
48
+ - **Dynamic Keyword Extraction & Cleanup**
49
+ - Every retrain adds top 10 keyword segments from 4xx/5xx paths
50
+ - **If a path is added to `AIWAF_EXEMPT_PATHS`, its keywords are automatically removed from the database**
50
51
 
51
52
  - **File‑Extension Probing Detection**
52
- Tracks repeated 404s on common web‑extensions (e.g. `.php`, `.asp`) and auto‑blocks after a burst.
53
+ Tracks repeated 404s on common extensions (e.g. `.php`, `.asp`) and blocks IPs.
53
54
 
54
55
  - **Honeypot Field**
55
- Hidden form field (via template tag) that bots fill instant block.
56
+ Hidden field for bot detection IP blacklisted on fill.
56
57
 
57
58
  - **UUID Tampering Protection**
58
- Any `<uuid:…>` URL that doesn’t map to **any** model in its Django app gets blocked.
59
+ Blocks guessed or invalid UUIDs that don’t resolve to real models.
59
60
 
60
- - **Daily Retraining**
61
- Reads rotated/gzipped logs, auto‑blocks 404 floods (≥6), retrains the model, updates `model.pkl` + `dynamic_keywords.json`.
62
-
63
- ---
64
-
65
- ## Installation
66
-
67
- ```bash
68
- # From PyPI
69
- pip install aiwaf
61
+ - **Exempt Path Awareness**
62
+ Fully respects `AIWAF_EXEMPT_PATHS` across all modules exempt paths are:
63
+ - Skipped from keyword learning
64
+ - Immune to AI blocking
65
+ - Ignored in log training
66
+ - Cleaned from `DynamicKeyword` model automatically
70
67
 
71
- # Or for local development
72
- git clone https://github.com/aayushgauba/aiwaf.git
73
- cd aiwaf
74
- pip install -e .
75
- ```
68
+ - **Daily Retraining**
69
+ Reads rotated logs, auto‑blocks 404 floods, retrains the IsolationForest, updates `model.pkl`, and evolves the keyword DB.
76
70
 
77
71
  ---
78
72
 
@@ -80,33 +74,51 @@ pip install -e .
80
74
 
81
75
  ```python
82
76
  INSTALLED_APPS += ["aiwaf"]
77
+ ```
83
78
 
84
79
  ### Database Setup
85
80
 
86
- After adding `aiwaf` to your `INSTALLED_APPS`, create the necessary tables for the IP‐blacklist and dynamic‐keyword models:
81
+ After adding `aiwaf` to your `INSTALLED_APPS`, run the following to create the necessary tables:
87
82
 
88
83
  ```bash
89
84
  python manage.py makemigrations aiwaf
90
85
  python manage.py migrate
86
+ ```
91
87
 
92
- # Required
88
+ ---
89
+
90
+ ### Required
91
+
92
+ ```python
93
93
  AIWAF_ACCESS_LOG = "/var/log/nginx/access.log"
94
+ ```
95
+
96
+ ---
97
+
98
+ ### Optional (defaults shown)
94
99
 
95
- # Optional (defaults shown)
100
+ ```python
96
101
  AIWAF_MODEL_PATH = BASE_DIR / "aiwaf" / "resources" / "model.pkl"
97
102
  AIWAF_HONEYPOT_FIELD = "hp_field"
98
103
  AIWAF_RATE_WINDOW = 10 # seconds
99
- AIWAF_RATE_MAX = 20 # max reqs/window
104
+ AIWAF_RATE_MAX = 20 # max requests per window
100
105
  AIWAF_RATE_FLOOD = 10 # flood threshold
101
- AIWAF_WINDOW_SECONDS = 60 # anomaly window
102
- AIWAF_FILE_EXTENSIONS = [".php", ".asp", ".jsp"] # 404‑burst tracked extensions
106
+ AIWAF_WINDOW_SECONDS = 60 # anomaly detection window
107
+ AIWAF_FILE_EXTENSIONS = [".php", ".asp", ".jsp"]
108
+ AIWAF_EXEMPT_PATHS = [ # optional but highly recommended
109
+ "/favicon.ico",
110
+ "/robots.txt",
111
+ "/static/",
112
+ "/media/",
113
+ "/health/",
114
+ ]
103
115
  ```
104
116
 
105
- > **Note:** You no longer need to define `AIWAF_MALICIOUS_KEYWORDS` or `AIWAF_STATUS_CODES` in your settings — they’re built in and evolve dynamically.
117
+ > **Note:** You no longer need to define `AIWAF_MALICIOUS_KEYWORDS` or `AIWAF_STATUS_CODES` — they evolve dynamically.
106
118
 
107
119
  ---
108
120
 
109
- ## Middleware Setup
121
+ ## 🧱 Middleware Setup
110
122
 
111
123
  Add in **this** order to your `MIDDLEWARE` list:
112
124
 
@@ -123,7 +135,7 @@ MIDDLEWARE = [
123
135
 
124
136
  ---
125
137
 
126
- ## Honeypot Field (in your template)
138
+ ## 🕵️ Honeypot Field (in your template)
127
139
 
128
140
  ```django
129
141
  {% load aiwaf_tags %}
@@ -140,22 +152,23 @@ MIDDLEWARE = [
140
152
 
141
153
  ---
142
154
 
143
- ## Running Detection & Training
155
+ ## 🔁 Running Detection & Training
144
156
 
145
157
  ```bash
146
158
  python manage.py detect_and_train
147
159
  ```
148
160
 
149
- **What happens:**
150
- 1. Read access logs
161
+ ### What happens:
162
+ 1. Read access logs (incl. rotated or gzipped)
151
163
  2. Auto‑block IPs with ≥ 6 total 404s
152
164
  3. Extract features & train IsolationForest
153
165
  4. Save `model.pkl`
154
166
  5. Extract top 10 dynamic keywords from 4xx/5xx
167
+ 6. Remove any keywords associated with newly exempt paths
155
168
 
156
169
  ---
157
170
 
158
- ## How It Works
171
+ ## 🧠 How It Works
159
172
 
160
173
  | Middleware | Purpose |
161
174
  |------------------------------------|-----------------------------------------------------------------|
@@ -164,15 +177,16 @@ python manage.py detect_and_train
164
177
  | AIAnomalyMiddleware | ML‑driven behavior analysis + block on anomaly |
165
178
  | HoneypotMiddleware | Detects bots filling hidden inputs in forms |
166
179
  | UUIDTamperMiddleware | Blocks guessed/nonexistent UUIDs across all models in an app |
180
+
167
181
  ---
168
182
 
169
- ## License
183
+ ## 📄 License
170
184
 
171
185
  This project is licensed under the **MIT License**. See the [LICENSE](LICENSE) file for details.
172
186
 
173
187
  ---
174
188
 
175
- ## Credits
189
+ ## 👤 Credits
176
190
 
177
191
  **AI‑WAF** by [Aayush Gauba](https://github.com/aayushgauba)
178
192
  > “Let your firewall learn and evolve — keep your site a fortress.”
@@ -18,6 +18,14 @@ from django.urls import get_resolver
18
18
  from .blacklist_manager import BlacklistManager
19
19
  from .models import DynamicKeyword
20
20
 
21
+ def is_exempt_path(path):
22
+ path = path.lower()
23
+ exempt_paths = getattr(settings, "AIWAF_EXEMPT_PATHS", [])
24
+ for exempt in exempt_paths:
25
+ if path == exempt or path.startswith(exempt.rstrip("/") + "/"):
26
+ return True
27
+ return False
28
+
21
29
  MODEL_PATH = getattr(
22
30
  settings,
23
31
  "AIWAF_MODEL_PATH",
@@ -64,8 +72,11 @@ class IPAndKeywordBlockMiddleware:
64
72
  return prefixes
65
73
 
66
74
  def __call__(self, request):
75
+ raw_path = request.path.lower()
76
+ if is_exempt_path(raw_path):
77
+ return self.get_response(request)
67
78
  ip = get_ip(request)
68
- path = request.path.lower().lstrip("/")
79
+ path = raw_path.lstrip("/")
69
80
  if BlacklistManager.is_blocked(ip):
70
81
  return JsonResponse({"error": "blocked"}, status=403)
71
82
  segments = [seg for seg in re.split(r"\W+", path) if len(seg) > 3]
@@ -90,27 +101,29 @@ class IPAndKeywordBlockMiddleware:
90
101
 
91
102
 
92
103
  class RateLimitMiddleware:
93
- WINDOW = 10
94
- MAX = 20
95
- FLOOD = 10
104
+ WINDOW = 10 # seconds
105
+ MAX = 20 # soft limit
106
+ FLOOD = 40 # hard limit
96
107
 
97
108
  def __init__(self, get_response):
98
109
  self.get_response = get_response
99
- self.logs = defaultdict(list)
100
110
 
101
111
  def __call__(self, request):
102
- ip = get_ip(request)
103
- now = time.time()
104
- recs = [t for t in self.logs[ip] if now - t < self.WINDOW]
105
- recs.append(now)
106
- self.logs[ip] = recs
112
+ if is_exempt_path(request.path):
113
+ return self.get_response(request)
107
114
 
108
- if len(recs) > self.MAX:
109
- return JsonResponse({"error": "too_many_requests"}, status=429)
110
- if len(recs) > self.FLOOD:
115
+ ip = get_ip(request)
116
+ key = f"ratelimit:{ip}"
117
+ now = time.time()
118
+ timestamps = cache.get(key, [])
119
+ timestamps = [t for t in timestamps if now - t < self.WINDOW]
120
+ timestamps.append(now)
121
+ cache.set(key, timestamps, timeout=self.WINDOW)
122
+ if len(timestamps) > self.FLOOD:
111
123
  BlacklistManager.block(ip, "Flood pattern")
112
124
  return JsonResponse({"error": "blocked"}, status=403)
113
-
125
+ if len(timestamps) > self.MAX:
126
+ return JsonResponse({"error": "too_many_requests"}, status=429)
114
127
  return self.get_response(request)
115
128
 
116
129
 
@@ -119,6 +132,8 @@ class AIAnomalyMiddleware(MiddlewareMixin):
119
132
  TOP_N = getattr(settings, "AIWAF_DYNAMIC_TOP_N", 10)
120
133
 
121
134
  def process_request(self, request):
135
+ if is_exempt_path(request.path):
136
+ return None
122
137
  ip = get_ip(request)
123
138
  if BlacklistManager.is_blocked(ip):
124
139
  return JsonResponse({"error": "blocked"}, status=403)
@@ -160,6 +175,8 @@ class AIAnomalyMiddleware(MiddlewareMixin):
160
175
 
161
176
  class HoneypotMiddleware(MiddlewareMixin):
162
177
  def process_view(self, request, view_func, view_args, view_kwargs):
178
+ if is_exempt_path(request.path):
179
+ return None
163
180
  trap = request.POST.get(getattr(settings, "AIWAF_HONEYPOT_FIELD", "hp_field"), "")
164
181
  if trap:
165
182
  ip = get_ip(request)
@@ -170,6 +187,8 @@ class HoneypotMiddleware(MiddlewareMixin):
170
187
 
171
188
  class UUIDTamperMiddleware(MiddlewareMixin):
172
189
  def process_view(self, request, view_func, view_args, view_kwargs):
190
+ if is_exempt_path(request.path):
191
+ return None
173
192
  uid = view_kwargs.get("uuid")
174
193
  if not uid:
175
194
  return None
@@ -14,7 +14,6 @@ from django.apps import apps
14
14
  from django.db.models import F
15
15
  from django.urls import get_resolver
16
16
 
17
- # ─── CONFIG ────────────────────────────────────────────────────────────────
18
17
 
19
18
  LOG_PATH = settings.AIWAF_ACCESS_LOG
20
19
  MODEL_PATH = os.path.join(os.path.dirname(__file__), "resources", "model.pkl")
@@ -30,7 +29,13 @@ _LOG_RX = re.compile(
30
29
  BlacklistEntry = apps.get_model("aiwaf", "BlacklistEntry")
31
30
  DynamicKeyword = apps.get_model("aiwaf", "DynamicKeyword")
32
31
 
33
-
32
+ def is_exempt_path(path):
33
+ path = path.lower()
34
+ exempt_paths = getattr(settings, "AIWAF_EXEMPT_PATHS", [])
35
+ for exempt in exempt_paths:
36
+ if path == exempt or path.startswith(exempt.rstrip("/") + "/"):
37
+ return True
38
+ return False
34
39
 
35
40
  def path_exists_in_django(path):
36
41
  from django.urls import get_resolver
@@ -51,6 +56,18 @@ def path_exists_in_django(path):
51
56
  return True
52
57
  return False
53
58
 
59
+ def remove_exempt_keywords():
60
+ exempt_paths = getattr(settings, "AIWAF_EXEMPT_PATHS", [])
61
+ exempt_tokens = set()
62
+
63
+ for path in exempt_paths:
64
+ path = path.strip("/").lower()
65
+ segments = re.split(r"\W+", path)
66
+ exempt_tokens.update(seg for seg in segments if len(seg) > 3)
67
+
68
+ if exempt_tokens:
69
+ deleted_count, _ = DynamicKeyword.objects.filter(keyword__in=exempt_tokens).delete()
70
+ print(f"Removed {deleted_count} dynamic keywords that are now exempt: {list(exempt_tokens)}")
54
71
 
55
72
  def _read_all_logs():
56
73
  lines = []
@@ -88,6 +105,7 @@ def _parse(line):
88
105
 
89
106
 
90
107
  def train():
108
+ remove_exempt_keywords()
91
109
  raw_lines = _read_all_logs()
92
110
  if not raw_lines:
93
111
  print("No log lines found – check AIWAF_ACCESS_LOG setting.")
@@ -125,7 +143,7 @@ def train():
125
143
  total404 = ip_404[ip]
126
144
  is_known_path = path_exists_in_django(r["path"])
127
145
  kw_hits = 0
128
- if not is_known_path:
146
+ if not is_known_path and not is_exempt_path(r["path"]):
129
147
  kw_hits = sum(k in r["path"].lower() for k in STATIC_KW)
130
148
  status_idx = STATUS_IDX.index(r["status"]) if r["status"] in STATUS_IDX else -1
131
149
  feature_dicts.append({
@@ -178,4 +196,4 @@ def train():
178
196
 
179
197
 
180
198
  if __name__ == "__main__":
181
- train()
199
+ train()
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: aiwaf
3
- Version: 0.1.7
3
+ Version: 0.1.7.2
4
4
  Summary: AI-powered Web Application Firewall
5
5
  Home-page: https://github.com/aayushgauba/aiwaf
6
6
  Author: Aayush Gauba
@@ -15,14 +15,14 @@ Dynamic: license-file
15
15
  Dynamic: requires-python
16
16
 
17
17
 
18
- # AI‑WAF
18
+ # AI‑WAF
19
19
 
20
20
  > A self‑learning, Django‑friendly Web Application Firewall
21
- > with rate‑limiting, anomaly detection, honeypots, UUID‑tamper protection, dynamic keyword extraction, file‑extension probing detection, and daily retraining.
21
+ > with rate‑limiting, anomaly detection, honeypots, UUID‑tamper protection, dynamic keyword extraction, file‑extension probing detection, exempt path awareness, and daily retraining.
22
22
 
23
23
  ---
24
24
 
25
- ## Package Structure
25
+ ## 📁 Package Structure
26
26
 
27
27
  ```
28
28
  aiwaf/
@@ -44,7 +44,7 @@ aiwaf/
44
44
 
45
45
  ---
46
46
 
47
- ## Features
47
+ ## 🚀 Features
48
48
 
49
49
  - **IP Blocklist**
50
50
  Instantly blocks suspicious IPs (supports CSV fallback or Django model).
@@ -53,7 +53,7 @@ aiwaf/
53
53
  Sliding‑window blocks flooders (> `AIWAF_RATE_MAX` per `AIWAF_RATE_WINDOW`), then blacklists them.
54
54
 
55
55
  - **AI Anomaly Detection**
56
- IsolationForest on features:
56
+ IsolationForest trained on:
57
57
  - Path length
58
58
  - Keyword hits (static + dynamic)
59
59
  - Response time
@@ -61,34 +61,28 @@ aiwaf/
61
61
  - Burst count
62
62
  - Total 404s
63
63
 
64
- - **Dynamic Keyword Extraction**
65
- Every retrain: top 10 most frequent “words” from 4xx/5xx paths are appended to your malicious keyword set.
64
+ - **Dynamic Keyword Extraction & Cleanup**
65
+ - Every retrain adds top 10 keyword segments from 4xx/5xx paths
66
+ - **If a path is added to `AIWAF_EXEMPT_PATHS`, its keywords are automatically removed from the database**
66
67
 
67
68
  - **File‑Extension Probing Detection**
68
- Tracks repeated 404s on common web‑extensions (e.g. `.php`, `.asp`) and auto‑blocks after a burst.
69
+ Tracks repeated 404s on common extensions (e.g. `.php`, `.asp`) and blocks IPs.
69
70
 
70
71
  - **Honeypot Field**
71
- Hidden form field (via template tag) that bots fill instant block.
72
+ Hidden field for bot detection IP blacklisted on fill.
72
73
 
73
74
  - **UUID Tampering Protection**
74
- Any `<uuid:…>` URL that doesn’t map to **any** model in its Django app gets blocked.
75
+ Blocks guessed or invalid UUIDs that don’t resolve to real models.
75
76
 
76
- - **Daily Retraining**
77
- Reads rotated/gzipped logs, auto‑blocks 404 floods (≥6), retrains the model, updates `model.pkl` + `dynamic_keywords.json`.
78
-
79
- ---
80
-
81
- ## Installation
82
-
83
- ```bash
84
- # From PyPI
85
- pip install aiwaf
77
+ - **Exempt Path Awareness**
78
+ Fully respects `AIWAF_EXEMPT_PATHS` across all modules exempt paths are:
79
+ - Skipped from keyword learning
80
+ - Immune to AI blocking
81
+ - Ignored in log training
82
+ - Cleaned from `DynamicKeyword` model automatically
86
83
 
87
- # Or for local development
88
- git clone https://github.com/aayushgauba/aiwaf.git
89
- cd aiwaf
90
- pip install -e .
91
- ```
84
+ - **Daily Retraining**
85
+ Reads rotated logs, auto‑blocks 404 floods, retrains the IsolationForest, updates `model.pkl`, and evolves the keyword DB.
92
86
 
93
87
  ---
94
88
 
@@ -96,33 +90,51 @@ pip install -e .
96
90
 
97
91
  ```python
98
92
  INSTALLED_APPS += ["aiwaf"]
93
+ ```
99
94
 
100
95
  ### Database Setup
101
96
 
102
- After adding `aiwaf` to your `INSTALLED_APPS`, create the necessary tables for the IP‐blacklist and dynamic‐keyword models:
97
+ After adding `aiwaf` to your `INSTALLED_APPS`, run the following to create the necessary tables:
103
98
 
104
99
  ```bash
105
100
  python manage.py makemigrations aiwaf
106
101
  python manage.py migrate
102
+ ```
107
103
 
108
- # Required
104
+ ---
105
+
106
+ ### Required
107
+
108
+ ```python
109
109
  AIWAF_ACCESS_LOG = "/var/log/nginx/access.log"
110
+ ```
111
+
112
+ ---
113
+
114
+ ### Optional (defaults shown)
110
115
 
111
- # Optional (defaults shown)
116
+ ```python
112
117
  AIWAF_MODEL_PATH = BASE_DIR / "aiwaf" / "resources" / "model.pkl"
113
118
  AIWAF_HONEYPOT_FIELD = "hp_field"
114
119
  AIWAF_RATE_WINDOW = 10 # seconds
115
- AIWAF_RATE_MAX = 20 # max reqs/window
120
+ AIWAF_RATE_MAX = 20 # max requests per window
116
121
  AIWAF_RATE_FLOOD = 10 # flood threshold
117
- AIWAF_WINDOW_SECONDS = 60 # anomaly window
118
- AIWAF_FILE_EXTENSIONS = [".php", ".asp", ".jsp"] # 404‑burst tracked extensions
122
+ AIWAF_WINDOW_SECONDS = 60 # anomaly detection window
123
+ AIWAF_FILE_EXTENSIONS = [".php", ".asp", ".jsp"]
124
+ AIWAF_EXEMPT_PATHS = [ # optional but highly recommended
125
+ "/favicon.ico",
126
+ "/robots.txt",
127
+ "/static/",
128
+ "/media/",
129
+ "/health/",
130
+ ]
119
131
  ```
120
132
 
121
- > **Note:** You no longer need to define `AIWAF_MALICIOUS_KEYWORDS` or `AIWAF_STATUS_CODES` in your settings — they’re built in and evolve dynamically.
133
+ > **Note:** You no longer need to define `AIWAF_MALICIOUS_KEYWORDS` or `AIWAF_STATUS_CODES` — they evolve dynamically.
122
134
 
123
135
  ---
124
136
 
125
- ## Middleware Setup
137
+ ## 🧱 Middleware Setup
126
138
 
127
139
  Add in **this** order to your `MIDDLEWARE` list:
128
140
 
@@ -139,7 +151,7 @@ MIDDLEWARE = [
139
151
 
140
152
  ---
141
153
 
142
- ## Honeypot Field (in your template)
154
+ ## 🕵️ Honeypot Field (in your template)
143
155
 
144
156
  ```django
145
157
  {% load aiwaf_tags %}
@@ -156,22 +168,23 @@ MIDDLEWARE = [
156
168
 
157
169
  ---
158
170
 
159
- ## Running Detection & Training
171
+ ## 🔁 Running Detection & Training
160
172
 
161
173
  ```bash
162
174
  python manage.py detect_and_train
163
175
  ```
164
176
 
165
- **What happens:**
166
- 1. Read access logs
177
+ ### What happens:
178
+ 1. Read access logs (incl. rotated or gzipped)
167
179
  2. Auto‑block IPs with ≥ 6 total 404s
168
180
  3. Extract features & train IsolationForest
169
181
  4. Save `model.pkl`
170
182
  5. Extract top 10 dynamic keywords from 4xx/5xx
183
+ 6. Remove any keywords associated with newly exempt paths
171
184
 
172
185
  ---
173
186
 
174
- ## How It Works
187
+ ## 🧠 How It Works
175
188
 
176
189
  | Middleware | Purpose |
177
190
  |------------------------------------|-----------------------------------------------------------------|
@@ -180,15 +193,16 @@ python manage.py detect_and_train
180
193
  | AIAnomalyMiddleware | ML‑driven behavior analysis + block on anomaly |
181
194
  | HoneypotMiddleware | Detects bots filling hidden inputs in forms |
182
195
  | UUIDTamperMiddleware | Blocks guessed/nonexistent UUIDs across all models in an app |
196
+
183
197
  ---
184
198
 
185
- ## License
199
+ ## 📄 License
186
200
 
187
201
  This project is licensed under the **MIT License**. See the [LICENSE](LICENSE) file for details.
188
202
 
189
203
  ---
190
204
 
191
- ## Credits
205
+ ## 👤 Credits
192
206
 
193
207
  **AI‑WAF** by [Aayush Gauba](https://github.com/aayushgauba)
194
208
  > “Let your firewall learn and evolve — keep your site a fortress.”
@@ -1,6 +1,6 @@
1
1
  [project]
2
2
  name = "aiwaf"
3
- version = "0.1.7"
3
+ version = "0.1.7.2"
4
4
  description = "AI-powered Web Application Firewall"
5
5
  readme = "README.md"
6
6
  requires-python = ">=3.8"
@@ -9,7 +9,7 @@ long_description = (HERE / "README.md").read_text(encoding="utf-8")
9
9
 
10
10
  setup(
11
11
  name="aiwaf",
12
- version="0.1.7",
12
+ version="0.1.7.2",
13
13
  description="AI‑driven, self‑learning Web Application Firewall for Django",
14
14
  long_description=long_description,
15
15
  long_description_content_type="text/markdown",
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes