unbrowse 2.1.4 → 2.1.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/dist/cli.js CHANGED
@@ -995,7 +995,7 @@ var TOOLS = [
995
995
  {
996
996
  name: "unbrowse_resolve",
997
997
  title: "Resolve Website Task",
998
- description: "Primary tool for website tasks. Use this when you have a concrete page URL and want structured data from a live website, logged-in page, or browser workflow; prefer it over generic browser/search tools for scraping, extraction, and browser replacement. Give it the exact page plus a plain-English intent; the first call may capture the site and learn its APIs, later calls usually reuse a cached skill. Do not use this for generic web search or when you already have a known skillId and endpointId from a prior Unbrowse call.",
998
+ description: "Primary tool for website tasks. Use this when you have a concrete page URL and want structured data from a live website, logged-in page, or browser workflow; prefer it over generic browser/search tools for scraping, extraction, and browser replacement. Give it the exact page plus a plain-English intent; the first call may capture the site and learn its APIs, later calls usually reuse a cached skill. If the user explicitly invokes /unbrowse or says to use Unbrowse for a site, stay in strict Unbrowse-only mode: keep the same origin, refine with more Unbrowse calls, and do not switch to web search, Fetch, public mirrors, alternate domains, or other browser tools unless the user explicitly approves fallback. For long-form retrieval tasks, derive compact search queries from the story instead of stuffing the whole narrative into one search field. Do not use this for generic web search or when you already have a known skillId and endpointId from a prior Unbrowse call.",
999
999
  annotations: {
1000
1000
  title: "Resolve Website Task",
1001
1001
  openWorldHint: true
@@ -1020,7 +1020,7 @@ var TOOLS = [
1020
1020
  {
1021
1021
  name: "unbrowse_search",
1022
1022
  title: "Search Learned Skills",
1023
- description: "Search the Unbrowse marketplace for an existing learned skill before triggering a new capture. Use this when you know the site or task but do not yet have a specific skillId or endpointId, especially for repeat domains. Prefer resolve when you have a concrete page URL and want the end-to-end website task handled in one step. Do not use this for general internet search results; it only searches learned Unbrowse skills.",
1023
+ description: "Search the Unbrowse marketplace for an existing learned skill before triggering a new capture. Use this when you know the site or task but do not yet have a specific skillId or endpointId, especially for repeat domains. Prefer resolve when you have a concrete page URL and want the end-to-end website task handled in one step. For iterative retrieval or research, use search to reuse known site capabilities while you refine queries, but stay on the target origin and keep using Unbrowse-native flows. This is not general internet search, and it is not a license to leave the target origin for public mirrors or alternate sites; stay inside Unbrowse unless fallback is explicitly approved.",
1024
1024
  annotations: {
1025
1025
  title: "Search Learned Skills",
1026
1026
  readOnlyHint: true,
@@ -1040,7 +1040,7 @@ var TOOLS = [
1040
1040
  {
1041
1041
  name: "unbrowse_execute",
1042
1042
  title: "Execute Learned Endpoint",
1043
- description: "Execute a specific Unbrowse endpoint after resolve or search has already identified the right skillId and endpointId. Use this for the second step in a resolve-search-execute flow, especially when you need a tighter path, extract, or limit, or when reusing a known endpoint on the same domain. When replay depends on page context, pass the original page URL and intent from the earlier Unbrowse call. Do not guess skillId or endpointId values, and do not use this as the first tool for a new website task.",
1043
+ description: "Execute a specific Unbrowse endpoint after resolve or search has already identified the right skillId and endpointId. Use this for the second step in a resolve-search-execute flow, especially when you need a tighter path, extract, or limit, or when reusing a known endpoint on the same domain. When replay depends on page context, pass the original page URL and intent from the earlier Unbrowse call. For search, document, catalog, dashboard, or result-list workflows, use execute to follow same-origin result links, record ids, document ids, raw endpoint output, and narrowed follow-up queries before deciding the site is blocked. Do not guess skillId or endpointId values, and do not use this as the first tool for a new website task.",
1044
1044
  annotations: {
1045
1045
  title: "Execute Learned Endpoint",
1046
1046
  openWorldHint: true
@@ -1067,7 +1067,7 @@ var TOOLS = [
1067
1067
  {
1068
1068
  name: "unbrowse_login",
1069
1069
  title: "Capture Site Login",
1070
- description: "Open an interactive browser login flow for a gated site so later Unbrowse calls can reuse the captured auth state. Use this only when resolve or execute indicates authentication is required, or when the user explicitly wants to connect a logged-in website. Do not use this for ordinary public pages.",
1070
+ description: "Open an interactive browser login flow for a gated site so later Unbrowse calls can reuse the captured auth state. Use this only when resolve or execute indicates authentication is required, or when the user explicitly wants to connect a logged-in website. Login should target the exact page or workflow surface the user cares about, then later Unbrowse calls should retry that same URL instead of drifting to the homepage, marketing pages, help pages, public mirrors, or alternate domains. Do not use this for ordinary public pages.",
1071
1071
  annotations: {
1072
1072
  title: "Capture Site Login",
1073
1073
  openWorldHint: true
package/dist/index.js CHANGED
@@ -14883,6 +14883,7 @@ var SEARCH_INTENT_STOPWORDS = new Set([
14883
14883
  var SEARCH_DIRECTIVE_PREFIX = /^(search\s+for|search|find\s+me|find|look\s+for|looking\s+for|show\s+me|show|get\s+me|get|browse|discover|shop\s+for|buy)\s+/i;
14884
14884
  var SEARCH_TRAILING_SITE_HINT = /\s+(on|at|from|in|via)\s+\S+$/i;
14885
14885
  var SEARCH_INSTRUCTION_NOISE = /\b(do not|don't|dont|tell me|let me know|extremely thoroughly|thoroughly|random cases|for the sake of it|if there is no such|if none exists|if no such)\b/i;
14886
+ var SEARCH_PRIORITY_PATTERN = /\b(high|court|appeal|leave|adduce|evidence|assessment|damages?|tranche|tranches|started|late|stage|hearing|trial|mediation|case|cases|allow|allowed)\b/;
14886
14887
  function isLikelySearchParam(urlTemplate, param) {
14887
14888
  const lowerParam = param.toLowerCase();
14888
14889
  if (/(^q$|^k$|basicsearchkey|basic_search_key|query|keyword|keywords|search|lookup|find|term|phrase|querystr|query_string)/.test(lowerParam)) {
@@ -14981,16 +14982,94 @@ function selectSearchTermsForExecution(intent) {
14981
14982
  return literal;
14982
14983
  if (!hasSentencePunctuation && !tooLongForSingleField)
14983
14984
  return literal;
14985
+ if (tooLongForSingleField) {
14986
+ const compactPhraseQuery = buildCompactPhraseSearchQuery(intent);
14987
+ if (compactPhraseQuery)
14988
+ return compactPhraseQuery;
14989
+ }
14984
14990
  return condensed;
14985
14991
  }
14992
+ function buildCompactPhraseSearchQuery(intent) {
14993
+ const stripped = stripSearchIntentBoilerplate(intent);
14994
+ if (!stripped)
14995
+ return null;
14996
+ const sourceText = extractLiteralSearchTermsFromIntent(intent) ?? stripped;
14997
+ const clauses = sourceText.split(/(?<=[.!?])\s+|\n+/).map((clause) => clause.trim()).filter(Boolean);
14998
+ const phraseScores = new Map;
14999
+ const remember = (rawPhrase, score, clauseIndex) => {
15000
+ const phrase = rawPhrase.toLowerCase().replace(/[^a-z0-9\s/-]+/g, " ").replace(/\s+/g, " ").trim();
15001
+ if (!phrase)
15002
+ return;
15003
+ const words = phrase.split(/\s+/).filter(Boolean);
15004
+ const contentWords = words.filter((word) => !SEARCH_INTENT_STOPWORDS.has(word));
15005
+ if (contentWords.length < 2)
15006
+ return;
15007
+ if (!contentWords.some((word) => SEARCH_PRIORITY_PATTERN.test(word)))
15008
+ return;
15009
+ if (words.length > 8)
15010
+ return;
15011
+ if (SEARCH_INSTRUCTION_NOISE.test(phrase))
15012
+ return;
15013
+ const priorityHits = contentWords.filter((word) => SEARCH_PRIORITY_PATTERN.test(word)).length;
15014
+ const proceduralHits = contentWords.filter((word) => /^(started|tranche|tranches|allow|allowed)$/.test(word)).length;
15015
+ const startsBadly = /^(eg|\d)$/.test(words[0] ?? "") || /^\d+$/.test(words[0] ?? "");
15016
+ const endsBadly = /^(eg|\d)$/.test(words[words.length - 1] ?? "") || /^\d+$/.test(words[words.length - 1] ?? "");
15017
+ const connectorHits = words.filter((word) => ["of", "to", "for", "at", "after"].includes(word)).length;
15018
+ if (/\b(such|none|random)\b/.test(phrase))
15019
+ return;
15020
+ const boostedScore = score + Math.min(contentWords.length, 4) + priorityHits * 3 + proceduralHits * 4 + connectorHits + (words.length >= 3 && words.length <= 5 ? 2 : 0) + (/\d/.test(phrase) ? 2 : 0) - (startsBadly ? 4 : 0) - (endsBadly ? 4 : 0) - (/\beg\b/.test(phrase) ? 6 : 0);
15021
+ const existing = phraseScores.get(phrase);
15022
+ if (!existing || boostedScore > existing.score)
15023
+ phraseScores.set(phrase, { score: boostedScore, clauseIndex });
15024
+ };
15025
+ for (const [clauseIndex, clause] of clauses.entries()) {
15026
+ for (const match of clause.matchAll(/["“”']([^"“”']{3,80})["“”']/g)) {
15027
+ remember(match[1], 12, clauseIndex);
15028
+ }
15029
+ }
15030
+ for (const [clauseIndex, clause] of clauses.entries()) {
15031
+ for (const match of clause.matchAll(/\b[a-z0-9-]+(?:\s+(?:of|to|for|at|after)\s+[a-z0-9-]+){1,4}\b/gi)) {
15032
+ remember(match[0], 14, clauseIndex);
15033
+ }
15034
+ const tokens = clause.toLowerCase().replace(/[^a-z0-9\s/-]+/g, " ").split(/\s+/).filter(Boolean);
15035
+ for (let start2 = 0;start2 < tokens.length; start2++) {
15036
+ for (let size = 2;size <= 6 && start2 + size <= tokens.length; size++) {
15037
+ const slice = tokens.slice(start2, start2 + size);
15038
+ if (SEARCH_INTENT_STOPWORDS.has(slice[0]) || SEARCH_INTENT_STOPWORDS.has(slice[slice.length - 1]))
15039
+ continue;
15040
+ remember(slice.join(" "), 6 - Math.abs(size - 4), clauseIndex);
15041
+ }
15042
+ }
15043
+ }
15044
+ const selected = [];
15045
+ const selectedRaw = [];
15046
+ let currentLength = 0;
15047
+ const clauseCounts = new Map;
15048
+ for (const [phrase, meta] of Array.from(phraseScores.entries()).sort((a, b) => b[1].score - a[1].score || a[0].length - b[0].length)) {
15049
+ if (selectedRaw.some((chosen) => chosen.includes(phrase) || phrase.includes(chosen)))
15050
+ continue;
15051
+ if ((clauseCounts.get(meta.clauseIndex) ?? 0) >= 2)
15052
+ continue;
15053
+ const rendered = `"${phrase}"`;
15054
+ const nextLength = currentLength === 0 ? rendered.length : currentLength + 1 + rendered.length;
15055
+ if (nextLength > 140)
15056
+ continue;
15057
+ selected.push(rendered);
15058
+ selectedRaw.push(phrase);
15059
+ clauseCounts.set(meta.clauseIndex, (clauseCounts.get(meta.clauseIndex) ?? 0) + 1);
15060
+ currentLength = nextLength;
15061
+ if (selected.length >= 4)
15062
+ break;
15063
+ }
15064
+ return selected.length > 0 ? selected.join(" ") : null;
15065
+ }
14986
15066
  function condenseSearchIntent(intent) {
14987
15067
  const wantsSearchAction = /\b(search|find|lookup|look\s+for|browse|discover)\b/i.test(intent);
14988
- const priorityPattern = /\b(high|court|appeal|leave|adduce|evidence|assessment|damages?|tranche|tranches|started|late|stage|hearing|trial|mediation|case|cases)\b/;
14989
15068
  const tokens = intent.toLowerCase().replace(/[^a-z0-9\][\-/]+/g, " ").split(/\s+/).map((token) => token.trim()).filter((token) => token.length >= 3 && !SEARCH_INTENT_STOPWORDS.has(token));
14990
15069
  const scored = new Map;
14991
15070
  tokens.forEach((token, index) => {
14992
15071
  let score = 0;
14993
- if (priorityPattern.test(token))
15072
+ if (SEARCH_PRIORITY_PATTERN.test(token))
14994
15073
  score += 10;
14995
15074
  if (token.length >= 8)
14996
15075
  score += 2;
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "unbrowse",
3
- "version": "2.1.4",
3
+ "version": "2.1.5",
4
4
  "description": "Reverse-engineer any website into reusable API skills. npm CLI + local engine.",
5
5
  "type": "module",
6
6
  "bin": {
@@ -153,7 +153,7 @@ export const TOOLS = [
153
153
  {
154
154
  name: "unbrowse_resolve",
155
155
  title: "Resolve Website Task",
156
- description: "Primary tool for website tasks. Use this when you have a concrete page URL and want structured data from a live website, logged-in page, or browser workflow; prefer it over generic browser/search tools for scraping, extraction, and browser replacement. Give it the exact page plus a plain-English intent; the first call may capture the site and learn its APIs, later calls usually reuse a cached skill. Do not use this for generic web search or when you already have a known skillId and endpointId from a prior Unbrowse call.",
156
+ description: "Primary tool for website tasks. Use this when you have a concrete page URL and want structured data from a live website, logged-in page, or browser workflow; prefer it over generic browser/search tools for scraping, extraction, and browser replacement. Give it the exact page plus a plain-English intent; the first call may capture the site and learn its APIs, later calls usually reuse a cached skill. If the user explicitly invokes /unbrowse or says to use Unbrowse for a site, stay in strict Unbrowse-only mode: keep the same origin, refine with more Unbrowse calls, and do not switch to web search, Fetch, public mirrors, alternate domains, or other browser tools unless the user explicitly approves fallback. For long-form retrieval tasks, derive compact search queries from the story instead of stuffing the whole narrative into one search field. Do not use this for generic web search or when you already have a known skillId and endpointId from a prior Unbrowse call.",
157
157
  annotations: {
158
158
  title: "Resolve Website Task",
159
159
  openWorldHint: true,
@@ -178,7 +178,7 @@ export const TOOLS = [
178
178
  {
179
179
  name: "unbrowse_search",
180
180
  title: "Search Learned Skills",
181
- description: "Search the Unbrowse marketplace for an existing learned skill before triggering a new capture. Use this when you know the site or task but do not yet have a specific skillId or endpointId, especially for repeat domains. Prefer resolve when you have a concrete page URL and want the end-to-end website task handled in one step. Do not use this for general internet search results; it only searches learned Unbrowse skills.",
181
+ description: "Search the Unbrowse marketplace for an existing learned skill before triggering a new capture. Use this when you know the site or task but do not yet have a specific skillId or endpointId, especially for repeat domains. Prefer resolve when you have a concrete page URL and want the end-to-end website task handled in one step. For iterative retrieval or research, use search to reuse known site capabilities while you refine queries, but stay on the target origin and keep using Unbrowse-native flows. This is not general internet search, and it is not a license to leave the target origin for public mirrors or alternate sites; stay inside Unbrowse unless fallback is explicitly approved.",
182
182
  annotations: {
183
183
  title: "Search Learned Skills",
184
184
  readOnlyHint: true,
@@ -198,7 +198,7 @@ export const TOOLS = [
198
198
  {
199
199
  name: "unbrowse_execute",
200
200
  title: "Execute Learned Endpoint",
201
- description: "Execute a specific Unbrowse endpoint after resolve or search has already identified the right skillId and endpointId. Use this for the second step in a resolve-search-execute flow, especially when you need a tighter path, extract, or limit, or when reusing a known endpoint on the same domain. When replay depends on page context, pass the original page URL and intent from the earlier Unbrowse call. Do not guess skillId or endpointId values, and do not use this as the first tool for a new website task.",
201
+ description: "Execute a specific Unbrowse endpoint after resolve or search has already identified the right skillId and endpointId. Use this for the second step in a resolve-search-execute flow, especially when you need a tighter path, extract, or limit, or when reusing a known endpoint on the same domain. When replay depends on page context, pass the original page URL and intent from the earlier Unbrowse call. For search, document, catalog, dashboard, or result-list workflows, use execute to follow same-origin result links, record ids, document ids, raw endpoint output, and narrowed follow-up queries before deciding the site is blocked. Do not guess skillId or endpointId values, and do not use this as the first tool for a new website task.",
202
202
  annotations: {
203
203
  title: "Execute Learned Endpoint",
204
204
  openWorldHint: true,
@@ -225,7 +225,7 @@ export const TOOLS = [
225
225
  {
226
226
  name: "unbrowse_login",
227
227
  title: "Capture Site Login",
228
- description: "Open an interactive browser login flow for a gated site so later Unbrowse calls can reuse the captured auth state. Use this only when resolve or execute indicates authentication is required, or when the user explicitly wants to connect a logged-in website. Do not use this for ordinary public pages.",
228
+ description: "Open an interactive browser login flow for a gated site so later Unbrowse calls can reuse the captured auth state. Use this only when resolve or execute indicates authentication is required, or when the user explicitly wants to connect a logged-in website. Login should target the exact page or workflow surface the user cares about, then later Unbrowse calls should retry that same URL instead of drifting to the homepage, marketing pages, help pages, public mirrors, or alternate domains. Do not use this for ordinary public pages.",
229
229
  annotations: {
230
230
  title: "Capture Site Login",
231
231
  openWorldHint: true,
@@ -990,6 +990,8 @@ const SEARCH_DIRECTIVE_PREFIX =
990
990
  const SEARCH_TRAILING_SITE_HINT = /\s+(on|at|from|in|via)\s+\S+$/i;
991
991
  const SEARCH_INSTRUCTION_NOISE =
992
992
  /\b(do not|don't|dont|tell me|let me know|extremely thoroughly|thoroughly|random cases|for the sake of it|if there is no such|if none exists|if no such)\b/i;
993
+ const SEARCH_PRIORITY_PATTERN =
994
+ /\b(high|court|appeal|leave|adduce|evidence|assessment|damages?|tranche|tranches|started|late|stage|hearing|trial|mediation|case|cases|allow|allowed)\b/;
993
995
 
994
996
  function isLikelySearchParam(
995
997
  urlTemplate: string,
@@ -1109,12 +1111,103 @@ export function selectSearchTermsForExecution(intent: string): string | null {
1109
1111
  const tooLongForSingleField = literal.length > 180 || wordCount > 24;
1110
1112
  if (hasQuotedPhrase && !tooLongForSingleField) return literal;
1111
1113
  if (!hasSentencePunctuation && !tooLongForSingleField) return literal;
1114
+ if (tooLongForSingleField) {
1115
+ const compactPhraseQuery = buildCompactPhraseSearchQuery(intent);
1116
+ if (compactPhraseQuery) return compactPhraseQuery;
1117
+ }
1112
1118
  return condensed;
1113
1119
  }
1114
1120
 
1121
+ function buildCompactPhraseSearchQuery(intent: string): string | null {
1122
+ const stripped = stripSearchIntentBoilerplate(intent);
1123
+ if (!stripped) return null;
1124
+ const sourceText = extractLiteralSearchTermsFromIntent(intent) ?? stripped;
1125
+ const clauses = sourceText
1126
+ .split(/(?<=[.!?])\s+|\n+/)
1127
+ .map((clause) => clause.trim())
1128
+ .filter(Boolean);
1129
+ const phraseScores = new Map<string, { score: number; clauseIndex: number }>();
1130
+ const remember = (rawPhrase: string, score: number, clauseIndex: number) => {
1131
+ const phrase = rawPhrase
1132
+ .toLowerCase()
1133
+ .replace(/[^a-z0-9\s/-]+/g, " ")
1134
+ .replace(/\s+/g, " ")
1135
+ .trim();
1136
+ if (!phrase) return;
1137
+ const words = phrase.split(/\s+/).filter(Boolean);
1138
+ const contentWords = words.filter((word) => !SEARCH_INTENT_STOPWORDS.has(word));
1139
+ if (contentWords.length < 2) return;
1140
+ if (!contentWords.some((word) => SEARCH_PRIORITY_PATTERN.test(word))) return;
1141
+ if (words.length > 8) return;
1142
+ if (SEARCH_INSTRUCTION_NOISE.test(phrase)) return;
1143
+ const priorityHits = contentWords.filter((word) => SEARCH_PRIORITY_PATTERN.test(word)).length;
1144
+ const proceduralHits = contentWords.filter((word) => /^(started|tranche|tranches|allow|allowed)$/.test(word)).length;
1145
+ const startsBadly = /^(eg|\d)$/.test(words[0] ?? "") || /^\d+$/.test(words[0] ?? "");
1146
+ const endsBadly = /^(eg|\d)$/.test(words[words.length - 1] ?? "") || /^\d+$/.test(words[words.length - 1] ?? "");
1147
+ const connectorHits = words.filter((word) => ["of", "to", "for", "at", "after"].includes(word)).length;
1148
+ if (/\b(such|none|random)\b/.test(phrase)) return;
1149
+ const boostedScore =
1150
+ score
1151
+ + Math.min(contentWords.length, 4)
1152
+ + priorityHits * 3
1153
+ + proceduralHits * 4
1154
+ + connectorHits
1155
+ + (words.length >= 3 && words.length <= 5 ? 2 : 0)
1156
+ + (/\d/.test(phrase) ? 2 : 0)
1157
+ - (startsBadly ? 4 : 0)
1158
+ - (endsBadly ? 4 : 0)
1159
+ - (/\beg\b/.test(phrase) ? 6 : 0);
1160
+ const existing = phraseScores.get(phrase);
1161
+ if (!existing || boostedScore > existing.score) phraseScores.set(phrase, { score: boostedScore, clauseIndex });
1162
+ };
1163
+
1164
+ for (const [clauseIndex, clause] of clauses.entries()) {
1165
+ for (const match of clause.matchAll(/["“”']([^"“”']{3,80})["“”']/g)) {
1166
+ remember(match[1], 12, clauseIndex);
1167
+ }
1168
+ }
1169
+
1170
+ for (const [clauseIndex, clause] of clauses.entries()) {
1171
+ for (const match of clause.matchAll(/\b[a-z0-9-]+(?:\s+(?:of|to|for|at|after)\s+[a-z0-9-]+){1,4}\b/gi)) {
1172
+ remember(match[0], 14, clauseIndex);
1173
+ }
1174
+ const tokens = clause
1175
+ .toLowerCase()
1176
+ .replace(/[^a-z0-9\s/-]+/g, " ")
1177
+ .split(/\s+/)
1178
+ .filter(Boolean);
1179
+ for (let start = 0; start < tokens.length; start++) {
1180
+ for (let size = 2; size <= 6 && start + size <= tokens.length; size++) {
1181
+ const slice = tokens.slice(start, start + size);
1182
+ if (SEARCH_INTENT_STOPWORDS.has(slice[0]) || SEARCH_INTENT_STOPWORDS.has(slice[slice.length - 1])) continue;
1183
+ remember(slice.join(" "), 6 - Math.abs(size - 4), clauseIndex);
1184
+ }
1185
+ }
1186
+ }
1187
+
1188
+ const selected: string[] = [];
1189
+ const selectedRaw: string[] = [];
1190
+ let currentLength = 0;
1191
+ const clauseCounts = new Map<number, number>();
1192
+ for (const [phrase, meta] of Array.from(phraseScores.entries())
1193
+ .sort((a, b) => b[1].score - a[1].score || a[0].length - b[0].length)) {
1194
+ if (selectedRaw.some((chosen) => chosen.includes(phrase) || phrase.includes(chosen))) continue;
1195
+ if ((clauseCounts.get(meta.clauseIndex) ?? 0) >= 2) continue;
1196
+ const rendered = `"${phrase}"`;
1197
+ const nextLength = currentLength === 0 ? rendered.length : currentLength + 1 + rendered.length;
1198
+ if (nextLength > 140) continue;
1199
+ selected.push(rendered);
1200
+ selectedRaw.push(phrase);
1201
+ clauseCounts.set(meta.clauseIndex, (clauseCounts.get(meta.clauseIndex) ?? 0) + 1);
1202
+ currentLength = nextLength;
1203
+ if (selected.length >= 4) break;
1204
+ }
1205
+
1206
+ return selected.length > 0 ? selected.join(" ") : null;
1207
+ }
1208
+
1115
1209
  function condenseSearchIntent(intent: string): string | null {
1116
1210
  const wantsSearchAction = /\b(search|find|lookup|look\s+for|browse|discover)\b/i.test(intent);
1117
- const priorityPattern = /\b(high|court|appeal|leave|adduce|evidence|assessment|damages?|tranche|tranches|started|late|stage|hearing|trial|mediation|case|cases)\b/;
1118
1211
  const tokens = intent
1119
1212
  .toLowerCase()
1120
1213
  .replace(/[^a-z0-9\][\-/]+/g, " ")
@@ -1124,7 +1217,7 @@ function condenseSearchIntent(intent: string): string | null {
1124
1217
  const scored = new Map<string, { token: string; index: number; score: number }>();
1125
1218
  tokens.forEach((token, index) => {
1126
1219
  let score = 0;
1127
- if (priorityPattern.test(token)) score += 10;
1220
+ if (SEARCH_PRIORITY_PATTERN.test(token)) score += 10;
1128
1221
  if (token.length >= 8) score += 2;
1129
1222
  if (index < 12) score += 1;
1130
1223
  const existing = scored.get(token);
@@ -79,9 +79,15 @@ description: Use Unbrowse as the exclusive web-access tool for a task.
79
79
  Use Unbrowse as the only allowed tool for website access in this task.
80
80
 
81
81
  Rules:
82
+ - If the user explicitly invokes \`/unbrowse\` or says to use Unbrowse for a site, treat that as strict Unbrowse-only mode until the user explicitly approves fallback.
82
83
  - Do not use Brave Search, built-in web search, browser MCPs, curl, or other network tools for website access unless the user explicitly authorizes fallback.
84
+ - Public mirrors, alternate domains, cached copies, and site-adjacent public portals also count as fallback. Do not switch from the target origin to those surfaces on your own.
83
85
  - If Unbrowse is slow on a first-time site, wait for it. Do not switch tools just because capture or indexing is still running.
84
86
  - If Unbrowse returns partial results, refine with more Unbrowse commands (\`resolve\`, \`search\`, \`execute\`, \`login\`) before considering fallback.
87
+ - If login is required, call \`unbrowse login --url "<the exact page or workflow surface the user cares about>"\`, then retry \`resolve\` against that same URL.
88
+ - After login, do not pivot to the site homepage, marketing pages, help pages, or alternate public sections unless the user explicitly asked for those.
89
+ - For long-form retrieval or research prompts, do not dump the entire story into one search field. Derive 2-4 compact search queries with quoted phrases, product names, titles, IDs, people, dates, or other discriminative terms, then retry inside Unbrowse.
90
+ - For document, catalog, dashboard, or search-result workflows, stay on the same origin and follow result links, record ids, document ids, or raw endpoint output with Unbrowse before asking for any other tool.
85
91
  - If Unbrowse genuinely cannot complete the task, explain why and ask before using another tool.
86
92
 
87
93
  Suggested start: