vsegments 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,380 @@
1
+ # vsegments (Node.js)
2
+
3
+ **Visual segmentation and bounding box detection using Google Gemini AI**
4
+
5
+ `vsegments` is a powerful Node.js library and CLI tool that leverages Google's Gemini AI models to perform advanced visual segmentation and object detection on images. It provides an easy-to-use interface for detecting bounding boxes and generating segmentation masks with high accuracy.
6
+
7
+ [![npm version](https://img.shields.io/npm/v/vsegments.svg)](https://www.npmjs.com/package/vsegments)
8
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
9
+
10
+ ## Features
11
+
12
+ - šŸŽÆ **Bounding Box Detection**: Automatically detect and label objects in images
13
+ - šŸŽØ **Segmentation Masks**: Generate precise segmentation masks for identified objects
14
+ - šŸ–¼ļø **Visualization**: Beautiful visualization with customizable colors, fonts, and transparency
15
+ - šŸ› ļø **CLI Tool**: Powerful command-line interface for batch processing
16
+ - šŸ“¦ **Library**: Clean JavaScript API for integration into your projects
17
+ - šŸš€ **Multiple Models**: Support for various Gemini models (Flash, Pro, etc.)
18
+ - āš™ļø **Customizable**: Fine-tune prompts, system instructions, and output settings
19
+ - šŸ“Š **JSON Export**: Export detection results in structured JSON format
20
+
21
+ ## Installation
22
+
23
+ ### From npm (Recommended)
24
+
25
+ ```bash
26
+ npm install vsegments
27
+ ```
28
+
29
+ ### Global Installation (for CLI)
30
+
31
+ ```bash
32
+ npm install -g vsegments
33
+ ```
34
+
35
+ ### From Source
36
+
37
+ ```bash
38
+ git clone git@github.com:nxtphaseai/vsegments.git
39
+ cd node_vsegments
40
+ npm install
41
+ npm link
42
+ ```
43
+
44
+ ## Quick Start
45
+
46
+ ### Prerequisites
47
+
48
+ You need a Google API key to use this library. Get one from [Google AI Studio](https://aistudio.google.com/app/apikey).
49
+
50
+ Set your API key as an environment variable:
51
+
52
+ ```bash
53
+ export GOOGLE_API_KEY="your-api-key-here"
54
+ ```
55
+
56
+ ### CLI Usage
57
+
58
+ #### Basic Bounding Box Detection
59
+
60
+ ```bash
61
+ vsegments -f image.jpg
62
+ ```
63
+
64
+ #### Save Output Image
65
+
66
+ ```bash
67
+ vsegments -f image.jpg -o output.jpg
68
+ ```
69
+
70
+ #### Perform Segmentation
71
+
72
+ ```bash
73
+ vsegments -f image.jpg --segment -o segmented.jpg
74
+ ```
75
+
76
+ #### Custom Prompt
77
+
78
+ ```bash
79
+ vsegments -f image.jpg -p "Find all people wearing red shirts"
80
+ ```
81
+
82
+ #### Export JSON Results
83
+
84
+ ```bash
85
+ vsegments -f image.jpg --json results.json
86
+ ```
87
+
88
+ #### Compact Output
89
+
90
+ ```bash
91
+ vsegments -f image.jpg --compact
92
+ ```
93
+
94
+ ### Library Usage
95
+
96
+ #### Basic Detection
97
+
98
+ ```javascript
99
+ const VSegments = require('vsegments');
100
+
101
+ // Initialize
102
+ const vs = new VSegments({ apiKey: 'your-api-key' });
103
+
104
+ // Detect bounding boxes
105
+ const result = await vs.detectBoxes('image.jpg');
106
+
107
+ // Print results
108
+ console.log(`Found ${result.boxes.length} objects`);
109
+ result.boxes.forEach(box => {
110
+ console.log(` - ${box.label}`);
111
+ });
112
+
113
+ // Visualize
114
+ await vs.visualize('image.jpg', result, { outputPath: 'output.jpg' });
115
+ ```
116
+
117
+ #### Advanced Detection
118
+
119
+ ```javascript
120
+ const VSegments = require('vsegments');
121
+
122
+ // Initialize with custom settings
123
+ const vs = new VSegments({
124
+ apiKey: 'your-api-key',
125
+ model: 'gemini-2.5-pro',
126
+ temperature: 0.7,
127
+ maxObjects: 50
128
+ });
129
+
130
+ // Detect with custom prompt and instructions
131
+ const result = await vs.detectBoxes('image.jpg', {
132
+ prompt: 'Find all vehicles in the image',
133
+ customInstructions: 'Focus on cars, trucks, and motorcycles. Ignore bicycles.'
134
+ });
135
+
136
+ // Access individual boxes
137
+ result.boxes.forEach(box => {
138
+ console.log(`${box.label}: [${box.x1}, ${box.y1}] -> [${box.x2}, ${box.y2}]`);
139
+ });
140
+ ```
141
+
142
+ #### Segmentation
143
+
144
+ ```javascript
145
+ const VSegments = require('vsegments');
146
+
147
+ const vs = new VSegments({ apiKey: 'your-api-key' });
148
+
149
+ // Perform segmentation
150
+ const result = await vs.segment('image.jpg');
151
+
152
+ // Visualize with custom settings
153
+ await vs.visualize('image.jpg', result, {
154
+ outputPath: 'segmented.jpg',
155
+ lineWidth: 6,
156
+ fontSize: 18,
157
+ alpha: 0.6
158
+ });
159
+ ```
160
+
161
+ ## CLI Reference
162
+
163
+ ### Required Arguments
164
+
165
+ - `-f, --file <image>`: Path to input image file
166
+
167
+ ### Mode Options
168
+
169
+ - `--segment`: Perform segmentation instead of bounding box detection
170
+
171
+ ### API Options
172
+
173
+ - `--api-key <key>`: Google API key (default: `GOOGLE_API_KEY` env var)
174
+ - `-m, --model <model>`: Model name (default: `gemini-flash-latest`)
175
+ - `--temperature <temp>`: Sampling temperature 0.0-1.0 (default: 0.5)
176
+ - `--max-objects <n>`: Maximum objects to detect (default: 25)
177
+
178
+ ### Prompt Options
179
+
180
+ - `-p, --prompt <text>`: Custom detection prompt
181
+ - `--instructions <text>`: Additional system instructions for grounding
182
+
183
+ ### Output Options
184
+
185
+ - `-o, --output <file>`: Save visualized output to file
186
+ - `--json <file>`: Export results as JSON
187
+ - `--no-show`: Don't display the output image
188
+ - `--raw`: Print raw API response
189
+
190
+ ### Visualization Options
191
+
192
+ - `--line-width <n>`: Bounding box line width (default: 4)
193
+ - `--font-size <n>`: Label font size (default: 14)
194
+ - `--alpha <a>`: Mask transparency 0.0-1.0 (default: 0.7)
195
+ - `--max-size <n>`: Maximum image dimension for processing (default: 1024)
196
+
197
+ ### Other Options
198
+
199
+ - `-V, --version`: Show version information
200
+ - `-q, --quiet`: Suppress informational output
201
+ - `--compact`: Compact output format
202
+ - `-h, --help`: Show help message
203
+
204
+ ## API Reference
205
+
206
+ ### `VSegments` Class
207
+
208
+ #### Constructor
209
+
210
+ ```javascript
211
+ new VSegments({
212
+ apiKey: String, // Optional (defaults to GOOGLE_API_KEY env var)
213
+ model: String, // Optional (default: 'gemini-flash-latest')
214
+ temperature: Number, // Optional (default: 0.5)
215
+ maxObjects: Number // Optional (default: 25)
216
+ })
217
+ ```
218
+
219
+ #### Methods
220
+
221
+ ##### `detectBoxes()`
222
+
223
+ Detect bounding boxes in an image.
224
+
225
+ ```javascript
226
+ await vs.detectBoxes(imagePath, {
227
+ prompt: String, // Optional custom prompt
228
+ customInstructions: String, // Optional system instructions
229
+ maxSize: Number // Optional (default: 1024)
230
+ })
231
+ ```
232
+
233
+ Returns: `Promise<SegmentationResult>`
234
+
235
+ ##### `segment()`
236
+
237
+ Perform segmentation on an image.
238
+
239
+ ```javascript
240
+ await vs.segment(imagePath, {
241
+ prompt: String, // Optional custom prompt
242
+ maxSize: Number // Optional (default: 1024)
243
+ })
244
+ ```
245
+
246
+ Returns: `Promise<SegmentationResult>`
247
+
248
+ ##### `visualize()`
249
+
250
+ Visualize detection/segmentation results.
251
+
252
+ ```javascript
253
+ await vs.visualize(imagePath, result, {
254
+ outputPath: String, // Optional output file path
255
+ lineWidth: Number, // Optional (default: 4)
256
+ fontSize: Number, // Optional (default: 14)
257
+ alpha: Number // Optional (default: 0.7)
258
+ })
259
+ ```
260
+
261
+ Returns: `Promise<Canvas>`
262
+
263
+ ### Data Models
264
+
265
+ #### `BoundingBox`
266
+
267
+ ```javascript
268
+ {
269
+ label: String,
270
+ y1: Number, // Normalized 0-1000
271
+ x1: Number,
272
+ y2: Number,
273
+ x2: Number,
274
+
275
+ toAbsolute(imgWidth, imgHeight) // Returns [absX1, absY1, absX2, absY2]
276
+ }
277
+ ```
278
+
279
+ #### `SegmentationResult`
280
+
281
+ ```javascript
282
+ {
283
+ boxes: BoundingBox[],
284
+ masks: SegmentationMask[] | null,
285
+ rawResponse: String | null,
286
+ length: Number // Number of detected objects
287
+ }
288
+ ```
289
+
290
+ ## Examples
291
+
292
+ See the `examples/` directory for complete working examples:
293
+
294
+ - `basic.js` - Basic object detection
295
+ - `segmentation.js` - Image segmentation with masks
296
+
297
+ Run examples:
298
+
299
+ ```bash
300
+ cd examples
301
+ node basic.js path/to/image.jpg
302
+ node segmentation.js path/to/image.jpg
303
+ ```
304
+
305
+ ## Supported Models
306
+
307
+ - `gemini-flash-latest` (default, fastest)
308
+ - `gemini-2.0-flash`
309
+ - `gemini-2.5-flash-lite`
310
+ - `gemini-2.5-flash`
311
+ - `gemini-2.5-pro` (best quality, slower)
312
+
313
+ **Note**: Segmentation features require 2.5 models or later.
314
+
315
+ ## Requirements
316
+
317
+ - Node.js 16.0.0 or higher
318
+ - Dependencies:
319
+ - `@google/generative-ai` ^0.21.0
320
+ - `canvas` ^2.11.2
321
+ - `commander` ^12.0.0
322
+
323
+ ## Publishing to npm
324
+
325
+ ### 1. Build and Test
326
+
327
+ ```bash
328
+ npm install
329
+ npm test
330
+ ```
331
+
332
+ ### 2. Update Version
333
+
334
+ Edit `package.json` and update the version number.
335
+
336
+ ### 3. Login to npm
337
+
338
+ ```bash
339
+ npm login
340
+ ```
341
+
342
+ ### 4. Publish
343
+
344
+ ```bash
345
+ npm publish
346
+ ```
347
+
348
+ ### 5. Verify
349
+
350
+ ```bash
351
+ npm info vsegments
352
+ ```
353
+
354
+ ## Contributing
355
+
356
+ Contributions are welcome! Please feel free to submit a Pull Request.
357
+
358
+ 1. Fork the repository
359
+ 2. Create your feature branch (`git checkout -b feature/amazing-feature`)
360
+ 3. Commit your changes (`git commit -m 'Add amazing feature'`)
361
+ 4. Push to the branch (`git push origin feature/amazing-feature`)
362
+ 5. Open a Pull Request
363
+
364
+ ## License
365
+
366
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
367
+
368
+ ## Acknowledgments
369
+
370
+ - Built using [Google Gemini AI](https://ai.google.dev/)
371
+ - Inspired by the [Google AI Cookbook](https://github.com/google-gemini/cookbook)
372
+
373
+ ## Support
374
+
375
+ - **Issues**: [GitHub Issues](https://github.com/nxtphaseai/vsegments/issues)
376
+ - **Documentation**: [GitHub README](https://github.com/nxtphaseai/vsegments#readme)
377
+
378
+ ---
379
+
380
+ Made with ā¤ļø by Marco Kotrotsos
package/bin/cli.js ADDED
@@ -0,0 +1,159 @@
1
+ #!/usr/bin/env node
2
+
3
+ /**
4
+ * Command-line interface for vsegments
5
+ */
6
+
7
+ const { Command } = require('commander');
8
+ const fs = require('fs').promises;
9
+ const path = require('path');
10
+ const VSegments = require('../src/index');
11
+
12
+ const program = new Command();
13
+
14
+ program
15
+ .name('vsegments')
16
+ .description('Visual segmentation and bounding box detection using Google Gemini AI')
17
+ .version('0.1.0')
18
+ .requiredOption('-f, --file <image>', 'Path to input image file')
19
+ .option('--segment', 'Perform segmentation instead of bounding box detection')
20
+ .option('--api-key <key>', 'Google API key (default: GOOGLE_API_KEY env var)')
21
+ .option('-m, --model <model>', 'Model name to use', 'gemini-flash-latest')
22
+ .option('--temperature <temp>', 'Sampling temperature 0.0-1.0', parseFloat, 0.5)
23
+ .option('--max-objects <n>', 'Maximum number of objects to detect', parseInt, 25)
24
+ .option('-p, --prompt <text>', 'Custom detection prompt')
25
+ .option('--instructions <text>', 'Additional system instructions for grounding')
26
+ .option('-o, --output <file>', 'Save visualized output to file')
27
+ .option('--json <file>', 'Export results as JSON')
28
+ .option('--no-show', 'Don\'t display the output image')
29
+ .option('--raw', 'Print raw API response')
30
+ .option('--line-width <n>', 'Bounding box line width', parseInt, 4)
31
+ .option('--font-size <n>', 'Label font size', parseInt, 14)
32
+ .option('--alpha <a>', 'Mask transparency 0.0-1.0', parseFloat, 0.7)
33
+ .option('--max-size <n>', 'Maximum image dimension for processing', parseInt, 1024)
34
+ .option('-q, --quiet', 'Suppress informational output')
35
+ .option('--compact', 'Compact output: order. subject [x y xx yy]');
36
+
37
+ program.parse(process.argv);
38
+
39
+ const options = program.opts();
40
+
41
+ async function main() {
42
+ try {
43
+ // Validate file exists
44
+ try {
45
+ await fs.access(options.file);
46
+ } catch (err) {
47
+ console.error(`Error: Image file not found: ${options.file}`);
48
+ process.exit(1);
49
+ }
50
+
51
+ // Initialize VSegments
52
+ const vs = new VSegments({
53
+ apiKey: options.apiKey,
54
+ model: options.model,
55
+ temperature: options.temperature,
56
+ maxObjects: options.maxObjects
57
+ });
58
+
59
+ // Perform detection or segmentation
60
+ let result;
61
+
62
+ if (options.segment) {
63
+ if (!options.quiet && !options.compact) {
64
+ console.log(`Performing segmentation on: ${options.file}`);
65
+ }
66
+
67
+ result = await vs.segment(options.file, {
68
+ prompt: options.prompt,
69
+ maxSize: options.maxSize
70
+ });
71
+ } else {
72
+ if (!options.quiet && !options.compact) {
73
+ console.log(`Detecting bounding boxes in: ${options.file}`);
74
+ }
75
+
76
+ result = await vs.detectBoxes(options.file, {
77
+ prompt: options.prompt,
78
+ customInstructions: options.instructions,
79
+ maxSize: options.maxSize
80
+ });
81
+ }
82
+
83
+ // Print results
84
+ if (options.compact) {
85
+ // Compact output: order. subject [x y xx yy]
86
+ result.boxes.forEach((box, i) => {
87
+ console.log(`${i + 1}. ${box.label} [${box.x1} ${box.y1} ${box.x2} ${box.y2}]`);
88
+ });
89
+ } else if (!options.quiet) {
90
+ console.log(`\nDetected ${result.boxes.length} object(s):`);
91
+ result.boxes.forEach((box, i) => {
92
+ console.log(` ${i + 1}. ${box.label}`);
93
+ });
94
+ }
95
+
96
+ // Print raw response if requested
97
+ if (options.raw && !options.compact) {
98
+ console.log('\nRaw API Response:');
99
+ console.log(result.rawResponse);
100
+ }
101
+
102
+ // Export JSON if requested
103
+ if (options.json) {
104
+ const jsonData = {
105
+ boxes: result.boxes.map(box => ({
106
+ label: box.label,
107
+ box_2d: [box.y1, box.x1, box.y2, box.x2]
108
+ })),
109
+ model: options.model,
110
+ temperature: options.temperature
111
+ };
112
+
113
+ if (result.masks) {
114
+ jsonData.masks = result.masks.length;
115
+ }
116
+
117
+ await fs.writeFile(
118
+ options.json,
119
+ JSON.stringify(jsonData, null, 2)
120
+ );
121
+
122
+ if (!options.quiet && !options.compact) {
123
+ console.log(`\nJSON results saved to: ${options.json}`);
124
+ }
125
+ }
126
+
127
+ // Visualize and save if requested
128
+ if (options.output && !options.compact) {
129
+ if (!options.quiet) {
130
+ console.log(`\nCreating visualization...`);
131
+ }
132
+
133
+ await vs.visualize(options.file, result, {
134
+ outputPath: options.output,
135
+ lineWidth: options.lineWidth,
136
+ fontSize: options.fontSize,
137
+ alpha: options.alpha
138
+ });
139
+
140
+ if (!options.quiet) {
141
+ console.log(`Output saved to: ${options.output}`);
142
+ }
143
+ }
144
+
145
+ // Success
146
+ if (!options.quiet && !options.compact) {
147
+ console.log('\nāœ“ Complete!');
148
+ }
149
+
150
+ } catch (err) {
151
+ console.error(`Error: ${err.message}`);
152
+ if (!options.quiet) {
153
+ console.error(err.stack);
154
+ }
155
+ process.exit(1);
156
+ }
157
+ }
158
+
159
+ main();
package/package.json ADDED
@@ -0,0 +1,48 @@
1
+ {
2
+ "name": "vsegments",
3
+ "version": "0.1.0",
4
+ "description": "Visual segmentation and bounding box detection using Google Gemini AI",
5
+ "main": "src/index.js",
6
+ "types": "src/index.d.ts",
7
+ "bin": {
8
+ "vsegments": "./bin/cli.js"
9
+ },
10
+ "scripts": {
11
+ "test": "echo \"Error: no test specified\" && exit 1",
12
+ "lint": "eslint src/**/*.js",
13
+ "format": "prettier --write \"src/**/*.js\""
14
+ },
15
+ "keywords": [
16
+ "image",
17
+ "segmentation",
18
+ "bounding-box",
19
+ "object-detection",
20
+ "google",
21
+ "gemini",
22
+ "ai",
23
+ "computer-vision",
24
+ "machine-learning"
25
+ ],
26
+ "author": "Marco Kotrotsos",
27
+ "license": "MIT",
28
+ "repository": {
29
+ "type": "git",
30
+ "url": "git@github.com:nxtphaseai/node_vsegments.git"
31
+ },
32
+ "homepage": "https://github.com/nxtphaseai/node_vsegments#readme",
33
+ "bugs": {
34
+ "url": "https://github.com/nxtphaseai/node_vsegments/issues"
35
+ },
36
+ "engines": {
37
+ "node": ">=16.0.0"
38
+ },
39
+ "dependencies": {
40
+ "@google/generative-ai": "^0.21.0",
41
+ "canvas": "^2.11.2",
42
+ "commander": "^12.0.0"
43
+ },
44
+ "devDependencies": {
45
+ "eslint": "^8.57.0",
46
+ "prettier": "^3.2.5"
47
+ }
48
+ }