mime-bytes 0.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,23 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2025 Dan Lynch <pyramation@gmail.com>
4
+ Copyright (c) 2025 Hyperweb <developers@hyperweb.io>
5
+ Copyright (c) 2020-present, Interweb, Inc.
6
+
7
+ Permission is hereby granted, free of charge, to any person obtaining a copy
8
+ of this software and associated documentation files (the "Software"), to deal
9
+ in the Software without restriction, including without limitation the rights
10
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
11
+ copies of the Software, and to permit persons to whom the Software is
12
+ furnished to do so, subject to the following conditions:
13
+
14
+ The above copyright notice and this permission notice shall be included in all
15
+ copies or substantial portions of the Software.
16
+
17
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
18
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
19
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
20
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
21
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
22
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
23
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,398 @@
1
+ # mime-bytes
2
+
3
+ <p align="center" width="100%">
4
+ <img height="250" src="https://raw.githubusercontent.com/launchql/launchql/refs/heads/main/assets/outline-logo.svg" />
5
+ </p>
6
+
7
+ <p align="center" width="100%">
8
+ <a href="https://github.com/launchql/launchql/actions/workflows/run-tests.yaml">
9
+ <img height="20" src="https://github.com/launchql/launchql/actions/workflows/run-tests.yaml/badge.svg" />
10
+ </a>
11
+ <a href="https://github.com/launchql/launchql/blob/main/LICENSE"><img height="20" src="https://img.shields.io/badge/license-MIT-blue.svg"/></a>
12
+ <a href="https://www.npmjs.com/package/mime-bytes"><img height="20" src="https://img.shields.io/github/package-json/v/launchql/launchql?filename=packages%2Fmime-bytes%2Fpackage.json"/></a>
13
+ </p>
14
+
15
+ **Lightning-fast file type detection using magic bytes (file signatures) with a focus on stream processing and minimal memory usage.**
16
+
17
+ [Features](#-features) • [Installation](#-installation) • [Quick Start](#-quick-start) • [API](#-api) • [File Types](#-supported-file-types) • [Performance](#-performance)
18
+
19
+ </div>
20
+
21
+ ---
22
+
23
+ ## ✨ Features
24
+
25
+ - 🚀 **Stream-based detection** - Process files of any size without loading them into memory
26
+ - 📦 **100+ file types** - Comprehensive coverage of common and specialized formats
27
+ - 🎯 **High accuracy** - Magic byte detection with fallback to extension-based identification
28
+ - 💾 **Minimal memory usage** - Only reads the first 16-32 bytes needed for detection
29
+ - 🔧 **TypeScript support** - Full type safety and IntelliSense
30
+ - ⚡ **Performance optimized** - Built-in caching for repeated operations
31
+ - 🎨 **Content type disambiguation** - Smart MIME type resolution for ambiguous formats
32
+ - 🔌 **Extensible** - Add custom file types at runtime
33
+ - 🌐 **Charset detection** - Automatic encoding detection for text files
34
+ - 🛡️ **Robust error handling** - Graceful degradation for unknown formats
35
+
36
+ ## 📦 Installation
37
+
38
+ ```bash
39
+ npm install mime-bytes
40
+ ```
41
+
42
+ ## 🚀 Quick Start
43
+
44
+ ```typescript
45
+ import { FileTypeDetector } from 'mime-bytes';
46
+ import { createReadStream } from 'fs';
47
+
48
+ const detector = new FileTypeDetector();
49
+
50
+ // Stream-based detection (recommended)
51
+ const stream = createReadStream('document.pdf');
52
+ const fileType = await detector.detectFromStream(stream);
53
+
54
+ console.log(fileType);
55
+ // {
56
+ // name: "pdf",
57
+ // mimeType: "application/pdf",
58
+ // extensions: ["pdf"],
59
+ // description: "Portable Document Format",
60
+ // charset: "binary",
61
+ // contentType: "application/pdf",
62
+ // confidence: 1.0
63
+ // }
64
+ ```
65
+
66
+ ## 📖 API
67
+
68
+ ### FileTypeDetector
69
+
70
+ The main class for file type detection.
71
+
72
+ #### Constructor Options
73
+
74
+ ```typescript
75
+ const detector = new FileTypeDetector({
76
+ peekBytes: 32, // Number of bytes to peek (default: 32)
77
+ checkMultipleOffsets: true, // Check offsets 0, 4, 8, 12 (default: true)
78
+ maxOffset: 12 // Maximum offset to check (default: 12)
79
+ });
80
+ ```
81
+
82
+ ### Core Methods
83
+
84
+ #### `detectFromStream(stream: Readable): Promise<FileTypeResult | null>`
85
+
86
+ Detect file type from a readable stream. **This is the primary and recommended method.**
87
+
88
+ ```typescript
89
+ const stream = createReadStream('video.mp4');
90
+ const result = await detector.detectFromStream(stream);
91
+ // Stream can still be used after detection!
92
+ ```
93
+
94
+ #### `detectFromBuffer(buffer: Buffer): Promise<FileTypeResult | null>`
95
+
96
+ Detect file type from a buffer (for already-loaded data).
97
+
98
+ ```typescript
99
+ const buffer = await fs.readFile('image.png');
100
+ const result = await detector.detectFromBuffer(buffer);
101
+ ```
102
+
103
+ #### `detectWithFallback(input: Readable | Buffer, filename?: string): Promise<FileTypeResult | null>`
104
+
105
+ Detect with automatic fallback to extension-based detection.
106
+
107
+ ```typescript
108
+ const stream = createReadStream('document.docx');
109
+ const result = await detector.detectWithFallback(stream, 'document.docx');
110
+ // Will use magic bytes first, then fall back to extension if needed
111
+ ```
112
+
113
+ #### `detectFromExtension(extension: string): FileTypeResult[]`
114
+
115
+ Get possible file types based on extension alone.
116
+
117
+ ```typescript
118
+ const results = detector.detectFromExtension('.jpg');
119
+ // Returns array of possible types with lower confidence scores
120
+ ```
121
+
122
+ ### File Type Management
123
+
124
+ #### `addFileType(fileType: FileTypeDefinition): void`
125
+
126
+ Add a custom file type definition.
127
+
128
+ ```typescript
129
+ detector.addFileType({
130
+ name: "myformat",
131
+ magicBytes: ["0x4D", "0x59", "0x46", "0x4D"],
132
+ mimeType: "application/x-myformat",
133
+ extensions: ["myf", "myfmt"],
134
+ description: "My Custom Format",
135
+ category: "application"
136
+ });
137
+ ```
138
+
139
+ #### `removeFileType(name: string): boolean`
140
+
141
+ Remove a file type by name.
142
+
143
+ ```typescript
144
+ detector.removeFileType('myformat'); // Returns true if removed
145
+ ```
146
+
147
+ #### `getByCategory(category: string): FileTypeDefinition[]`
148
+
149
+ Get all file types in a specific category.
150
+
151
+ ```typescript
152
+ const imageTypes = detector.getByCategory('image');
153
+ const audioTypes = detector.getByCategory('audio');
154
+ ```
155
+
156
+ ### Utility Methods
157
+
158
+ #### `isFileType(input: Buffer | FileTypeResult, typeName: string): boolean`
159
+
160
+ Check if a buffer or result matches a specific file type.
161
+
162
+ ```typescript
163
+ const buffer = await fs.readFile('image.png');
164
+ if (detector.isFileType(buffer, 'png')) {
165
+ console.log('This is a PNG file!');
166
+ }
167
+ ```
168
+
169
+ #### `getStats(): FileTypeStats`
170
+
171
+ Get detection statistics.
172
+
173
+ ```typescript
174
+ const stats = detector.getStats();
175
+ console.log(`Total detections: ${stats.totalDetections}`);
176
+ console.log(`Cache hit rate: ${stats.cacheHitRate}%`);
177
+ ```
178
+
179
+ #### `clearCache(): void`
180
+
181
+ Clear the internal cache (useful for testing or memory management).
182
+
183
+ ```typescript
184
+ detector.clearCache();
185
+ ```
186
+
187
+ ## 📊 File Type Result
188
+
189
+ All detection methods return a `FileTypeResult` object:
190
+
191
+ ```typescript
192
+ interface FileTypeResult {
193
+ name: string; // Short identifier (e.g., "pdf")
194
+ mimeType: string; // Standard MIME type
195
+ extensions: string[]; // Common file extensions
196
+ description?: string; // Human-readable description
197
+ charset?: string; // Character encoding (for text files)
198
+ contentType?: string; // Full content type
199
+ confidence: number; // Detection confidence (0-1)
200
+ }
201
+ ```
202
+
203
+ ## 🗂️ Supported File Types
204
+
205
+ ### Images (30+ formats)
206
+ - **Common**: PNG, JPEG, GIF, WebP, SVG, ICO
207
+ - **Modern**: AVIF, HEIC/HEIF, JXL, QOI
208
+ - **Professional**: PSD, TIFF, BMP, DNG
209
+ - **Raw**: CR2, CR3, NEF, ARW, RAF
210
+ - **Legacy**: PCX, TGA, PICT
211
+
212
+ ### Archives (20+ formats)
213
+ - **Common**: ZIP, RAR, 7Z, TAR, GZIP
214
+ - **Unix**: BZIP2, XZ, LZ, CPIO
215
+ - **Windows**: CAB, MSI
216
+ - **Package**: DEB, RPM, APK, JAR
217
+
218
+ ### Documents (15+ formats)
219
+ - **Office**: DOCX, XLSX, PPTX, ODT, ODS
220
+ - **Portable**: PDF, RTF, EPUB
221
+ - **Legacy**: DOC, XLS, PPT
222
+
223
+ ### Media (25+ formats)
224
+ - **Video**: MP4, AVI, MKV, MOV, WebM, FLV
225
+ - **Audio**: MP3, WAV, FLAC, OGG, M4A, AAC
226
+ - **Streaming**: MPEG-TS, M3U8
227
+
228
+ ### Programming (20+ formats)
229
+ - **Source**: JS, TS, JSX, TSX, Python, Java
230
+ - **Data**: JSON, XML, YAML, TOML
231
+ - **Web**: HTML, CSS, LESS, SCSS
232
+ - **Scripts**: SH, BAT, PS1
233
+
234
+ ### Executables (10+ formats)
235
+ - **Windows**: EXE, DLL, MSI
236
+ - **Unix**: ELF, Mach-O
237
+ - **Cross-platform**: JAR, WASM
238
+
239
+ ### Specialized
240
+ - **CAD**: DWG, DXF, STL
241
+ - **Fonts**: TTF, OTF, WOFF, WOFF2
242
+ - **Database**: SQLite
243
+ - **Disk Images**: ISO, DMG
244
+
245
+ ## ⚡ Performance
246
+
247
+ mime-bytes is designed for speed and efficiency:
248
+
249
+ - **Memory Usage**: O(1) - Only peeks necessary bytes
250
+ - **Time Complexity**: O(n) where n is number of registered types
251
+ - **Caching**: ~40% performance improvement on repeated operations
252
+ - **Average Detection Time**: <10ms per file
253
+
254
+ ### Benchmarks
255
+
256
+ ```typescript
257
+ // First detection: ~13ms
258
+ // Cached detection: ~8ms (38% faster)
259
+ // Concurrent processing: Handles 1000+ files/second
260
+ ```
261
+
262
+ ## 🔧 Advanced Usage
263
+
264
+ ### Custom Peek Size
265
+
266
+ For files with magic bytes at unusual offsets:
267
+
268
+ ```typescript
269
+ const detector = new FileTypeDetector({
270
+ peekBytes: 64, // Read more bytes
271
+ maxOffset: 32 // Check deeper offsets
272
+ });
273
+ ```
274
+
275
+ ### Stream Processing Large Files
276
+
277
+ ```typescript
278
+ import { pipeline } from 'stream/promises';
279
+
280
+ async function processLargeFile(filepath: string) {
281
+ const readStream = createReadStream(filepath);
282
+
283
+ // Detect type without consuming the stream
284
+ const fileType = await detector.detectFromStream(readStream);
285
+
286
+ if (fileType?.name === 'zip') {
287
+ // Continue processing the same stream
288
+ await pipeline(
289
+ readStream,
290
+ createUnzipStream(),
291
+ createWriteStream('output')
292
+ );
293
+ }
294
+ }
295
+ ```
296
+
297
+ ### Handling Ambiguous Types
298
+
299
+ ```typescript
300
+ // TypeScript files can be video/mp2t or text/x-typescript
301
+ const result = await detector.detectWithFallback(stream, 'file.ts');
302
+
303
+ if (result?.charset === 'utf-8') {
304
+ console.log('TypeScript source file');
305
+ } else if (result?.charset === 'binary') {
306
+ console.log('MPEG Transport Stream');
307
+ }
308
+ ```
309
+
310
+ ### Batch Processing
311
+
312
+ ```typescript
313
+ async function detectMultipleFiles(files: string[]) {
314
+ const results = await Promise.all(
315
+ files.map(async (file) => {
316
+ const stream = createReadStream(file);
317
+ const type = await detector.detectFromStream(stream);
318
+ return { file, type };
319
+ })
320
+ );
321
+
322
+ return results;
323
+ }
324
+ ```
325
+
326
+ ## 🛡️ Error Handling
327
+
328
+ ```typescript
329
+ try {
330
+ const result = await detector.detectFromStream(stream);
331
+
332
+ if (!result) {
333
+ console.log('Unknown file type');
334
+ // Handle unknown format
335
+ } else {
336
+ console.log(`Detected: ${result.name}`);
337
+ }
338
+ } catch (error) {
339
+ console.error('Detection failed:', error.message);
340
+ // Handle stream errors, permission issues, etc.
341
+ }
342
+ ```
343
+
344
+ ## 🤝 Contributing
345
+
346
+ Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
347
+
348
+ ### Adding New File Types
349
+
350
+ 1. Add the file type definition to `src/file-types-registry.ts`
351
+ 2. Include magic bytes, MIME type, and extensions
352
+ 3. Add tests for the new file type
353
+ 4. Submit a PR with a description of the format
354
+
355
+ ## Related LaunchQL Tooling
356
+
357
+ ### 🧪 Testing
358
+
359
+ * [launchql/pgsql-test](https://github.com/launchql/launchql/tree/main/packages/pgsql-test): **📊 Isolated testing environments** with per-test transaction rollbacks—ideal for integration tests, complex migrations, and RLS simulation.
360
+ * [launchql/graphile-test](https://github.com/launchql/launchql/tree/main/packages/graphile-test): **🔐 Authentication mocking** for Graphile-focused test helpers and emulating row-level security contexts.
361
+ * [launchql/pg-query-context](https://github.com/launchql/launchql/tree/main/packages/pg-query-context): **🔒 Session context injection** to add session-local context (e.g., `SET LOCAL`) into queries—ideal for setting `role`, `jwt.claims`, and other session settings.
362
+
363
+ ### 🧠 Parsing & AST
364
+
365
+ * [launchql/pgsql-parser](https://github.com/launchql/pgsql-parser): **🔄 SQL conversion engine** that interprets and converts PostgreSQL syntax.
366
+ * [launchql/libpg-query-node](https://github.com/launchql/libpg-query-node): **🌉 Node.js bindings** for `libpg_query`, converting SQL into parse trees.
367
+ * [launchql/pg-proto-parser](https://github.com/launchql/pg-proto-parser): **📦 Protobuf parser** for parsing PostgreSQL Protocol Buffers definitions to generate TypeScript interfaces, utility functions, and JSON mappings for enums.
368
+ * [@pgsql/enums](https://github.com/launchql/pgsql-parser/tree/main/packages/enums): **🏷️ TypeScript enums** for PostgreSQL AST for safe and ergonomic parsing logic.
369
+ * [@pgsql/types](https://github.com/launchql/pgsql-parser/tree/main/packages/types): **📝 Type definitions** for PostgreSQL AST nodes in TypeScript.
370
+ * [@pgsql/utils](https://github.com/launchql/pgsql-parser/tree/main/packages/utils): **🛠️ AST utilities** for constructing and transforming PostgreSQL syntax trees.
371
+ * [launchql/pg-ast](https://github.com/launchql/launchql/tree/main/packages/pg-ast): **🔍 Low-level AST tools** and transformations for Postgres query structures.
372
+
373
+ ### 🚀 API & Dev Tools
374
+
375
+ * [launchql/server](https://github.com/launchql/launchql/tree/main/packages/server): **⚡ Express-based API server** powered by PostGraphile to expose a secure, scalable GraphQL API over your Postgres database.
376
+ * [launchql/explorer](https://github.com/launchql/launchql/tree/main/packages/explorer): **🔎 Visual API explorer** with GraphiQL for browsing across all databases and schemas—useful for debugging, documentation, and API prototyping.
377
+
378
+ ### 🔁 Streaming & Uploads
379
+
380
+ * [launchql/s3-streamer](https://github.com/launchql/launchql/tree/main/packages/s3-streamer): **📤 Direct S3 streaming** for large files with support for metadata injection and content validation.
381
+ * [launchql/etag-hash](https://github.com/launchql/launchql/tree/main/packages/etag-hash): **🏷️ S3-compatible ETags** created by streaming and hashing file uploads in chunks.
382
+ * [launchql/etag-stream](https://github.com/launchql/launchql/tree/main/packages/etag-stream): **🔄 ETag computation** via Node stream transformer during upload or transfer.
383
+ * [launchql/uuid-hash](https://github.com/launchql/launchql/tree/main/packages/uuid-hash): **🆔 Deterministic UUIDs** generated from hashed content, great for deduplication and asset referencing.
384
+ * [launchql/uuid-stream](https://github.com/launchql/launchql/tree/main/packages/uuid-stream): **🌊 Streaming UUID generation** based on piped file content—ideal for upload pipelines.
385
+ * [launchql/upload-names](https://github.com/launchql/launchql/tree/main/packages/upload-names): **📂 Collision-resistant filenames** utility for structured and unique file names for uploads.
386
+
387
+ ### 🧰 CLI & Codegen
388
+
389
+ * [@launchql/cli](https://github.com/launchql/launchql/tree/main/packages/cli): **🖥️ Command-line toolkit** for managing LaunchQL projects—supports database scaffolding, migrations, seeding, code generation, and automation.
390
+ * [launchql/launchql-gen](https://github.com/launchql/launchql/tree/main/packages/launchql-gen): **✨ Auto-generated GraphQL** mutations and queries dynamically built from introspected schema data.
391
+ * [@launchql/query-builder](https://github.com/launchql/launchql/tree/main/packages/query-builder): **🏗️ SQL constructor** providing a robust TypeScript-based query builder for dynamic generation of `SELECT`, `INSERT`, `UPDATE`, `DELETE`, and stored procedure calls—supports advanced SQL features like `JOIN`, `GROUP BY`, and schema-qualified queries.
392
+ * [@launchql/query](https://github.com/launchql/launchql/tree/main/packages/query): **🧩 Fluent GraphQL builder** for PostGraphile schemas. ⚡ Schema-aware via introspection, 🧩 composable and ergonomic for building deeply nested queries.
393
+
394
+ ## Disclaimer
395
+
396
+ AS DESCRIBED IN THE LICENSES, THE SOFTWARE IS PROVIDED "AS IS", AT YOUR OWN RISK, AND WITHOUT WARRANTIES OF ANY KIND.
397
+
398
+ No developer or entity involved in creating this software will be liable for any claims or damages whatsoever associated with your use, inability to use, or your interaction with other users of the code, including any direct, indirect, incidental, special, exemplary, punitive or consequential damages, or loss of profits, cryptocurrencies, tokens, or anything else of value.