@solana/codecs-strings 6.3.1 → 6.3.2-canary-20260313143218
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/index.browser.cjs +2 -2
- package/dist/index.browser.cjs.map +1 -1
- package/dist/index.browser.mjs +2 -2
- package/dist/index.browser.mjs.map +1 -1
- package/dist/index.native.mjs +2 -2
- package/dist/index.native.mjs.map +1 -1
- package/dist/index.node.cjs +2 -2
- package/dist/index.node.cjs.map +1 -1
- package/dist/index.node.mjs +2 -2
- package/dist/index.node.mjs.map +1 -1
- package/package.json +6 -5
- package/src/assertions.ts +31 -0
- package/src/base10.ts +87 -0
- package/src/base16.ts +156 -0
- package/src/base58.ts +87 -0
- package/src/base64.ts +166 -0
- package/src/baseX-reslice.ts +147 -0
- package/src/baseX.ts +189 -0
- package/src/index.ts +210 -0
- package/src/null-characters.ts +36 -0
- package/src/utf8.ts +114 -0
package/src/index.ts
ADDED
|
@@ -0,0 +1,210 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* This package contains codecs for strings of different sizes and encodings.
|
|
3
|
+
* It can be used standalone, but it is also exported as part of Kit
|
|
4
|
+
* [`@solana/kit`](https://github.com/anza-xyz/kit/tree/main/packages/kit).
|
|
5
|
+
*
|
|
6
|
+
* This package is also part of the [`@solana/codecs` package](https://github.com/anza-xyz/kit/tree/main/packages/codecs)
|
|
7
|
+
* which acts as an entry point for all codec packages as well as for their documentation.
|
|
8
|
+
*
|
|
9
|
+
* ## Sizing string codecs
|
|
10
|
+
*
|
|
11
|
+
* The `@solana/codecs-strings` package offers a variety of string codecs such as `utf8`, `base58`, `base64`, etc —
|
|
12
|
+
* which we will discuss in more detail below. However, before digging into the available string codecs,
|
|
13
|
+
* it's important to understand the different sizing strategies available for string codecs.
|
|
14
|
+
*
|
|
15
|
+
* By default, all available string codecs will return a `VariableSizeCodec<string>` meaning that:
|
|
16
|
+
*
|
|
17
|
+
* - When encoding a string, all bytes necessary to encode the string will be used.
|
|
18
|
+
* - When decoding a byte array at a given offset, all bytes starting from that offset will be decoded as a string.
|
|
19
|
+
*
|
|
20
|
+
* For instance, here's how you can encode/decode `utf8` strings without any size boundary:
|
|
21
|
+
*
|
|
22
|
+
* ```ts
|
|
23
|
+
* const codec = getUtf8Codec();
|
|
24
|
+
*
|
|
25
|
+
* codec.encode('hello');
|
|
26
|
+
* // 0x68656c6c6f
|
|
27
|
+
* // └-- Any bytes necessary to encode our content.
|
|
28
|
+
*
|
|
29
|
+
* codec.decode(new Uint8Array([0x68, 0x65, 0x6c, 0x6c, 0x6f]));
|
|
30
|
+
* // 'hello'
|
|
31
|
+
* ```
|
|
32
|
+
*
|
|
33
|
+
* This might be what you want — e.g. when having a string at the end of a data structure — but in many cases,
|
|
34
|
+
* you might want to have a size boundary for your string. You may achieve this by composing your string codec
|
|
35
|
+
* with the [`fixCodecSize`](https://github.com/anza-xyz/kit/tree/main/packages/codecs-core#fixing-the-size-of-codecs)
|
|
36
|
+
* or [`addCodecSizePrefix`](https://github.com/anza-xyz/kit/tree/main/packages/codecs-core#prefixing-the-size-of-codecs) functions.
|
|
37
|
+
*
|
|
38
|
+
* The `fixCodecSize` function accepts a fixed byte length and returns a `FixedSizeCodec<string>` that will always use
|
|
39
|
+
* that amount of bytes to encode and decode a string. Any string longer or smaller than that size will be truncated
|
|
40
|
+
* or padded respectively. Here's how you can use it with a `utf8` codec:
|
|
41
|
+
*
|
|
42
|
+
* ```ts
|
|
43
|
+
* const codec = fixCodecSize(getUtf8Codec(), 5);
|
|
44
|
+
*
|
|
45
|
+
* codec.encode('hello');
|
|
46
|
+
* // 0x68656c6c6f
|
|
47
|
+
* // └-- The exact 5 bytes of content.
|
|
48
|
+
*
|
|
49
|
+
* codec.encode('hello world');
|
|
50
|
+
* // 0x68656c6c6f
|
|
51
|
+
* // └-- The truncated 5 bytes of content.
|
|
52
|
+
*
|
|
53
|
+
* codec.encode('hell');
|
|
54
|
+
* // 0x68656c6c00
|
|
55
|
+
* // └-- The padded 5 bytes of content.
|
|
56
|
+
*
|
|
57
|
+
* codec.decode(new Uint8Array([0x68, 0x65, 0x6c, 0x6c, 0x6f, 0xff, 0xff, 0xff, 0xff]));
|
|
58
|
+
* // 'hello'
|
|
59
|
+
* ```
|
|
60
|
+
*
|
|
61
|
+
* The `addCodecSizePrefix` function accepts an additional number codec that will be used to encode and
|
|
62
|
+
* decode a size prefix for the string. This prefix allows us to know when to stop reading the string when
|
|
63
|
+
* decoding a given byte array. Here's how you can use it with a `utf8` codec:
|
|
64
|
+
*
|
|
65
|
+
* ```ts
|
|
66
|
+
* const codec = addCodecSizePrefix(getUtf8Codec(), getU32Codec());
|
|
67
|
+
*
|
|
68
|
+
* codec.encode('hello');
|
|
69
|
+
* // 0x0500000068656c6c6f
|
|
70
|
+
* // | └-- The 5 bytes of content.
|
|
71
|
+
* // └-- 4-byte prefix telling us to read 5 bytes.
|
|
72
|
+
*
|
|
73
|
+
* codec.decode(new Uint8Array([0x05, 0x00, 0x00, 0x00, 0x68, 0x65, 0x6c, 0x6c, 0x6f, 0xff, 0xff, 0xff, 0xff]));
|
|
74
|
+
* // "hello"
|
|
75
|
+
* ```
|
|
76
|
+
*
|
|
77
|
+
* Now, let's take a look at the available string encodings. Just remember that you can use
|
|
78
|
+
* the `fixSizeCodec` or `prefixSizeCodec` functions on any of these encodings to add a size boundary to them.
|
|
79
|
+
*
|
|
80
|
+
* ## Utf8 codec
|
|
81
|
+
*
|
|
82
|
+
* The `getUtf8Codec` function encodes and decodes a UTF-8 string to and from a byte array.
|
|
83
|
+
*
|
|
84
|
+
* ```ts
|
|
85
|
+
* const bytes = getUtf8Codec().encode('hello'); // 0x68656c6c6f
|
|
86
|
+
* const value = getUtf8Codec().decode(bytes); // "hello"
|
|
87
|
+
* ```
|
|
88
|
+
*
|
|
89
|
+
* As usual, separate `getUtf8Encoder` and `getUtf8Decoder` functions are also available.
|
|
90
|
+
*
|
|
91
|
+
* ```ts
|
|
92
|
+
* const bytes = getUtf8Encoder().encode('hello'); // 0x68656c6c6f
|
|
93
|
+
* const value = getUtf8Decoder().decode(bytes); // "hello"
|
|
94
|
+
* ```
|
|
95
|
+
*
|
|
96
|
+
* ## Base 64 codec
|
|
97
|
+
*
|
|
98
|
+
* The `getBase64Codec` function encodes and decodes a base-64 string to and from a byte array.
|
|
99
|
+
*
|
|
100
|
+
* ```ts
|
|
101
|
+
* const bytes = getBase64Codec().encode('hello+world'); // 0x85e965a3ec28ae57
|
|
102
|
+
* const value = getBase64Codec().decode(bytes); // "hello+world"
|
|
103
|
+
* ```
|
|
104
|
+
*
|
|
105
|
+
* As usual, separate `getBase64Encoder` and `getBase64Decoder` functions are also available.
|
|
106
|
+
*
|
|
107
|
+
* ```ts
|
|
108
|
+
* const bytes = getBase64Encoder().encode('hello+world'); // 0x85e965a3ec28ae57
|
|
109
|
+
* const value = getBase64Decoder().decode(bytes); // "hello+world"
|
|
110
|
+
* ```
|
|
111
|
+
*
|
|
112
|
+
* ## Base 58 codec
|
|
113
|
+
*
|
|
114
|
+
* The `getBase58Codec` function encodes and decodes a base-58 string to and from a byte array.
|
|
115
|
+
*
|
|
116
|
+
* ```ts
|
|
117
|
+
* const bytes = getBase58Codec().encode('heLLo'); // 0x1b6a3070
|
|
118
|
+
* const value = getBase58Codec().decode(bytes); // "heLLo"
|
|
119
|
+
* ```
|
|
120
|
+
*
|
|
121
|
+
* As usual, separate `getBase58Encoder` and `getBase58Decoder` functions are also available.
|
|
122
|
+
*
|
|
123
|
+
* ```ts
|
|
124
|
+
* const bytes = getBase58Encoder().encode('heLLo'); // 0x1b6a3070
|
|
125
|
+
* const value = getBase58Decoder().decode(bytes); // "heLLo"
|
|
126
|
+
* ```
|
|
127
|
+
*
|
|
128
|
+
* ## Base 16 codec
|
|
129
|
+
*
|
|
130
|
+
* The `getBase16Codec` function encodes and decodes a base-16 string to and from a byte array.
|
|
131
|
+
*
|
|
132
|
+
* ```ts
|
|
133
|
+
* const bytes = getBase16Codec().encode('deadface'); // 0xdeadface
|
|
134
|
+
* const value = getBase16Codec().decode(bytes); // "deadface"
|
|
135
|
+
* ```
|
|
136
|
+
*
|
|
137
|
+
* As usual, separate `getBase16Encoder` and `getBase16Decoder` functions are also available.
|
|
138
|
+
*
|
|
139
|
+
* ```ts
|
|
140
|
+
* const bytes = getBase16Encoder().encode('deadface'); // 0xdeadface
|
|
141
|
+
* const value = getBase16Decoder().decode(bytes); // "deadface"
|
|
142
|
+
* ```
|
|
143
|
+
*
|
|
144
|
+
* ## Base 10 codec
|
|
145
|
+
*
|
|
146
|
+
* The `getBase10Codec` function encodes and decodes a base-10 string to and from a byte array.
|
|
147
|
+
*
|
|
148
|
+
* ```ts
|
|
149
|
+
* const bytes = getBase10Codec().encode('1024'); // 0x0400
|
|
150
|
+
* const value = getBase10Codec().decode(bytes); // "1024"
|
|
151
|
+
* ```
|
|
152
|
+
*
|
|
153
|
+
* As usual, separate `getBase10Encoder` and `getBase10Decoder` functions are also available.
|
|
154
|
+
*
|
|
155
|
+
* ```ts
|
|
156
|
+
* const bytes = getBase10Encoder().encode('1024'); // 0x0400
|
|
157
|
+
* const value = getBase10Decoder().decode(bytes); // "1024"
|
|
158
|
+
* ```
|
|
159
|
+
*
|
|
160
|
+
* ## Base X codec
|
|
161
|
+
*
|
|
162
|
+
* The `getBaseXCodec` accepts a custom `alphabet` of `X` characters and creates a base-X codec using that alphabet.
|
|
163
|
+
* It does so by iteratively dividing by `X` and handling leading zeros.
|
|
164
|
+
*
|
|
165
|
+
* The base-10 and base-58 codecs use this base-x codec under the hood.
|
|
166
|
+
*
|
|
167
|
+
* ```ts
|
|
168
|
+
* const alphabet = '0ehlo';
|
|
169
|
+
* const bytes = getBaseXCodec(alphabet).encode('hello'); // 0x05bd
|
|
170
|
+
* const value = getBaseXCodec(alphabet).decode(bytes); // "hello"
|
|
171
|
+
* ```
|
|
172
|
+
*
|
|
173
|
+
* As usual, separate `getBaseXEncoder` and `getBaseXDecoder` functions are also available.
|
|
174
|
+
*
|
|
175
|
+
* ```ts
|
|
176
|
+
* const bytes = getBaseXEncoder(alphabet).encode('hello'); // 0x05bd
|
|
177
|
+
* const value = getBaseXDecoder(alphabet).decode(bytes); // "hello"
|
|
178
|
+
* ```
|
|
179
|
+
*
|
|
180
|
+
* ## Re-slicing base X codec
|
|
181
|
+
*
|
|
182
|
+
* The `getBaseXResliceCodec` also creates a base-x codec but uses a different strategy.
|
|
183
|
+
* It re-slices bytes into custom chunks of bits that are then mapped to a provided `alphabet`.
|
|
184
|
+
* The number of bits per chunk is also provided and should typically be set to `log2(alphabet.length)`.
|
|
185
|
+
*
|
|
186
|
+
* This is typically used to create codecs whose alphabet’s length is a power of 2 such as base-16 or base-64.
|
|
187
|
+
*
|
|
188
|
+
* ```ts
|
|
189
|
+
* const bytes = getBaseXResliceCodec('elho', 2).encode('hellolol'); // 0x4aee
|
|
190
|
+
* const value = getBaseXResliceCodec('elho', 2).decode(bytes); // "hellolol"
|
|
191
|
+
* ```
|
|
192
|
+
*
|
|
193
|
+
* As usual, separate `getBaseXResliceEncoder` and `getBaseXResliceDecoder` functions are also available.
|
|
194
|
+
*
|
|
195
|
+
* ```ts
|
|
196
|
+
* const bytes = getBaseXResliceEncoder('elho', 2).encode('hellolol'); // 0x4aee
|
|
197
|
+
* const value = getBaseXResliceDecoder('elho', 2).decode(bytes); // "hellolol"
|
|
198
|
+
* ```
|
|
199
|
+
*
|
|
200
|
+
* @packageDocumentation
|
|
201
|
+
*/
|
|
202
|
+
export * from './assertions';
|
|
203
|
+
export * from './base10';
|
|
204
|
+
export * from './base16';
|
|
205
|
+
export * from './base58';
|
|
206
|
+
export * from './base64';
|
|
207
|
+
export * from './baseX';
|
|
208
|
+
export * from './baseX-reslice';
|
|
209
|
+
export * from './null-characters';
|
|
210
|
+
export * from './utf8';
|
|
@@ -0,0 +1,36 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Removes all null characters (`\u0000`) from a string.
|
|
3
|
+
*
|
|
4
|
+
* This function cleans a string by stripping out any null characters,
|
|
5
|
+
* which are often used as padding in fixed-size string encodings.
|
|
6
|
+
*
|
|
7
|
+
* @param value - The string to process.
|
|
8
|
+
* @returns The input string with all null characters removed.
|
|
9
|
+
*
|
|
10
|
+
* @example
|
|
11
|
+
* Removing null characters from a string.
|
|
12
|
+
* ```ts
|
|
13
|
+
* removeNullCharacters('hello\u0000\u0000'); // "hello"
|
|
14
|
+
* ```
|
|
15
|
+
*/
|
|
16
|
+
export const removeNullCharacters = (value: string) =>
|
|
17
|
+
// eslint-disable-next-line no-control-regex
|
|
18
|
+
value.replace(/\u0000/g, '');
|
|
19
|
+
|
|
20
|
+
/**
|
|
21
|
+
* Pads a string with null characters (`\u0000`) at the end to reach a fixed length.
|
|
22
|
+
*
|
|
23
|
+
* If the input string is shorter than the specified length, it is padded with null characters
|
|
24
|
+
* until it reaches the desired size. If it is already long enough, it remains unchanged.
|
|
25
|
+
*
|
|
26
|
+
* @param value - The string to pad.
|
|
27
|
+
* @param chars - The total length of the resulting string, including padding.
|
|
28
|
+
* @returns The input string padded with null characters up to the specified length.
|
|
29
|
+
*
|
|
30
|
+
* @example
|
|
31
|
+
* Padding a string with null characters.
|
|
32
|
+
* ```ts
|
|
33
|
+
* padNullCharacters('hello', 8); // "hello\u0000\u0000\u0000"
|
|
34
|
+
* ```
|
|
35
|
+
*/
|
|
36
|
+
export const padNullCharacters = (value: string, chars: number) => value.padEnd(chars, '\u0000');
|
package/src/utf8.ts
ADDED
|
@@ -0,0 +1,114 @@
|
|
|
1
|
+
import {
|
|
2
|
+
combineCodec,
|
|
3
|
+
createDecoder,
|
|
4
|
+
createEncoder,
|
|
5
|
+
VariableSizeCodec,
|
|
6
|
+
VariableSizeDecoder,
|
|
7
|
+
VariableSizeEncoder,
|
|
8
|
+
} from '@solana/codecs-core';
|
|
9
|
+
import { TextDecoder, TextEncoder } from '@solana/text-encoding-impl';
|
|
10
|
+
|
|
11
|
+
import { removeNullCharacters } from './null-characters';
|
|
12
|
+
|
|
13
|
+
/**
|
|
14
|
+
* Returns an encoder for UTF-8 strings.
|
|
15
|
+
*
|
|
16
|
+
* This encoder serializes strings using UTF-8 encoding.
|
|
17
|
+
* The encoded output contains as many bytes as needed to represent the string.
|
|
18
|
+
*
|
|
19
|
+
* For more details, see {@link getUtf8Codec}.
|
|
20
|
+
*
|
|
21
|
+
* @returns A `VariableSizeEncoder<string>` for encoding UTF-8 strings.
|
|
22
|
+
*
|
|
23
|
+
* @example
|
|
24
|
+
* Encoding a UTF-8 string.
|
|
25
|
+
* ```ts
|
|
26
|
+
* const encoder = getUtf8Encoder();
|
|
27
|
+
* const bytes = encoder.encode('hello'); // 0x68656c6c6f
|
|
28
|
+
* ```
|
|
29
|
+
*
|
|
30
|
+
* @see {@link getUtf8Codec}
|
|
31
|
+
*/
|
|
32
|
+
export const getUtf8Encoder = (): VariableSizeEncoder<string> => {
|
|
33
|
+
let textEncoder: TextEncoder;
|
|
34
|
+
return createEncoder({
|
|
35
|
+
getSizeFromValue: value => (textEncoder ||= new TextEncoder()).encode(value).length,
|
|
36
|
+
write: (value: string, bytes, offset) => {
|
|
37
|
+
const bytesToAdd = (textEncoder ||= new TextEncoder()).encode(value);
|
|
38
|
+
bytes.set(bytesToAdd, offset);
|
|
39
|
+
return offset + bytesToAdd.length;
|
|
40
|
+
},
|
|
41
|
+
});
|
|
42
|
+
};
|
|
43
|
+
|
|
44
|
+
/**
|
|
45
|
+
* Returns a decoder for UTF-8 strings.
|
|
46
|
+
*
|
|
47
|
+
* This decoder deserializes UTF-8 encoded strings from a byte array.
|
|
48
|
+
* It reads all available bytes starting from the given offset.
|
|
49
|
+
*
|
|
50
|
+
* For more details, see {@link getUtf8Codec}.
|
|
51
|
+
*
|
|
52
|
+
* @returns A `VariableSizeDecoder<string>` for decoding UTF-8 strings.
|
|
53
|
+
*
|
|
54
|
+
* @example
|
|
55
|
+
* Decoding a UTF-8 string.
|
|
56
|
+
* ```ts
|
|
57
|
+
* const decoder = getUtf8Decoder();
|
|
58
|
+
* const value = decoder.decode(new Uint8Array([0x68, 0x65, 0x6c, 0x6c, 0x6f])); // "hello"
|
|
59
|
+
* ```
|
|
60
|
+
*
|
|
61
|
+
* @see {@link getUtf8Codec}
|
|
62
|
+
*/
|
|
63
|
+
export const getUtf8Decoder = (): VariableSizeDecoder<string> => {
|
|
64
|
+
let textDecoder: TextDecoder;
|
|
65
|
+
return createDecoder({
|
|
66
|
+
read(bytes, offset) {
|
|
67
|
+
const value = (textDecoder ||= new TextDecoder()).decode(bytes.slice(offset));
|
|
68
|
+
return [removeNullCharacters(value), bytes.length];
|
|
69
|
+
},
|
|
70
|
+
});
|
|
71
|
+
};
|
|
72
|
+
|
|
73
|
+
/**
|
|
74
|
+
* Returns a codec for encoding and decoding UTF-8 strings.
|
|
75
|
+
*
|
|
76
|
+
* This codec serializes strings using UTF-8 encoding.
|
|
77
|
+
* The encoded output contains as many bytes as needed to represent the string.
|
|
78
|
+
*
|
|
79
|
+
* @returns A `VariableSizeCodec<string>` for encoding and decoding UTF-8 strings.
|
|
80
|
+
*
|
|
81
|
+
* @example
|
|
82
|
+
* Encoding and decoding a UTF-8 string.
|
|
83
|
+
* ```ts
|
|
84
|
+
* const codec = getUtf8Codec();
|
|
85
|
+
* const bytes = codec.encode('hello'); // 0x68656c6c6f
|
|
86
|
+
* const value = codec.decode(bytes); // "hello"
|
|
87
|
+
* ```
|
|
88
|
+
*
|
|
89
|
+
* @remarks
|
|
90
|
+
* This codec does not enforce a size boundary. It will encode and decode all bytes necessary to represent the string.
|
|
91
|
+
*
|
|
92
|
+
* If you need a fixed-size UTF-8 codec, consider using {@link fixCodecSize}.
|
|
93
|
+
*
|
|
94
|
+
* ```ts
|
|
95
|
+
* const codec = fixCodecSize(getUtf8Codec(), 5);
|
|
96
|
+
* ```
|
|
97
|
+
*
|
|
98
|
+
* If you need a size-prefixed UTF-8 codec, consider using {@link addCodecSizePrefix}.
|
|
99
|
+
*
|
|
100
|
+
* ```ts
|
|
101
|
+
* const codec = addCodecSizePrefix(getUtf8Codec(), getU32Codec());
|
|
102
|
+
* ```
|
|
103
|
+
*
|
|
104
|
+
* Separate {@link getUtf8Encoder} and {@link getUtf8Decoder} functions are available.
|
|
105
|
+
*
|
|
106
|
+
* ```ts
|
|
107
|
+
* const bytes = getUtf8Encoder().encode('hello');
|
|
108
|
+
* const value = getUtf8Decoder().decode(bytes);
|
|
109
|
+
* ```
|
|
110
|
+
*
|
|
111
|
+
* @see {@link getUtf8Encoder}
|
|
112
|
+
* @see {@link getUtf8Decoder}
|
|
113
|
+
*/
|
|
114
|
+
export const getUtf8Codec = (): VariableSizeCodec<string> => combineCodec(getUtf8Encoder(), getUtf8Decoder());
|