@huggingface/transformers 3.0.0-alpha.12 → 3.0.0-alpha.14

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -101,7 +101,7 @@ npm i @huggingface/transformers
101
101
  Alternatively, you can use it in vanilla JS, without any bundler, by using a CDN or static hosting. For example, using [ES Modules](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Modules), you can import the library with:
102
102
  ```html
103
103
  <script type="module">
104
- import { pipeline } from 'https://cdn.jsdelivr.net/npm/@huggingface/transformers@3.0.0-alpha.12';
104
+ import { pipeline } from 'https://cdn.jsdelivr.net/npm/@huggingface/transformers@3.0.0-alpha.14';
105
105
  </script>
106
106
  ```
107
107
 
@@ -134,7 +134,7 @@ Check out the Transformers.js [template](https://huggingface.co/new-space?templa
134
134
 
135
135
 
136
136
 
137
- By default, Transformers.js uses [hosted pretrained models](https://huggingface.co/models?library=transformers.js) and [precompiled WASM binaries](https://cdn.jsdelivr.net/npm/@huggingface/transformers@3.0.0-alpha.12/dist/), which should work out-of-the-box. You can customize this as follows:
137
+ By default, Transformers.js uses [hosted pretrained models](https://huggingface.co/models?library=transformers.js) and [precompiled WASM binaries](https://cdn.jsdelivr.net/npm/@huggingface/transformers@3.0.0-alpha.14/dist/), which should work out-of-the-box. You can customize this as follows:
138
138
 
139
139
  ### Settings
140
140
 
@@ -310,6 +310,7 @@ You can refine your search by selecting the task you're interested in (e.g., [te
310
310
  1. **[GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj)** (from EleutherAI) released in the repository [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) by Ben Wang and Aran Komatsuzaki.
311
311
  1. **[GPTBigCode](https://huggingface.co/docs/transformers/model_doc/gpt_bigcode)** (from BigCode) released with the paper [SantaCoder: don't reach for the stars!](https://arxiv.org/abs/2301.03988) by Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo García del Río, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Terry Yue Zhuo, Ian Yu, Paulo Villegas, Marco Zocca, Sourab Mangrulkar, David Lansky, Huu Nguyen, Danish Contractor, Luis Villa, Jia Li, Dzmitry Bahdanau, Yacine Jernite, Sean Hughes, Daniel Fried, Arjun Guha, Harm de Vries, Leandro von Werra.
312
312
  1. **[HerBERT](https://huggingface.co/docs/transformers/model_doc/herbert)** (from Allegro.pl, AGH University of Science and Technology) released with the paper [KLEJ: Comprehensive Benchmark for Polish Language Understanding](https://www.aclweb.org/anthology/2020.acl-main.111.pdf) by Piotr Rybak, Robert Mroczkowski, Janusz Tracz, Ireneusz Gawlik.
313
+ 1. **[Hiera](https://huggingface.co/docs/transformers/model_doc/hiera)** (from Meta) released with the paper [Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles](https://arxiv.org/pdf/2306.00989) by Chaitanya Ryali, Yuan-Ting Hu, Daniel Bolya, Chen Wei, Haoqi Fan, Po-Yao Huang, Vaibhav Aggarwal, Arkabandhu Chowdhury, Omid Poursaeed, Judy Hoffman, Jitendra Malik, Yanghao Li, Christoph Feichtenhofer.
313
314
  1. **[Hubert](https://huggingface.co/docs/transformers/model_doc/hubert)** (from Facebook) released with the paper [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.
314
315
  1. **JAIS** (from Core42) released with the paper [Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models](https://arxiv.org/pdf/2308.16149) by Neha Sengupta, Sunil Kumar Sahu, Bokang Jia, Satheesh Katipomu, Haonan Li, Fajri Koto, William Marshall, Gurpreet Gosal, Cynthia Liu, Zhiming Chen, Osama Mohammed Afzal, Samta Kamboj, Onkar Pandit, Rahul Pal, Lalit Pradhan, Zain Muhammad Mujahid, Massa Baali, Xudong Han, Sondos Mahmoud Bsharat, Alham Fikri Aji, Zhiqiang Shen, Zhengzhong Liu, Natalia Vassilieva, Joel Hestness, Andy Hock, Andrew Feldman, Jonathan Lee, Andrew Jackson, Hector Xuguang Ren, Preslav Nakov, Timothy Baldwin, Eric Xing.
315
316
  1. **[LongT5](https://huggingface.co/docs/transformers/model_doc/longt5)** (from Google AI) released with the paper [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/abs/2112.07916) by Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, Yinfei Yang.
@@ -4437,7 +4437,7 @@ __webpack_require__.r(__webpack_exports__);
4437
4437
 
4438
4438
 
4439
4439
 
4440
- const VERSION = '3.0.0-alpha.12';
4440
+ const VERSION = '3.0.0-alpha.14';
4441
4441
 
4442
4442
  // Check if various APIs are available (depends on environment)
4443
4443
  const IS_BROWSER_ENV = typeof self !== 'undefined';
@@ -5533,18 +5533,18 @@ class NoBadWordsLogitsProcessor extends LogitsProcessor {
5533
5533
  _call(input_ids, logits) {
5534
5534
  for (let i = 0; i < input_ids.length; ++i) {
5535
5535
  const batch_logits_data = /** @type {Float32Array} */(logits[i].data);
5536
-
5536
+ const ids = input_ids[i];
5537
5537
  for (const bad_word_ids of this.bad_words_ids) {
5538
5538
  // Whether to modify the logits of the last token in the bad word id sequence
5539
5539
  let mark = true;
5540
5540
 
5541
5541
  // For each bad word in the list, if the current sequence of input ids ends with this sequence (excluding the last),
5542
5542
  // then we set the logits of the last bad word id to -Infinity.
5543
- for (let i = 1; i <= bad_word_ids.length - 1 && bad_word_ids.length < input_ids[i].length; ++i) {
5543
+ for (let j = 1; j <= bad_word_ids.length - 1 && bad_word_ids.length < ids.length; ++j) {
5544
5544
 
5545
5545
  // NOTE: We use != instead of !== to compare bigint and number
5546
5546
  // @ts-ignore
5547
- if (bad_word_ids.at(-i - 1) != input_ids[i].at(-i)) {
5547
+ if (bad_word_ids.at(-j - 1) != ids.at(-j)) {
5548
5548
  // We have found a mismatch
5549
5549
  mark = false;
5550
5550
  break;
@@ -6372,6 +6372,7 @@ __webpack_require__.r(__webpack_exports__);
6372
6372
  /* harmony export */ AutoModelForImageToImage: () => (/* binding */ AutoModelForImageToImage),
6373
6373
  /* harmony export */ AutoModelForMaskGeneration: () => (/* binding */ AutoModelForMaskGeneration),
6374
6374
  /* harmony export */ AutoModelForMaskedLM: () => (/* binding */ AutoModelForMaskedLM),
6375
+ /* harmony export */ AutoModelForNormalEstimation: () => (/* binding */ AutoModelForNormalEstimation),
6375
6376
  /* harmony export */ AutoModelForObjectDetection: () => (/* binding */ AutoModelForObjectDetection),
6376
6377
  /* harmony export */ AutoModelForQuestionAnswering: () => (/* binding */ AutoModelForQuestionAnswering),
6377
6378
  /* harmony export */ AutoModelForSemanticSegmentation: () => (/* binding */ AutoModelForSemanticSegmentation),
@@ -6529,6 +6530,9 @@ __webpack_require__.r(__webpack_exports__);
6529
6530
  /* harmony export */ GemmaForCausalLM: () => (/* binding */ GemmaForCausalLM),
6530
6531
  /* harmony export */ GemmaModel: () => (/* binding */ GemmaModel),
6531
6532
  /* harmony export */ GemmaPreTrainedModel: () => (/* binding */ GemmaPreTrainedModel),
6533
+ /* harmony export */ HieraForImageClassification: () => (/* binding */ HieraForImageClassification),
6534
+ /* harmony export */ HieraModel: () => (/* binding */ HieraModel),
6535
+ /* harmony export */ HieraPreTrainedModel: () => (/* binding */ HieraPreTrainedModel),
6532
6536
  /* harmony export */ HubertForCTC: () => (/* binding */ HubertForCTC),
6533
6537
  /* harmony export */ HubertForSequenceClassification: () => (/* binding */ HubertForSequenceClassification),
6534
6538
  /* harmony export */ HubertModel: () => (/* binding */ HubertModel),
@@ -6957,6 +6961,7 @@ async function getSession(pretrained_model_name_or_path, fileName, options) {
6957
6961
  });
6958
6962
  if (Object.keys(shapes).length > 0 && !(0,_backends_onnx_js__WEBPACK_IMPORTED_MODULE_1__.isONNXProxy)()) {
6959
6963
  // Only set preferredOutputLocation if shapes are present and we aren't proxying ONNX
6964
+ /** @type {Record<string, import('onnxruntime-common').Tensor.DataLocation>} */
6960
6965
  const preferredOutputLocation = {};
6961
6966
  for (const key in shapes) {
6962
6967
  preferredOutputLocation[key] = 'gpu-buffer';
@@ -11177,6 +11182,19 @@ class DeiTForImageClassification extends DeiTPreTrainedModel {
11177
11182
  }
11178
11183
  //////////////////////////////////////////////////
11179
11184
 
11185
+ //////////////////////////////////////////////////
11186
+ class HieraPreTrainedModel extends PreTrainedModel { }
11187
+ class HieraModel extends HieraPreTrainedModel { }
11188
+ class HieraForImageClassification extends HieraPreTrainedModel {
11189
+ /**
11190
+ * @param {any} model_inputs
11191
+ */
11192
+ async _call(model_inputs) {
11193
+ return new SequenceClassifierOutput(await super._call(model_inputs));
11194
+ }
11195
+ }
11196
+ //////////////////////////////////////////////////
11197
+
11180
11198
 
11181
11199
  //////////////////////////////////////////////////
11182
11200
  /**
@@ -13057,6 +13075,7 @@ const MODEL_MAPPING_NAMES_ENCODER_ONLY = new Map([
13057
13075
  ['owlv2', ['Owlv2Model', Owlv2Model]],
13058
13076
  ['beit', ['BeitModel', BeitModel]],
13059
13077
  ['deit', ['DeiTModel', DeiTModel]],
13078
+ ['hiera', ['HieraModel', HieraModel]],
13060
13079
  ['convnext', ['ConvNextModel', ConvNextModel]],
13061
13080
  ['convnextv2', ['ConvNextV2Model', ConvNextV2Model]],
13062
13081
  ['dinov2', ['Dinov2Model', Dinov2Model]],
@@ -13264,6 +13283,7 @@ const MODEL_FOR_IMAGE_CLASSIFICATION_MAPPING_NAMES = new Map([
13264
13283
  ['mobilevitv2', ['MobileViTV2ForImageClassification', MobileViTV2ForImageClassification]],
13265
13284
  ['beit', ['BeitForImageClassification', BeitForImageClassification]],
13266
13285
  ['deit', ['DeiTForImageClassification', DeiTForImageClassification]],
13286
+ ['hiera', ['HieraForImageClassification', HieraForImageClassification]],
13267
13287
  ['convnext', ['ConvNextForImageClassification', ConvNextForImageClassification]],
13268
13288
  ['convnextv2', ['ConvNextV2ForImageClassification', ConvNextV2ForImageClassification]],
13269
13289
  ['dinov2', ['Dinov2ForImageClassification', Dinov2ForImageClassification]],
@@ -13348,6 +13368,10 @@ const MODEL_FOR_DEPTH_ESTIMATION_MAPPING_NAMES = new Map([
13348
13368
  ['sapiens', ['SapiensForDepthEstimation', SapiensForDepthEstimation]],
13349
13369
  ])
13350
13370
 
13371
+ const MODEL_FOR_NORMAL_ESTIMATION_MAPPING_NAMES = new Map([
13372
+ ['sapiens', ['SapiensForNormalEstimation', SapiensForNormalEstimation]],
13373
+ ])
13374
+
13351
13375
  // NOTE: This is custom to Transformers.js, and is necessary because certain models
13352
13376
  // (e.g., CLIP) are split into vision and text components
13353
13377
  const MODEL_FOR_IMAGE_FEATURE_EXTRACTION_MAPPING_NAMES = new Map([
@@ -13374,6 +13398,7 @@ const MODEL_CLASS_TYPE_MAPPING = [
13374
13398
  [MODEL_FOR_IMAGE_MATTING_MAPPING_NAMES, MODEL_TYPES.EncoderOnly],
13375
13399
  [MODEL_FOR_IMAGE_TO_IMAGE_MAPPING_NAMES, MODEL_TYPES.EncoderOnly],
13376
13400
  [MODEL_FOR_DEPTH_ESTIMATION_MAPPING_NAMES, MODEL_TYPES.EncoderOnly],
13401
+ [MODEL_FOR_NORMAL_ESTIMATION_MAPPING_NAMES, MODEL_TYPES.EncoderOnly],
13377
13402
  [MODEL_FOR_OBJECT_DETECTION_MAPPING_NAMES, MODEL_TYPES.EncoderOnly],
13378
13403
  [MODEL_FOR_ZERO_SHOT_OBJECT_DETECTION_MAPPING_NAMES, MODEL_TYPES.EncoderOnly],
13379
13404
  [MODEL_FOR_MASK_GENERATION_MAPPING_NAMES, MODEL_TYPES.MaskGeneration],
@@ -13630,6 +13655,10 @@ class AutoModelForDepthEstimation extends PretrainedMixin {
13630
13655
  static MODEL_CLASS_MAPPINGS = [MODEL_FOR_DEPTH_ESTIMATION_MAPPING_NAMES];
13631
13656
  }
13632
13657
 
13658
+ class AutoModelForNormalEstimation extends PretrainedMixin {
13659
+ static MODEL_CLASS_MAPPINGS = [MODEL_FOR_NORMAL_ESTIMATION_MAPPING_NAMES];
13660
+ }
13661
+
13633
13662
  class AutoModelForImageFeatureExtraction extends PretrainedMixin {
13634
13663
  static MODEL_CLASS_MAPPINGS = [MODEL_FOR_IMAGE_FEATURE_EXTRACTION_MAPPING_NAMES];
13635
13664
  }
@@ -14082,20 +14111,31 @@ __webpack_require__.r(__webpack_exports__);
14082
14111
 
14083
14112
 
14084
14113
 
14114
+ /**
14115
+ * Asynchronously creates a wrapper function for running an ONNX inference session.
14116
+ *
14117
+ * @param {number[]} session_bytes The session data in bytes.
14118
+ * @param {import('onnxruntime-common').InferenceSession.SessionOptions} session_options The options for the ONNX session.
14119
+ * @template {string | [string] | string[]} T
14120
+ * @param {T} names The name(s) of the output tensor(s).
14121
+ *
14122
+ * @returns {Promise<function(Record<string, Tensor>): Promise<T extends string ? Tensor : T extends string[] ? { [K in keyof T]: Tensor } : never>>}
14123
+ * The wrapper function for running the ONNX inference session.
14124
+ */
14085
14125
  const wrap = async (session_bytes, session_options, names) => {
14086
14126
  const session = await (0,_backends_onnx_js__WEBPACK_IMPORTED_MODULE_0__.createInferenceSession)(
14087
14127
  new Uint8Array(session_bytes), session_options,
14088
14128
  );
14089
- return async (inputs) => {
14129
+ return /** @type {any} */(async (/** @type {Record<string, Tensor>} */ inputs) => {
14090
14130
  const ortFeed = Object.fromEntries(Object.entries(inputs).map(([k, v]) => [k, v.ort_tensor]));
14091
14131
  const outputs = await session.run(ortFeed);
14092
14132
 
14093
14133
  if (Array.isArray(names)) {
14094
14134
  return names.map((n) => new _utils_tensor_js__WEBPACK_IMPORTED_MODULE_1__.Tensor(outputs[n]));
14095
14135
  } else {
14096
- return new _utils_tensor_js__WEBPACK_IMPORTED_MODULE_1__.Tensor(outputs[names]);
14136
+ return new _utils_tensor_js__WEBPACK_IMPORTED_MODULE_1__.Tensor(outputs[/** @type {string} */(names)]);
14097
14137
  }
14098
- }
14138
+ })
14099
14139
  }
14100
14140
 
14101
14141
  // In-memory registry of initialized ONNX operators
@@ -17773,9 +17813,8 @@ function post_process_semantic_segmentation(outputs, target_sizes = null) {
17773
17813
  // Store which objects have labels
17774
17814
  // This is much more efficient that creating a set of the final values
17775
17815
  const hasLabel = new Array(data.dims[0]);
17776
- const out = segmentation.data;
17777
- for (let j = 0; j < out.length; ++j) {
17778
- const index = out[j];
17816
+ for (let j = 0; j < segmentation_data.length; ++j) {
17817
+ const index = segmentation_data[j];
17779
17818
  hasLabel[index] = index;
17780
17819
  }
17781
17820
  /** @type {number[]} The unique list of labels that were detected */
@@ -27891,7 +27930,7 @@ function magnitude(arr) {
27891
27930
  /**
27892
27931
  * Returns the value and index of the minimum element in an array.
27893
27932
  * @param {number[]|TypedArray} arr array of numbers.
27894
- * @returns {number[]} the value and index of the minimum element, of the form: [valueOfMin, indexOfMin]
27933
+ * @returns {[number, number]} the value and index of the minimum element, of the form: [valueOfMin, indexOfMin]
27895
27934
  * @throws {Error} If array is empty.
27896
27935
  */
27897
27936
  function min(arr) {
@@ -29027,6 +29066,7 @@ class Tensor {
29027
29066
  }
29028
29067
  return this;
29029
29068
  }
29069
+
29030
29070
  /**
29031
29071
  * Return a new Tensor with every element added by a constant.
29032
29072
  * @param {number} val The value to add by.
@@ -29049,6 +29089,28 @@ class Tensor {
29049
29089
  return this;
29050
29090
  }
29051
29091
 
29092
+ /**
29093
+ * Return a new Tensor with every element subtracted by a constant.
29094
+ * @param {number} val The value to subtract by.
29095
+ * @returns {Tensor} The new tensor.
29096
+ */
29097
+ sub(val) {
29098
+ return this.clone().sub_(val);
29099
+ }
29100
+
29101
+ /**
29102
+ * Subtract the tensor by a constant in place.
29103
+ * @param {number} val The value to subtract by.
29104
+ * @returns {Tensor} Returns `this`.
29105
+ */
29106
+ sub_(val) {
29107
+ const this_data = this.data;
29108
+ for (let i = 0; i < this_data.length; ++i) {
29109
+ this_data[i] -= val;
29110
+ }
29111
+ return this;
29112
+ }
29113
+
29052
29114
  clone() {
29053
29115
  return new Tensor(this.type, this.data.slice(), this.dims.slice());
29054
29116
  }
@@ -30257,6 +30319,7 @@ __webpack_require__.r(__webpack_exports__);
30257
30319
  /* harmony export */ AutoModelForImageToImage: () => (/* reexport safe */ _models_js__WEBPACK_IMPORTED_MODULE_2__.AutoModelForImageToImage),
30258
30320
  /* harmony export */ AutoModelForMaskGeneration: () => (/* reexport safe */ _models_js__WEBPACK_IMPORTED_MODULE_2__.AutoModelForMaskGeneration),
30259
30321
  /* harmony export */ AutoModelForMaskedLM: () => (/* reexport safe */ _models_js__WEBPACK_IMPORTED_MODULE_2__.AutoModelForMaskedLM),
30322
+ /* harmony export */ AutoModelForNormalEstimation: () => (/* reexport safe */ _models_js__WEBPACK_IMPORTED_MODULE_2__.AutoModelForNormalEstimation),
30260
30323
  /* harmony export */ AutoModelForObjectDetection: () => (/* reexport safe */ _models_js__WEBPACK_IMPORTED_MODULE_2__.AutoModelForObjectDetection),
30261
30324
  /* harmony export */ AutoModelForQuestionAnswering: () => (/* reexport safe */ _models_js__WEBPACK_IMPORTED_MODULE_2__.AutoModelForQuestionAnswering),
30262
30325
  /* harmony export */ AutoModelForSemanticSegmentation: () => (/* reexport safe */ _models_js__WEBPACK_IMPORTED_MODULE_2__.AutoModelForSemanticSegmentation),
@@ -30463,6 +30526,9 @@ __webpack_require__.r(__webpack_exports__);
30463
30526
  /* harmony export */ GemmaTokenizer: () => (/* reexport safe */ _tokenizers_js__WEBPACK_IMPORTED_MODULE_3__.GemmaTokenizer),
30464
30527
  /* harmony export */ Grok1Tokenizer: () => (/* reexport safe */ _tokenizers_js__WEBPACK_IMPORTED_MODULE_3__.Grok1Tokenizer),
30465
30528
  /* harmony export */ HerbertTokenizer: () => (/* reexport safe */ _tokenizers_js__WEBPACK_IMPORTED_MODULE_3__.HerbertTokenizer),
30529
+ /* harmony export */ HieraForImageClassification: () => (/* reexport safe */ _models_js__WEBPACK_IMPORTED_MODULE_2__.HieraForImageClassification),
30530
+ /* harmony export */ HieraModel: () => (/* reexport safe */ _models_js__WEBPACK_IMPORTED_MODULE_2__.HieraModel),
30531
+ /* harmony export */ HieraPreTrainedModel: () => (/* reexport safe */ _models_js__WEBPACK_IMPORTED_MODULE_2__.HieraPreTrainedModel),
30466
30532
  /* harmony export */ HubertForCTC: () => (/* reexport safe */ _models_js__WEBPACK_IMPORTED_MODULE_2__.HubertForCTC),
30467
30533
  /* harmony export */ HubertForSequenceClassification: () => (/* reexport safe */ _models_js__WEBPACK_IMPORTED_MODULE_2__.HubertForSequenceClassification),
30468
30534
  /* harmony export */ HubertModel: () => (/* reexport safe */ _models_js__WEBPACK_IMPORTED_MODULE_2__.HubertModel),