fasttext 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (510) hide show
  1. checksums.yaml +7 -0
  2. data/CHANGELOG.md +3 -0
  3. data/LICENSE.txt +22 -0
  4. data/README.md +251 -0
  5. data/ext/fasttext/ext.cpp +291 -0
  6. data/ext/fasttext/extconf.rb +15 -0
  7. data/lib/fasttext.rb +41 -0
  8. data/lib/fasttext/classifier.rb +92 -0
  9. data/lib/fasttext/ext.bundle +0 -0
  10. data/lib/fasttext/model.rb +60 -0
  11. data/lib/fasttext/vectorizer.rb +58 -0
  12. data/lib/fasttext/version.rb +3 -0
  13. data/vendor/fastText/CMakeLists.txt +68 -0
  14. data/vendor/fastText/CODE_OF_CONDUCT.md +2 -0
  15. data/vendor/fastText/CONTRIBUTING.md +32 -0
  16. data/vendor/fastText/LICENSE +21 -0
  17. data/vendor/fastText/MANIFEST.in +5 -0
  18. data/vendor/fastText/Makefile +63 -0
  19. data/vendor/fastText/README.md +339 -0
  20. data/vendor/fastText/alignment/README.md +53 -0
  21. data/vendor/fastText/alignment/align.py +145 -0
  22. data/vendor/fastText/alignment/eval.py +60 -0
  23. data/vendor/fastText/alignment/example.sh +51 -0
  24. data/vendor/fastText/alignment/unsup_align.py +109 -0
  25. data/vendor/fastText/alignment/utils.py +154 -0
  26. data/vendor/fastText/classification-example.sh +41 -0
  27. data/vendor/fastText/classification-results.sh +94 -0
  28. data/vendor/fastText/crawl/README.md +26 -0
  29. data/vendor/fastText/crawl/dedup.cc +51 -0
  30. data/vendor/fastText/crawl/download_crawl.sh +57 -0
  31. data/vendor/fastText/crawl/filter_dedup.sh +13 -0
  32. data/vendor/fastText/crawl/filter_utf8.cc +105 -0
  33. data/vendor/fastText/crawl/process_wet_file.sh +30 -0
  34. data/vendor/fastText/docs/aligned-vectors.md +64 -0
  35. data/vendor/fastText/docs/api.md +6 -0
  36. data/vendor/fastText/docs/cheatsheet.md +66 -0
  37. data/vendor/fastText/docs/crawl-vectors.md +125 -0
  38. data/vendor/fastText/docs/dataset.md +6 -0
  39. data/vendor/fastText/docs/english-vectors.md +53 -0
  40. data/vendor/fastText/docs/faqs.md +63 -0
  41. data/vendor/fastText/docs/language-identification.md +47 -0
  42. data/vendor/fastText/docs/options.md +50 -0
  43. data/vendor/fastText/docs/pretrained-vectors.md +142 -0
  44. data/vendor/fastText/docs/python-module.md +314 -0
  45. data/vendor/fastText/docs/references.md +41 -0
  46. data/vendor/fastText/docs/supervised-models.md +54 -0
  47. data/vendor/fastText/docs/supervised-tutorial.md +349 -0
  48. data/vendor/fastText/docs/support.md +58 -0
  49. data/vendor/fastText/docs/unsupervised-tutorials.md +309 -0
  50. data/vendor/fastText/eval.py +95 -0
  51. data/vendor/fastText/get-wikimedia.sh +79 -0
  52. data/vendor/fastText/python/README.md +322 -0
  53. data/vendor/fastText/python/README.rst +406 -0
  54. data/vendor/fastText/python/benchmarks/README.rst +3 -0
  55. data/vendor/fastText/python/benchmarks/get_word_vector.py +49 -0
  56. data/vendor/fastText/python/doc/examples/FastTextEmbeddingBag.py +81 -0
  57. data/vendor/fastText/python/doc/examples/bin_to_vec.py +41 -0
  58. data/vendor/fastText/python/doc/examples/compute_accuracy.py +163 -0
  59. data/vendor/fastText/python/doc/examples/get_vocab.py +48 -0
  60. data/vendor/fastText/python/doc/examples/train_supervised.py +42 -0
  61. data/vendor/fastText/python/doc/examples/train_unsupervised.py +56 -0
  62. data/vendor/fastText/python/fasttext_module/fasttext/FastText.py +468 -0
  63. data/vendor/fastText/python/fasttext_module/fasttext/__init__.py +22 -0
  64. data/vendor/fastText/python/fasttext_module/fasttext/pybind/fasttext_pybind.cc +388 -0
  65. data/vendor/fastText/python/fasttext_module/fasttext/tests/__init__.py +14 -0
  66. data/vendor/fastText/python/fasttext_module/fasttext/tests/test_configurations.py +239 -0
  67. data/vendor/fastText/python/fasttext_module/fasttext/tests/test_script.py +629 -0
  68. data/vendor/fastText/python/fasttext_module/fasttext/util/__init__.py +13 -0
  69. data/vendor/fastText/python/fasttext_module/fasttext/util/util.py +60 -0
  70. data/vendor/fastText/quantization-example.sh +40 -0
  71. data/vendor/fastText/runtests.py +60 -0
  72. data/vendor/fastText/scripts/kbcompletion/README.md +19 -0
  73. data/vendor/fastText/scripts/kbcompletion/data.sh +69 -0
  74. data/vendor/fastText/scripts/kbcompletion/eval.cpp +108 -0
  75. data/vendor/fastText/scripts/kbcompletion/fb15k.sh +49 -0
  76. data/vendor/fastText/scripts/kbcompletion/fb15k237.sh +45 -0
  77. data/vendor/fastText/scripts/kbcompletion/svo.sh +38 -0
  78. data/vendor/fastText/scripts/kbcompletion/wn18.sh +49 -0
  79. data/vendor/fastText/scripts/quantization/quantization-results.sh +43 -0
  80. data/vendor/fastText/setup.cfg +2 -0
  81. data/vendor/fastText/setup.py +203 -0
  82. data/vendor/fastText/src/args.cc +320 -0
  83. data/vendor/fastText/src/args.h +68 -0
  84. data/vendor/fastText/src/densematrix.cc +155 -0
  85. data/vendor/fastText/src/densematrix.h +75 -0
  86. data/vendor/fastText/src/dictionary.cc +540 -0
  87. data/vendor/fastText/src/dictionary.h +111 -0
  88. data/vendor/fastText/src/fasttext.cc +821 -0
  89. data/vendor/fastText/src/fasttext.h +191 -0
  90. data/vendor/fastText/src/loss.cc +346 -0
  91. data/vendor/fastText/src/loss.h +163 -0
  92. data/vendor/fastText/src/main.cc +435 -0
  93. data/vendor/fastText/src/matrix.cc +25 -0
  94. data/vendor/fastText/src/matrix.h +44 -0
  95. data/vendor/fastText/src/meter.cc +68 -0
  96. data/vendor/fastText/src/meter.h +69 -0
  97. data/vendor/fastText/src/model.cc +98 -0
  98. data/vendor/fastText/src/model.h +79 -0
  99. data/vendor/fastText/src/productquantizer.cc +251 -0
  100. data/vendor/fastText/src/productquantizer.h +63 -0
  101. data/vendor/fastText/src/quantmatrix.cc +117 -0
  102. data/vendor/fastText/src/quantmatrix.h +60 -0
  103. data/vendor/fastText/src/real.h +15 -0
  104. data/vendor/fastText/src/utils.cc +28 -0
  105. data/vendor/fastText/src/utils.h +43 -0
  106. data/vendor/fastText/src/vector.cc +97 -0
  107. data/vendor/fastText/src/vector.h +61 -0
  108. data/vendor/fastText/tests/fetch_test_data.sh +202 -0
  109. data/vendor/fastText/website/README.md +6 -0
  110. data/vendor/fastText/website/blog/2016-08-18-blog-post.md +42 -0
  111. data/vendor/fastText/website/blog/2017-05-02-blog-post.md +60 -0
  112. data/vendor/fastText/website/blog/2017-10-02-blog-post.md +90 -0
  113. data/vendor/fastText/website/blog/2019-06-25-blog-post.md +168 -0
  114. data/vendor/fastText/website/core/Footer.js +127 -0
  115. data/vendor/fastText/website/package.json +12 -0
  116. data/vendor/fastText/website/pages/en/index.js +286 -0
  117. data/vendor/fastText/website/sidebars.json +18 -0
  118. data/vendor/fastText/website/siteConfig.js +102 -0
  119. data/vendor/fastText/website/static/docs/en/html/annotated.html +115 -0
  120. data/vendor/fastText/website/static/docs/en/html/annotated_dup.js +4 -0
  121. data/vendor/fastText/website/static/docs/en/html/args_8cc.html +113 -0
  122. data/vendor/fastText/website/static/docs/en/html/args_8h.html +134 -0
  123. data/vendor/fastText/website/static/docs/en/html/args_8h.js +14 -0
  124. data/vendor/fastText/website/static/docs/en/html/args_8h_source.html +139 -0
  125. data/vendor/fastText/website/static/docs/en/html/bc_s.png +0 -0
  126. data/vendor/fastText/website/static/docs/en/html/bdwn.png +0 -0
  127. data/vendor/fastText/website/static/docs/en/html/classes.html +121 -0
  128. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1Args-members.html +140 -0
  129. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1Args.html +753 -0
  130. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1Args.js +40 -0
  131. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1Dictionary-members.html +148 -0
  132. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1Dictionary.html +1266 -0
  133. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1Dictionary.js +43 -0
  134. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1FastText-members.html +145 -0
  135. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1FastText.html +1149 -0
  136. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1FastText.js +45 -0
  137. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1Matrix-members.html +123 -0
  138. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1Matrix.html +610 -0
  139. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1Matrix.js +23 -0
  140. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1Model-members.html +150 -0
  141. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1Model.html +1400 -0
  142. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1Model.js +48 -0
  143. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1ProductQuantizer-members.html +131 -0
  144. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1ProductQuantizer.html +950 -0
  145. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1ProductQuantizer.js +31 -0
  146. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1QMatrix-members.html +122 -0
  147. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1QMatrix.html +565 -0
  148. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1QMatrix.js +22 -0
  149. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1Vector-members.html +121 -0
  150. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1Vector.html +542 -0
  151. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1Vector.js +21 -0
  152. data/vendor/fastText/website/static/docs/en/html/closed.png +0 -0
  153. data/vendor/fastText/website/static/docs/en/html/dictionary_8cc.html +116 -0
  154. data/vendor/fastText/website/static/docs/en/html/dictionary_8h.html +142 -0
  155. data/vendor/fastText/website/static/docs/en/html/dictionary_8h.js +10 -0
  156. data/vendor/fastText/website/static/docs/en/html/dictionary_8h_source.html +127 -0
  157. data/vendor/fastText/website/static/docs/en/html/dir_68267d1309a1af8e8297ef4c3efbcdba.html +145 -0
  158. data/vendor/fastText/website/static/docs/en/html/dir_68267d1309a1af8e8297ef4c3efbcdba.js +29 -0
  159. data/vendor/fastText/website/static/docs/en/html/doc.png +0 -0
  160. data/vendor/fastText/website/static/docs/en/html/doxygen.css +1596 -0
  161. data/vendor/fastText/website/static/docs/en/html/doxygen.png +0 -0
  162. data/vendor/fastText/website/static/docs/en/html/dynsections.js +97 -0
  163. data/vendor/fastText/website/static/docs/en/html/fasttext_8cc.html +119 -0
  164. data/vendor/fastText/website/static/docs/en/html/fasttext_8h.html +168 -0
  165. data/vendor/fastText/website/static/docs/en/html/fasttext_8h.js +6 -0
  166. data/vendor/fastText/website/static/docs/en/html/fasttext_8h_source.html +155 -0
  167. data/vendor/fastText/website/static/docs/en/html/favicon.png +0 -0
  168. data/vendor/fastText/website/static/docs/en/html/files.html +125 -0
  169. data/vendor/fastText/website/static/docs/en/html/files.js +4 -0
  170. data/vendor/fastText/website/static/docs/en/html/folderclosed.png +0 -0
  171. data/vendor/fastText/website/static/docs/en/html/folderopen.png +0 -0
  172. data/vendor/fastText/website/static/docs/en/html/functions.html +139 -0
  173. data/vendor/fastText/website/static/docs/en/html/functions_0x7e.html +112 -0
  174. data/vendor/fastText/website/static/docs/en/html/functions_b.html +115 -0
  175. data/vendor/fastText/website/static/docs/en/html/functions_c.html +143 -0
  176. data/vendor/fastText/website/static/docs/en/html/functions_d.html +135 -0
  177. data/vendor/fastText/website/static/docs/en/html/functions_dup.js +27 -0
  178. data/vendor/fastText/website/static/docs/en/html/functions_e.html +115 -0
  179. data/vendor/fastText/website/static/docs/en/html/functions_f.html +112 -0
  180. data/vendor/fastText/website/static/docs/en/html/functions_func.html +563 -0
  181. data/vendor/fastText/website/static/docs/en/html/functions_g.html +145 -0
  182. data/vendor/fastText/website/static/docs/en/html/functions_h.html +112 -0
  183. data/vendor/fastText/website/static/docs/en/html/functions_i.html +121 -0
  184. data/vendor/fastText/website/static/docs/en/html/functions_k.html +106 -0
  185. data/vendor/fastText/website/static/docs/en/html/functions_l.html +140 -0
  186. data/vendor/fastText/website/static/docs/en/html/functions_m.html +153 -0
  187. data/vendor/fastText/website/static/docs/en/html/functions_n.html +164 -0
  188. data/vendor/fastText/website/static/docs/en/html/functions_o.html +116 -0
  189. data/vendor/fastText/website/static/docs/en/html/functions_p.html +161 -0
  190. data/vendor/fastText/website/static/docs/en/html/functions_q.html +135 -0
  191. data/vendor/fastText/website/static/docs/en/html/functions_r.html +116 -0
  192. data/vendor/fastText/website/static/docs/en/html/functions_s.html +159 -0
  193. data/vendor/fastText/website/static/docs/en/html/functions_t.html +138 -0
  194. data/vendor/fastText/website/static/docs/en/html/functions_u.html +106 -0
  195. data/vendor/fastText/website/static/docs/en/html/functions_v.html +106 -0
  196. data/vendor/fastText/website/static/docs/en/html/functions_vars.html +486 -0
  197. data/vendor/fastText/website/static/docs/en/html/functions_w.html +124 -0
  198. data/vendor/fastText/website/static/docs/en/html/functions_z.html +104 -0
  199. data/vendor/fastText/website/static/docs/en/html/globals.html +170 -0
  200. data/vendor/fastText/website/static/docs/en/html/globals_defs.html +113 -0
  201. data/vendor/fastText/website/static/docs/en/html/globals_func.html +155 -0
  202. data/vendor/fastText/website/static/docs/en/html/index.html +100 -0
  203. data/vendor/fastText/website/static/docs/en/html/jquery.js +87 -0
  204. data/vendor/fastText/website/static/docs/en/html/main_8cc.html +582 -0
  205. data/vendor/fastText/website/static/docs/en/html/main_8cc.js +22 -0
  206. data/vendor/fastText/website/static/docs/en/html/matrix_8cc.html +114 -0
  207. data/vendor/fastText/website/static/docs/en/html/matrix_8h.html +121 -0
  208. data/vendor/fastText/website/static/docs/en/html/matrix_8h_source.html +123 -0
  209. data/vendor/fastText/website/static/docs/en/html/menu.js +26 -0
  210. data/vendor/fastText/website/static/docs/en/html/menudata.js +90 -0
  211. data/vendor/fastText/website/static/docs/en/html/model_8cc.html +113 -0
  212. data/vendor/fastText/website/static/docs/en/html/model_8h.html +183 -0
  213. data/vendor/fastText/website/static/docs/en/html/model_8h.js +8 -0
  214. data/vendor/fastText/website/static/docs/en/html/model_8h_source.html +139 -0
  215. data/vendor/fastText/website/static/docs/en/html/namespacefasttext.html +343 -0
  216. data/vendor/fastText/website/static/docs/en/html/namespacefasttext.js +13 -0
  217. data/vendor/fastText/website/static/docs/en/html/namespacefasttext_1_1utils.html +158 -0
  218. data/vendor/fastText/website/static/docs/en/html/namespacemembers.html +125 -0
  219. data/vendor/fastText/website/static/docs/en/html/namespacemembers_enum.html +107 -0
  220. data/vendor/fastText/website/static/docs/en/html/namespacemembers_func.html +110 -0
  221. data/vendor/fastText/website/static/docs/en/html/namespacemembers_type.html +104 -0
  222. data/vendor/fastText/website/static/docs/en/html/namespaces.html +106 -0
  223. data/vendor/fastText/website/static/docs/en/html/namespaces.js +4 -0
  224. data/vendor/fastText/website/static/docs/en/html/nav_f.png +0 -0
  225. data/vendor/fastText/website/static/docs/en/html/nav_g.png +0 -0
  226. data/vendor/fastText/website/static/docs/en/html/nav_h.png +0 -0
  227. data/vendor/fastText/website/static/docs/en/html/navtree.css +146 -0
  228. data/vendor/fastText/website/static/docs/en/html/navtree.js +517 -0
  229. data/vendor/fastText/website/static/docs/en/html/navtreedata.js +40 -0
  230. data/vendor/fastText/website/static/docs/en/html/navtreeindex0.js +253 -0
  231. data/vendor/fastText/website/static/docs/en/html/navtreeindex1.js +139 -0
  232. data/vendor/fastText/website/static/docs/en/html/open.png +0 -0
  233. data/vendor/fastText/website/static/docs/en/html/productquantizer_8cc.html +118 -0
  234. data/vendor/fastText/website/static/docs/en/html/productquantizer_8cc.js +4 -0
  235. data/vendor/fastText/website/static/docs/en/html/productquantizer_8h.html +124 -0
  236. data/vendor/fastText/website/static/docs/en/html/productquantizer_8h_source.html +133 -0
  237. data/vendor/fastText/website/static/docs/en/html/qmatrix_8cc.html +112 -0
  238. data/vendor/fastText/website/static/docs/en/html/qmatrix_8h.html +126 -0
  239. data/vendor/fastText/website/static/docs/en/html/qmatrix_8h_source.html +128 -0
  240. data/vendor/fastText/website/static/docs/en/html/real_8h.html +117 -0
  241. data/vendor/fastText/website/static/docs/en/html/real_8h.js +4 -0
  242. data/vendor/fastText/website/static/docs/en/html/real_8h_source.html +103 -0
  243. data/vendor/fastText/website/static/docs/en/html/resize.js +114 -0
  244. data/vendor/fastText/website/static/docs/en/html/search/all_0.html +26 -0
  245. data/vendor/fastText/website/static/docs/en/html/search/all_0.js +17 -0
  246. data/vendor/fastText/website/static/docs/en/html/search/all_1.html +26 -0
  247. data/vendor/fastText/website/static/docs/en/html/search/all_1.js +8 -0
  248. data/vendor/fastText/website/static/docs/en/html/search/all_10.html +26 -0
  249. data/vendor/fastText/website/static/docs/en/html/search/all_10.js +10 -0
  250. data/vendor/fastText/website/static/docs/en/html/search/all_11.html +26 -0
  251. data/vendor/fastText/website/static/docs/en/html/search/all_11.js +25 -0
  252. data/vendor/fastText/website/static/docs/en/html/search/all_12.html +26 -0
  253. data/vendor/fastText/website/static/docs/en/html/search/all_12.js +15 -0
  254. data/vendor/fastText/website/static/docs/en/html/search/all_13.html +26 -0
  255. data/vendor/fastText/website/static/docs/en/html/search/all_13.js +7 -0
  256. data/vendor/fastText/website/static/docs/en/html/search/all_14.html +26 -0
  257. data/vendor/fastText/website/static/docs/en/html/search/all_14.js +7 -0
  258. data/vendor/fastText/website/static/docs/en/html/search/all_15.html +26 -0
  259. data/vendor/fastText/website/static/docs/en/html/search/all_15.js +11 -0
  260. data/vendor/fastText/website/static/docs/en/html/search/all_16.html +26 -0
  261. data/vendor/fastText/website/static/docs/en/html/search/all_16.js +4 -0
  262. data/vendor/fastText/website/static/docs/en/html/search/all_17.html +26 -0
  263. data/vendor/fastText/website/static/docs/en/html/search/all_17.js +7 -0
  264. data/vendor/fastText/website/static/docs/en/html/search/all_2.html +26 -0
  265. data/vendor/fastText/website/static/docs/en/html/search/all_2.js +17 -0
  266. data/vendor/fastText/website/static/docs/en/html/search/all_3.html +26 -0
  267. data/vendor/fastText/website/static/docs/en/html/search/all_3.js +17 -0
  268. data/vendor/fastText/website/static/docs/en/html/search/all_4.html +26 -0
  269. data/vendor/fastText/website/static/docs/en/html/search/all_4.js +10 -0
  270. data/vendor/fastText/website/static/docs/en/html/search/all_5.html +26 -0
  271. data/vendor/fastText/website/static/docs/en/html/search/all_5.js +12 -0
  272. data/vendor/fastText/website/static/docs/en/html/search/all_6.html +26 -0
  273. data/vendor/fastText/website/static/docs/en/html/search/all_6.js +18 -0
  274. data/vendor/fastText/website/static/docs/en/html/search/all_7.html +26 -0
  275. data/vendor/fastText/website/static/docs/en/html/search/all_7.js +8 -0
  276. data/vendor/fastText/website/static/docs/en/html/search/all_8.html +26 -0
  277. data/vendor/fastText/website/static/docs/en/html/search/all_8.js +11 -0
  278. data/vendor/fastText/website/static/docs/en/html/search/all_9.html +26 -0
  279. data/vendor/fastText/website/static/docs/en/html/search/all_9.js +5 -0
  280. data/vendor/fastText/website/static/docs/en/html/search/all_a.html +26 -0
  281. data/vendor/fastText/website/static/docs/en/html/search/all_a.js +17 -0
  282. data/vendor/fastText/website/static/docs/en/html/search/all_b.html +26 -0
  283. data/vendor/fastText/website/static/docs/en/html/search/all_b.js +27 -0
  284. data/vendor/fastText/website/static/docs/en/html/search/all_c.html +26 -0
  285. data/vendor/fastText/website/static/docs/en/html/search/all_c.js +26 -0
  286. data/vendor/fastText/website/static/docs/en/html/search/all_d.html +26 -0
  287. data/vendor/fastText/website/static/docs/en/html/search/all_d.js +9 -0
  288. data/vendor/fastText/website/static/docs/en/html/search/all_e.html +26 -0
  289. data/vendor/fastText/website/static/docs/en/html/search/all_e.js +35 -0
  290. data/vendor/fastText/website/static/docs/en/html/search/all_f.html +26 -0
  291. data/vendor/fastText/website/static/docs/en/html/search/all_f.js +16 -0
  292. data/vendor/fastText/website/static/docs/en/html/search/classes_0.html +26 -0
  293. data/vendor/fastText/website/static/docs/en/html/search/classes_0.js +4 -0
  294. data/vendor/fastText/website/static/docs/en/html/search/classes_1.html +26 -0
  295. data/vendor/fastText/website/static/docs/en/html/search/classes_1.js +4 -0
  296. data/vendor/fastText/website/static/docs/en/html/search/classes_2.html +26 -0
  297. data/vendor/fastText/website/static/docs/en/html/search/classes_2.js +4 -0
  298. data/vendor/fastText/website/static/docs/en/html/search/classes_3.html +26 -0
  299. data/vendor/fastText/website/static/docs/en/html/search/classes_3.js +4 -0
  300. data/vendor/fastText/website/static/docs/en/html/search/classes_4.html +26 -0
  301. data/vendor/fastText/website/static/docs/en/html/search/classes_4.js +5 -0
  302. data/vendor/fastText/website/static/docs/en/html/search/classes_5.html +26 -0
  303. data/vendor/fastText/website/static/docs/en/html/search/classes_5.js +4 -0
  304. data/vendor/fastText/website/static/docs/en/html/search/classes_6.html +26 -0
  305. data/vendor/fastText/website/static/docs/en/html/search/classes_6.js +4 -0
  306. data/vendor/fastText/website/static/docs/en/html/search/classes_7.html +26 -0
  307. data/vendor/fastText/website/static/docs/en/html/search/classes_7.js +4 -0
  308. data/vendor/fastText/website/static/docs/en/html/search/classes_8.html +26 -0
  309. data/vendor/fastText/website/static/docs/en/html/search/classes_8.js +4 -0
  310. data/vendor/fastText/website/static/docs/en/html/search/close.png +0 -0
  311. data/vendor/fastText/website/static/docs/en/html/search/defines_0.html +26 -0
  312. data/vendor/fastText/website/static/docs/en/html/search/defines_0.js +5 -0
  313. data/vendor/fastText/website/static/docs/en/html/search/defines_1.html +26 -0
  314. data/vendor/fastText/website/static/docs/en/html/search/defines_1.js +4 -0
  315. data/vendor/fastText/website/static/docs/en/html/search/defines_2.html +26 -0
  316. data/vendor/fastText/website/static/docs/en/html/search/defines_2.js +4 -0
  317. data/vendor/fastText/website/static/docs/en/html/search/defines_3.html +26 -0
  318. data/vendor/fastText/website/static/docs/en/html/search/defines_3.js +4 -0
  319. data/vendor/fastText/website/static/docs/en/html/search/enums_0.html +26 -0
  320. data/vendor/fastText/website/static/docs/en/html/search/enums_0.js +4 -0
  321. data/vendor/fastText/website/static/docs/en/html/search/enums_1.html +26 -0
  322. data/vendor/fastText/website/static/docs/en/html/search/enums_1.js +4 -0
  323. data/vendor/fastText/website/static/docs/en/html/search/enums_2.html +26 -0
  324. data/vendor/fastText/website/static/docs/en/html/search/enums_2.js +4 -0
  325. data/vendor/fastText/website/static/docs/en/html/search/enumvalues_0.html +26 -0
  326. data/vendor/fastText/website/static/docs/en/html/search/enumvalues_0.js +4 -0
  327. data/vendor/fastText/website/static/docs/en/html/search/enumvalues_1.html +26 -0
  328. data/vendor/fastText/website/static/docs/en/html/search/enumvalues_1.js +4 -0
  329. data/vendor/fastText/website/static/docs/en/html/search/enumvalues_2.html +26 -0
  330. data/vendor/fastText/website/static/docs/en/html/search/enumvalues_2.js +4 -0
  331. data/vendor/fastText/website/static/docs/en/html/search/enumvalues_3.html +26 -0
  332. data/vendor/fastText/website/static/docs/en/html/search/enumvalues_3.js +4 -0
  333. data/vendor/fastText/website/static/docs/en/html/search/enumvalues_4.html +26 -0
  334. data/vendor/fastText/website/static/docs/en/html/search/enumvalues_4.js +6 -0
  335. data/vendor/fastText/website/static/docs/en/html/search/enumvalues_5.html +26 -0
  336. data/vendor/fastText/website/static/docs/en/html/search/enumvalues_5.js +4 -0
  337. data/vendor/fastText/website/static/docs/en/html/search/files_0.html +26 -0
  338. data/vendor/fastText/website/static/docs/en/html/search/files_0.js +5 -0
  339. data/vendor/fastText/website/static/docs/en/html/search/files_1.html +26 -0
  340. data/vendor/fastText/website/static/docs/en/html/search/files_1.js +5 -0
  341. data/vendor/fastText/website/static/docs/en/html/search/files_2.html +26 -0
  342. data/vendor/fastText/website/static/docs/en/html/search/files_2.js +5 -0
  343. data/vendor/fastText/website/static/docs/en/html/search/files_3.html +26 -0
  344. data/vendor/fastText/website/static/docs/en/html/search/files_3.js +8 -0
  345. data/vendor/fastText/website/static/docs/en/html/search/files_4.html +26 -0
  346. data/vendor/fastText/website/static/docs/en/html/search/files_4.js +5 -0
  347. data/vendor/fastText/website/static/docs/en/html/search/files_5.html +26 -0
  348. data/vendor/fastText/website/static/docs/en/html/search/files_5.js +5 -0
  349. data/vendor/fastText/website/static/docs/en/html/search/files_6.html +26 -0
  350. data/vendor/fastText/website/static/docs/en/html/search/files_6.js +4 -0
  351. data/vendor/fastText/website/static/docs/en/html/search/files_7.html +26 -0
  352. data/vendor/fastText/website/static/docs/en/html/search/files_7.js +5 -0
  353. data/vendor/fastText/website/static/docs/en/html/search/files_8.html +26 -0
  354. data/vendor/fastText/website/static/docs/en/html/search/files_8.js +5 -0
  355. data/vendor/fastText/website/static/docs/en/html/search/functions_0.html +26 -0
  356. data/vendor/fastText/website/static/docs/en/html/search/functions_0.js +14 -0
  357. data/vendor/fastText/website/static/docs/en/html/search/functions_1.html +26 -0
  358. data/vendor/fastText/website/static/docs/en/html/search/functions_1.js +5 -0
  359. data/vendor/fastText/website/static/docs/en/html/search/functions_10.html +26 -0
  360. data/vendor/fastText/website/static/docs/en/html/search/functions_10.js +5 -0
  361. data/vendor/fastText/website/static/docs/en/html/search/functions_11.html +26 -0
  362. data/vendor/fastText/website/static/docs/en/html/search/functions_11.js +18 -0
  363. data/vendor/fastText/website/static/docs/en/html/search/functions_12.html +26 -0
  364. data/vendor/fastText/website/static/docs/en/html/search/functions_12.js +8 -0
  365. data/vendor/fastText/website/static/docs/en/html/search/functions_13.html +26 -0
  366. data/vendor/fastText/website/static/docs/en/html/search/functions_13.js +5 -0
  367. data/vendor/fastText/website/static/docs/en/html/search/functions_14.html +26 -0
  368. data/vendor/fastText/website/static/docs/en/html/search/functions_14.js +4 -0
  369. data/vendor/fastText/website/static/docs/en/html/search/functions_15.html +26 -0
  370. data/vendor/fastText/website/static/docs/en/html/search/functions_15.js +4 -0
  371. data/vendor/fastText/website/static/docs/en/html/search/functions_16.html +26 -0
  372. data/vendor/fastText/website/static/docs/en/html/search/functions_16.js +4 -0
  373. data/vendor/fastText/website/static/docs/en/html/search/functions_17.html +26 -0
  374. data/vendor/fastText/website/static/docs/en/html/search/functions_17.js +7 -0
  375. data/vendor/fastText/website/static/docs/en/html/search/functions_2.html +26 -0
  376. data/vendor/fastText/website/static/docs/en/html/search/functions_2.js +11 -0
  377. data/vendor/fastText/website/static/docs/en/html/search/functions_3.html +26 -0
  378. data/vendor/fastText/website/static/docs/en/html/search/functions_3.js +9 -0
  379. data/vendor/fastText/website/static/docs/en/html/search/functions_4.html +26 -0
  380. data/vendor/fastText/website/static/docs/en/html/search/functions_4.js +4 -0
  381. data/vendor/fastText/website/static/docs/en/html/search/functions_5.html +26 -0
  382. data/vendor/fastText/website/static/docs/en/html/search/functions_5.js +7 -0
  383. data/vendor/fastText/website/static/docs/en/html/search/functions_6.html +26 -0
  384. data/vendor/fastText/website/static/docs/en/html/search/functions_6.js +17 -0
  385. data/vendor/fastText/website/static/docs/en/html/search/functions_7.html +26 -0
  386. data/vendor/fastText/website/static/docs/en/html/search/functions_7.js +5 -0
  387. data/vendor/fastText/website/static/docs/en/html/search/functions_8.html +26 -0
  388. data/vendor/fastText/website/static/docs/en/html/search/functions_8.js +8 -0
  389. data/vendor/fastText/website/static/docs/en/html/search/functions_9.html +26 -0
  390. data/vendor/fastText/website/static/docs/en/html/search/functions_9.js +4 -0
  391. data/vendor/fastText/website/static/docs/en/html/search/functions_a.html +26 -0
  392. data/vendor/fastText/website/static/docs/en/html/search/functions_a.js +8 -0
  393. data/vendor/fastText/website/static/docs/en/html/search/functions_b.html +26 -0
  394. data/vendor/fastText/website/static/docs/en/html/search/functions_b.js +10 -0
  395. data/vendor/fastText/website/static/docs/en/html/search/functions_c.html +26 -0
  396. data/vendor/fastText/website/static/docs/en/html/search/functions_c.js +10 -0
  397. data/vendor/fastText/website/static/docs/en/html/search/functions_d.html +26 -0
  398. data/vendor/fastText/website/static/docs/en/html/search/functions_d.js +6 -0
  399. data/vendor/fastText/website/static/docs/en/html/search/functions_e.html +26 -0
  400. data/vendor/fastText/website/static/docs/en/html/search/functions_e.js +26 -0
  401. data/vendor/fastText/website/static/docs/en/html/search/functions_f.html +26 -0
  402. data/vendor/fastText/website/static/docs/en/html/search/functions_f.js +6 -0
  403. data/vendor/fastText/website/static/docs/en/html/search/mag_sel.png +0 -0
  404. data/vendor/fastText/website/static/docs/en/html/search/namespaces_0.html +26 -0
  405. data/vendor/fastText/website/static/docs/en/html/search/namespaces_0.js +5 -0
  406. data/vendor/fastText/website/static/docs/en/html/search/nomatches.html +12 -0
  407. data/vendor/fastText/website/static/docs/en/html/search/search.css +271 -0
  408. data/vendor/fastText/website/static/docs/en/html/search/search.js +791 -0
  409. data/vendor/fastText/website/static/docs/en/html/search/search_l.png +0 -0
  410. data/vendor/fastText/website/static/docs/en/html/search/search_m.png +0 -0
  411. data/vendor/fastText/website/static/docs/en/html/search/search_r.png +0 -0
  412. data/vendor/fastText/website/static/docs/en/html/search/searchdata.js +42 -0
  413. data/vendor/fastText/website/static/docs/en/html/search/typedefs_0.html +26 -0
  414. data/vendor/fastText/website/static/docs/en/html/search/typedefs_0.js +4 -0
  415. data/vendor/fastText/website/static/docs/en/html/search/typedefs_1.html +26 -0
  416. data/vendor/fastText/website/static/docs/en/html/search/typedefs_1.js +4 -0
  417. data/vendor/fastText/website/static/docs/en/html/search/variables_0.html +26 -0
  418. data/vendor/fastText/website/static/docs/en/html/search/variables_0.js +4 -0
  419. data/vendor/fastText/website/static/docs/en/html/search/variables_1.html +26 -0
  420. data/vendor/fastText/website/static/docs/en/html/search/variables_1.js +6 -0
  421. data/vendor/fastText/website/static/docs/en/html/search/variables_10.html +26 -0
  422. data/vendor/fastText/website/static/docs/en/html/search/variables_10.js +8 -0
  423. data/vendor/fastText/website/static/docs/en/html/search/variables_11.html +26 -0
  424. data/vendor/fastText/website/static/docs/en/html/search/variables_11.js +11 -0
  425. data/vendor/fastText/website/static/docs/en/html/search/variables_12.html +26 -0
  426. data/vendor/fastText/website/static/docs/en/html/search/variables_12.js +4 -0
  427. data/vendor/fastText/website/static/docs/en/html/search/variables_13.html +26 -0
  428. data/vendor/fastText/website/static/docs/en/html/search/variables_13.js +10 -0
  429. data/vendor/fastText/website/static/docs/en/html/search/variables_2.html +26 -0
  430. data/vendor/fastText/website/static/docs/en/html/search/variables_2.js +9 -0
  431. data/vendor/fastText/website/static/docs/en/html/search/variables_3.html +26 -0
  432. data/vendor/fastText/website/static/docs/en/html/search/variables_3.js +9 -0
  433. data/vendor/fastText/website/static/docs/en/html/search/variables_4.html +26 -0
  434. data/vendor/fastText/website/static/docs/en/html/search/variables_4.js +7 -0
  435. data/vendor/fastText/website/static/docs/en/html/search/variables_5.html +26 -0
  436. data/vendor/fastText/website/static/docs/en/html/search/variables_5.js +4 -0
  437. data/vendor/fastText/website/static/docs/en/html/search/variables_6.html +26 -0
  438. data/vendor/fastText/website/static/docs/en/html/search/variables_6.js +5 -0
  439. data/vendor/fastText/website/static/docs/en/html/search/variables_7.html +26 -0
  440. data/vendor/fastText/website/static/docs/en/html/search/variables_7.js +5 -0
  441. data/vendor/fastText/website/static/docs/en/html/search/variables_8.html +26 -0
  442. data/vendor/fastText/website/static/docs/en/html/search/variables_8.js +4 -0
  443. data/vendor/fastText/website/static/docs/en/html/search/variables_9.html +26 -0
  444. data/vendor/fastText/website/static/docs/en/html/search/variables_9.js +10 -0
  445. data/vendor/fastText/website/static/docs/en/html/search/variables_a.html +26 -0
  446. data/vendor/fastText/website/static/docs/en/html/search/variables_a.js +14 -0
  447. data/vendor/fastText/website/static/docs/en/html/search/variables_b.html +26 -0
  448. data/vendor/fastText/website/static/docs/en/html/search/variables_b.js +17 -0
  449. data/vendor/fastText/website/static/docs/en/html/search/variables_c.html +26 -0
  450. data/vendor/fastText/website/static/docs/en/html/search/variables_c.js +6 -0
  451. data/vendor/fastText/website/static/docs/en/html/search/variables_d.html +26 -0
  452. data/vendor/fastText/website/static/docs/en/html/search/variables_d.js +10 -0
  453. data/vendor/fastText/website/static/docs/en/html/search/variables_e.html +26 -0
  454. data/vendor/fastText/website/static/docs/en/html/search/variables_e.js +11 -0
  455. data/vendor/fastText/website/static/docs/en/html/search/variables_f.html +26 -0
  456. data/vendor/fastText/website/static/docs/en/html/search/variables_f.js +6 -0
  457. data/vendor/fastText/website/static/docs/en/html/splitbar.png +0 -0
  458. data/vendor/fastText/website/static/docs/en/html/structfasttext_1_1Node-members.html +108 -0
  459. data/vendor/fastText/website/static/docs/en/html/structfasttext_1_1Node.html +194 -0
  460. data/vendor/fastText/website/static/docs/en/html/structfasttext_1_1Node.js +8 -0
  461. data/vendor/fastText/website/static/docs/en/html/structfasttext_1_1entry-members.html +107 -0
  462. data/vendor/fastText/website/static/docs/en/html/structfasttext_1_1entry.html +178 -0
  463. data/vendor/fastText/website/static/docs/en/html/structfasttext_1_1entry.js +7 -0
  464. data/vendor/fastText/website/static/docs/en/html/sync_off.png +0 -0
  465. data/vendor/fastText/website/static/docs/en/html/sync_on.png +0 -0
  466. data/vendor/fastText/website/static/docs/en/html/tab_a.png +0 -0
  467. data/vendor/fastText/website/static/docs/en/html/tab_b.png +0 -0
  468. data/vendor/fastText/website/static/docs/en/html/tab_h.png +0 -0
  469. data/vendor/fastText/website/static/docs/en/html/tab_s.png +0 -0
  470. data/vendor/fastText/website/static/docs/en/html/tabs.css +1 -0
  471. data/vendor/fastText/website/static/docs/en/html/utils_8cc.html +121 -0
  472. data/vendor/fastText/website/static/docs/en/html/utils_8cc.js +5 -0
  473. data/vendor/fastText/website/static/docs/en/html/utils_8h.html +122 -0
  474. data/vendor/fastText/website/static/docs/en/html/utils_8h.js +5 -0
  475. data/vendor/fastText/website/static/docs/en/html/utils_8h_source.html +104 -0
  476. data/vendor/fastText/website/static/docs/en/html/vector_8cc.html +121 -0
  477. data/vendor/fastText/website/static/docs/en/html/vector_8cc.js +4 -0
  478. data/vendor/fastText/website/static/docs/en/html/vector_8h.html +126 -0
  479. data/vendor/fastText/website/static/docs/en/html/vector_8h.js +5 -0
  480. data/vendor/fastText/website/static/docs/en/html/vector_8h_source.html +120 -0
  481. data/vendor/fastText/website/static/fasttext.css +48 -0
  482. data/vendor/fastText/website/static/img/authors/armand_joulin.jpg +0 -0
  483. data/vendor/fastText/website/static/img/authors/christian_puhrsch.png +0 -0
  484. data/vendor/fastText/website/static/img/authors/edouard_grave.jpeg +0 -0
  485. data/vendor/fastText/website/static/img/authors/piotr_bojanowski.jpg +0 -0
  486. data/vendor/fastText/website/static/img/authors/tomas_mikolov.jpg +0 -0
  487. data/vendor/fastText/website/static/img/blog/2016-08-18-blog-post-img1.png +0 -0
  488. data/vendor/fastText/website/static/img/blog/2016-08-18-blog-post-img2.png +0 -0
  489. data/vendor/fastText/website/static/img/blog/2017-05-02-blog-post-img1.jpg +0 -0
  490. data/vendor/fastText/website/static/img/blog/2017-05-02-blog-post-img2.jpg +0 -0
  491. data/vendor/fastText/website/static/img/blog/2017-10-02-blog-post-img1.png +0 -0
  492. data/vendor/fastText/website/static/img/cbo_vs_skipgram.png +0 -0
  493. data/vendor/fastText/website/static/img/fasttext-icon-api.png +0 -0
  494. data/vendor/fastText/website/static/img/fasttext-icon-bg-web.png +0 -0
  495. data/vendor/fastText/website/static/img/fasttext-icon-color-square.png +0 -0
  496. data/vendor/fastText/website/static/img/fasttext-icon-color-web.png +0 -0
  497. data/vendor/fastText/website/static/img/fasttext-icon-faq.png +0 -0
  498. data/vendor/fastText/website/static/img/fasttext-icon-tutorial.png +0 -0
  499. data/vendor/fastText/website/static/img/fasttext-icon-white-web.png +0 -0
  500. data/vendor/fastText/website/static/img/fasttext-logo-color-web.png +0 -0
  501. data/vendor/fastText/website/static/img/fasttext-logo-white-web.png +0 -0
  502. data/vendor/fastText/website/static/img/logo-color.png +0 -0
  503. data/vendor/fastText/website/static/img/model-black.png +0 -0
  504. data/vendor/fastText/website/static/img/model-blue.png +0 -0
  505. data/vendor/fastText/website/static/img/model-red.png +0 -0
  506. data/vendor/fastText/website/static/img/ogimage.png +0 -0
  507. data/vendor/fastText/website/static/img/oss_logo.png +0 -0
  508. data/vendor/fastText/wikifil.pl +57 -0
  509. data/vendor/fastText/word-vector-example.sh +39 -0
  510. metadata +621 -0
@@ -0,0 +1,6 @@
1
+ ---
2
+ id: dataset
3
+ title: Datasets
4
+ ---
5
+
6
+ [Download YFCC100M Dataset](https://fb-public.box.com/s/htfdbrvycvroebv9ecaezaztocbcnsdn)
@@ -0,0 +1,53 @@
1
+ ---
2
+ id: english-vectors
3
+ title: English word vectors
4
+ ---
5
+
6
+ This page gathers several pre-trained word vectors trained using fastText.
7
+
8
+ ### Download pre-trained word vectors
9
+
10
+ Pre-trained word vectors learned on different sources can be downloaded below:
11
+
12
+ 1. [wiki-news-300d-1M.vec.zip](https://dl.fbaipublicfiles.com/fasttext/vectors-english/wiki-news-300d-1M.vec.zip): 1 million word vectors trained on Wikipedia 2017, UMBC webbase corpus and statmt.org news dataset (16B tokens).
13
+ 2. [wiki-news-300d-1M-subword.vec.zip](https://dl.fbaipublicfiles.com/fasttext/vectors-english/wiki-news-300d-1M-subword.vec.zip): 1 million word vectors trained with subword infomation on Wikipedia 2017, UMBC webbase corpus and statmt.org news dataset (16B tokens).
14
+ 3. [crawl-300d-2M.vec.zip](https://dl.fbaipublicfiles.com/fasttext/vectors-english/crawl-300d-2M.vec.zip): 2 million word vectors trained on Common Crawl (600B tokens).
15
+ 4. [crawl-300d-2M-subword.zip](https://dl.fbaipublicfiles.com/fasttext/vectors-english/crawl-300d-2M-subword.zip): 2 million word vectors trained with subword information on Common Crawl (600B tokens).
16
+
17
+ ### Format
18
+
19
+ The first line of the file contains the number of words in the vocabulary and the size of the vectors.
20
+ Each line contains a word followed by its vectors, like in the default fastText text format.
21
+ Each value is space separated. Words are ordered by descending frequency.
22
+ These text models can easily be loaded in Python using the following code:
23
+ ```python
24
+ import io
25
+
26
+ def load_vectors(fname):
27
+ fin = io.open(fname, 'r', encoding='utf-8', newline='\n', errors='ignore')
28
+ n, d = map(int, fin.readline().split())
29
+ data = {}
30
+ for line in fin:
31
+ tokens = line.rstrip().split(' ')
32
+ data[tokens[0]] = map(float, tokens[1:])
33
+ return data
34
+ ```
35
+
36
+ ### License
37
+
38
+ These word vectors are distributed under the [*Creative Commons Attribution-Share-Alike License 3.0*](https://creativecommons.org/licenses/by-sa/3.0/).
39
+
40
+ ### References
41
+
42
+ If you use these word vectors, please cite the following paper:
43
+
44
+ T. Mikolov, E. Grave, P. Bojanowski, C. Puhrsch, A. Joulin. [*Advances in Pre-Training Distributed Word Representations*](https://arxiv.org/abs/1712.09405)
45
+
46
+ ```markup
47
+ @inproceedings{mikolov2018advances,
48
+ title={Advances in Pre-Training Distributed Word Representations},
49
+ author={Mikolov, Tomas and Grave, Edouard and Bojanowski, Piotr and Puhrsch, Christian and Joulin, Armand},
50
+ booktitle={Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018)},
51
+ year={2018}
52
+ }
53
+ ```
@@ -0,0 +1,63 @@
1
+ ---
2
+ id: faqs
3
+ title:FAQ
4
+ ---
5
+
6
+ ## What is fastText? Are there tutorials?
7
+
8
+ FastText is a library for text classification and representation. It transforms text into continuous vectors that can later be used on any language related task. A few tutorials are available.
9
+
10
+ ## How can I reduce the size of my fastText models?
11
+
12
+ fastText uses a hashtable for either word or character ngrams. The size of the hashtable directly impacts the size of a model. To reduce the size of the model, it is possible to reduce the size of this table with the option '-hash'. For example a good value is 20000. Another option that greatly impacts the size of a model is the size of the vectors (-dim). This dimension can be reduced to save space but this can significantly impact performance. If that still produce a model that is too big, one can further reduce the size of a trained model with the quantization option.
13
+ ```bash
14
+ ./fasttext quantize -output model
15
+ ```
16
+
17
+ ## What would be the best way to represent word phrases rather than words?
18
+
19
+ Currently the best approach to represent word phrases or sentence is to take a bag of words of word vectors. Additionally, for phrases like “New York”, preprocessing the data so that it becomes a single token “New_York” can greatly help.
20
+
21
+ ## Why does fastText produce vectors even for unknown words?
22
+
23
+ One of the key features of fastText word representation is its ability to produce vectors for any words, even made-up ones.
24
+ Indeed, fastText word vectors are built from vectors of substrings of characters contained in it.
25
+ This allows to build vectors even for misspelled words or concatenation of words.
26
+
27
+ ## Why is the hierarchical softmax slightly worse in performance than the full softmax?
28
+
29
+ The hierarchical softmax is an approximation of the full softmax loss that allows to train on large number of class efficiently. This is often at the cost of a few percent of accuracy.
30
+ Note also that this loss is thought for classes that are unbalanced, that is some classes are more frequent than others. If your dataset has a balanced number of examples per class, it is worth trying the negative sampling loss (-loss ns -neg 100).
31
+ However, negative sampling will still be very slow at test time, since the full softmax will be computed.
32
+
33
+ ## Can we run fastText program on a GPU?
34
+
35
+ As of now, fastText only works on CPU.
36
+ Please note that one of the goal of fastText is to be an efficient CPU tool, allowing to train models without requiring a GPU.
37
+
38
+ ## Can I use fastText with python? Or other languages?
39
+
40
+ [Python is officially supported](/docs/en/support.html#building-fasttext-python-module).
41
+ There are few unofficial wrappers for javascript, lua and other languages available on github.
42
+
43
+ ## Can I use fastText with continuous data?
44
+
45
+ FastText works on discrete tokens and thus cannot be directly used on continuous tokens. However, one can discretize continuous tokens to use fastText on them, for example by rounding values to a specific digit ("12.3" becomes "12").
46
+
47
+ ## There are misspellings in the dictionary. Should we improve text normalization?
48
+
49
+ If the words are infrequent, there is no need to worry.
50
+
51
+ ## I'm encountering a NaN, why could this be?
52
+
53
+ You'll likely see this behavior because your learning rate is too high. Try reducing it until you don't see this error anymore.
54
+
55
+ ## My compiler / architecture can't build fastText. What should I do?
56
+ Try a newer version of your compiler. We try to maintain compatibility with older versions of gcc and many platforms, however sometimes maintaining backwards compatibility becomes very hard. In general, compilers and tool chains that ship with LTS versions of major linux distributions should be fair game. In any case, create an issue with your compiler version and architecture and we'll try to implement compatibility.
57
+
58
+ ## How do I run fastText in a fully reproducible way? Each time I run it I get different results.
59
+ If you run fastText multiple times you'll obtain slightly different results each time due to the optimization algorithm (asynchronous stochastic gradient descent, or Hogwild). If you need to get the same results (e.g. to confront different input params set) you have to set the 'thread' parameter to 1. In this way you'll get exactly the same performances at each run (with the same input params).
60
+
61
+
62
+ ## Why do I get a probability of 1.00001?
63
+ This is a known rounding issue. You can consider it as 1.0.
@@ -0,0 +1,47 @@
1
+ ---
2
+ id: language-identification
3
+ title: Language identification
4
+ ---
5
+
6
+ ### Description
7
+
8
+ We distribute two models for language identification, which can recognize 176 languages (see the list of ISO codes below). These models were trained on data from [Wikipedia](https://www.wikipedia.org/), [Tatoeba](https://tatoeba.org/eng/) and [SETimes](http://nlp.ffzg.hr/resources/corpora/setimes/), used under [CC-BY-SA](http://creativecommons.org/licenses/by-sa/3.0/).
9
+
10
+ We distribute two versions of the models:
11
+
12
+ * [lid.176.bin](https://dl.fbaipublicfiles.com/fasttext/supervised-models/lid.176.bin), which is faster and slightly more accurate, but has a file size of 126MB ;
13
+ * [lid.176.ftz](https://dl.fbaipublicfiles.com/fasttext/supervised-models/lid.176.ftz), which is the compressed version of the model, with a file size of 917kB.
14
+
15
+ These models were trained on UTF-8 data, and therefore expect UTF-8 as input.
16
+
17
+ ### License
18
+
19
+ The models are distributed under the [*Creative Commons Attribution-Share-Alike License 3.0*](https://creativecommons.org/licenses/by-sa/3.0/).
20
+
21
+ ### List of supported languages
22
+ ```
23
+ af als am an ar arz as ast av az azb ba bar bcl be bg bh bn bo bpy br bs bxr ca cbk ce ceb ckb co cs cv cy da de diq dsb dty dv el eml en eo es et eu fa fi fr frr fy ga gd gl gn gom gu gv he hi hif hr hsb ht hu hy ia id ie ilo io is it ja jbo jv ka kk km kn ko krc ku kv kw ky la lb lez li lmo lo lrc lt lv mai mg mhr min mk ml mn mr mrj ms mt mwl my myv mzn nah nap nds ne new nl nn no oc or os pa pam pfl pl pms pnb ps pt qu rm ro ru rue sa sah sc scn sco sd sh si sk sl so sq sr su sv sw ta te tg th tk tl tr tt tyv ug uk ur uz vec vep vi vls vo wa war wuu xal xmf yi yo yue zh
24
+ ```
25
+
26
+ ### References
27
+
28
+ If you use these models, please cite the following papers:
29
+
30
+ [1] A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, [*Bag of Tricks for Efficient Text Classification*](https://arxiv.org/abs/1607.01759)
31
+ ```
32
+ @article{joulin2016bag,
33
+ title={Bag of Tricks for Efficient Text Classification},
34
+ author={Joulin, Armand and Grave, Edouard and Bojanowski, Piotr and Mikolov, Tomas},
35
+ journal={arXiv preprint arXiv:1607.01759},
36
+ year={2016}
37
+ }
38
+ ```
39
+ [2] A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jégou, T. Mikolov, [*FastText.zip: Compressing text classification models* ](https://arxiv.org/abs/1612.03651)
40
+ ```
41
+ @article{joulin2016fasttext,
42
+ title={FastText.zip: Compressing text classification models},
43
+ author={Joulin, Armand and Grave, Edouard and Bojanowski, Piotr and Douze, Matthijs and J{\'e}gou, H{\'e}rve and Mikolov, Tomas},
44
+ journal={arXiv preprint arXiv:1612.03651},
45
+ year={2016}
46
+ }
47
+ ```
@@ -0,0 +1,50 @@
1
+ ---
2
+ id: options
3
+ title: List of options
4
+ ---
5
+
6
+ Invoke a command without arguments to list available arguments and their default values:
7
+
8
+ ```bash
9
+ $ ./fasttext supervised
10
+ Empty input or output path.
11
+
12
+ The following arguments are mandatory:
13
+ -input training file path
14
+ -output output file path
15
+
16
+ The following arguments are optional:
17
+ -verbose verbosity level [2]
18
+
19
+ The following arguments for the dictionary are optional:
20
+ -minCount minimal number of word occurrences [5]
21
+ -minCountLabel minimal number of label occurrences [0]
22
+ -wordNgrams max length of word ngram [1]
23
+ -bucket number of buckets [2000000]
24
+ -minn min length of char ngram [3]
25
+ -maxn max length of char ngram [6]
26
+ -t sampling threshold [0.0001]
27
+ -label labels prefix [__label__]
28
+
29
+ The following arguments for training are optional:
30
+ -lr learning rate [0.05]
31
+ -lrUpdateRate change the rate of updates for the learning rate [100]
32
+ -dim size of word vectors [100]
33
+ -ws size of the context window [5]
34
+ -epoch number of epochs [5]
35
+ -neg number of negatives sampled [5]
36
+ -loss loss function {ns, hs, softmax} [ns]
37
+ -thread number of threads [12]
38
+ -pretrainedVectors pretrained word vectors for supervised learning []
39
+ -saveOutput whether output params should be saved [0]
40
+
41
+ The following arguments for quantization are optional:
42
+ -cutoff number of words and ngrams to retain [0]
43
+ -retrain finetune embeddings if a cutoff is applied [0]
44
+ -qnorm quantizing the norm separately [0]
45
+ -qout quantizing the classifier [0]
46
+ -dsub size of each sub-vector [2]
47
+ ```
48
+
49
+ Defaults may vary by mode. (Word-representation modes `skipgram` and `cbow` use a default `-minCount` of 5.)
50
+
@@ -0,0 +1,142 @@
1
+ ---
2
+ id: pretrained-vectors
3
+ title: Wiki word vectors
4
+ ---
5
+
6
+ We are publishing pre-trained word vectors for 294 languages, trained on [*Wikipedia*](https://www.wikipedia.org) using fastText.
7
+ These vectors in dimension 300 were obtained using the skip-gram model described in [*Bojanowski et al. (2016)*](https://arxiv.org/abs/1607.04606) with default parameters.
8
+
9
+ Please note that a newer version of multi-lingual word vectors are available at: [Word vectors for 157 languages](https://fasttext.cc/docs/en/crawl-vectors.html).
10
+
11
+ ### Models
12
+
13
+ The models can be downloaded from:
14
+
15
+ ||||
16
+ |-|-|-|
17
+ | Abkhazian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ab.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ab.vec) | Acehnese: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ace.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ace.vec) | Adyghe: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ady.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ady.vec) |
18
+ | Afar: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.aa.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.aa.vec) | Afrikaans: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.af.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.af.vec) | Akan: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ak.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ak.vec) |
19
+ | Albanian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.sq.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.sq.vec) | Alemannic: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.als.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.als.vec) | Amharic: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.am.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.am.vec) |
20
+ | Anglo_Saxon: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ang.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ang.vec) | Arabic: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ar.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ar.vec) | Aragonese: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.an.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.an.vec) |
21
+ | Aramaic: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.arc.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.arc.vec) | Armenian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.hy.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.hy.vec) | Aromanian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.roa_rup.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.roa_rup.vec) |
22
+ | Assamese: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.as.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.as.vec) | Asturian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ast.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ast.vec) | Avar: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.av.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.av.vec) |
23
+ | Aymara: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ay.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ay.vec) | Azerbaijani: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.az.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.az.vec) | Bambara: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.bm.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.bm.vec) |
24
+ | Banjar: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.bjn.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.bjn.vec) | Banyumasan: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.map_bms.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.map_bms.vec) | Bashkir: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ba.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ba.vec) |
25
+ | Basque: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.eu.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.eu.vec) | Bavarian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.bar.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.bar.vec) | Belarusian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.be.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.be.vec) |
26
+ | Bengali: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.bn.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.bn.vec) | Bihari: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.bh.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.bh.vec) | Bishnupriya Manipuri: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.bpy.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.bpy.vec) |
27
+ | Bislama: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.bi.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.bi.vec) | Bosnian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.bs.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.bs.vec) | Breton: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.br.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.br.vec) |
28
+ | Buginese: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.bug.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.bug.vec) | Bulgarian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.bg.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.bg.vec) | Burmese: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.my.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.my.vec) |
29
+ | Buryat: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.bxr.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.bxr.vec) | Cantonese: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.zh_yue.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.zh_yue.vec) | Catalan: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ca.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ca.vec) |
30
+ | Cebuano: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ceb.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ceb.vec) | Central Bicolano: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.bcl.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.bcl.vec) | Chamorro: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ch.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ch.vec) |
31
+ | Chavacano: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.cbk_zam.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.cbk_zam.vec) | Chechen: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ce.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ce.vec) | Cherokee: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.chr.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.chr.vec) |
32
+ | Cheyenne: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.chy.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.chy.vec) | Chichewa: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ny.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ny.vec) | Chinese: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.zh.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.zh.vec) |
33
+ | Choctaw: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.cho.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.cho.vec) | Chuvash: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.cv.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.cv.vec) | Classical Chinese: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.zh_classical.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.zh_classical.vec) |
34
+ | Cornish: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.kw.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.kw.vec) | Corsican: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.co.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.co.vec) | Cree: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.cr.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.cr.vec) |
35
+ | Crimean Tatar: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.crh.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.crh.vec) | Croatian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.hr.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.hr.vec) | Czech: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.cs.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.cs.vec) |
36
+ | Danish: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.da.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.da.vec) | Divehi: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.dv.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.dv.vec) | Dutch: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.nl.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.nl.vec) |
37
+ | Dutch Low Saxon: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.nds_nl.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.nds_nl.vec) | Dzongkha: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.dz.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.dz.vec) | Eastern Punjabi: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.pa.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.pa.vec) |
38
+ | Egyptian Arabic: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.arz.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.arz.vec) | Emilian_Romagnol: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.eml.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.eml.vec) | English: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.en.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.en.vec) |
39
+ | Erzya: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.myv.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.myv.vec) | Esperanto: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.eo.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.eo.vec) | Estonian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.et.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.et.vec) |
40
+ | Ewe: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ee.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ee.vec) | Extremaduran: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ext.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ext.vec) | Faroese: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.fo.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.fo.vec) |
41
+ | Fiji Hindi: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.hif.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.hif.vec) | Fijian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.fj.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.fj.vec) | Finnish: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.fi.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.fi.vec) |
42
+ | Franco_Provençal: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.frp.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.frp.vec) | French: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.fr.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.fr.vec) | Friulian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.fur.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.fur.vec) |
43
+ | Fula: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ff.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ff.vec) | Gagauz: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.gag.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.gag.vec) | Galician: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.gl.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.gl.vec) |
44
+ | Gan: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.gan.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.gan.vec) | Georgian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ka.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ka.vec) | German: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.de.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.de.vec) |
45
+ | Gilaki: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.glk.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.glk.vec) | Goan Konkani: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.gom.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.gom.vec) | Gothic: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.got.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.got.vec) |
46
+ | Greek: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.el.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.el.vec) | Greenlandic: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.kl.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.kl.vec) | Guarani: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.gn.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.gn.vec) |
47
+ | Gujarati: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.gu.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.gu.vec) | Haitian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ht.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ht.vec) | Hakka: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.hak.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.hak.vec) |
48
+ | Hausa: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ha.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ha.vec) | Hawaiian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.haw.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.haw.vec) | Hebrew: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.he.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.he.vec) |
49
+ | Herero: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.hz.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.hz.vec) | Hill Mari: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.mrj.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.mrj.vec) | Hindi: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.hi.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.hi.vec) |
50
+ | Hiri Motu: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ho.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ho.vec) | Hungarian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.hu.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.hu.vec) | Icelandic: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.is.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.is.vec) |
51
+ | Ido: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.io.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.io.vec) | Igbo: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ig.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ig.vec) | Ilokano: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ilo.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ilo.vec) |
52
+ | Indonesian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.id.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.id.vec) | Interlingua: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ia.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ia.vec) | Interlingue: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ie.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ie.vec) |
53
+ | Inuktitut: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.iu.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.iu.vec) | Inupiak: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ik.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ik.vec) | Irish: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ga.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ga.vec) |
54
+ | Italian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.it.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.it.vec) | Jamaican Patois: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.jam.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.jam.vec) | Japanese: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ja.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ja.vec) |
55
+ | Javanese: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.jv.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.jv.vec) | Kabardian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.kbd.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.kbd.vec) | Kabyle: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.kab.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.kab.vec) |
56
+ | Kalmyk: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.xal.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.xal.vec) | Kannada: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.kn.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.kn.vec) | Kanuri: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.kr.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.kr.vec) |
57
+ | Kapampangan: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.pam.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.pam.vec) | Karachay_Balkar: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.krc.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.krc.vec) | Karakalpak: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.kaa.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.kaa.vec) |
58
+ | Kashmiri: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ks.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ks.vec) | Kashubian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.csb.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.csb.vec) | Kazakh: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.kk.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.kk.vec) |
59
+ | Khmer: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.km.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.km.vec) | Kikuyu: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ki.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ki.vec) | Kinyarwanda: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.rw.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.rw.vec) |
60
+ | Kirghiz: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ky.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ky.vec) | Kirundi: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.rn.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.rn.vec) | Komi: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.kv.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.kv.vec) |
61
+ | Komi_Permyak: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.koi.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.koi.vec) | Kongo: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.kg.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.kg.vec) | Korean: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ko.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ko.vec) |
62
+ | Kuanyama: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.kj.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.kj.vec) | Kurdish (Kurmanji): [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ku.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ku.vec) | Kurdish (Sorani): [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ckb.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ckb.vec) |
63
+ | Ladino: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.lad.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.lad.vec) | Lak: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.lbe.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.lbe.vec) | Lao: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.lo.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.lo.vec) |
64
+ | Latgalian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ltg.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ltg.vec) | Latin: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.la.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.la.vec) | Latvian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.lv.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.lv.vec) |
65
+ | Lezgian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.lez.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.lez.vec) | Ligurian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.lij.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.lij.vec) | Limburgish: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.li.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.li.vec) |
66
+ | Lingala: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ln.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ln.vec) | Lithuanian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.lt.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.lt.vec) | Livvi_Karelian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.olo.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.olo.vec) |
67
+ | Lojban: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.jbo.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.jbo.vec) | Lombard: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.lmo.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.lmo.vec) | Low Saxon: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.nds.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.nds.vec) |
68
+ | Lower Sorbian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.dsb.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.dsb.vec) | Luganda: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.lg.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.lg.vec) | Luxembourgish: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.lb.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.lb.vec) |
69
+ | Macedonian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.mk.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.mk.vec) | Maithili: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.mai.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.mai.vec) | Malagasy: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.mg.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.mg.vec) |
70
+ | Malay: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ms.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ms.vec) | Malayalam: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ml.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ml.vec) | Maltese: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.mt.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.mt.vec) |
71
+ | Manx: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.gv.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.gv.vec) | Maori: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.mi.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.mi.vec) | Marathi: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.mr.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.mr.vec) |
72
+ | Marshallese: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.mh.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.mh.vec) | Mazandarani: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.mzn.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.mzn.vec) | Meadow Mari: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.mhr.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.mhr.vec) |
73
+ | Min Dong: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.cdo.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.cdo.vec) | Min Nan: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.zh_min_nan.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.zh_min_nan.vec) | Minangkabau: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.min.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.min.vec) |
74
+ | Mingrelian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.xmf.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.xmf.vec) | Mirandese: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.mwl.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.mwl.vec) | Moksha: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.mdf.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.mdf.vec) |
75
+ | Moldovan: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.mo.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.mo.vec) | Mongolian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.mn.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.mn.vec) | Muscogee: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.mus.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.mus.vec) |
76
+ | Nahuatl: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.nah.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.nah.vec) | Nauruan: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.na.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.na.vec) | Navajo: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.nv.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.nv.vec) |
77
+ | Ndonga: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ng.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ng.vec) | Neapolitan: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.nap.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.nap.vec) | Nepali: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ne.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ne.vec) |
78
+ | Newar: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.new.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.new.vec) | Norfolk: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.pih.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.pih.vec) | Norman: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.nrm.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.nrm.vec) |
79
+ | North Frisian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.frr.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.frr.vec) | Northern Luri: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.lrc.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.lrc.vec) | Northern Sami: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.se.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.se.vec) |
80
+ | Northern Sotho: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.nso.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.nso.vec) | Norwegian (Bokmål): [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.no.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.no.vec) | Norwegian (Nynorsk): [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.nn.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.nn.vec) |
81
+ | Novial: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.nov.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.nov.vec) | Nuosu: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ii.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ii.vec) | Occitan: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.oc.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.oc.vec) |
82
+ | Old Church Slavonic: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.cu.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.cu.vec) | Oriya: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.or.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.or.vec) | Oromo: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.om.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.om.vec) |
83
+ | Ossetian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.os.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.os.vec) | Palatinate German: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.pfl.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.pfl.vec) | Pali: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.pi.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.pi.vec) |
84
+ | Pangasinan: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.pag.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.pag.vec) | Papiamentu: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.pap.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.pap.vec) | Pashto: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ps.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ps.vec) |
85
+ | Pennsylvania German: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.pdc.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.pdc.vec) | Persian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.fa.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.fa.vec) | Picard: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.pcd.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.pcd.vec) |
86
+ | Piedmontese: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.pms.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.pms.vec) | Polish: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.pl.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.pl.vec) | Pontic: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.pnt.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.pnt.vec) |
87
+ | Portuguese: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.pt.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.pt.vec) | Quechua: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.qu.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.qu.vec) | Ripuarian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ksh.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ksh.vec) |
88
+ | Romani: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.rmy.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.rmy.vec) | Romanian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ro.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ro.vec) | Romansh: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.rm.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.rm.vec) |
89
+ | Russian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ru.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ru.vec) | Rusyn: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.rue.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.rue.vec) | Sakha: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.sah.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.sah.vec) |
90
+ | Samoan: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.sm.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.sm.vec) | Samogitian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.bat_smg.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.bat_smg.vec) | Sango: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.sg.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.sg.vec) |
91
+ | Sanskrit: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.sa.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.sa.vec) | Sardinian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.sc.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.sc.vec) | Saterland Frisian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.stq.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.stq.vec) |
92
+ | Scots: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.sco.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.sco.vec) | Scottish Gaelic: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.gd.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.gd.vec) | Serbian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.sr.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.sr.vec) |
93
+ | Serbo_Croatian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.sh.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.sh.vec) | Sesotho: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.st.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.st.vec) | Shona: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.sn.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.sn.vec) |
94
+ | Sicilian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.scn.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.scn.vec) | Silesian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.szl.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.szl.vec) | Simple English: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.simple.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.simple.vec) |
95
+ | Sindhi: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.sd.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.sd.vec) | Sinhalese: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.si.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.si.vec) | Slovak: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.sk.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.sk.vec) |
96
+ | Slovenian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.sl.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.sl.vec) | Somali: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.so.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.so.vec) | Southern Azerbaijani: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.azb.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.azb.vec) |
97
+ | Spanish: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.es.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.es.vec) | Sranan: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.srn.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.srn.vec) | Sundanese: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.su.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.su.vec) |
98
+ | Swahili: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.sw.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.sw.vec) | Swati: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ss.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ss.vec) | Swedish: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.sv.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.sv.vec) |
99
+ | Tagalog: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.tl.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.tl.vec) | Tahitian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ty.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ty.vec) | Tajik: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.tg.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.tg.vec) |
100
+ | Tamil: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ta.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ta.vec) | Tarantino: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.roa_tara.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.roa_tara.vec) | Tatar: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.tt.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.tt.vec) |
101
+ | Telugu: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.te.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.te.vec) | Tetum: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.tet.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.tet.vec) | Thai: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.th.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.th.vec) |
102
+ | Tibetan: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.bo.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.bo.vec) | Tigrinya: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ti.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ti.vec) | Tok Pisin: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.tpi.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.tpi.vec) |
103
+ | Tongan: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.to.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.to.vec) | Tsonga: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ts.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ts.vec) | Tswana: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.tn.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.tn.vec) |
104
+ | Tulu: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.tcy.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.tcy.vec) | Tumbuka: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.tum.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.tum.vec) | Turkish: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.tr.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.tr.vec) |
105
+ | Turkmen: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.tk.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.tk.vec) | Tuvan: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.tyv.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.tyv.vec) | Twi: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.tw.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.tw.vec) |
106
+ | Udmurt: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.udm.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.udm.vec) | Ukrainian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.uk.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.uk.vec) | Upper Sorbian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.hsb.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.hsb.vec) |
107
+ | Urdu: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ur.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ur.vec) | Uyghur: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ug.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ug.vec) | Uzbek: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.uz.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.uz.vec) |
108
+ | Venda: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ve.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ve.vec) | Venetian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.vec.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.vec.vec) | Vepsian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.vep.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.vep.vec) |
109
+ | Vietnamese: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.vi.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.vi.vec) | Volapük: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.vo.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.vo.vec) | Võro: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.fiu_vro.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.fiu_vro.vec) |
110
+ | Walloon: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.wa.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.wa.vec) | Waray: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.war.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.war.vec) | Welsh: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.cy.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.cy.vec) |
111
+ | West Flemish: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.vls.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.vls.vec) | West Frisian: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.fy.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.fy.vec) | Western Punjabi: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.pnb.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.pnb.vec) |
112
+ | Wolof: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.wo.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.wo.vec) | Wu: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.wuu.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.wuu.vec) | Xhosa: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.xh.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.xh.vec) |
113
+ | Yiddish: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.yi.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.yi.vec) | Yoruba: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.yo.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.yo.vec) | Zazaki: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.diq.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.diq.vec) |
114
+ | Zeelandic: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.zea.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.zea.vec) | Zhuang: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.za.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.za.vec) | Zulu: [*bin+text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.zu.zip), [*text*](https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.zu.vec) |
115
+
116
+ ### Format
117
+
118
+ The word vectors come in both the binary and text default formats of fastText.
119
+ In the text format, each line contains a word followed by its vector. Each value is space separated.
120
+ Words are ordered by their frequency in a descending order.
121
+
122
+ ### License
123
+
124
+ The word vectors are distributed under the [*Creative Commons Attribution-Share-Alike License 3.0*](https://creativecommons.org/licenses/by-sa/3.0/).
125
+
126
+ ### References
127
+
128
+ If you use these word vectors, please cite the following paper:
129
+
130
+ P. Bojanowski\*, E. Grave\*, A. Joulin, T. Mikolov, [*Enriching Word Vectors with Subword Information*](https://arxiv.org/abs/1607.04606)
131
+
132
+ ```markup
133
+ @article{bojanowski2017enriching,
134
+ title={Enriching Word Vectors with Subword Information},
135
+ author={Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas},
136
+ journal={Transactions of the Association for Computational Linguistics},
137
+ volume={5},
138
+ year={2017},
139
+ issn={2307-387X},
140
+ pages={135--146}
141
+ }
142
+ ```
@@ -0,0 +1,314 @@
1
+ ---
2
+ id: python-module
3
+ title: Python module
4
+ ---
5
+
6
+ In this document we present how to use fastText in python.
7
+
8
+ ## Table of contents
9
+
10
+ * [Requirements](#requirements)
11
+ * [Installation](#installation)
12
+ * [Usage overview](#usage-overview)
13
+ * [Word representation model](#word-representation-model)
14
+ * [Text classification model](#text-classification-model)
15
+ * [IMPORTANT: Preprocessing data / encoding conventions](#important-preprocessing-data-encoding-conventions)
16
+ * [More examples](#more-examples)
17
+ * [API](#api)
18
+ * [`train_unsupervised` parameters](#train_unsupervised-parameters)
19
+ * [`train_supervised` parameters](#train_supervised-parameters)
20
+ * [`model` object](#model-object)
21
+
22
+
23
+ # Requirements
24
+
25
+ [fastText](https://fasttext.cc/) builds on modern Mac OS and Linux distributions.
26
+ Since it uses C\++11 features, it requires a compiler with good C++11 support. You will need [Python](https://www.python.org/) (version 2.7 or ≥ 3.4), [NumPy](http://www.numpy.org/) & [SciPy](https://www.scipy.org/) and [pybind11](https://github.com/pybind/pybind11).
27
+
28
+
29
+ # Installation
30
+
31
+ To install the latest release, you can do :
32
+ ```bash
33
+ $ pip install fasttext
34
+ ```
35
+
36
+ or, to get the latest development version of fasttext, you can install from our github repository :
37
+ ```bash
38
+ $ git clone https://github.com/facebookresearch/fastText.git
39
+ $ cd fastText
40
+ $ sudo pip install .
41
+ $ # or :
42
+ $ sudo python setup.py install
43
+ ```
44
+
45
+ # Usage overview
46
+
47
+
48
+ ## Word representation model
49
+
50
+ In order to learn word vectors, as [described here](/docs/en/references.html#enriching-word-vectors-with-subword-information), we can use `fasttext.train_unsupervised` function like this:
51
+
52
+
53
+ ```py
54
+ import fasttext
55
+
56
+ # Skipgram model :
57
+ model = fasttext.train_unsupervised('data.txt', model='skipgram')
58
+
59
+ # or, cbow model :
60
+ model = fasttext.train_unsupervised('data.txt', model='cbow')
61
+
62
+ ```
63
+
64
+ where `data.txt` is a training file containing utf-8 encoded text.
65
+
66
+
67
+ The returned `model` object represents your learned model, and you can use it to retrieve information.
68
+
69
+ ```py
70
+ print(model.words) # list of words in dictionary
71
+ print(model['king']) # get the vector of the word 'king'
72
+ ```
73
+
74
+
75
+ ### Saving and loading a model object
76
+
77
+ You can save your trained model object by calling the function `save_model`.
78
+ ```py
79
+ model.save_model("model_filename.bin")
80
+ ```
81
+
82
+ and retrieve it later thanks to the function `load_model` :
83
+ ```py
84
+ model = fasttext.load_model("model_filename.bin")
85
+ ```
86
+
87
+ For more information about word representation usage of fasttext, you can refer to our [word representations tutorial](/docs/en/unsupervised-tutorial.html).
88
+
89
+
90
+ ## Text classification model
91
+
92
+ In order to train a text classifier using the method [described here](/docs/en/references.html#bag-of-tricks-for-efficient-text-classification), we can use `fasttext.train_supervised` function like this:
93
+
94
+
95
+ ```py
96
+ import fasttext
97
+
98
+ model = fasttext.train_supervised('data.train.txt')
99
+ ```
100
+
101
+ where `data.train.txt` is a text file containing a training sentence per line along with the labels. By default, we assume that labels are words that are prefixed by the string `__label__`
102
+
103
+ Once the model is trained, we can retrieve the list of words and labels:
104
+
105
+ ```py
106
+ print(model.words)
107
+ print(model.labels)
108
+ ```
109
+
110
+ To evaluate our model by computing the precision at 1 (P@1) and the recall on a test set, we use the `test` function:
111
+
112
+ ```py
113
+ def print_results(N, p, r):
114
+ print("N\t" + str(N))
115
+ print("P@{}\t{:.3f}".format(1, p))
116
+ print("R@{}\t{:.3f}".format(1, r))
117
+
118
+ print_results(*model.test('test.txt'))
119
+ ```
120
+
121
+ We can also predict labels for a specific text :
122
+
123
+ ```py
124
+ model.predict("Which baking dish is best to bake a banana bread ?")
125
+ ```
126
+
127
+ By default, `predict` returns only one label : the one with the highest probability. You can also predict more than one label by specifying the parameter `k`:
128
+ ```py
129
+ model.predict("Which baking dish is best to bake a banana bread ?", k=3)
130
+ ```
131
+
132
+ If you want to predict more than one sentence you can pass an array of strings :
133
+
134
+ ```py
135
+ model.predict(["Which baking dish is best to bake a banana bread ?", "Why not put knives in the dishwasher?"], k=3)
136
+ ```
137
+
138
+
139
+ Of course, you can also save and load a model to/from a file as [in the word representation usage](#saving-and-loading-a-model-object).
140
+
141
+ For more information about text classification usage of fasttext, you can refer to our [text classification tutorial](/docs/en/supervised-tutorial.html).
142
+
143
+
144
+
145
+
146
+ ### Compress model files with quantization
147
+
148
+ When you want to save a supervised model file, fastText can compress it in order to have a much smaller model file by sacrificing only a little bit performance.
149
+
150
+ ```py
151
+ # with the previously trained `model` object, call :
152
+ model.quantize(input='data.train.txt', retrain=True)
153
+
154
+ # then display results and save the new model :
155
+ print_results(*model.test(valid_data))
156
+ model.save_model("model_filename.ftz")
157
+ ```
158
+
159
+ `model_filename.ftz` will have a much smaller size than `model_filename.bin`.
160
+
161
+ For further reading on quantization, you can refer to [this paragraph from our blog post](/blog/2017/10/02/blog-post.html#model-compression).
162
+
163
+
164
+ ## IMPORTANT: Preprocessing data / encoding conventions
165
+
166
+ In general it is important to properly preprocess your data. In particular our example scripts in the [root folder](https://github.com/facebookresearch/fastText) do this.
167
+
168
+ fastText assumes UTF-8 encoded text. All text must be [unicode for Python2](https://docs.python.org/2/library/functions.html#unicode) and [str for Python3](https://docs.python.org/3.5/library/stdtypes.html#textseq). The passed text will be [encoded as UTF-8 by pybind11](https://pybind11.readthedocs.io/en/master/advanced/cast/strings.html?highlight=utf-8#strings-bytes-and-unicode-conversions) before passed to the fastText C++ library. This means it is important to use UTF-8 encoded text when building a model. On Unix-like systems you can convert text using [iconv](https://en.wikipedia.org/wiki/Iconv).
169
+
170
+ fastText will tokenize (split text into pieces) based on the following ASCII characters (bytes). In particular, it is not aware of UTF-8 whitespace. We advice the user to convert UTF-8 whitespace / word boundaries into one of the following symbols as appropiate.
171
+
172
+ * space
173
+ * tab
174
+ * vertical tab
175
+ * carriage return
176
+ * formfeed
177
+ * the null character
178
+
179
+ The newline character is used to delimit lines of text. In particular, the EOS token is appended to a line of text if a newline character is encountered. The only exception is if the number of tokens exceeds the MAX\_LINE\_SIZE constant as defined in the [Dictionary header](https://github.com/facebookresearch/fastText/blob/master/src/dictionary.h). This means if you have text that is not separate by newlines, such as the [fil9 dataset](http://mattmahoney.net/dc/textdata), it will be broken into chunks with MAX\_LINE\_SIZE of tokens and the EOS token is not appended.
180
+
181
+ The length of a token is the number of UTF-8 characters by considering the [leading two bits of a byte](https://en.wikipedia.org/wiki/UTF-8#Description) to identify [subsequent bytes of a multi-byte sequence](https://github.com/facebookresearch/fastText/blob/master/src/dictionary.cc). Knowing this is especially important when choosing the minimum and maximum length of subwords. Further, the EOS token (as specified in the [Dictionary header](https://github.com/facebookresearch/fastText/blob/master/src/dictionary.h)) is considered a character and will not be broken into subwords.
182
+
183
+ ## More examples
184
+
185
+ In order to have a better knowledge of fastText models, please consider the main [README](https://github.com/facebookresearch/fastText/blob/master/README.md) and in particular [the tutorials on our website](https://fasttext.cc/docs/en/supervised-tutorial.html).
186
+
187
+ You can find further python examples in [the doc folder](https://github.com/facebookresearch/fastText/tree/master/python/doc/examples).
188
+
189
+ As with any package you can get help on any Python function using the help function.
190
+
191
+ For example
192
+
193
+ ```
194
+ +>>> import fasttext
195
+ +>>> help(fasttext.FastText)
196
+
197
+ Help on module fasttext.FastText in fasttext:
198
+
199
+ NAME
200
+ fasttext.FastText
201
+
202
+ DESCRIPTION
203
+ # Copyright (c) 2017-present, Facebook, Inc.
204
+ # All rights reserved.
205
+ #
206
+ # This source code is licensed under the MIT license found in the
207
+ # LICENSE file in the root directory of this source tree.
208
+
209
+ FUNCTIONS
210
+ load_model(path)
211
+ Load a model given a filepath and return a model object.
212
+
213
+ tokenize(text)
214
+ Given a string of text, tokenize it and return a list of tokens
215
+ [...]
216
+ ```
217
+
218
+
219
+ # API
220
+
221
+
222
+ ## `train_unsupervised` parameters
223
+
224
+ ```python
225
+ input # training file path (required)
226
+ model # unsupervised fasttext model {cbow, skipgram} [skipgram]
227
+ lr # learning rate [0.05]
228
+ dim # size of word vectors [100]
229
+ ws # size of the context window [5]
230
+ epoch # number of epochs [5]
231
+ minCount # minimal number of word occurences [5]
232
+ minn # min length of char ngram [3]
233
+ maxn # max length of char ngram [6]
234
+ neg # number of negatives sampled [5]
235
+ wordNgrams # max length of word ngram [1]
236
+ loss # loss function {ns, hs, softmax, ova} [ns]
237
+ bucket # number of buckets [2000000]
238
+ thread # number of threads [number of cpus]
239
+ lrUpdateRate # change the rate of updates for the learning rate [100]
240
+ t # sampling threshold [0.0001]
241
+ verbose # verbose [2]
242
+ ```
243
+
244
+ ## `train_supervised` parameters
245
+
246
+ ```python
247
+ input # training file path (required)
248
+ lr # learning rate [0.1]
249
+ dim # size of word vectors [100]
250
+ ws # size of the context window [5]
251
+ epoch # number of epochs [5]
252
+ minCount # minimal number of word occurences [1]
253
+ minCountLabel # minimal number of label occurences [1]
254
+ minn # min length of char ngram [0]
255
+ maxn # max length of char ngram [0]
256
+ neg # number of negatives sampled [5]
257
+ wordNgrams # max length of word ngram [1]
258
+ loss # loss function {ns, hs, softmax, ova} [softmax]
259
+ bucket # number of buckets [2000000]
260
+ thread # number of threads [number of cpus]
261
+ lrUpdateRate # change the rate of updates for the learning rate [100]
262
+ t # sampling threshold [0.0001]
263
+ label # label prefix ['__label__']
264
+ verbose # verbose [2]
265
+ pretrainedVectors # pretrained word vectors (.vec file) for supervised learning []
266
+ ```
267
+
268
+ ## `model` object
269
+
270
+ `train_supervised`, `train_unsupervised` and `load_model` functions return an instance of `_FastText` class, that we generaly name `model` object.
271
+
272
+ This object exposes those training arguments as properties : `lr`, `dim`, `ws`, `epoch`, `minCount`, `minCountLabel`, `minn`, `maxn`, `neg`, `wordNgrams`, `loss`, `bucket`, `thread`, `lrUpdateRate`, `t`, `label`, `verbose`, `pretrainedVectors`. So `model.wordNgrams` will give you the max length of word ngram used for training this model.
273
+
274
+ In addition, the object exposes several functions :
275
+
276
+ ```python
277
+ get_dimension # Get the dimension (size) of a lookup vector (hidden layer).
278
+ # This is equivalent to `dim` property.
279
+ get_input_vector # Given an index, get the corresponding vector of the Input Matrix.
280
+ get_input_matrix # Get a copy of the full input matrix of a Model.
281
+ get_labels # Get the entire list of labels of the dictionary
282
+ # This is equivalent to `labels` property.
283
+ get_line # Split a line of text into words and labels.
284
+ get_output_matrix # Get a copy of the full output matrix of a Model.
285
+ get_sentence_vector # Given a string, get a single vector represenation. This function
286
+ # assumes to be given a single line of text. We split words on
287
+ # whitespace (space, newline, tab, vertical tab) and the control
288
+ # characters carriage return, formfeed and the null character.
289
+ get_subword_id # Given a subword, return the index (within input matrix) it hashes to.
290
+ get_subwords # Given a word, get the subwords and their indicies.
291
+ get_word_id # Given a word, get the word id within the dictionary.
292
+ get_word_vector # Get the vector representation of word.
293
+ get_words # Get the entire list of words of the dictionary
294
+ # This is equivalent to `words` property.
295
+ is_quantized # whether the model has been quantized
296
+ predict # Given a string, get a list of labels and a list of corresponding probabilities.
297
+ quantize # Quantize the model reducing the size of the model and it's memory footprint.
298
+ save_model # Save the model to the given path
299
+ test # Evaluate supervised model using file given by path
300
+ test_label # Return the precision and recall score for each label.
301
+ ```
302
+
303
+ The properties `words`, `labels` return the words and labels from the dictionary :
304
+ ```py
305
+ model.words # equivalent to model.get_words()
306
+ model.labels # equivalent to model.get_labels()
307
+ ```
308
+
309
+ The object overrides `__getitem__` and `__contains__` functions in order to return the representation of a word and to check if a word is in the vocabulary.
310
+
311
+ ```py
312
+ model['king'] # equivalent to model.get_word_vector('king')
313
+ 'king' in model # equivalent to `'king' in model.get_words()`
314
+ ```