fasttext 0.1.0 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (498) hide show
  1. checksums.yaml +4 -4
  2. data/CHANGELOG.md +26 -1
  3. data/LICENSE.txt +18 -18
  4. data/README.md +39 -12
  5. data/ext/fasttext/ext.cpp +108 -101
  6. data/ext/fasttext/extconf.rb +7 -9
  7. data/lib/fasttext.rb +3 -0
  8. data/lib/fasttext/classifier.rb +25 -7
  9. data/lib/fasttext/vectorizer.rb +7 -2
  10. data/lib/fasttext/version.rb +1 -1
  11. data/vendor/fastText/README.md +3 -3
  12. data/vendor/fastText/src/args.cc +179 -6
  13. data/vendor/fastText/src/args.h +29 -1
  14. data/vendor/fastText/src/autotune.cc +477 -0
  15. data/vendor/fastText/src/autotune.h +89 -0
  16. data/vendor/fastText/src/densematrix.cc +27 -7
  17. data/vendor/fastText/src/densematrix.h +10 -2
  18. data/vendor/fastText/src/fasttext.cc +125 -114
  19. data/vendor/fastText/src/fasttext.h +31 -52
  20. data/vendor/fastText/src/main.cc +32 -13
  21. data/vendor/fastText/src/meter.cc +148 -2
  22. data/vendor/fastText/src/meter.h +24 -2
  23. data/vendor/fastText/src/model.cc +0 -1
  24. data/vendor/fastText/src/real.h +0 -1
  25. data/vendor/fastText/src/utils.cc +25 -0
  26. data/vendor/fastText/src/utils.h +29 -0
  27. data/vendor/fastText/src/vector.cc +0 -1
  28. metadata +16 -539
  29. data/lib/fasttext/ext.bundle +0 -0
  30. data/vendor/fastText/CMakeLists.txt +0 -68
  31. data/vendor/fastText/CODE_OF_CONDUCT.md +0 -2
  32. data/vendor/fastText/CONTRIBUTING.md +0 -32
  33. data/vendor/fastText/MANIFEST.in +0 -5
  34. data/vendor/fastText/Makefile +0 -63
  35. data/vendor/fastText/alignment/README.md +0 -53
  36. data/vendor/fastText/alignment/align.py +0 -145
  37. data/vendor/fastText/alignment/eval.py +0 -60
  38. data/vendor/fastText/alignment/example.sh +0 -51
  39. data/vendor/fastText/alignment/unsup_align.py +0 -109
  40. data/vendor/fastText/alignment/utils.py +0 -154
  41. data/vendor/fastText/classification-example.sh +0 -41
  42. data/vendor/fastText/classification-results.sh +0 -94
  43. data/vendor/fastText/crawl/README.md +0 -26
  44. data/vendor/fastText/crawl/dedup.cc +0 -51
  45. data/vendor/fastText/crawl/download_crawl.sh +0 -57
  46. data/vendor/fastText/crawl/filter_dedup.sh +0 -13
  47. data/vendor/fastText/crawl/filter_utf8.cc +0 -105
  48. data/vendor/fastText/crawl/process_wet_file.sh +0 -30
  49. data/vendor/fastText/docs/aligned-vectors.md +0 -64
  50. data/vendor/fastText/docs/api.md +0 -6
  51. data/vendor/fastText/docs/cheatsheet.md +0 -66
  52. data/vendor/fastText/docs/crawl-vectors.md +0 -125
  53. data/vendor/fastText/docs/dataset.md +0 -6
  54. data/vendor/fastText/docs/english-vectors.md +0 -53
  55. data/vendor/fastText/docs/faqs.md +0 -63
  56. data/vendor/fastText/docs/language-identification.md +0 -47
  57. data/vendor/fastText/docs/options.md +0 -50
  58. data/vendor/fastText/docs/pretrained-vectors.md +0 -142
  59. data/vendor/fastText/docs/python-module.md +0 -314
  60. data/vendor/fastText/docs/references.md +0 -41
  61. data/vendor/fastText/docs/supervised-models.md +0 -54
  62. data/vendor/fastText/docs/supervised-tutorial.md +0 -349
  63. data/vendor/fastText/docs/support.md +0 -58
  64. data/vendor/fastText/docs/unsupervised-tutorials.md +0 -309
  65. data/vendor/fastText/eval.py +0 -95
  66. data/vendor/fastText/get-wikimedia.sh +0 -79
  67. data/vendor/fastText/python/README.md +0 -322
  68. data/vendor/fastText/python/README.rst +0 -406
  69. data/vendor/fastText/python/benchmarks/README.rst +0 -3
  70. data/vendor/fastText/python/benchmarks/get_word_vector.py +0 -49
  71. data/vendor/fastText/python/doc/examples/FastTextEmbeddingBag.py +0 -81
  72. data/vendor/fastText/python/doc/examples/bin_to_vec.py +0 -41
  73. data/vendor/fastText/python/doc/examples/compute_accuracy.py +0 -163
  74. data/vendor/fastText/python/doc/examples/get_vocab.py +0 -48
  75. data/vendor/fastText/python/doc/examples/train_supervised.py +0 -42
  76. data/vendor/fastText/python/doc/examples/train_unsupervised.py +0 -56
  77. data/vendor/fastText/python/fasttext_module/fasttext/FastText.py +0 -468
  78. data/vendor/fastText/python/fasttext_module/fasttext/__init__.py +0 -22
  79. data/vendor/fastText/python/fasttext_module/fasttext/pybind/fasttext_pybind.cc +0 -388
  80. data/vendor/fastText/python/fasttext_module/fasttext/tests/__init__.py +0 -14
  81. data/vendor/fastText/python/fasttext_module/fasttext/tests/test_configurations.py +0 -239
  82. data/vendor/fastText/python/fasttext_module/fasttext/tests/test_script.py +0 -629
  83. data/vendor/fastText/python/fasttext_module/fasttext/util/__init__.py +0 -13
  84. data/vendor/fastText/python/fasttext_module/fasttext/util/util.py +0 -60
  85. data/vendor/fastText/quantization-example.sh +0 -40
  86. data/vendor/fastText/runtests.py +0 -60
  87. data/vendor/fastText/scripts/kbcompletion/README.md +0 -19
  88. data/vendor/fastText/scripts/kbcompletion/data.sh +0 -69
  89. data/vendor/fastText/scripts/kbcompletion/eval.cpp +0 -108
  90. data/vendor/fastText/scripts/kbcompletion/fb15k.sh +0 -49
  91. data/vendor/fastText/scripts/kbcompletion/fb15k237.sh +0 -45
  92. data/vendor/fastText/scripts/kbcompletion/svo.sh +0 -38
  93. data/vendor/fastText/scripts/kbcompletion/wn18.sh +0 -49
  94. data/vendor/fastText/scripts/quantization/quantization-results.sh +0 -43
  95. data/vendor/fastText/setup.cfg +0 -2
  96. data/vendor/fastText/setup.py +0 -203
  97. data/vendor/fastText/tests/fetch_test_data.sh +0 -202
  98. data/vendor/fastText/website/README.md +0 -6
  99. data/vendor/fastText/website/blog/2016-08-18-blog-post.md +0 -42
  100. data/vendor/fastText/website/blog/2017-05-02-blog-post.md +0 -60
  101. data/vendor/fastText/website/blog/2017-10-02-blog-post.md +0 -90
  102. data/vendor/fastText/website/blog/2019-06-25-blog-post.md +0 -168
  103. data/vendor/fastText/website/core/Footer.js +0 -127
  104. data/vendor/fastText/website/package.json +0 -12
  105. data/vendor/fastText/website/pages/en/index.js +0 -286
  106. data/vendor/fastText/website/sidebars.json +0 -18
  107. data/vendor/fastText/website/siteConfig.js +0 -102
  108. data/vendor/fastText/website/static/docs/en/html/annotated.html +0 -115
  109. data/vendor/fastText/website/static/docs/en/html/annotated_dup.js +0 -4
  110. data/vendor/fastText/website/static/docs/en/html/args_8cc.html +0 -113
  111. data/vendor/fastText/website/static/docs/en/html/args_8h.html +0 -134
  112. data/vendor/fastText/website/static/docs/en/html/args_8h.js +0 -14
  113. data/vendor/fastText/website/static/docs/en/html/args_8h_source.html +0 -139
  114. data/vendor/fastText/website/static/docs/en/html/bc_s.png +0 -0
  115. data/vendor/fastText/website/static/docs/en/html/bdwn.png +0 -0
  116. data/vendor/fastText/website/static/docs/en/html/classes.html +0 -121
  117. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1Args-members.html +0 -140
  118. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1Args.html +0 -753
  119. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1Args.js +0 -40
  120. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1Dictionary-members.html +0 -148
  121. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1Dictionary.html +0 -1266
  122. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1Dictionary.js +0 -43
  123. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1FastText-members.html +0 -145
  124. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1FastText.html +0 -1149
  125. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1FastText.js +0 -45
  126. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1Matrix-members.html +0 -123
  127. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1Matrix.html +0 -610
  128. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1Matrix.js +0 -23
  129. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1Model-members.html +0 -150
  130. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1Model.html +0 -1400
  131. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1Model.js +0 -48
  132. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1ProductQuantizer-members.html +0 -131
  133. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1ProductQuantizer.html +0 -950
  134. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1ProductQuantizer.js +0 -31
  135. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1QMatrix-members.html +0 -122
  136. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1QMatrix.html +0 -565
  137. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1QMatrix.js +0 -22
  138. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1Vector-members.html +0 -121
  139. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1Vector.html +0 -542
  140. data/vendor/fastText/website/static/docs/en/html/classfasttext_1_1Vector.js +0 -21
  141. data/vendor/fastText/website/static/docs/en/html/closed.png +0 -0
  142. data/vendor/fastText/website/static/docs/en/html/dictionary_8cc.html +0 -116
  143. data/vendor/fastText/website/static/docs/en/html/dictionary_8h.html +0 -142
  144. data/vendor/fastText/website/static/docs/en/html/dictionary_8h.js +0 -10
  145. data/vendor/fastText/website/static/docs/en/html/dictionary_8h_source.html +0 -127
  146. data/vendor/fastText/website/static/docs/en/html/dir_68267d1309a1af8e8297ef4c3efbcdba.html +0 -145
  147. data/vendor/fastText/website/static/docs/en/html/dir_68267d1309a1af8e8297ef4c3efbcdba.js +0 -29
  148. data/vendor/fastText/website/static/docs/en/html/doc.png +0 -0
  149. data/vendor/fastText/website/static/docs/en/html/doxygen.css +0 -1596
  150. data/vendor/fastText/website/static/docs/en/html/doxygen.png +0 -0
  151. data/vendor/fastText/website/static/docs/en/html/dynsections.js +0 -97
  152. data/vendor/fastText/website/static/docs/en/html/fasttext_8cc.html +0 -119
  153. data/vendor/fastText/website/static/docs/en/html/fasttext_8h.html +0 -168
  154. data/vendor/fastText/website/static/docs/en/html/fasttext_8h.js +0 -6
  155. data/vendor/fastText/website/static/docs/en/html/fasttext_8h_source.html +0 -155
  156. data/vendor/fastText/website/static/docs/en/html/favicon.png +0 -0
  157. data/vendor/fastText/website/static/docs/en/html/files.html +0 -125
  158. data/vendor/fastText/website/static/docs/en/html/files.js +0 -4
  159. data/vendor/fastText/website/static/docs/en/html/folderclosed.png +0 -0
  160. data/vendor/fastText/website/static/docs/en/html/folderopen.png +0 -0
  161. data/vendor/fastText/website/static/docs/en/html/functions.html +0 -139
  162. data/vendor/fastText/website/static/docs/en/html/functions_0x7e.html +0 -112
  163. data/vendor/fastText/website/static/docs/en/html/functions_b.html +0 -115
  164. data/vendor/fastText/website/static/docs/en/html/functions_c.html +0 -143
  165. data/vendor/fastText/website/static/docs/en/html/functions_d.html +0 -135
  166. data/vendor/fastText/website/static/docs/en/html/functions_dup.js +0 -27
  167. data/vendor/fastText/website/static/docs/en/html/functions_e.html +0 -115
  168. data/vendor/fastText/website/static/docs/en/html/functions_f.html +0 -112
  169. data/vendor/fastText/website/static/docs/en/html/functions_func.html +0 -563
  170. data/vendor/fastText/website/static/docs/en/html/functions_g.html +0 -145
  171. data/vendor/fastText/website/static/docs/en/html/functions_h.html +0 -112
  172. data/vendor/fastText/website/static/docs/en/html/functions_i.html +0 -121
  173. data/vendor/fastText/website/static/docs/en/html/functions_k.html +0 -106
  174. data/vendor/fastText/website/static/docs/en/html/functions_l.html +0 -140
  175. data/vendor/fastText/website/static/docs/en/html/functions_m.html +0 -153
  176. data/vendor/fastText/website/static/docs/en/html/functions_n.html +0 -164
  177. data/vendor/fastText/website/static/docs/en/html/functions_o.html +0 -116
  178. data/vendor/fastText/website/static/docs/en/html/functions_p.html +0 -161
  179. data/vendor/fastText/website/static/docs/en/html/functions_q.html +0 -135
  180. data/vendor/fastText/website/static/docs/en/html/functions_r.html +0 -116
  181. data/vendor/fastText/website/static/docs/en/html/functions_s.html +0 -159
  182. data/vendor/fastText/website/static/docs/en/html/functions_t.html +0 -138
  183. data/vendor/fastText/website/static/docs/en/html/functions_u.html +0 -106
  184. data/vendor/fastText/website/static/docs/en/html/functions_v.html +0 -106
  185. data/vendor/fastText/website/static/docs/en/html/functions_vars.html +0 -486
  186. data/vendor/fastText/website/static/docs/en/html/functions_w.html +0 -124
  187. data/vendor/fastText/website/static/docs/en/html/functions_z.html +0 -104
  188. data/vendor/fastText/website/static/docs/en/html/globals.html +0 -170
  189. data/vendor/fastText/website/static/docs/en/html/globals_defs.html +0 -113
  190. data/vendor/fastText/website/static/docs/en/html/globals_func.html +0 -155
  191. data/vendor/fastText/website/static/docs/en/html/index.html +0 -100
  192. data/vendor/fastText/website/static/docs/en/html/jquery.js +0 -87
  193. data/vendor/fastText/website/static/docs/en/html/main_8cc.html +0 -582
  194. data/vendor/fastText/website/static/docs/en/html/main_8cc.js +0 -22
  195. data/vendor/fastText/website/static/docs/en/html/matrix_8cc.html +0 -114
  196. data/vendor/fastText/website/static/docs/en/html/matrix_8h.html +0 -121
  197. data/vendor/fastText/website/static/docs/en/html/matrix_8h_source.html +0 -123
  198. data/vendor/fastText/website/static/docs/en/html/menu.js +0 -26
  199. data/vendor/fastText/website/static/docs/en/html/menudata.js +0 -90
  200. data/vendor/fastText/website/static/docs/en/html/model_8cc.html +0 -113
  201. data/vendor/fastText/website/static/docs/en/html/model_8h.html +0 -183
  202. data/vendor/fastText/website/static/docs/en/html/model_8h.js +0 -8
  203. data/vendor/fastText/website/static/docs/en/html/model_8h_source.html +0 -139
  204. data/vendor/fastText/website/static/docs/en/html/namespacefasttext.html +0 -343
  205. data/vendor/fastText/website/static/docs/en/html/namespacefasttext.js +0 -13
  206. data/vendor/fastText/website/static/docs/en/html/namespacefasttext_1_1utils.html +0 -158
  207. data/vendor/fastText/website/static/docs/en/html/namespacemembers.html +0 -125
  208. data/vendor/fastText/website/static/docs/en/html/namespacemembers_enum.html +0 -107
  209. data/vendor/fastText/website/static/docs/en/html/namespacemembers_func.html +0 -110
  210. data/vendor/fastText/website/static/docs/en/html/namespacemembers_type.html +0 -104
  211. data/vendor/fastText/website/static/docs/en/html/namespaces.html +0 -106
  212. data/vendor/fastText/website/static/docs/en/html/namespaces.js +0 -4
  213. data/vendor/fastText/website/static/docs/en/html/nav_f.png +0 -0
  214. data/vendor/fastText/website/static/docs/en/html/nav_g.png +0 -0
  215. data/vendor/fastText/website/static/docs/en/html/nav_h.png +0 -0
  216. data/vendor/fastText/website/static/docs/en/html/navtree.css +0 -146
  217. data/vendor/fastText/website/static/docs/en/html/navtree.js +0 -517
  218. data/vendor/fastText/website/static/docs/en/html/navtreedata.js +0 -40
  219. data/vendor/fastText/website/static/docs/en/html/navtreeindex0.js +0 -253
  220. data/vendor/fastText/website/static/docs/en/html/navtreeindex1.js +0 -139
  221. data/vendor/fastText/website/static/docs/en/html/open.png +0 -0
  222. data/vendor/fastText/website/static/docs/en/html/productquantizer_8cc.html +0 -118
  223. data/vendor/fastText/website/static/docs/en/html/productquantizer_8cc.js +0 -4
  224. data/vendor/fastText/website/static/docs/en/html/productquantizer_8h.html +0 -124
  225. data/vendor/fastText/website/static/docs/en/html/productquantizer_8h_source.html +0 -133
  226. data/vendor/fastText/website/static/docs/en/html/qmatrix_8cc.html +0 -112
  227. data/vendor/fastText/website/static/docs/en/html/qmatrix_8h.html +0 -126
  228. data/vendor/fastText/website/static/docs/en/html/qmatrix_8h_source.html +0 -128
  229. data/vendor/fastText/website/static/docs/en/html/real_8h.html +0 -117
  230. data/vendor/fastText/website/static/docs/en/html/real_8h.js +0 -4
  231. data/vendor/fastText/website/static/docs/en/html/real_8h_source.html +0 -103
  232. data/vendor/fastText/website/static/docs/en/html/resize.js +0 -114
  233. data/vendor/fastText/website/static/docs/en/html/search/all_0.html +0 -26
  234. data/vendor/fastText/website/static/docs/en/html/search/all_0.js +0 -17
  235. data/vendor/fastText/website/static/docs/en/html/search/all_1.html +0 -26
  236. data/vendor/fastText/website/static/docs/en/html/search/all_1.js +0 -8
  237. data/vendor/fastText/website/static/docs/en/html/search/all_10.html +0 -26
  238. data/vendor/fastText/website/static/docs/en/html/search/all_10.js +0 -10
  239. data/vendor/fastText/website/static/docs/en/html/search/all_11.html +0 -26
  240. data/vendor/fastText/website/static/docs/en/html/search/all_11.js +0 -25
  241. data/vendor/fastText/website/static/docs/en/html/search/all_12.html +0 -26
  242. data/vendor/fastText/website/static/docs/en/html/search/all_12.js +0 -15
  243. data/vendor/fastText/website/static/docs/en/html/search/all_13.html +0 -26
  244. data/vendor/fastText/website/static/docs/en/html/search/all_13.js +0 -7
  245. data/vendor/fastText/website/static/docs/en/html/search/all_14.html +0 -26
  246. data/vendor/fastText/website/static/docs/en/html/search/all_14.js +0 -7
  247. data/vendor/fastText/website/static/docs/en/html/search/all_15.html +0 -26
  248. data/vendor/fastText/website/static/docs/en/html/search/all_15.js +0 -11
  249. data/vendor/fastText/website/static/docs/en/html/search/all_16.html +0 -26
  250. data/vendor/fastText/website/static/docs/en/html/search/all_16.js +0 -4
  251. data/vendor/fastText/website/static/docs/en/html/search/all_17.html +0 -26
  252. data/vendor/fastText/website/static/docs/en/html/search/all_17.js +0 -7
  253. data/vendor/fastText/website/static/docs/en/html/search/all_2.html +0 -26
  254. data/vendor/fastText/website/static/docs/en/html/search/all_2.js +0 -17
  255. data/vendor/fastText/website/static/docs/en/html/search/all_3.html +0 -26
  256. data/vendor/fastText/website/static/docs/en/html/search/all_3.js +0 -17
  257. data/vendor/fastText/website/static/docs/en/html/search/all_4.html +0 -26
  258. data/vendor/fastText/website/static/docs/en/html/search/all_4.js +0 -10
  259. data/vendor/fastText/website/static/docs/en/html/search/all_5.html +0 -26
  260. data/vendor/fastText/website/static/docs/en/html/search/all_5.js +0 -12
  261. data/vendor/fastText/website/static/docs/en/html/search/all_6.html +0 -26
  262. data/vendor/fastText/website/static/docs/en/html/search/all_6.js +0 -18
  263. data/vendor/fastText/website/static/docs/en/html/search/all_7.html +0 -26
  264. data/vendor/fastText/website/static/docs/en/html/search/all_7.js +0 -8
  265. data/vendor/fastText/website/static/docs/en/html/search/all_8.html +0 -26
  266. data/vendor/fastText/website/static/docs/en/html/search/all_8.js +0 -11
  267. data/vendor/fastText/website/static/docs/en/html/search/all_9.html +0 -26
  268. data/vendor/fastText/website/static/docs/en/html/search/all_9.js +0 -5
  269. data/vendor/fastText/website/static/docs/en/html/search/all_a.html +0 -26
  270. data/vendor/fastText/website/static/docs/en/html/search/all_a.js +0 -17
  271. data/vendor/fastText/website/static/docs/en/html/search/all_b.html +0 -26
  272. data/vendor/fastText/website/static/docs/en/html/search/all_b.js +0 -27
  273. data/vendor/fastText/website/static/docs/en/html/search/all_c.html +0 -26
  274. data/vendor/fastText/website/static/docs/en/html/search/all_c.js +0 -26
  275. data/vendor/fastText/website/static/docs/en/html/search/all_d.html +0 -26
  276. data/vendor/fastText/website/static/docs/en/html/search/all_d.js +0 -9
  277. data/vendor/fastText/website/static/docs/en/html/search/all_e.html +0 -26
  278. data/vendor/fastText/website/static/docs/en/html/search/all_e.js +0 -35
  279. data/vendor/fastText/website/static/docs/en/html/search/all_f.html +0 -26
  280. data/vendor/fastText/website/static/docs/en/html/search/all_f.js +0 -16
  281. data/vendor/fastText/website/static/docs/en/html/search/classes_0.html +0 -26
  282. data/vendor/fastText/website/static/docs/en/html/search/classes_0.js +0 -4
  283. data/vendor/fastText/website/static/docs/en/html/search/classes_1.html +0 -26
  284. data/vendor/fastText/website/static/docs/en/html/search/classes_1.js +0 -4
  285. data/vendor/fastText/website/static/docs/en/html/search/classes_2.html +0 -26
  286. data/vendor/fastText/website/static/docs/en/html/search/classes_2.js +0 -4
  287. data/vendor/fastText/website/static/docs/en/html/search/classes_3.html +0 -26
  288. data/vendor/fastText/website/static/docs/en/html/search/classes_3.js +0 -4
  289. data/vendor/fastText/website/static/docs/en/html/search/classes_4.html +0 -26
  290. data/vendor/fastText/website/static/docs/en/html/search/classes_4.js +0 -5
  291. data/vendor/fastText/website/static/docs/en/html/search/classes_5.html +0 -26
  292. data/vendor/fastText/website/static/docs/en/html/search/classes_5.js +0 -4
  293. data/vendor/fastText/website/static/docs/en/html/search/classes_6.html +0 -26
  294. data/vendor/fastText/website/static/docs/en/html/search/classes_6.js +0 -4
  295. data/vendor/fastText/website/static/docs/en/html/search/classes_7.html +0 -26
  296. data/vendor/fastText/website/static/docs/en/html/search/classes_7.js +0 -4
  297. data/vendor/fastText/website/static/docs/en/html/search/classes_8.html +0 -26
  298. data/vendor/fastText/website/static/docs/en/html/search/classes_8.js +0 -4
  299. data/vendor/fastText/website/static/docs/en/html/search/close.png +0 -0
  300. data/vendor/fastText/website/static/docs/en/html/search/defines_0.html +0 -26
  301. data/vendor/fastText/website/static/docs/en/html/search/defines_0.js +0 -5
  302. data/vendor/fastText/website/static/docs/en/html/search/defines_1.html +0 -26
  303. data/vendor/fastText/website/static/docs/en/html/search/defines_1.js +0 -4
  304. data/vendor/fastText/website/static/docs/en/html/search/defines_2.html +0 -26
  305. data/vendor/fastText/website/static/docs/en/html/search/defines_2.js +0 -4
  306. data/vendor/fastText/website/static/docs/en/html/search/defines_3.html +0 -26
  307. data/vendor/fastText/website/static/docs/en/html/search/defines_3.js +0 -4
  308. data/vendor/fastText/website/static/docs/en/html/search/enums_0.html +0 -26
  309. data/vendor/fastText/website/static/docs/en/html/search/enums_0.js +0 -4
  310. data/vendor/fastText/website/static/docs/en/html/search/enums_1.html +0 -26
  311. data/vendor/fastText/website/static/docs/en/html/search/enums_1.js +0 -4
  312. data/vendor/fastText/website/static/docs/en/html/search/enums_2.html +0 -26
  313. data/vendor/fastText/website/static/docs/en/html/search/enums_2.js +0 -4
  314. data/vendor/fastText/website/static/docs/en/html/search/enumvalues_0.html +0 -26
  315. data/vendor/fastText/website/static/docs/en/html/search/enumvalues_0.js +0 -4
  316. data/vendor/fastText/website/static/docs/en/html/search/enumvalues_1.html +0 -26
  317. data/vendor/fastText/website/static/docs/en/html/search/enumvalues_1.js +0 -4
  318. data/vendor/fastText/website/static/docs/en/html/search/enumvalues_2.html +0 -26
  319. data/vendor/fastText/website/static/docs/en/html/search/enumvalues_2.js +0 -4
  320. data/vendor/fastText/website/static/docs/en/html/search/enumvalues_3.html +0 -26
  321. data/vendor/fastText/website/static/docs/en/html/search/enumvalues_3.js +0 -4
  322. data/vendor/fastText/website/static/docs/en/html/search/enumvalues_4.html +0 -26
  323. data/vendor/fastText/website/static/docs/en/html/search/enumvalues_4.js +0 -6
  324. data/vendor/fastText/website/static/docs/en/html/search/enumvalues_5.html +0 -26
  325. data/vendor/fastText/website/static/docs/en/html/search/enumvalues_5.js +0 -4
  326. data/vendor/fastText/website/static/docs/en/html/search/files_0.html +0 -26
  327. data/vendor/fastText/website/static/docs/en/html/search/files_0.js +0 -5
  328. data/vendor/fastText/website/static/docs/en/html/search/files_1.html +0 -26
  329. data/vendor/fastText/website/static/docs/en/html/search/files_1.js +0 -5
  330. data/vendor/fastText/website/static/docs/en/html/search/files_2.html +0 -26
  331. data/vendor/fastText/website/static/docs/en/html/search/files_2.js +0 -5
  332. data/vendor/fastText/website/static/docs/en/html/search/files_3.html +0 -26
  333. data/vendor/fastText/website/static/docs/en/html/search/files_3.js +0 -8
  334. data/vendor/fastText/website/static/docs/en/html/search/files_4.html +0 -26
  335. data/vendor/fastText/website/static/docs/en/html/search/files_4.js +0 -5
  336. data/vendor/fastText/website/static/docs/en/html/search/files_5.html +0 -26
  337. data/vendor/fastText/website/static/docs/en/html/search/files_5.js +0 -5
  338. data/vendor/fastText/website/static/docs/en/html/search/files_6.html +0 -26
  339. data/vendor/fastText/website/static/docs/en/html/search/files_6.js +0 -4
  340. data/vendor/fastText/website/static/docs/en/html/search/files_7.html +0 -26
  341. data/vendor/fastText/website/static/docs/en/html/search/files_7.js +0 -5
  342. data/vendor/fastText/website/static/docs/en/html/search/files_8.html +0 -26
  343. data/vendor/fastText/website/static/docs/en/html/search/files_8.js +0 -5
  344. data/vendor/fastText/website/static/docs/en/html/search/functions_0.html +0 -26
  345. data/vendor/fastText/website/static/docs/en/html/search/functions_0.js +0 -14
  346. data/vendor/fastText/website/static/docs/en/html/search/functions_1.html +0 -26
  347. data/vendor/fastText/website/static/docs/en/html/search/functions_1.js +0 -5
  348. data/vendor/fastText/website/static/docs/en/html/search/functions_10.html +0 -26
  349. data/vendor/fastText/website/static/docs/en/html/search/functions_10.js +0 -5
  350. data/vendor/fastText/website/static/docs/en/html/search/functions_11.html +0 -26
  351. data/vendor/fastText/website/static/docs/en/html/search/functions_11.js +0 -18
  352. data/vendor/fastText/website/static/docs/en/html/search/functions_12.html +0 -26
  353. data/vendor/fastText/website/static/docs/en/html/search/functions_12.js +0 -8
  354. data/vendor/fastText/website/static/docs/en/html/search/functions_13.html +0 -26
  355. data/vendor/fastText/website/static/docs/en/html/search/functions_13.js +0 -5
  356. data/vendor/fastText/website/static/docs/en/html/search/functions_14.html +0 -26
  357. data/vendor/fastText/website/static/docs/en/html/search/functions_14.js +0 -4
  358. data/vendor/fastText/website/static/docs/en/html/search/functions_15.html +0 -26
  359. data/vendor/fastText/website/static/docs/en/html/search/functions_15.js +0 -4
  360. data/vendor/fastText/website/static/docs/en/html/search/functions_16.html +0 -26
  361. data/vendor/fastText/website/static/docs/en/html/search/functions_16.js +0 -4
  362. data/vendor/fastText/website/static/docs/en/html/search/functions_17.html +0 -26
  363. data/vendor/fastText/website/static/docs/en/html/search/functions_17.js +0 -7
  364. data/vendor/fastText/website/static/docs/en/html/search/functions_2.html +0 -26
  365. data/vendor/fastText/website/static/docs/en/html/search/functions_2.js +0 -11
  366. data/vendor/fastText/website/static/docs/en/html/search/functions_3.html +0 -26
  367. data/vendor/fastText/website/static/docs/en/html/search/functions_3.js +0 -9
  368. data/vendor/fastText/website/static/docs/en/html/search/functions_4.html +0 -26
  369. data/vendor/fastText/website/static/docs/en/html/search/functions_4.js +0 -4
  370. data/vendor/fastText/website/static/docs/en/html/search/functions_5.html +0 -26
  371. data/vendor/fastText/website/static/docs/en/html/search/functions_5.js +0 -7
  372. data/vendor/fastText/website/static/docs/en/html/search/functions_6.html +0 -26
  373. data/vendor/fastText/website/static/docs/en/html/search/functions_6.js +0 -17
  374. data/vendor/fastText/website/static/docs/en/html/search/functions_7.html +0 -26
  375. data/vendor/fastText/website/static/docs/en/html/search/functions_7.js +0 -5
  376. data/vendor/fastText/website/static/docs/en/html/search/functions_8.html +0 -26
  377. data/vendor/fastText/website/static/docs/en/html/search/functions_8.js +0 -8
  378. data/vendor/fastText/website/static/docs/en/html/search/functions_9.html +0 -26
  379. data/vendor/fastText/website/static/docs/en/html/search/functions_9.js +0 -4
  380. data/vendor/fastText/website/static/docs/en/html/search/functions_a.html +0 -26
  381. data/vendor/fastText/website/static/docs/en/html/search/functions_a.js +0 -8
  382. data/vendor/fastText/website/static/docs/en/html/search/functions_b.html +0 -26
  383. data/vendor/fastText/website/static/docs/en/html/search/functions_b.js +0 -10
  384. data/vendor/fastText/website/static/docs/en/html/search/functions_c.html +0 -26
  385. data/vendor/fastText/website/static/docs/en/html/search/functions_c.js +0 -10
  386. data/vendor/fastText/website/static/docs/en/html/search/functions_d.html +0 -26
  387. data/vendor/fastText/website/static/docs/en/html/search/functions_d.js +0 -6
  388. data/vendor/fastText/website/static/docs/en/html/search/functions_e.html +0 -26
  389. data/vendor/fastText/website/static/docs/en/html/search/functions_e.js +0 -26
  390. data/vendor/fastText/website/static/docs/en/html/search/functions_f.html +0 -26
  391. data/vendor/fastText/website/static/docs/en/html/search/functions_f.js +0 -6
  392. data/vendor/fastText/website/static/docs/en/html/search/mag_sel.png +0 -0
  393. data/vendor/fastText/website/static/docs/en/html/search/namespaces_0.html +0 -26
  394. data/vendor/fastText/website/static/docs/en/html/search/namespaces_0.js +0 -5
  395. data/vendor/fastText/website/static/docs/en/html/search/nomatches.html +0 -12
  396. data/vendor/fastText/website/static/docs/en/html/search/search.css +0 -271
  397. data/vendor/fastText/website/static/docs/en/html/search/search.js +0 -791
  398. data/vendor/fastText/website/static/docs/en/html/search/search_l.png +0 -0
  399. data/vendor/fastText/website/static/docs/en/html/search/search_m.png +0 -0
  400. data/vendor/fastText/website/static/docs/en/html/search/search_r.png +0 -0
  401. data/vendor/fastText/website/static/docs/en/html/search/searchdata.js +0 -42
  402. data/vendor/fastText/website/static/docs/en/html/search/typedefs_0.html +0 -26
  403. data/vendor/fastText/website/static/docs/en/html/search/typedefs_0.js +0 -4
  404. data/vendor/fastText/website/static/docs/en/html/search/typedefs_1.html +0 -26
  405. data/vendor/fastText/website/static/docs/en/html/search/typedefs_1.js +0 -4
  406. data/vendor/fastText/website/static/docs/en/html/search/variables_0.html +0 -26
  407. data/vendor/fastText/website/static/docs/en/html/search/variables_0.js +0 -4
  408. data/vendor/fastText/website/static/docs/en/html/search/variables_1.html +0 -26
  409. data/vendor/fastText/website/static/docs/en/html/search/variables_1.js +0 -6
  410. data/vendor/fastText/website/static/docs/en/html/search/variables_10.html +0 -26
  411. data/vendor/fastText/website/static/docs/en/html/search/variables_10.js +0 -8
  412. data/vendor/fastText/website/static/docs/en/html/search/variables_11.html +0 -26
  413. data/vendor/fastText/website/static/docs/en/html/search/variables_11.js +0 -11
  414. data/vendor/fastText/website/static/docs/en/html/search/variables_12.html +0 -26
  415. data/vendor/fastText/website/static/docs/en/html/search/variables_12.js +0 -4
  416. data/vendor/fastText/website/static/docs/en/html/search/variables_13.html +0 -26
  417. data/vendor/fastText/website/static/docs/en/html/search/variables_13.js +0 -10
  418. data/vendor/fastText/website/static/docs/en/html/search/variables_2.html +0 -26
  419. data/vendor/fastText/website/static/docs/en/html/search/variables_2.js +0 -9
  420. data/vendor/fastText/website/static/docs/en/html/search/variables_3.html +0 -26
  421. data/vendor/fastText/website/static/docs/en/html/search/variables_3.js +0 -9
  422. data/vendor/fastText/website/static/docs/en/html/search/variables_4.html +0 -26
  423. data/vendor/fastText/website/static/docs/en/html/search/variables_4.js +0 -7
  424. data/vendor/fastText/website/static/docs/en/html/search/variables_5.html +0 -26
  425. data/vendor/fastText/website/static/docs/en/html/search/variables_5.js +0 -4
  426. data/vendor/fastText/website/static/docs/en/html/search/variables_6.html +0 -26
  427. data/vendor/fastText/website/static/docs/en/html/search/variables_6.js +0 -5
  428. data/vendor/fastText/website/static/docs/en/html/search/variables_7.html +0 -26
  429. data/vendor/fastText/website/static/docs/en/html/search/variables_7.js +0 -5
  430. data/vendor/fastText/website/static/docs/en/html/search/variables_8.html +0 -26
  431. data/vendor/fastText/website/static/docs/en/html/search/variables_8.js +0 -4
  432. data/vendor/fastText/website/static/docs/en/html/search/variables_9.html +0 -26
  433. data/vendor/fastText/website/static/docs/en/html/search/variables_9.js +0 -10
  434. data/vendor/fastText/website/static/docs/en/html/search/variables_a.html +0 -26
  435. data/vendor/fastText/website/static/docs/en/html/search/variables_a.js +0 -14
  436. data/vendor/fastText/website/static/docs/en/html/search/variables_b.html +0 -26
  437. data/vendor/fastText/website/static/docs/en/html/search/variables_b.js +0 -17
  438. data/vendor/fastText/website/static/docs/en/html/search/variables_c.html +0 -26
  439. data/vendor/fastText/website/static/docs/en/html/search/variables_c.js +0 -6
  440. data/vendor/fastText/website/static/docs/en/html/search/variables_d.html +0 -26
  441. data/vendor/fastText/website/static/docs/en/html/search/variables_d.js +0 -10
  442. data/vendor/fastText/website/static/docs/en/html/search/variables_e.html +0 -26
  443. data/vendor/fastText/website/static/docs/en/html/search/variables_e.js +0 -11
  444. data/vendor/fastText/website/static/docs/en/html/search/variables_f.html +0 -26
  445. data/vendor/fastText/website/static/docs/en/html/search/variables_f.js +0 -6
  446. data/vendor/fastText/website/static/docs/en/html/splitbar.png +0 -0
  447. data/vendor/fastText/website/static/docs/en/html/structfasttext_1_1Node-members.html +0 -108
  448. data/vendor/fastText/website/static/docs/en/html/structfasttext_1_1Node.html +0 -194
  449. data/vendor/fastText/website/static/docs/en/html/structfasttext_1_1Node.js +0 -8
  450. data/vendor/fastText/website/static/docs/en/html/structfasttext_1_1entry-members.html +0 -107
  451. data/vendor/fastText/website/static/docs/en/html/structfasttext_1_1entry.html +0 -178
  452. data/vendor/fastText/website/static/docs/en/html/structfasttext_1_1entry.js +0 -7
  453. data/vendor/fastText/website/static/docs/en/html/sync_off.png +0 -0
  454. data/vendor/fastText/website/static/docs/en/html/sync_on.png +0 -0
  455. data/vendor/fastText/website/static/docs/en/html/tab_a.png +0 -0
  456. data/vendor/fastText/website/static/docs/en/html/tab_b.png +0 -0
  457. data/vendor/fastText/website/static/docs/en/html/tab_h.png +0 -0
  458. data/vendor/fastText/website/static/docs/en/html/tab_s.png +0 -0
  459. data/vendor/fastText/website/static/docs/en/html/tabs.css +0 -1
  460. data/vendor/fastText/website/static/docs/en/html/utils_8cc.html +0 -121
  461. data/vendor/fastText/website/static/docs/en/html/utils_8cc.js +0 -5
  462. data/vendor/fastText/website/static/docs/en/html/utils_8h.html +0 -122
  463. data/vendor/fastText/website/static/docs/en/html/utils_8h.js +0 -5
  464. data/vendor/fastText/website/static/docs/en/html/utils_8h_source.html +0 -104
  465. data/vendor/fastText/website/static/docs/en/html/vector_8cc.html +0 -121
  466. data/vendor/fastText/website/static/docs/en/html/vector_8cc.js +0 -4
  467. data/vendor/fastText/website/static/docs/en/html/vector_8h.html +0 -126
  468. data/vendor/fastText/website/static/docs/en/html/vector_8h.js +0 -5
  469. data/vendor/fastText/website/static/docs/en/html/vector_8h_source.html +0 -120
  470. data/vendor/fastText/website/static/fasttext.css +0 -48
  471. data/vendor/fastText/website/static/img/authors/armand_joulin.jpg +0 -0
  472. data/vendor/fastText/website/static/img/authors/christian_puhrsch.png +0 -0
  473. data/vendor/fastText/website/static/img/authors/edouard_grave.jpeg +0 -0
  474. data/vendor/fastText/website/static/img/authors/piotr_bojanowski.jpg +0 -0
  475. data/vendor/fastText/website/static/img/authors/tomas_mikolov.jpg +0 -0
  476. data/vendor/fastText/website/static/img/blog/2016-08-18-blog-post-img1.png +0 -0
  477. data/vendor/fastText/website/static/img/blog/2016-08-18-blog-post-img2.png +0 -0
  478. data/vendor/fastText/website/static/img/blog/2017-05-02-blog-post-img1.jpg +0 -0
  479. data/vendor/fastText/website/static/img/blog/2017-05-02-blog-post-img2.jpg +0 -0
  480. data/vendor/fastText/website/static/img/blog/2017-10-02-blog-post-img1.png +0 -0
  481. data/vendor/fastText/website/static/img/cbo_vs_skipgram.png +0 -0
  482. data/vendor/fastText/website/static/img/fasttext-icon-api.png +0 -0
  483. data/vendor/fastText/website/static/img/fasttext-icon-bg-web.png +0 -0
  484. data/vendor/fastText/website/static/img/fasttext-icon-color-square.png +0 -0
  485. data/vendor/fastText/website/static/img/fasttext-icon-color-web.png +0 -0
  486. data/vendor/fastText/website/static/img/fasttext-icon-faq.png +0 -0
  487. data/vendor/fastText/website/static/img/fasttext-icon-tutorial.png +0 -0
  488. data/vendor/fastText/website/static/img/fasttext-icon-white-web.png +0 -0
  489. data/vendor/fastText/website/static/img/fasttext-logo-color-web.png +0 -0
  490. data/vendor/fastText/website/static/img/fasttext-logo-white-web.png +0 -0
  491. data/vendor/fastText/website/static/img/logo-color.png +0 -0
  492. data/vendor/fastText/website/static/img/model-black.png +0 -0
  493. data/vendor/fastText/website/static/img/model-blue.png +0 -0
  494. data/vendor/fastText/website/static/img/model-red.png +0 -0
  495. data/vendor/fastText/website/static/img/ogimage.png +0 -0
  496. data/vendor/fastText/website/static/img/oss_logo.png +0 -0
  497. data/vendor/fastText/wikifil.pl +0 -57
  498. data/vendor/fastText/word-vector-example.sh +0 -39
@@ -1,322 +0,0 @@
1
- # fastText [![CircleCI](https://circleci.com/gh/facebookresearch/fastText/tree/master.svg?style=svg)](https://circleci.com/gh/facebookresearch/fastText/tree/master)
2
-
3
- [fastText](https://fasttext.cc/) is a library for efficient learning of word representations and sentence classification.
4
-
5
- In this document we present how to use fastText in python.
6
-
7
- ## Table of contents
8
-
9
- * [Requirements](#requirements)
10
- * [Installation](#installation)
11
- * [Usage overview](#usage-overview)
12
- * [Word representation model](#word-representation-model)
13
- * [Text classification model](#text-classification-model)
14
- * [IMPORTANT: Preprocessing data / encoding conventions](#important-preprocessing-data-encoding-conventions)
15
- * [More examples](#more-examples)
16
- * [API](#api)
17
- * [`train_unsupervised` parameters](#train_unsupervised-parameters)
18
- * [`train_supervised` parameters](#train_supervised-parameters)
19
- * [`model` object](#model-object)
20
-
21
-
22
- # Requirements
23
-
24
- [fastText](https://fasttext.cc/) builds on modern Mac OS and Linux distributions.
25
- Since it uses C\++11 features, it requires a compiler with good C++11 support. You will need [Python](https://www.python.org/) (version 2.7 or ≥ 3.4), [NumPy](http://www.numpy.org/) & [SciPy](https://www.scipy.org/) and [pybind11](https://github.com/pybind/pybind11).
26
-
27
-
28
- # Installation
29
-
30
- To install the latest release, you can do :
31
- ```bash
32
- $ pip install fasttext
33
- ```
34
-
35
- or, to get the latest development version of fasttext, you can install from our github repository :
36
- ```bash
37
- $ git clone https://github.com/facebookresearch/fastText.git
38
- $ cd fastText
39
- $ sudo pip install .
40
- $ # or :
41
- $ sudo python setup.py install
42
- ```
43
-
44
- # Usage overview
45
-
46
-
47
- ## Word representation model
48
-
49
- In order to learn word vectors, as [described here](https://fasttext.cc/docs/en/references.html#enriching-word-vectors-with-subword-information), we can use `fasttext.train_unsupervised` function like this:
50
-
51
-
52
- ```py
53
- import fasttext
54
-
55
- # Skipgram model :
56
- model = fasttext.train_unsupervised('data.txt', model='skipgram')
57
-
58
- # or, cbow model :
59
- model = fasttext.train_unsupervised('data.txt', model='cbow')
60
-
61
- ```
62
-
63
- where `data.txt` is a training file containing utf-8 encoded text.
64
-
65
-
66
- The returned `model` object represents your learned model, and you can use it to retrieve information.
67
-
68
- ```py
69
- print(model.words) # list of words in dictionary
70
- print(model['king']) # get the vector of the word 'king'
71
- ```
72
-
73
-
74
- ### Saving and loading a model object
75
-
76
- You can save your trained model object by calling the function `save_model`.
77
- ```py
78
- model.save_model("model_filename.bin")
79
- ```
80
-
81
- and retrieve it later thanks to the function `load_model` :
82
- ```py
83
- model = fasttext.load_model("model_filename.bin")
84
- ```
85
-
86
- For more information about word representation usage of fasttext, you can refer to our [word representations tutorial](https://fasttext.cc/docs/en/unsupervised-tutorial.html).
87
-
88
-
89
- ## Text classification model
90
-
91
- In order to train a text classifier using the method [described here](https://fasttext.cc/docs/en/references.html#bag-of-tricks-for-efficient-text-classification), we can use `fasttext.train_supervised` function like this:
92
-
93
-
94
- ```py
95
- import fasttext
96
-
97
- model = fasttext.train_supervised('data.train.txt')
98
- ```
99
-
100
- where `data.train.txt` is a text file containing a training sentence per line along with the labels. By default, we assume that labels are words that are prefixed by the string `__label__`
101
-
102
- Once the model is trained, we can retrieve the list of words and labels:
103
-
104
- ```py
105
- print(model.words)
106
- print(model.labels)
107
- ```
108
-
109
- To evaluate our model by computing the precision at 1 (P@1) and the recall on a test set, we use the `test` function:
110
-
111
- ```py
112
- def print_results(N, p, r):
113
- print("N\t" + str(N))
114
- print("P@{}\t{:.3f}".format(1, p))
115
- print("R@{}\t{:.3f}".format(1, r))
116
-
117
- print_results(*model.test('test.txt'))
118
- ```
119
-
120
- We can also predict labels for a specific text :
121
-
122
- ```py
123
- model.predict("Which baking dish is best to bake a banana bread ?")
124
- ```
125
-
126
- By default, `predict` returns only one label : the one with the highest probability. You can also predict more than one label by specifying the parameter `k`:
127
- ```py
128
- model.predict("Which baking dish is best to bake a banana bread ?", k=3)
129
- ```
130
-
131
- If you want to predict more than one sentence you can pass an array of strings :
132
-
133
- ```py
134
- model.predict(["Which baking dish is best to bake a banana bread ?", "Why not put knives in the dishwasher?"], k=3)
135
- ```
136
-
137
-
138
- Of course, you can also save and load a model to/from a file as [in the word representation usage](#saving-and-loading-a-model-object).
139
-
140
- For more information about text classification usage of fasttext, you can refer to our [text classification tutorial](https://fasttext.cc/docs/en/supervised-tutorial.html).
141
-
142
-
143
-
144
-
145
- ### Compress model files with quantization
146
-
147
- When you want to save a supervised model file, fastText can compress it in order to have a much smaller model file by sacrificing only a little bit performance.
148
-
149
- ```py
150
- # with the previously trained `model` object, call :
151
- model.quantize(input='data.train.txt', retrain=True)
152
-
153
- # then display results and save the new model :
154
- print_results(*model.test(valid_data))
155
- model.save_model("model_filename.ftz")
156
- ```
157
-
158
- `model_filename.ftz` will have a much smaller size than `model_filename.bin`.
159
-
160
- For further reading on quantization, you can refer to [this paragraph from our blog post](https://fasttext.cc/blog/2017/10/02/blog-post.html#model-compression).
161
-
162
-
163
- ## IMPORTANT: Preprocessing data / encoding conventions
164
-
165
- In general it is important to properly preprocess your data. In particular our example scripts in the [root folder](https://github.com/facebookresearch/fastText) do this.
166
-
167
- fastText assumes UTF-8 encoded text. All text must be [unicode for Python2](https://docs.python.org/2/library/functions.html#unicode) and [str for Python3](https://docs.python.org/3.5/library/stdtypes.html#textseq). The passed text will be [encoded as UTF-8 by pybind11](https://pybind11.readthedocs.io/en/master/advanced/cast/strings.html?highlight=utf-8#strings-bytes-and-unicode-conversions) before passed to the fastText C++ library. This means it is important to use UTF-8 encoded text when building a model. On Unix-like systems you can convert text using [iconv](https://en.wikipedia.org/wiki/Iconv).
168
-
169
- fastText will tokenize (split text into pieces) based on the following ASCII characters (bytes). In particular, it is not aware of UTF-8 whitespace. We advice the user to convert UTF-8 whitespace / word boundaries into one of the following symbols as appropiate.
170
-
171
- * space
172
- * tab
173
- * vertical tab
174
- * carriage return
175
- * formfeed
176
- * the null character
177
-
178
- The newline character is used to delimit lines of text. In particular, the EOS token is appended to a line of text if a newline character is encountered. The only exception is if the number of tokens exceeds the MAX\_LINE\_SIZE constant as defined in the [Dictionary header](https://github.com/facebookresearch/fastText/blob/master/src/dictionary.h). This means if you have text that is not separate by newlines, such as the [fil9 dataset](http://mattmahoney.net/dc/textdata), it will be broken into chunks with MAX\_LINE\_SIZE of tokens and the EOS token is not appended.
179
-
180
- The length of a token is the number of UTF-8 characters by considering the [leading two bits of a byte](https://en.wikipedia.org/wiki/UTF-8#Description) to identify [subsequent bytes of a multi-byte sequence](https://github.com/facebookresearch/fastText/blob/master/src/dictionary.cc). Knowing this is especially important when choosing the minimum and maximum length of subwords. Further, the EOS token (as specified in the [Dictionary header](https://github.com/facebookresearch/fastText/blob/master/src/dictionary.h)) is considered a character and will not be broken into subwords.
181
-
182
- ## More examples
183
-
184
- In order to have a better knowledge of fastText models, please consider the main [README](https://github.com/facebookresearch/fastText/blob/master/README.md) and in particular [the tutorials on our website](https://fasttext.cc/docs/en/supervised-tutorial.html).
185
-
186
- You can find further python examples in [the doc folder](https://github.com/facebookresearch/fastText/tree/master/python/doc/examples).
187
-
188
- As with any package you can get help on any Python function using the help function.
189
-
190
- For example
191
-
192
- ```
193
- +>>> import fasttext
194
- +>>> help(fasttext.FastText)
195
-
196
- Help on module fasttext.FastText in fasttext:
197
-
198
- NAME
199
- fasttext.FastText
200
-
201
- DESCRIPTION
202
- # Copyright (c) 2017-present, Facebook, Inc.
203
- # All rights reserved.
204
- #
205
- # This source code is licensed under the MIT license found in the
206
- # LICENSE file in the root directory of this source tree.
207
-
208
- FUNCTIONS
209
- load_model(path)
210
- Load a model given a filepath and return a model object.
211
-
212
- tokenize(text)
213
- Given a string of text, tokenize it and return a list of tokens
214
- [...]
215
- ```
216
-
217
-
218
- # API
219
-
220
-
221
- ## `train_unsupervised` parameters
222
-
223
- ```python
224
- input # training file path (required)
225
- model # unsupervised fasttext model {cbow, skipgram} [skipgram]
226
- lr # learning rate [0.05]
227
- dim # size of word vectors [100]
228
- ws # size of the context window [5]
229
- epoch # number of epochs [5]
230
- minCount # minimal number of word occurences [5]
231
- minn # min length of char ngram [3]
232
- maxn # max length of char ngram [6]
233
- neg # number of negatives sampled [5]
234
- wordNgrams # max length of word ngram [1]
235
- loss # loss function {ns, hs, softmax, ova} [ns]
236
- bucket # number of buckets [2000000]
237
- thread # number of threads [number of cpus]
238
- lrUpdateRate # change the rate of updates for the learning rate [100]
239
- t # sampling threshold [0.0001]
240
- verbose # verbose [2]
241
- ```
242
-
243
- ## `train_supervised` parameters
244
-
245
- ```python
246
- input # training file path (required)
247
- lr # learning rate [0.1]
248
- dim # size of word vectors [100]
249
- ws # size of the context window [5]
250
- epoch # number of epochs [5]
251
- minCount # minimal number of word occurences [1]
252
- minCountLabel # minimal number of label occurences [1]
253
- minn # min length of char ngram [0]
254
- maxn # max length of char ngram [0]
255
- neg # number of negatives sampled [5]
256
- wordNgrams # max length of word ngram [1]
257
- loss # loss function {ns, hs, softmax, ova} [softmax]
258
- bucket # number of buckets [2000000]
259
- thread # number of threads [number of cpus]
260
- lrUpdateRate # change the rate of updates for the learning rate [100]
261
- t # sampling threshold [0.0001]
262
- label # label prefix ['__label__']
263
- verbose # verbose [2]
264
- pretrainedVectors # pretrained word vectors (.vec file) for supervised learning []
265
- ```
266
-
267
- ## `model` object
268
-
269
- `train_supervised`, `train_unsupervised` and `load_model` functions return an instance of `_FastText` class, that we generaly name `model` object.
270
-
271
- This object exposes those training arguments as properties : `lr`, `dim`, `ws`, `epoch`, `minCount`, `minCountLabel`, `minn`, `maxn`, `neg`, `wordNgrams`, `loss`, `bucket`, `thread`, `lrUpdateRate`, `t`, `label`, `verbose`, `pretrainedVectors`. So `model.wordNgrams` will give you the max length of word ngram used for training this model.
272
-
273
- In addition, the object exposes several functions :
274
-
275
- ```python
276
- get_dimension # Get the dimension (size) of a lookup vector (hidden layer).
277
- # This is equivalent to `dim` property.
278
- get_input_vector # Given an index, get the corresponding vector of the Input Matrix.
279
- get_input_matrix # Get a copy of the full input matrix of a Model.
280
- get_labels # Get the entire list of labels of the dictionary
281
- # This is equivalent to `labels` property.
282
- get_line # Split a line of text into words and labels.
283
- get_output_matrix # Get a copy of the full output matrix of a Model.
284
- get_sentence_vector # Given a string, get a single vector represenation. This function
285
- # assumes to be given a single line of text. We split words on
286
- # whitespace (space, newline, tab, vertical tab) and the control
287
- # characters carriage return, formfeed and the null character.
288
- get_subword_id # Given a subword, return the index (within input matrix) it hashes to.
289
- get_subwords # Given a word, get the subwords and their indicies.
290
- get_word_id # Given a word, get the word id within the dictionary.
291
- get_word_vector # Get the vector representation of word.
292
- get_words # Get the entire list of words of the dictionary
293
- # This is equivalent to `words` property.
294
- is_quantized # whether the model has been quantized
295
- predict # Given a string, get a list of labels and a list of corresponding probabilities.
296
- quantize # Quantize the model reducing the size of the model and it's memory footprint.
297
- save_model # Save the model to the given path
298
- test # Evaluate supervised model using file given by path
299
- test_label # Return the precision and recall score for each label.
300
- ```
301
-
302
- The properties `words`, `labels` return the words and labels from the dictionary :
303
- ```py
304
- model.words # equivalent to model.get_words()
305
- model.labels # equivalent to model.get_labels()
306
- ```
307
-
308
- The object overrides `__getitem__` and `__contains__` functions in order to return the representation of a word and to check if a word is in the vocabulary.
309
-
310
- ```py
311
- model['king'] # equivalent to model.get_word_vector('king')
312
- 'king' in model # equivalent to `'king' in model.get_words()`
313
- ```
314
-
315
-
316
- Join the fastText community
317
- ---------------------------
318
-
319
- - [Facebook page](https://www.facebook.com/groups/1174547215919768)
320
- - [Stack overflow](https://stackoverflow.com/questions/tagged/fasttext)
321
- - [Google group](https://groups.google.com/forum/#!forum/fasttext-library)
322
- - [GitHub](https://github.com/facebookresearch/fastText)
@@ -1,406 +0,0 @@
1
- fastText |CircleCI|
2
- ===================
3
-
4
- `fastText <https://fasttext.cc/>`__ is a library for efficient learning
5
- of word representations and sentence classification.
6
-
7
- In this document we present how to use fastText in python.
8
-
9
- Table of contents
10
- -----------------
11
-
12
- - `Requirements <#requirements>`__
13
- - `Installation <#installation>`__
14
- - `Usage overview <#usage-overview>`__
15
- - `Word representation model <#word-representation-model>`__
16
- - `Text classification model <#text-classification-model>`__
17
- - `IMPORTANT: Preprocessing data / encoding
18
- conventions <#important-preprocessing-data-encoding-conventions>`__
19
- - `More examples <#more-examples>`__
20
- - `API <#api>`__
21
- - `train_unsupervised parameters <#train_unsupervised-parameters>`__
22
- - `train_supervised parameters <#train_supervised-parameters>`__
23
- - `model object <#model-object>`__
24
-
25
- Requirements
26
- ============
27
-
28
- `fastText <https://fasttext.cc/>`__ builds on modern Mac OS and Linux
29
- distributions. Since it uses C++11 features, it requires a compiler with
30
- good C++11 support. You will need `Python <https://www.python.org/>`__
31
- (version 2.7 or ≥ 3.4), `NumPy <http://www.numpy.org/>`__ &
32
- `SciPy <https://www.scipy.org/>`__ and
33
- `pybind11 <https://github.com/pybind/pybind11>`__.
34
-
35
- Installation
36
- ============
37
-
38
- To install the latest release, you can do :
39
-
40
- .. code:: bash
41
-
42
- $ pip install fasttext
43
-
44
- or, to get the latest development version of fasttext, you can install
45
- from our github repository :
46
-
47
- .. code:: bash
48
-
49
- $ git clone https://github.com/facebookresearch/fastText.git
50
- $ cd fastText
51
- $ sudo pip install .
52
- $ # or :
53
- $ sudo python setup.py install
54
-
55
- Usage overview
56
- ==============
57
-
58
- Word representation model
59
- -------------------------
60
-
61
- In order to learn word vectors, as `described
62
- here <https://fasttext.cc/docs/en/references.html#enriching-word-vectors-with-subword-information>`__,
63
- we can use ``fasttext.train_unsupervised`` function like this:
64
-
65
- .. code:: py
66
-
67
- import fasttext
68
-
69
- # Skipgram model :
70
- model = fasttext.train_unsupervised('data.txt', model='skipgram')
71
-
72
- # or, cbow model :
73
- model = fasttext.train_unsupervised('data.txt', model='cbow')
74
-
75
- where ``data.txt`` is a training file containing utf-8 encoded text.
76
-
77
- The returned ``model`` object represents your learned model, and you can
78
- use it to retrieve information.
79
-
80
- .. code:: py
81
-
82
- print(model.words) # list of words in dictionary
83
- print(model['king']) # get the vector of the word 'king'
84
-
85
- Saving and loading a model object
86
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
87
-
88
- You can save your trained model object by calling the function
89
- ``save_model``.
90
-
91
- .. code:: py
92
-
93
- model.save_model("model_filename.bin")
94
-
95
- and retrieve it later thanks to the function ``load_model`` :
96
-
97
- .. code:: py
98
-
99
- model = fasttext.load_model("model_filename.bin")
100
-
101
- For more information about word representation usage of fasttext, you
102
- can refer to our `word representations
103
- tutorial <https://fasttext.cc/docs/en/unsupervised-tutorial.html>`__.
104
-
105
- Text classification model
106
- -------------------------
107
-
108
- In order to train a text classifier using the method `described
109
- here <https://fasttext.cc/docs/en/references.html#bag-of-tricks-for-efficient-text-classification>`__,
110
- we can use ``fasttext.train_supervised`` function like this:
111
-
112
- .. code:: py
113
-
114
- import fasttext
115
-
116
- model = fasttext.train_supervised('data.train.txt')
117
-
118
- where ``data.train.txt`` is a text file containing a training sentence
119
- per line along with the labels. By default, we assume that labels are
120
- words that are prefixed by the string ``__label__``
121
-
122
- Once the model is trained, we can retrieve the list of words and labels:
123
-
124
- .. code:: py
125
-
126
- print(model.words)
127
- print(model.labels)
128
-
129
- To evaluate our model by computing the precision at 1 (P@1) and the
130
- recall on a test set, we use the ``test`` function:
131
-
132
- .. code:: py
133
-
134
- def print_results(N, p, r):
135
- print("N\t" + str(N))
136
- print("P@{}\t{:.3f}".format(1, p))
137
- print("R@{}\t{:.3f}".format(1, r))
138
-
139
- print_results(*model.test('test.txt'))
140
-
141
- We can also predict labels for a specific text :
142
-
143
- .. code:: py
144
-
145
- model.predict("Which baking dish is best to bake a banana bread ?")
146
-
147
- By default, ``predict`` returns only one label : the one with the
148
- highest probability. You can also predict more than one label by
149
- specifying the parameter ``k``:
150
-
151
- .. code:: py
152
-
153
- model.predict("Which baking dish is best to bake a banana bread ?", k=3)
154
-
155
- If you want to predict more than one sentence you can pass an array of
156
- strings :
157
-
158
- .. code:: py
159
-
160
- model.predict(["Which baking dish is best to bake a banana bread ?", "Why not put knives in the dishwasher?"], k=3)
161
-
162
- Of course, you can also save and load a model to/from a file as `in the
163
- word representation usage <#saving-and-loading-a-model-object>`__.
164
-
165
- For more information about text classification usage of fasttext, you
166
- can refer to our `text classification
167
- tutorial <https://fasttext.cc/docs/en/supervised-tutorial.html>`__.
168
-
169
- Compress model files with quantization
170
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
171
-
172
- When you want to save a supervised model file, fastText can compress it
173
- in order to have a much smaller model file by sacrificing only a little
174
- bit performance.
175
-
176
- .. code:: py
177
-
178
- # with the previously trained `model` object, call :
179
- model.quantize(input='data.train.txt', retrain=True)
180
-
181
- # then display results and save the new model :
182
- print_results(*model.test(valid_data))
183
- model.save_model("model_filename.ftz")
184
-
185
- ``model_filename.ftz`` will have a much smaller size than
186
- ``model_filename.bin``.
187
-
188
- For further reading on quantization, you can refer to `this paragraph
189
- from our blog
190
- post <https://fasttext.cc/blog/2017/10/02/blog-post.html#model-compression>`__.
191
-
192
- IMPORTANT: Preprocessing data / encoding conventions
193
- ----------------------------------------------------
194
-
195
- In general it is important to properly preprocess your data. In
196
- particular our example scripts in the `root
197
- folder <https://github.com/facebookresearch/fastText>`__ do this.
198
-
199
- fastText assumes UTF-8 encoded text. All text must be `unicode for
200
- Python2 <https://docs.python.org/2/library/functions.html#unicode>`__
201
- and `str for
202
- Python3 <https://docs.python.org/3.5/library/stdtypes.html#textseq>`__.
203
- The passed text will be `encoded as UTF-8 by
204
- pybind11 <https://pybind11.readthedocs.io/en/master/advanced/cast/strings.html?highlight=utf-8#strings-bytes-and-unicode-conversions>`__
205
- before passed to the fastText C++ library. This means it is important to
206
- use UTF-8 encoded text when building a model. On Unix-like systems you
207
- can convert text using `iconv <https://en.wikipedia.org/wiki/Iconv>`__.
208
-
209
- fastText will tokenize (split text into pieces) based on the following
210
- ASCII characters (bytes). In particular, it is not aware of UTF-8
211
- whitespace. We advice the user to convert UTF-8 whitespace / word
212
- boundaries into one of the following symbols as appropiate.
213
-
214
- - space
215
- - tab
216
- - vertical tab
217
- - carriage return
218
- - formfeed
219
- - the null character
220
-
221
- The newline character is used to delimit lines of text. In particular,
222
- the EOS token is appended to a line of text if a newline character is
223
- encountered. The only exception is if the number of tokens exceeds the
224
- MAX\_LINE\_SIZE constant as defined in the `Dictionary
225
- header <https://github.com/facebookresearch/fastText/blob/master/src/dictionary.h>`__.
226
- This means if you have text that is not separate by newlines, such as
227
- the `fil9 dataset <http://mattmahoney.net/dc/textdata>`__, it will be
228
- broken into chunks with MAX\_LINE\_SIZE of tokens and the EOS token is
229
- not appended.
230
-
231
- The length of a token is the number of UTF-8 characters by considering
232
- the `leading two bits of a
233
- byte <https://en.wikipedia.org/wiki/UTF-8#Description>`__ to identify
234
- `subsequent bytes of a multi-byte
235
- sequence <https://github.com/facebookresearch/fastText/blob/master/src/dictionary.cc>`__.
236
- Knowing this is especially important when choosing the minimum and
237
- maximum length of subwords. Further, the EOS token (as specified in the
238
- `Dictionary
239
- header <https://github.com/facebookresearch/fastText/blob/master/src/dictionary.h>`__)
240
- is considered a character and will not be broken into subwords.
241
-
242
- More examples
243
- -------------
244
-
245
- In order to have a better knowledge of fastText models, please consider
246
- the main
247
- `README <https://github.com/facebookresearch/fastText/blob/master/README.md>`__
248
- and in particular `the tutorials on our
249
- website <https://fasttext.cc/docs/en/supervised-tutorial.html>`__.
250
-
251
- You can find further python examples in `the doc
252
- folder <https://github.com/facebookresearch/fastText/tree/master/python/doc/examples>`__.
253
-
254
- As with any package you can get help on any Python function using the
255
- help function.
256
-
257
- For example
258
-
259
- ::
260
-
261
- +>>> import fasttext
262
- +>>> help(fasttext.FastText)
263
-
264
- Help on module fasttext.FastText in fasttext:
265
-
266
- NAME
267
- fasttext.FastText
268
-
269
- DESCRIPTION
270
- # Copyright (c) 2017-present, Facebook, Inc.
271
- # All rights reserved.
272
- #
273
- # This source code is licensed under the MIT license found in the
274
- # LICENSE file in the root directory of this source tree.
275
-
276
- FUNCTIONS
277
- load_model(path)
278
- Load a model given a filepath and return a model object.
279
-
280
- tokenize(text)
281
- Given a string of text, tokenize it and return a list of tokens
282
- [...]
283
-
284
- API
285
- ===
286
-
287
- ``train_unsupervised`` parameters
288
- ---------------------------------
289
-
290
- .. code:: python
291
-
292
- input # training file path (required)
293
- model # unsupervised fasttext model {cbow, skipgram} [skipgram]
294
- lr # learning rate [0.05]
295
- dim # size of word vectors [100]
296
- ws # size of the context window [5]
297
- epoch # number of epochs [5]
298
- minCount # minimal number of word occurences [5]
299
- minn # min length of char ngram [3]
300
- maxn # max length of char ngram [6]
301
- neg # number of negatives sampled [5]
302
- wordNgrams # max length of word ngram [1]
303
- loss # loss function {ns, hs, softmax, ova} [ns]
304
- bucket # number of buckets [2000000]
305
- thread # number of threads [number of cpus]
306
- lrUpdateRate # change the rate of updates for the learning rate [100]
307
- t # sampling threshold [0.0001]
308
- verbose # verbose [2]
309
-
310
- ``train_supervised`` parameters
311
- -------------------------------
312
-
313
- .. code:: python
314
-
315
- input # training file path (required)
316
- lr # learning rate [0.1]
317
- dim # size of word vectors [100]
318
- ws # size of the context window [5]
319
- epoch # number of epochs [5]
320
- minCount # minimal number of word occurences [1]
321
- minCountLabel # minimal number of label occurences [1]
322
- minn # min length of char ngram [0]
323
- maxn # max length of char ngram [0]
324
- neg # number of negatives sampled [5]
325
- wordNgrams # max length of word ngram [1]
326
- loss # loss function {ns, hs, softmax, ova} [softmax]
327
- bucket # number of buckets [2000000]
328
- thread # number of threads [number of cpus]
329
- lrUpdateRate # change the rate of updates for the learning rate [100]
330
- t # sampling threshold [0.0001]
331
- label # label prefix ['__label__']
332
- verbose # verbose [2]
333
- pretrainedVectors # pretrained word vectors (.vec file) for supervised learning []
334
-
335
- ``model`` object
336
- ----------------
337
-
338
- ``train_supervised``, ``train_unsupervised`` and ``load_model``
339
- functions return an instance of ``_FastText`` class, that we generaly
340
- name ``model`` object.
341
-
342
- This object exposes those training arguments as properties : ``lr``,
343
- ``dim``, ``ws``, ``epoch``, ``minCount``, ``minCountLabel``, ``minn``,
344
- ``maxn``, ``neg``, ``wordNgrams``, ``loss``, ``bucket``, ``thread``,
345
- ``lrUpdateRate``, ``t``, ``label``, ``verbose``, ``pretrainedVectors``.
346
- So ``model.wordNgrams`` will give you the max length of word ngram used
347
- for training this model.
348
-
349
- In addition, the object exposes several functions :
350
-
351
- .. code:: python
352
-
353
- get_dimension # Get the dimension (size) of a lookup vector (hidden layer).
354
- # This is equivalent to `dim` property.
355
- get_input_vector # Given an index, get the corresponding vector of the Input Matrix.
356
- get_input_matrix # Get a copy of the full input matrix of a Model.
357
- get_labels # Get the entire list of labels of the dictionary
358
- # This is equivalent to `labels` property.
359
- get_line # Split a line of text into words and labels.
360
- get_output_matrix # Get a copy of the full output matrix of a Model.
361
- get_sentence_vector # Given a string, get a single vector represenation. This function
362
- # assumes to be given a single line of text. We split words on
363
- # whitespace (space, newline, tab, vertical tab) and the control
364
- # characters carriage return, formfeed and the null character.
365
- get_subword_id # Given a subword, return the index (within input matrix) it hashes to.
366
- get_subwords # Given a word, get the subwords and their indicies.
367
- get_word_id # Given a word, get the word id within the dictionary.
368
- get_word_vector # Get the vector representation of word.
369
- get_words # Get the entire list of words of the dictionary
370
- # This is equivalent to `words` property.
371
- is_quantized # whether the model has been quantized
372
- predict # Given a string, get a list of labels and a list of corresponding probabilities.
373
- quantize # Quantize the model reducing the size of the model and it's memory footprint.
374
- save_model # Save the model to the given path
375
- test # Evaluate supervised model using file given by path
376
- test_label # Return the precision and recall score for each label.
377
-
378
- The properties ``words``, ``labels`` return the words and labels from
379
- the dictionary :
380
-
381
- .. code:: py
382
-
383
- model.words # equivalent to model.get_words()
384
- model.labels # equivalent to model.get_labels()
385
-
386
- The object overrides ``__getitem__`` and ``__contains__`` functions in
387
- order to return the representation of a word and to check if a word is
388
- in the vocabulary.
389
-
390
- .. code:: py
391
-
392
- model['king'] # equivalent to model.get_word_vector('king')
393
- 'king' in model # equivalent to `'king' in model.get_words()`
394
-
395
- Join the fastText community
396
- ---------------------------
397
-
398
- - `Facebook page <https://www.facebook.com/groups/1174547215919768>`__
399
- - `Stack
400
- overflow <https://stackoverflow.com/questions/tagged/fasttext>`__
401
- - `Google
402
- group <https://groups.google.com/forum/#!forum/fasttext-library>`__
403
- - `GitHub <https://github.com/facebookresearch/fastText>`__
404
-
405
- .. |CircleCI| image:: https://circleci.com/gh/facebookresearch/fastText/tree/master.svg?style=svg
406
- :target: https://circleci.com/gh/facebookresearch/fastText/tree/master