data-prep-toolkit-transforms 1.0.1.dev1__9-py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (776) hide show
  1. data_prep_toolkit_transforms-1.0.1.dev1.dist-info/METADATA +506 -0
  2. data_prep_toolkit_transforms-1.0.1.dev1.dist-info/RECORD +776 -0
  3. data_prep_toolkit_transforms-1.0.1.dev1.dist-info/WHEEL +5 -0
  4. data_prep_toolkit_transforms-1.0.1.dev1.dist-info/top_level.txt +23 -0
  5. dpk_code_profiler/UAST.py +324 -0
  6. dpk_code_profiler/UAST_parser.py +315 -0
  7. dpk_code_profiler/__init__.py +0 -0
  8. dpk_code_profiler/data/Concept_dataset/uast_comment/agda/0.txt +5 -0
  9. dpk_code_profiler/data/Concept_dataset/uast_comment/c/0.txt +4 -0
  10. dpk_code_profiler/data/Concept_dataset/uast_comment/c_sharp/0.txt +4 -0
  11. dpk_code_profiler/data/Concept_dataset/uast_comment/cpp/0.txt +12 -0
  12. dpk_code_profiler/data/Concept_dataset/uast_comment/d/0.txt +7 -0
  13. dpk_code_profiler/data/Concept_dataset/uast_comment/dart/0.txt +1 -0
  14. dpk_code_profiler/data/Concept_dataset/uast_comment/dart/1.txt +3 -0
  15. dpk_code_profiler/data/Concept_dataset/uast_comment/elm/0.txt +1 -0
  16. dpk_code_profiler/data/Concept_dataset/uast_comment/elm/1.txt +1 -0
  17. dpk_code_profiler/data/Concept_dataset/uast_comment/go/0.txt +3 -0
  18. dpk_code_profiler/data/Concept_dataset/uast_comment/haskell/0.txt +1 -0
  19. dpk_code_profiler/data/Concept_dataset/uast_comment/java/0.txt +7 -0
  20. dpk_code_profiler/data/Concept_dataset/uast_comment/java/1.txt +12 -0
  21. dpk_code_profiler/data/Concept_dataset/uast_comment/js/0.txt +7 -0
  22. dpk_code_profiler/data/Concept_dataset/uast_comment/kotlin/0.txt +1 -0
  23. dpk_code_profiler/data/Concept_dataset/uast_comment/kotlin/1.txt +2 -0
  24. dpk_code_profiler/data/Concept_dataset/uast_comment/nim/0.txt +1 -0
  25. dpk_code_profiler/data/Concept_dataset/uast_comment/nim/1.txt +3 -0
  26. dpk_code_profiler/data/Concept_dataset/uast_comment/objc/0.txt +6 -0
  27. dpk_code_profiler/data/Concept_dataset/uast_comment/ocaml/0.txt +2 -0
  28. dpk_code_profiler/data/Concept_dataset/uast_comment/py/0.txt +2 -0
  29. dpk_code_profiler/data/Concept_dataset/uast_comment/qmljs/0.txt +6 -0
  30. dpk_code_profiler/data/Concept_dataset/uast_comment/rust/0.txt +1 -0
  31. dpk_code_profiler/data/Concept_dataset/uast_comment/scala/0.txt +1 -0
  32. dpk_code_profiler/data/Concept_dataset/uast_comment/scala/1.txt +1 -0
  33. dpk_code_profiler/data/Concept_dataset/uast_comment/ts/0.txt +2 -0
  34. dpk_code_profiler/data/Concept_dataset/uast_comment/verilog/0.txt +4 -0
  35. dpk_code_profiler/data/Concept_dataset/uast_comment/vhdl/0.txt +1 -0
  36. dpk_code_profiler/data/Concept_dataset/uast_function/agda/0.txt +2 -0
  37. dpk_code_profiler/data/Concept_dataset/uast_function/c/0.txt +6 -0
  38. dpk_code_profiler/data/Concept_dataset/uast_function/c_sharp/0.txt +4 -0
  39. dpk_code_profiler/data/Concept_dataset/uast_function/cpp/0.txt +10 -0
  40. dpk_code_profiler/data/Concept_dataset/uast_function/cpp/1.txt +46 -0
  41. dpk_code_profiler/data/Concept_dataset/uast_function/cpp/2.txt +75 -0
  42. dpk_code_profiler/data/Concept_dataset/uast_function/cpp/3.txt +99 -0
  43. dpk_code_profiler/data/Concept_dataset/uast_function/d/0.txt +8 -0
  44. dpk_code_profiler/data/Concept_dataset/uast_function/dart/0.txt +9 -0
  45. dpk_code_profiler/data/Concept_dataset/uast_function/elm/0.txt +1 -0
  46. dpk_code_profiler/data/Concept_dataset/uast_function/go/0.txt +20 -0
  47. dpk_code_profiler/data/Concept_dataset/uast_function/haskell/0.txt +1 -0
  48. dpk_code_profiler/data/Concept_dataset/uast_function/java/0.txt +8 -0
  49. dpk_code_profiler/data/Concept_dataset/uast_function/js/0.txt +10 -0
  50. dpk_code_profiler/data/Concept_dataset/uast_function/kotlin/0.txt +3 -0
  51. dpk_code_profiler/data/Concept_dataset/uast_function/nim/0.txt +3 -0
  52. dpk_code_profiler/data/Concept_dataset/uast_function/objc/0.txt +14 -0
  53. dpk_code_profiler/data/Concept_dataset/uast_function/perl/0.txt +3 -0
  54. dpk_code_profiler/data/Concept_dataset/uast_function/py/0.txt +5 -0
  55. dpk_code_profiler/data/Concept_dataset/uast_function/rust/0.txt +5 -0
  56. dpk_code_profiler/data/Concept_dataset/uast_function/scala/0.txt +6 -0
  57. dpk_code_profiler/data/Concept_dataset/uast_function/scala/1.txt +14 -0
  58. dpk_code_profiler/data/Concept_dataset/uast_function/ts/0.txt +14 -0
  59. dpk_code_profiler/data/Concept_dataset/uast_function/verilog/0.txt +5 -0
  60. dpk_code_profiler/data/Concept_dataset/uast_function/vhdl/0.txt +22 -0
  61. dpk_code_profiler/data/Concept_dataset/uast_package/agda/0.txt +2 -0
  62. dpk_code_profiler/data/Concept_dataset/uast_package/c/0.txt +3 -0
  63. dpk_code_profiler/data/Concept_dataset/uast_package/c_sharp/0.txt +2 -0
  64. dpk_code_profiler/data/Concept_dataset/uast_package/c_sharp/1.txt +2 -0
  65. dpk_code_profiler/data/Concept_dataset/uast_package/cpp/0.txt +8 -0
  66. dpk_code_profiler/data/Concept_dataset/uast_package/cpp/1.txt +4 -0
  67. dpk_code_profiler/data/Concept_dataset/uast_package/cpp/2.txt +25 -0
  68. dpk_code_profiler/data/Concept_dataset/uast_package/cpp/3.txt +25 -0
  69. dpk_code_profiler/data/Concept_dataset/uast_package/cpp/4.txt +29 -0
  70. dpk_code_profiler/data/Concept_dataset/uast_package/cpp/5.txt +29 -0
  71. dpk_code_profiler/data/Concept_dataset/uast_package/cpp/6.txt +29 -0
  72. dpk_code_profiler/data/Concept_dataset/uast_package/d/0.txt +2 -0
  73. dpk_code_profiler/data/Concept_dataset/uast_package/dart/0.txt +5 -0
  74. dpk_code_profiler/data/Concept_dataset/uast_package/elm/0.txt +5 -0
  75. dpk_code_profiler/data/Concept_dataset/uast_package/go/0.txt +26 -0
  76. dpk_code_profiler/data/Concept_dataset/uast_package/haskell/0.txt +11 -0
  77. dpk_code_profiler/data/Concept_dataset/uast_package/java/0.txt +10 -0
  78. dpk_code_profiler/data/Concept_dataset/uast_package/js/0.txt +4 -0
  79. dpk_code_profiler/data/Concept_dataset/uast_package/kotlin/0.txt +3 -0
  80. dpk_code_profiler/data/Concept_dataset/uast_package/nim/0.txt +4 -0
  81. dpk_code_profiler/data/Concept_dataset/uast_package/nim/1.txt +1 -0
  82. dpk_code_profiler/data/Concept_dataset/uast_package/nim/2.txt +1 -0
  83. dpk_code_profiler/data/Concept_dataset/uast_package/objc/0.txt +2 -0
  84. dpk_code_profiler/data/Concept_dataset/uast_package/objc/1.txt +4 -0
  85. dpk_code_profiler/data/Concept_dataset/uast_package/ocaml/0.txt +1 -0
  86. dpk_code_profiler/data/Concept_dataset/uast_package/perl/0.txt +6 -0
  87. dpk_code_profiler/data/Concept_dataset/uast_package/py/0.txt +5 -0
  88. dpk_code_profiler/data/Concept_dataset/uast_package/py/1.txt +2 -0
  89. dpk_code_profiler/data/Concept_dataset/uast_package/qmljs/0.txt +2 -0
  90. dpk_code_profiler/data/Concept_dataset/uast_package/rust/0.txt +2 -0
  91. dpk_code_profiler/data/Concept_dataset/uast_package/scala/0.txt +6 -0
  92. dpk_code_profiler/data/Concept_dataset/uast_package/scala/1.txt +30 -0
  93. dpk_code_profiler/data/Concept_dataset/uast_package/ts/0.txt +14 -0
  94. dpk_code_profiler/data/Concept_dataset/uast_package/verilog/0.txt +1 -0
  95. dpk_code_profiler/data/Concept_dataset/uast_package/verilog/1.txt +2 -0
  96. dpk_code_profiler/data/Concept_dataset/uast_package/vhdl/0.txt +2 -0
  97. dpk_code_profiler/data/few_shot_outputs/uast_comment/agda/0.txt +14 -0
  98. dpk_code_profiler/data/few_shot_outputs/uast_comment/c/0.txt +14 -0
  99. dpk_code_profiler/data/few_shot_outputs/uast_comment/c_sharp/0.txt +14 -0
  100. dpk_code_profiler/data/few_shot_outputs/uast_comment/cpp/0.txt +11 -0
  101. dpk_code_profiler/data/few_shot_outputs/uast_comment/d/0.txt +12 -0
  102. dpk_code_profiler/data/few_shot_outputs/uast_comment/dart/0.txt +11 -0
  103. dpk_code_profiler/data/few_shot_outputs/uast_comment/dart/1.txt +14 -0
  104. dpk_code_profiler/data/few_shot_outputs/uast_comment/elm/0.txt +10 -0
  105. dpk_code_profiler/data/few_shot_outputs/uast_comment/elm/1.txt +8 -0
  106. dpk_code_profiler/data/few_shot_outputs/uast_comment/go/0.txt +14 -0
  107. dpk_code_profiler/data/few_shot_outputs/uast_comment/haskell/0.txt +10 -0
  108. dpk_code_profiler/data/few_shot_outputs/uast_comment/java/0.txt +7 -0
  109. dpk_code_profiler/data/few_shot_outputs/uast_comment/java/1.txt +7 -0
  110. dpk_code_profiler/data/few_shot_outputs/uast_comment/js/0.txt +12 -0
  111. dpk_code_profiler/data/few_shot_outputs/uast_comment/kotlin/0.txt +10 -0
  112. dpk_code_profiler/data/few_shot_outputs/uast_comment/kotlin/1.txt +10 -0
  113. dpk_code_profiler/data/few_shot_outputs/uast_comment/nim/0.txt +10 -0
  114. dpk_code_profiler/data/few_shot_outputs/uast_comment/nim/1.txt +10 -0
  115. dpk_code_profiler/data/few_shot_outputs/uast_comment/objc/0.txt +14 -0
  116. dpk_code_profiler/data/few_shot_outputs/uast_comment/ocaml/0.txt +10 -0
  117. dpk_code_profiler/data/few_shot_outputs/uast_comment/py/0.txt +7 -0
  118. dpk_code_profiler/data/few_shot_outputs/uast_comment/qmljs/0.txt +14 -0
  119. dpk_code_profiler/data/few_shot_outputs/uast_comment/rust/0.txt +10 -0
  120. dpk_code_profiler/data/few_shot_outputs/uast_comment/scala/0.txt +10 -0
  121. dpk_code_profiler/data/few_shot_outputs/uast_comment/scala/1.txt +10 -0
  122. dpk_code_profiler/data/few_shot_outputs/uast_comment/ts/0.txt +14 -0
  123. dpk_code_profiler/data/few_shot_outputs/uast_comment/verilog/0.txt +14 -0
  124. dpk_code_profiler/data/few_shot_outputs/uast_comment/vhdl/0.txt +10 -0
  125. dpk_code_profiler/data/few_shot_outputs/uast_function/agda/0.txt +12 -0
  126. dpk_code_profiler/data/few_shot_outputs/uast_function/c/0.txt +12 -0
  127. dpk_code_profiler/data/few_shot_outputs/uast_function/c_sharp/0.txt +12 -0
  128. dpk_code_profiler/data/few_shot_outputs/uast_function/cpp/0.txt +9 -0
  129. dpk_code_profiler/data/few_shot_outputs/uast_function/cpp/1.txt +12 -0
  130. dpk_code_profiler/data/few_shot_outputs/uast_function/cpp/2.txt +12 -0
  131. dpk_code_profiler/data/few_shot_outputs/uast_function/cpp/3.txt +12 -0
  132. dpk_code_profiler/data/few_shot_outputs/uast_function/d/0.txt +12 -0
  133. dpk_code_profiler/data/few_shot_outputs/uast_function/dart/0.txt +12 -0
  134. dpk_code_profiler/data/few_shot_outputs/uast_function/elm/0.txt +12 -0
  135. dpk_code_profiler/data/few_shot_outputs/uast_function/elm/1.txt +12 -0
  136. dpk_code_profiler/data/few_shot_outputs/uast_function/go/0.txt +9 -0
  137. dpk_code_profiler/data/few_shot_outputs/uast_function/haskell/0.txt +12 -0
  138. dpk_code_profiler/data/few_shot_outputs/uast_function/java/0.txt +9 -0
  139. dpk_code_profiler/data/few_shot_outputs/uast_function/js/0.txt +9 -0
  140. dpk_code_profiler/data/few_shot_outputs/uast_function/kotlin/0.txt +12 -0
  141. dpk_code_profiler/data/few_shot_outputs/uast_function/nim/0.txt +12 -0
  142. dpk_code_profiler/data/few_shot_outputs/uast_function/objc/0.txt +12 -0
  143. dpk_code_profiler/data/few_shot_outputs/uast_function/perl/0.txt +12 -0
  144. dpk_code_profiler/data/few_shot_outputs/uast_function/py/0.txt +9 -0
  145. dpk_code_profiler/data/few_shot_outputs/uast_function/rust/0.txt +12 -0
  146. dpk_code_profiler/data/few_shot_outputs/uast_function/scala/0.txt +12 -0
  147. dpk_code_profiler/data/few_shot_outputs/uast_function/scala/1.txt +14 -0
  148. dpk_code_profiler/data/few_shot_outputs/uast_function/ts/0.txt +11 -0
  149. dpk_code_profiler/data/few_shot_outputs/uast_function/verilog/0.txt +10 -0
  150. dpk_code_profiler/data/few_shot_outputs/uast_function/vhdl/0.txt +12 -0
  151. dpk_code_profiler/data/few_shot_outputs/uast_package/agda/0.txt +16 -0
  152. dpk_code_profiler/data/few_shot_outputs/uast_package/c/0.txt +26 -0
  153. dpk_code_profiler/data/few_shot_outputs/uast_package/c_sharp/0.txt +18 -0
  154. dpk_code_profiler/data/few_shot_outputs/uast_package/c_sharp/1.txt +20 -0
  155. dpk_code_profiler/data/few_shot_outputs/uast_package/cpp/0.txt +10 -0
  156. dpk_code_profiler/data/few_shot_outputs/uast_package/cpp/1.txt +25 -0
  157. dpk_code_profiler/data/few_shot_outputs/uast_package/cpp/2.txt +25 -0
  158. dpk_code_profiler/data/few_shot_outputs/uast_package/cpp/3.txt +25 -0
  159. dpk_code_profiler/data/few_shot_outputs/uast_package/cpp/4.txt +28 -0
  160. dpk_code_profiler/data/few_shot_outputs/uast_package/cpp/5.txt +28 -0
  161. dpk_code_profiler/data/few_shot_outputs/uast_package/cpp/6.txt +28 -0
  162. dpk_code_profiler/data/few_shot_outputs/uast_package/d/0.txt +17 -0
  163. dpk_code_profiler/data/few_shot_outputs/uast_package/dart/0.txt +15 -0
  164. dpk_code_profiler/data/few_shot_outputs/uast_package/elm/0.txt +20 -0
  165. dpk_code_profiler/data/few_shot_outputs/uast_package/go/0.txt +23 -0
  166. dpk_code_profiler/data/few_shot_outputs/uast_package/haskell/0.txt +28 -0
  167. dpk_code_profiler/data/few_shot_outputs/uast_package/java/0.txt +13 -0
  168. dpk_code_profiler/data/few_shot_outputs/uast_package/js/0.txt +10 -0
  169. dpk_code_profiler/data/few_shot_outputs/uast_package/kotlin/0.txt +20 -0
  170. dpk_code_profiler/data/few_shot_outputs/uast_package/nim/0.txt +45 -0
  171. dpk_code_profiler/data/few_shot_outputs/uast_package/nim/1.txt +12 -0
  172. dpk_code_profiler/data/few_shot_outputs/uast_package/nim/2.txt +14 -0
  173. dpk_code_profiler/data/few_shot_outputs/uast_package/objc/0.txt +15 -0
  174. dpk_code_profiler/data/few_shot_outputs/uast_package/objc/1.txt +19 -0
  175. dpk_code_profiler/data/few_shot_outputs/uast_package/ocaml/0.txt +12 -0
  176. dpk_code_profiler/data/few_shot_outputs/uast_package/perl/0.txt +24 -0
  177. dpk_code_profiler/data/few_shot_outputs/uast_package/py/0.txt +38 -0
  178. dpk_code_profiler/data/few_shot_outputs/uast_package/py/1.txt +11 -0
  179. dpk_code_profiler/data/few_shot_outputs/uast_package/qmljs/0.txt +14 -0
  180. dpk_code_profiler/data/few_shot_outputs/uast_package/rust/0.txt +18 -0
  181. dpk_code_profiler/data/few_shot_outputs/uast_package/scala/0.txt +45 -0
  182. dpk_code_profiler/data/few_shot_outputs/uast_package/scala/1.txt +24 -0
  183. dpk_code_profiler/data/few_shot_outputs/uast_package/ts/0.txt +27 -0
  184. dpk_code_profiler/data/few_shot_outputs/uast_package/verilog/0.txt +12 -0
  185. dpk_code_profiler/data/few_shot_outputs/uast_package/verilog/1.txt +19 -0
  186. dpk_code_profiler/data/few_shot_outputs/uast_package/vhdl/0.txt +14 -0
  187. dpk_code_profiler/data/final_UI_outputs/comment/agda/0.txt +14 -0
  188. dpk_code_profiler/data/final_UI_outputs/comment/agda/example_languages.txt +2 -0
  189. dpk_code_profiler/data/final_UI_outputs/comment/agda/prompt.txt +24 -0
  190. dpk_code_profiler/data/final_UI_outputs/comment/agda/test_code.txt +5 -0
  191. dpk_code_profiler/data/final_UI_outputs/comment/c/0.txt +14 -0
  192. dpk_code_profiler/data/final_UI_outputs/comment/c/example_languages.txt +2 -0
  193. dpk_code_profiler/data/final_UI_outputs/comment/c/prompt.txt +24 -0
  194. dpk_code_profiler/data/final_UI_outputs/comment/c/test_code.txt +4 -0
  195. dpk_code_profiler/data/final_UI_outputs/comment/c_sharp/0.txt +14 -0
  196. dpk_code_profiler/data/final_UI_outputs/comment/c_sharp/example_languages.txt +2 -0
  197. dpk_code_profiler/data/final_UI_outputs/comment/c_sharp/prompt.txt +24 -0
  198. dpk_code_profiler/data/final_UI_outputs/comment/c_sharp/test_code.txt +4 -0
  199. dpk_code_profiler/data/final_UI_outputs/comment/cpp/0.txt +11 -0
  200. dpk_code_profiler/data/final_UI_outputs/comment/d/0.txt +12 -0
  201. dpk_code_profiler/data/final_UI_outputs/comment/d/example_languages.txt +2 -0
  202. dpk_code_profiler/data/final_UI_outputs/comment/d/prompt.txt +24 -0
  203. dpk_code_profiler/data/final_UI_outputs/comment/d/test_code.txt +7 -0
  204. dpk_code_profiler/data/final_UI_outputs/comment/dart/0.txt +11 -0
  205. dpk_code_profiler/data/final_UI_outputs/comment/dart/example_languages.txt +2 -0
  206. dpk_code_profiler/data/final_UI_outputs/comment/dart/prompt.txt +24 -0
  207. dpk_code_profiler/data/final_UI_outputs/comment/dart/test_code.txt +1 -0
  208. dpk_code_profiler/data/final_UI_outputs/comment/elm/0.txt +10 -0
  209. dpk_code_profiler/data/final_UI_outputs/comment/elm/example_languages.txt +2 -0
  210. dpk_code_profiler/data/final_UI_outputs/comment/elm/prompt.txt +24 -0
  211. dpk_code_profiler/data/final_UI_outputs/comment/elm/test_code.txt +1 -0
  212. dpk_code_profiler/data/final_UI_outputs/comment/go/0.txt +14 -0
  213. dpk_code_profiler/data/final_UI_outputs/comment/go/example_languages.txt +2 -0
  214. dpk_code_profiler/data/final_UI_outputs/comment/go/prompt.txt +24 -0
  215. dpk_code_profiler/data/final_UI_outputs/comment/go/test_code.txt +3 -0
  216. dpk_code_profiler/data/final_UI_outputs/comment/haskell/0.txt +10 -0
  217. dpk_code_profiler/data/final_UI_outputs/comment/haskell/example_languages.txt +2 -0
  218. dpk_code_profiler/data/final_UI_outputs/comment/haskell/prompt.txt +24 -0
  219. dpk_code_profiler/data/final_UI_outputs/comment/haskell/test_code.txt +1 -0
  220. dpk_code_profiler/data/final_UI_outputs/comment/java/0.txt +7 -0
  221. dpk_code_profiler/data/final_UI_outputs/comment/java/1.txt +7 -0
  222. dpk_code_profiler/data/final_UI_outputs/comment/js/0.txt +14 -0
  223. dpk_code_profiler/data/final_UI_outputs/comment/js/prompt.txt +24 -0
  224. dpk_code_profiler/data/final_UI_outputs/comment/kotlin/0.txt +10 -0
  225. dpk_code_profiler/data/final_UI_outputs/comment/kotlin/example_languages.txt +2 -0
  226. dpk_code_profiler/data/final_UI_outputs/comment/kotlin/prompt.txt +24 -0
  227. dpk_code_profiler/data/final_UI_outputs/comment/kotlin/test_code.txt +1 -0
  228. dpk_code_profiler/data/final_UI_outputs/comment/nim/0.txt +10 -0
  229. dpk_code_profiler/data/final_UI_outputs/comment/nim/example_languages.txt +2 -0
  230. dpk_code_profiler/data/final_UI_outputs/comment/nim/prompt.txt +24 -0
  231. dpk_code_profiler/data/final_UI_outputs/comment/nim/test_code.txt +1 -0
  232. dpk_code_profiler/data/final_UI_outputs/comment/objc/0.txt +14 -0
  233. dpk_code_profiler/data/final_UI_outputs/comment/objc/example_languages.txt +2 -0
  234. dpk_code_profiler/data/final_UI_outputs/comment/objc/prompt.txt +24 -0
  235. dpk_code_profiler/data/final_UI_outputs/comment/objc/test_code.txt +6 -0
  236. dpk_code_profiler/data/final_UI_outputs/comment/ocaml/0.txt +10 -0
  237. dpk_code_profiler/data/final_UI_outputs/comment/ocaml/example_languages.txt +2 -0
  238. dpk_code_profiler/data/final_UI_outputs/comment/ocaml/prompt.txt +24 -0
  239. dpk_code_profiler/data/final_UI_outputs/comment/ocaml/test_code.txt +2 -0
  240. dpk_code_profiler/data/final_UI_outputs/comment/py/0.txt +7 -0
  241. dpk_code_profiler/data/final_UI_outputs/comment/qmljs/0.txt +14 -0
  242. dpk_code_profiler/data/final_UI_outputs/comment/qmljs/example_languages.txt +2 -0
  243. dpk_code_profiler/data/final_UI_outputs/comment/qmljs/prompt.txt +24 -0
  244. dpk_code_profiler/data/final_UI_outputs/comment/qmljs/test_code.txt +6 -0
  245. dpk_code_profiler/data/final_UI_outputs/comment/rust/0.txt +10 -0
  246. dpk_code_profiler/data/final_UI_outputs/comment/rust/example_languages.txt +2 -0
  247. dpk_code_profiler/data/final_UI_outputs/comment/rust/prompt.txt +24 -0
  248. dpk_code_profiler/data/final_UI_outputs/comment/rust/test_code.txt +1 -0
  249. dpk_code_profiler/data/final_UI_outputs/comment/scala/0.txt +10 -0
  250. dpk_code_profiler/data/final_UI_outputs/comment/scala/example_languages.txt +2 -0
  251. dpk_code_profiler/data/final_UI_outputs/comment/scala/prompt.txt +24 -0
  252. dpk_code_profiler/data/final_UI_outputs/comment/scala/test_code.txt +1 -0
  253. dpk_code_profiler/data/final_UI_outputs/comment/ts/0.txt +14 -0
  254. dpk_code_profiler/data/final_UI_outputs/comment/ts/example_languages.txt +2 -0
  255. dpk_code_profiler/data/final_UI_outputs/comment/ts/prompt.txt +24 -0
  256. dpk_code_profiler/data/final_UI_outputs/comment/ts/test_code.txt +2 -0
  257. dpk_code_profiler/data/final_UI_outputs/comment/verilog/0.txt +14 -0
  258. dpk_code_profiler/data/final_UI_outputs/comment/verilog/example_languages.txt +2 -0
  259. dpk_code_profiler/data/final_UI_outputs/comment/verilog/prompt.txt +24 -0
  260. dpk_code_profiler/data/final_UI_outputs/comment/verilog/test_code.txt +4 -0
  261. dpk_code_profiler/data/final_UI_outputs/comment/vhdl/0.txt +10 -0
  262. dpk_code_profiler/data/final_UI_outputs/comment/vhdl/example_languages.txt +2 -0
  263. dpk_code_profiler/data/final_UI_outputs/comment/vhdl/prompt.txt +24 -0
  264. dpk_code_profiler/data/final_UI_outputs/comment/vhdl/test_code.txt +1 -0
  265. dpk_code_profiler/data/final_UI_outputs/function/agda/0.txt +12 -0
  266. dpk_code_profiler/data/final_UI_outputs/function/agda/example_languages.txt +2 -0
  267. dpk_code_profiler/data/final_UI_outputs/function/agda/prompt.txt +23 -0
  268. dpk_code_profiler/data/final_UI_outputs/function/agda/test_code.txt +2 -0
  269. dpk_code_profiler/data/final_UI_outputs/function/c/0.txt +12 -0
  270. dpk_code_profiler/data/final_UI_outputs/function/c/example_languages.txt +2 -0
  271. dpk_code_profiler/data/final_UI_outputs/function/c/prompt.txt +21 -0
  272. dpk_code_profiler/data/final_UI_outputs/function/c/test_code.txt +6 -0
  273. dpk_code_profiler/data/final_UI_outputs/function/c_sharp/0.txt +12 -0
  274. dpk_code_profiler/data/final_UI_outputs/function/c_sharp/example_languages.txt +2 -0
  275. dpk_code_profiler/data/final_UI_outputs/function/c_sharp/prompt.txt +30 -0
  276. dpk_code_profiler/data/final_UI_outputs/function/c_sharp/test_code.txt +4 -0
  277. dpk_code_profiler/data/final_UI_outputs/function/cpp/0.txt +9 -0
  278. dpk_code_profiler/data/final_UI_outputs/function/cpp/1.txt +12 -0
  279. dpk_code_profiler/data/final_UI_outputs/function/cpp/2.txt +12 -0
  280. dpk_code_profiler/data/final_UI_outputs/function/cpp/3.txt +12 -0
  281. dpk_code_profiler/data/final_UI_outputs/function/cpp/example_languages_1.txt +2 -0
  282. dpk_code_profiler/data/final_UI_outputs/function/cpp/example_languages_2.txt +2 -0
  283. dpk_code_profiler/data/final_UI_outputs/function/cpp/example_languages_3.txt +2 -0
  284. dpk_code_profiler/data/final_UI_outputs/function/cpp/prompt_1.txt +31 -0
  285. dpk_code_profiler/data/final_UI_outputs/function/cpp/prompt_2.txt +31 -0
  286. dpk_code_profiler/data/final_UI_outputs/function/cpp/prompt_3.txt +31 -0
  287. dpk_code_profiler/data/final_UI_outputs/function/cpp/test_code_1.txt +46 -0
  288. dpk_code_profiler/data/final_UI_outputs/function/cpp/test_code_2.txt +75 -0
  289. dpk_code_profiler/data/final_UI_outputs/function/cpp/test_code_3.txt +99 -0
  290. dpk_code_profiler/data/final_UI_outputs/function/d/0.txt +12 -0
  291. dpk_code_profiler/data/final_UI_outputs/function/d/example_languages.txt +2 -0
  292. dpk_code_profiler/data/final_UI_outputs/function/d/prompt.txt +30 -0
  293. dpk_code_profiler/data/final_UI_outputs/function/d/test_code.txt +8 -0
  294. dpk_code_profiler/data/final_UI_outputs/function/dart/0.txt +12 -0
  295. dpk_code_profiler/data/final_UI_outputs/function/dart/example_languages.txt +2 -0
  296. dpk_code_profiler/data/final_UI_outputs/function/dart/prompt.txt +30 -0
  297. dpk_code_profiler/data/final_UI_outputs/function/dart/test_code.txt +9 -0
  298. dpk_code_profiler/data/final_UI_outputs/function/elm/0.txt +12 -0
  299. dpk_code_profiler/data/final_UI_outputs/function/elm/example_languages.txt +2 -0
  300. dpk_code_profiler/data/final_UI_outputs/function/elm/prompt.txt +30 -0
  301. dpk_code_profiler/data/final_UI_outputs/function/elm/test_code.txt +1 -0
  302. dpk_code_profiler/data/final_UI_outputs/function/go/0.txt +12 -0
  303. dpk_code_profiler/data/final_UI_outputs/function/go/example_languages.txt +2 -0
  304. dpk_code_profiler/data/final_UI_outputs/function/go/prompt.txt +23 -0
  305. dpk_code_profiler/data/final_UI_outputs/function/go/test_code.txt +3 -0
  306. dpk_code_profiler/data/final_UI_outputs/function/haskell/0.txt +12 -0
  307. dpk_code_profiler/data/final_UI_outputs/function/haskell/example_languages.txt +2 -0
  308. dpk_code_profiler/data/final_UI_outputs/function/haskell/prompt.txt +21 -0
  309. dpk_code_profiler/data/final_UI_outputs/function/haskell/test_code.txt +1 -0
  310. dpk_code_profiler/data/final_UI_outputs/function/java/0.txt +9 -0
  311. dpk_code_profiler/data/final_UI_outputs/function/js/0.txt +12 -0
  312. dpk_code_profiler/data/final_UI_outputs/function/js/prompt.txt +22 -0
  313. dpk_code_profiler/data/final_UI_outputs/function/kotlin/0.txt +12 -0
  314. dpk_code_profiler/data/final_UI_outputs/function/kotlin/example_languages.txt +2 -0
  315. dpk_code_profiler/data/final_UI_outputs/function/kotlin/prompt.txt +30 -0
  316. dpk_code_profiler/data/final_UI_outputs/function/kotlin/test_code.txt +3 -0
  317. dpk_code_profiler/data/final_UI_outputs/function/nim/0.txt +12 -0
  318. dpk_code_profiler/data/final_UI_outputs/function/nim/example_languages.txt +2 -0
  319. dpk_code_profiler/data/final_UI_outputs/function/nim/prompt.txt +30 -0
  320. dpk_code_profiler/data/final_UI_outputs/function/nim/test_code.txt +3 -0
  321. dpk_code_profiler/data/final_UI_outputs/function/objc/0.txt +12 -0
  322. dpk_code_profiler/data/final_UI_outputs/function/objc/example_languages.txt +2 -0
  323. dpk_code_profiler/data/final_UI_outputs/function/objc/prompt.txt +30 -0
  324. dpk_code_profiler/data/final_UI_outputs/function/objc/test_code.txt +14 -0
  325. dpk_code_profiler/data/final_UI_outputs/function/perl/0.txt +12 -0
  326. dpk_code_profiler/data/final_UI_outputs/function/perl/example_languages.txt +2 -0
  327. dpk_code_profiler/data/final_UI_outputs/function/perl/prompt.txt +30 -0
  328. dpk_code_profiler/data/final_UI_outputs/function/perl/test_code.txt +3 -0
  329. dpk_code_profiler/data/final_UI_outputs/function/py/0.txt +9 -0
  330. dpk_code_profiler/data/final_UI_outputs/function/rust/0.txt +12 -0
  331. dpk_code_profiler/data/final_UI_outputs/function/rust/example_languages.txt +2 -0
  332. dpk_code_profiler/data/final_UI_outputs/function/rust/prompt.txt +30 -0
  333. dpk_code_profiler/data/final_UI_outputs/function/rust/test_code.txt +5 -0
  334. dpk_code_profiler/data/final_UI_outputs/function/scala/0.txt +12 -0
  335. dpk_code_profiler/data/final_UI_outputs/function/scala/1.txt +14 -0
  336. dpk_code_profiler/data/final_UI_outputs/function/scala/example_languages.txt +2 -0
  337. dpk_code_profiler/data/final_UI_outputs/function/scala/example_languages_1.txt +2 -0
  338. dpk_code_profiler/data/final_UI_outputs/function/scala/prompt.txt +31 -0
  339. dpk_code_profiler/data/final_UI_outputs/function/scala/prompt_1.txt +31 -0
  340. dpk_code_profiler/data/final_UI_outputs/function/scala/test_code.txt +6 -0
  341. dpk_code_profiler/data/final_UI_outputs/function/scala/test_code_1.txt +14 -0
  342. dpk_code_profiler/data/final_UI_outputs/function/ts/0.txt +11 -0
  343. dpk_code_profiler/data/final_UI_outputs/function/ts/prompt.txt +21 -0
  344. dpk_code_profiler/data/final_UI_outputs/function/verilog/0.txt +10 -0
  345. dpk_code_profiler/data/final_UI_outputs/function/verilog/example_languages.txt +2 -0
  346. dpk_code_profiler/data/final_UI_outputs/function/verilog/prompt.txt +31 -0
  347. dpk_code_profiler/data/final_UI_outputs/function/verilog/test_code.txt +5 -0
  348. dpk_code_profiler/data/final_UI_outputs/function/vhdl/0.txt +12 -0
  349. dpk_code_profiler/data/final_UI_outputs/function/vhdl/example_languages.txt +2 -0
  350. dpk_code_profiler/data/final_UI_outputs/function/vhdl/prompt.txt +31 -0
  351. dpk_code_profiler/data/final_UI_outputs/function/vhdl/test_code.txt +22 -0
  352. dpk_code_profiler/data/final_UI_outputs/package/agda/0.txt +16 -0
  353. dpk_code_profiler/data/final_UI_outputs/package/agda/example_languages.txt +2 -0
  354. dpk_code_profiler/data/final_UI_outputs/package/agda/prompt.txt +31 -0
  355. dpk_code_profiler/data/final_UI_outputs/package/agda/test_code.txt +2 -0
  356. dpk_code_profiler/data/final_UI_outputs/package/c/0.txt +26 -0
  357. dpk_code_profiler/data/final_UI_outputs/package/c/example_languages.txt +2 -0
  358. dpk_code_profiler/data/final_UI_outputs/package/c/prompt.txt +32 -0
  359. dpk_code_profiler/data/final_UI_outputs/package/c/test_code.txt +3 -0
  360. dpk_code_profiler/data/final_UI_outputs/package/c_sharp/0.txt +18 -0
  361. dpk_code_profiler/data/final_UI_outputs/package/c_sharp/1.txt +20 -0
  362. dpk_code_profiler/data/final_UI_outputs/package/c_sharp/example_languages.txt +2 -0
  363. dpk_code_profiler/data/final_UI_outputs/package/c_sharp/example_languages_1.txt +2 -0
  364. dpk_code_profiler/data/final_UI_outputs/package/c_sharp/prompt.txt +31 -0
  365. dpk_code_profiler/data/final_UI_outputs/package/c_sharp/prompt_1.txt +33 -0
  366. dpk_code_profiler/data/final_UI_outputs/package/c_sharp/test_code.txt +2 -0
  367. dpk_code_profiler/data/final_UI_outputs/package/c_sharp/test_code_1.txt +2 -0
  368. dpk_code_profiler/data/final_UI_outputs/package/cpp/0.txt +7 -0
  369. dpk_code_profiler/data/final_UI_outputs/package/cpp/1.txt +25 -0
  370. dpk_code_profiler/data/final_UI_outputs/package/cpp/2.txt +25 -0
  371. dpk_code_profiler/data/final_UI_outputs/package/cpp/3.txt +25 -0
  372. dpk_code_profiler/data/final_UI_outputs/package/cpp/4.txt +28 -0
  373. dpk_code_profiler/data/final_UI_outputs/package/cpp/5.txt +28 -0
  374. dpk_code_profiler/data/final_UI_outputs/package/cpp/6.txt +28 -0
  375. dpk_code_profiler/data/final_UI_outputs/package/cpp/example_languages_1.txt +2 -0
  376. dpk_code_profiler/data/final_UI_outputs/package/cpp/example_languages_2.txt +2 -0
  377. dpk_code_profiler/data/final_UI_outputs/package/cpp/example_languages_3.txt +2 -0
  378. dpk_code_profiler/data/final_UI_outputs/package/cpp/example_languages_4.txt +2 -0
  379. dpk_code_profiler/data/final_UI_outputs/package/cpp/example_languages_5.txt +2 -0
  380. dpk_code_profiler/data/final_UI_outputs/package/cpp/example_languages_6.txt +2 -0
  381. dpk_code_profiler/data/final_UI_outputs/package/cpp/prompt.txt +0 -0
  382. dpk_code_profiler/data/final_UI_outputs/package/cpp/prompt_1.txt +33 -0
  383. dpk_code_profiler/data/final_UI_outputs/package/cpp/prompt_2.txt +33 -0
  384. dpk_code_profiler/data/final_UI_outputs/package/cpp/prompt_3.txt +33 -0
  385. dpk_code_profiler/data/final_UI_outputs/package/cpp/prompt_4.txt +33 -0
  386. dpk_code_profiler/data/final_UI_outputs/package/cpp/prompt_5.txt +33 -0
  387. dpk_code_profiler/data/final_UI_outputs/package/cpp/prompt_6.txt +33 -0
  388. dpk_code_profiler/data/final_UI_outputs/package/cpp/test_code_1.txt +4 -0
  389. dpk_code_profiler/data/final_UI_outputs/package/cpp/test_code_2.txt +25 -0
  390. dpk_code_profiler/data/final_UI_outputs/package/cpp/test_code_3.txt +25 -0
  391. dpk_code_profiler/data/final_UI_outputs/package/cpp/test_code_4.txt +29 -0
  392. dpk_code_profiler/data/final_UI_outputs/package/cpp/test_code_5.txt +29 -0
  393. dpk_code_profiler/data/final_UI_outputs/package/cpp/test_code_6.txt +29 -0
  394. dpk_code_profiler/data/final_UI_outputs/package/d/0.txt +17 -0
  395. dpk_code_profiler/data/final_UI_outputs/package/d/example_languages.txt +2 -0
  396. dpk_code_profiler/data/final_UI_outputs/package/d/prompt.txt +31 -0
  397. dpk_code_profiler/data/final_UI_outputs/package/d/test_code.txt +2 -0
  398. dpk_code_profiler/data/final_UI_outputs/package/dart/0.txt +15 -0
  399. dpk_code_profiler/data/final_UI_outputs/package/dart/example_languages.txt +2 -0
  400. dpk_code_profiler/data/final_UI_outputs/package/dart/prompt.txt +33 -0
  401. dpk_code_profiler/data/final_UI_outputs/package/dart/test_code.txt +5 -0
  402. dpk_code_profiler/data/final_UI_outputs/package/elm/0.txt +20 -0
  403. dpk_code_profiler/data/final_UI_outputs/package/elm/example_languages.txt +2 -0
  404. dpk_code_profiler/data/final_UI_outputs/package/elm/prompt.txt +31 -0
  405. dpk_code_profiler/data/final_UI_outputs/package/elm/test_code.txt +5 -0
  406. dpk_code_profiler/data/final_UI_outputs/package/go/0.txt +23 -0
  407. dpk_code_profiler/data/final_UI_outputs/package/go/prompt.txt +0 -0
  408. dpk_code_profiler/data/final_UI_outputs/package/haskell/0.txt +28 -0
  409. dpk_code_profiler/data/final_UI_outputs/package/haskell/example_languages.txt +2 -0
  410. dpk_code_profiler/data/final_UI_outputs/package/haskell/prompt.txt +32 -0
  411. dpk_code_profiler/data/final_UI_outputs/package/haskell/test_code.txt +11 -0
  412. dpk_code_profiler/data/final_UI_outputs/package/java/0.txt +13 -0
  413. dpk_code_profiler/data/final_UI_outputs/package/java/prompt.txt +0 -0
  414. dpk_code_profiler/data/final_UI_outputs/package/js/0.txt +16 -0
  415. dpk_code_profiler/data/final_UI_outputs/package/js/example_languages.txt +2 -0
  416. dpk_code_profiler/data/final_UI_outputs/package/js/prompt.txt +30 -0
  417. dpk_code_profiler/data/final_UI_outputs/package/js/test_code.txt +4 -0
  418. dpk_code_profiler/data/final_UI_outputs/package/kotlin/0.txt +20 -0
  419. dpk_code_profiler/data/final_UI_outputs/package/kotlin/example_languages.txt +2 -0
  420. dpk_code_profiler/data/final_UI_outputs/package/kotlin/prompt.txt +32 -0
  421. dpk_code_profiler/data/final_UI_outputs/package/kotlin/test_code.txt +3 -0
  422. dpk_code_profiler/data/final_UI_outputs/package/nim/0.txt +45 -0
  423. dpk_code_profiler/data/final_UI_outputs/package/nim/1.txt +12 -0
  424. dpk_code_profiler/data/final_UI_outputs/package/nim/2.txt +14 -0
  425. dpk_code_profiler/data/final_UI_outputs/package/nim/example_languages.txt +2 -0
  426. dpk_code_profiler/data/final_UI_outputs/package/nim/prompt.txt +31 -0
  427. dpk_code_profiler/data/final_UI_outputs/package/nim/test_code.txt +4 -0
  428. dpk_code_profiler/data/final_UI_outputs/package/nim/test_code_0.txt +4 -0
  429. dpk_code_profiler/data/final_UI_outputs/package/nim/test_code_1.txt +1 -0
  430. dpk_code_profiler/data/final_UI_outputs/package/nim/test_code_2.txt +1 -0
  431. dpk_code_profiler/data/final_UI_outputs/package/objc/0.txt +15 -0
  432. dpk_code_profiler/data/final_UI_outputs/package/objc/1.txt +19 -0
  433. dpk_code_profiler/data/final_UI_outputs/package/objc/example_languages.txt +2 -0
  434. dpk_code_profiler/data/final_UI_outputs/package/objc/prompt.txt +31 -0
  435. dpk_code_profiler/data/final_UI_outputs/package/objc/test_code.txt +2 -0
  436. dpk_code_profiler/data/final_UI_outputs/package/objc/test_code_1.txt +4 -0
  437. dpk_code_profiler/data/final_UI_outputs/package/ocaml/0.txt +12 -0
  438. dpk_code_profiler/data/final_UI_outputs/package/ocaml/example_languages.txt +2 -0
  439. dpk_code_profiler/data/final_UI_outputs/package/ocaml/prompt.txt +31 -0
  440. dpk_code_profiler/data/final_UI_outputs/package/ocaml/test_code.txt +1 -0
  441. dpk_code_profiler/data/final_UI_outputs/package/perl/0.txt +24 -0
  442. dpk_code_profiler/data/final_UI_outputs/package/perl/example_languages.txt +2 -0
  443. dpk_code_profiler/data/final_UI_outputs/package/perl/prompt.txt +32 -0
  444. dpk_code_profiler/data/final_UI_outputs/package/perl/test_code.txt +6 -0
  445. dpk_code_profiler/data/final_UI_outputs/package/py/0.txt +47 -0
  446. dpk_code_profiler/data/final_UI_outputs/package/py/1.txt +17 -0
  447. dpk_code_profiler/data/final_UI_outputs/package/py/prompt.txt +0 -0
  448. dpk_code_profiler/data/final_UI_outputs/package/qmljs/0.txt +14 -0
  449. dpk_code_profiler/data/final_UI_outputs/package/qmljs/example_languages.txt +2 -0
  450. dpk_code_profiler/data/final_UI_outputs/package/qmljs/prompt.txt +31 -0
  451. dpk_code_profiler/data/final_UI_outputs/package/qmljs/test_code.txt +2 -0
  452. dpk_code_profiler/data/final_UI_outputs/package/rust/0.txt +18 -0
  453. dpk_code_profiler/data/final_UI_outputs/package/rust/example_languages.txt +2 -0
  454. dpk_code_profiler/data/final_UI_outputs/package/rust/prompt.txt +33 -0
  455. dpk_code_profiler/data/final_UI_outputs/package/rust/test_code.txt +2 -0
  456. dpk_code_profiler/data/final_UI_outputs/package/scala/0.txt +45 -0
  457. dpk_code_profiler/data/final_UI_outputs/package/scala/1.txt +24 -0
  458. dpk_code_profiler/data/final_UI_outputs/package/scala/example_languages.txt +2 -0
  459. dpk_code_profiler/data/final_UI_outputs/package/scala/example_languages_1.txt +2 -0
  460. dpk_code_profiler/data/final_UI_outputs/package/scala/prompt.txt +30 -0
  461. dpk_code_profiler/data/final_UI_outputs/package/scala/prompt_1.txt +33 -0
  462. dpk_code_profiler/data/final_UI_outputs/package/scala/test_code.txt +5 -0
  463. dpk_code_profiler/data/final_UI_outputs/package/scala/test_code_1.txt +30 -0
  464. dpk_code_profiler/data/final_UI_outputs/package/ts/0.txt +27 -0
  465. dpk_code_profiler/data/final_UI_outputs/package/ts/example_languages.txt +2 -0
  466. dpk_code_profiler/data/final_UI_outputs/package/ts/prompt.txt +30 -0
  467. dpk_code_profiler/data/final_UI_outputs/package/ts/test_code.txt +14 -0
  468. dpk_code_profiler/data/final_UI_outputs/package/verilog/0.txt +12 -0
  469. dpk_code_profiler/data/final_UI_outputs/package/verilog/1.txt +19 -0
  470. dpk_code_profiler/data/final_UI_outputs/package/verilog/example_languages.txt +2 -0
  471. dpk_code_profiler/data/final_UI_outputs/package/verilog/prompt.txt +31 -0
  472. dpk_code_profiler/data/final_UI_outputs/package/verilog/test_code.txt +1 -0
  473. dpk_code_profiler/data/final_UI_outputs/package/verilog/test_code_1.txt +2 -0
  474. dpk_code_profiler/data/final_UI_outputs/package/vhdl/0.txt +14 -0
  475. dpk_code_profiler/data/final_UI_outputs/package/vhdl/example_languages.txt +2 -0
  476. dpk_code_profiler/data/final_UI_outputs/package/vhdl/prompt.txt +33 -0
  477. dpk_code_profiler/data/final_UI_outputs/package/vhdl/test_code.txt +2 -0
  478. dpk_code_profiler/data/helper.ipynb +165 -0
  479. dpk_code_profiler/data/prompts/comment.txt +24 -0
  480. dpk_code_profiler/data/prompts/function.txt +31 -0
  481. dpk_code_profiler/data/prompts/package.txt +33 -0
  482. dpk_code_profiler/grammar/UAST_Grammar.json +20 -0
  483. dpk_code_profiler/higher_order_concepts.py +63 -0
  484. dpk_code_profiler/local.py +68 -0
  485. dpk_code_profiler/local_python.py +47 -0
  486. dpk_code_profiler/offline-customizations/cached_requirements.json +198 -0
  487. dpk_code_profiler/offline-customizations/config_LLM_runner_app.py +21 -0
  488. dpk_code_profiler/offline-customizations/generic_LLM_runner_app.py +655 -0
  489. dpk_code_profiler/output_data-prep-kit.sl.cloud9.ibm.com_20250129-062407-115.html +205 -0
  490. dpk_code_profiler/output_data-prep-kit.sl.cloud9.ibm.com_20250129-062407-115.json +596 -0
  491. dpk_code_profiler/profiler-report/template.html +107 -0
  492. dpk_code_profiler/profiler_report.py +195 -0
  493. dpk_code_profiler/ray/local.py +59 -0
  494. dpk_code_profiler/ray/s3.py +56 -0
  495. dpk_code_profiler/ray/transform.py +49 -0
  496. dpk_code_profiler/ruleset/UAST_rules_agda.json +14 -0
  497. dpk_code_profiler/ruleset/UAST_rules_c.json +14 -0
  498. dpk_code_profiler/ruleset/UAST_rules_c_sharp.json +14 -0
  499. dpk_code_profiler/ruleset/UAST_rules_cpp.json +22 -0
  500. dpk_code_profiler/ruleset/UAST_rules_d.json +14 -0
  501. dpk_code_profiler/ruleset/UAST_rules_dart.json +14 -0
  502. dpk_code_profiler/ruleset/UAST_rules_elm.json +18 -0
  503. dpk_code_profiler/ruleset/UAST_rules_go.json +10 -0
  504. dpk_code_profiler/ruleset/UAST_rules_haskell.json +14 -0
  505. dpk_code_profiler/ruleset/UAST_rules_java.json +22 -0
  506. dpk_code_profiler/ruleset/UAST_rules_js.json +10 -0
  507. dpk_code_profiler/ruleset/UAST_rules_kotlin.json +18 -0
  508. dpk_code_profiler/ruleset/UAST_rules_nim.json +26 -0
  509. dpk_code_profiler/ruleset/UAST_rules_objc.json +18 -0
  510. dpk_code_profiler/ruleset/UAST_rules_ocaml.json +10 -0
  511. dpk_code_profiler/ruleset/UAST_rules_perl.json +10 -0
  512. dpk_code_profiler/ruleset/UAST_rules_py.json +26 -0
  513. dpk_code_profiler/ruleset/UAST_rules_qmljs.json +10 -0
  514. dpk_code_profiler/ruleset/UAST_rules_rust.json +14 -0
  515. dpk_code_profiler/ruleset/UAST_rules_scala.json +18 -0
  516. dpk_code_profiler/ruleset/UAST_rules_ts.json +14 -0
  517. dpk_code_profiler/ruleset/UAST_rules_typescript.json +14 -0
  518. dpk_code_profiler/ruleset/UAST_rules_verilog.json +18 -0
  519. dpk_code_profiler/ruleset/UAST_rules_vhdl.json +14 -0
  520. dpk_code_profiler/semantic-ruleset/ikb_model.csv +2002 -0
  521. dpk_code_profiler/semantic-ruleset/null_libs.csv +10105 -0
  522. dpk_code_profiler/semantic-ruleset/offline-ikb-builder/concept_list.csv +14 -0
  523. dpk_code_profiler/semantic-ruleset/offline-ikb-builder/examples/examples-i.csv +27 -0
  524. dpk_code_profiler/semantic-ruleset/offline-ikb-builder/examples/examples-o.csv +27 -0
  525. dpk_code_profiler/semantic-ruleset/offline-ikb-builder/generate_ikb.py +178 -0
  526. dpk_code_profiler/semantic-ruleset/offline-ikb-builder/watsonxai.py +32 -0
  527. dpk_code_profiler/semantic_concepts.py +112 -0
  528. dpk_code_profiler/template.html +107 -0
  529. dpk_code_profiler/tool_utils/aggregate_report.py +57 -0
  530. dpk_code_profiler/tool_utils/aggregated_output_wca_ept_1.json +67757 -0
  531. dpk_code_profiler/tool_utils/report_stats_generation.py +105 -0
  532. dpk_code_profiler/transform.py +371 -0
  533. dpk_code_profiler/transform_python.py +49 -0
  534. dpk_doc_chunk/__init__.py +1 -0
  535. dpk_doc_chunk/chunkers.py +138 -0
  536. dpk_doc_chunk/local.py +34 -0
  537. dpk_doc_chunk/local_python.py +56 -0
  538. dpk_doc_chunk/ray/__init__.py +0 -0
  539. dpk_doc_chunk/ray/local.py +50 -0
  540. dpk_doc_chunk/ray/s3.py +57 -0
  541. dpk_doc_chunk/ray/transform.py +81 -0
  542. dpk_doc_chunk/transform.py +254 -0
  543. dpk_doc_chunk/transform_python.py +69 -0
  544. dpk_doc_id/__init__.py +4 -0
  545. dpk_doc_id/local.py +57 -0
  546. dpk_doc_id/local_python.py +54 -0
  547. dpk_doc_id/ray/__init__.py +0 -0
  548. dpk_doc_id/ray/local.py +59 -0
  549. dpk_doc_id/ray/s3.py +62 -0
  550. dpk_doc_id/ray/transform.py +143 -0
  551. dpk_doc_id/spark/__init__.py +0 -0
  552. dpk_doc_id/spark/local.py +52 -0
  553. dpk_doc_id/spark/transform.py +185 -0
  554. dpk_doc_id/transform.py +178 -0
  555. dpk_doc_id/transform_python.py +143 -0
  556. dpk_doc_quality/__init__.py +4 -0
  557. dpk_doc_quality/cc_net_prepro.py +168 -0
  558. dpk_doc_quality/doc_Gopher_statistics.py +158 -0
  559. dpk_doc_quality/doc_c4_statistics.py +167 -0
  560. dpk_doc_quality/ldnoobw/de +66 -0
  561. dpk_doc_quality/ldnoobw/en +403 -0
  562. dpk_doc_quality/ldnoobw/es +68 -0
  563. dpk_doc_quality/ldnoobw/fr +91 -0
  564. dpk_doc_quality/ldnoobw/ja +180 -0
  565. dpk_doc_quality/ldnoobw/pt +76 -0
  566. dpk_doc_quality/local.py +43 -0
  567. dpk_doc_quality/local_python.py +61 -0
  568. dpk_doc_quality/ray/__init__.py +0 -0
  569. dpk_doc_quality/ray/local.py +59 -0
  570. dpk_doc_quality/ray/s3.py +71 -0
  571. dpk_doc_quality/ray/transform.py +84 -0
  572. dpk_doc_quality/transform.py +241 -0
  573. dpk_doc_quality/transform_python.py +83 -0
  574. dpk_doc_quality/utils.py +67 -0
  575. dpk_ededup/__init__.py +1 -0
  576. dpk_ededup/local.py +46 -0
  577. dpk_ededup/local_python.py +49 -0
  578. dpk_ededup/local_python_incremental.py +53 -0
  579. dpk_ededup/ray/__init__.py +0 -0
  580. dpk_ededup/ray/cluster_estimator.py +59 -0
  581. dpk_ededup/ray/local.py +61 -0
  582. dpk_ededup/ray/local_incremental.py +65 -0
  583. dpk_ededup/ray/s3.py +64 -0
  584. dpk_ededup/ray/transform.py +273 -0
  585. dpk_ededup/transform_base.py +248 -0
  586. dpk_ededup/transform_python.py +171 -0
  587. dpk_extreme_tokenized/__init__.py +1 -0
  588. dpk_extreme_tokenized/common.py +47 -0
  589. dpk_extreme_tokenized/ray/__init__.py +0 -0
  590. dpk_extreme_tokenized/ray/runtime.py +55 -0
  591. dpk_extreme_tokenized/runtime.py +123 -0
  592. dpk_extreme_tokenized/transform.py +125 -0
  593. dpk_fdedup/Murmur_MH.py +112 -0
  594. dpk_fdedup/cluster_analysis/local_python.py +50 -0
  595. dpk_fdedup/cluster_analysis/ray/cluster_estimator.py +99 -0
  596. dpk_fdedup/cluster_analysis/ray/local.py +53 -0
  597. dpk_fdedup/cluster_analysis/ray/transform.py +74 -0
  598. dpk_fdedup/cluster_analysis/spark/local.py +49 -0
  599. dpk_fdedup/cluster_analysis/spark/transform.py +75 -0
  600. dpk_fdedup/cluster_analysis/transform.py +342 -0
  601. dpk_fdedup/cluster_analysis/transform_python.py +76 -0
  602. dpk_fdedup/data_cleaning/local_python.py +60 -0
  603. dpk_fdedup/data_cleaning/ray/local.py +69 -0
  604. dpk_fdedup/data_cleaning/ray/transform.py +138 -0
  605. dpk_fdedup/data_cleaning/spark/local.py +61 -0
  606. dpk_fdedup/data_cleaning/spark/transform.py +124 -0
  607. dpk_fdedup/data_cleaning/transform.py +179 -0
  608. dpk_fdedup/data_cleaning/transform_python.py +103 -0
  609. dpk_fdedup/get_duplicate_list/ray/transform.py +69 -0
  610. dpk_fdedup/get_duplicate_list/transform.py +173 -0
  611. dpk_fdedup/get_duplicate_list/transform_local_python.py +46 -0
  612. dpk_fdedup/get_duplicate_list/transform_python.py +71 -0
  613. dpk_fdedup/ray/transform.py +92 -0
  614. dpk_fdedup/signature_calc/local_python.py +51 -0
  615. dpk_fdedup/signature_calc/ray/local.py +54 -0
  616. dpk_fdedup/signature_calc/ray/transform.py +43 -0
  617. dpk_fdedup/signature_calc/spark/local.py +50 -0
  618. dpk_fdedup/signature_calc/spark/transform.py +42 -0
  619. dpk_fdedup/signature_calc/transform.py +517 -0
  620. dpk_fdedup/signature_calc/transform_python.py +44 -0
  621. dpk_fdedup/spark/transform.py +62 -0
  622. dpk_fdedup/transform_python.py +289 -0
  623. dpk_filter/__init__.py +1 -0
  624. dpk_filter/local.py +58 -0
  625. dpk_filter/local_python.py +60 -0
  626. dpk_filter/ray/__init__.py +0 -0
  627. dpk_filter/ray/local.py +71 -0
  628. dpk_filter/ray/s3.py +74 -0
  629. dpk_filter/ray/transform.py +63 -0
  630. dpk_filter/spark/local.py +60 -0
  631. dpk_filter/spark/transform.py +41 -0
  632. dpk_filter/test_support.py +135 -0
  633. dpk_filter/transform.py +192 -0
  634. dpk_filter/transform_python.py +56 -0
  635. dpk_gneissweb_classification/classification_models.py +63 -0
  636. dpk_gneissweb_classification/local.py +48 -0
  637. dpk_gneissweb_classification/local_python.py +54 -0
  638. dpk_gneissweb_classification/nlp.py +46 -0
  639. dpk_gneissweb_classification/ray/local.py +64 -0
  640. dpk_gneissweb_classification/ray/s3.py +73 -0
  641. dpk_gneissweb_classification/ray/transform.py +75 -0
  642. dpk_gneissweb_classification/transform.py +171 -0
  643. dpk_gneissweb_classification/transform_python.py +66 -0
  644. dpk_hap/__init__.py +4 -0
  645. dpk_hap/local.py +51 -0
  646. dpk_hap/local_python.py +54 -0
  647. dpk_hap/ray/__init__.py +0 -0
  648. dpk_hap/ray/local.py +58 -0
  649. dpk_hap/ray/s3.py +64 -0
  650. dpk_hap/ray/transform.py +40 -0
  651. dpk_hap/transform.py +186 -0
  652. dpk_hap/transform_python.py +65 -0
  653. dpk_html2parquet/__init__.py +4 -0
  654. dpk_html2parquet/local.py +35 -0
  655. dpk_html2parquet/local_python.py +46 -0
  656. dpk_html2parquet/ray/__init__.py +0 -0
  657. dpk_html2parquet/ray/local_ray.py +55 -0
  658. dpk_html2parquet/ray/s3_ray.py +57 -0
  659. dpk_html2parquet/ray/transform.py +60 -0
  660. dpk_html2parquet/transform.py +270 -0
  661. dpk_html2parquet/transform_python.py +66 -0
  662. dpk_lang_id/lang_models.py +52 -0
  663. dpk_lang_id/local.py +49 -0
  664. dpk_lang_id/local_python.py +55 -0
  665. dpk_lang_id/nlp.py +46 -0
  666. dpk_lang_id/ray/local.py +65 -0
  667. dpk_lang_id/ray/s3.py +71 -0
  668. dpk_lang_id/ray/transform.py +73 -0
  669. dpk_lang_id/transform.py +146 -0
  670. dpk_lang_id/transform_python.py +66 -0
  671. dpk_pdf2parquet/.gitignore +39 -0
  672. dpk_pdf2parquet/__init__.py +1 -0
  673. dpk_pdf2parquet/local.py +39 -0
  674. dpk_pdf2parquet/local_python.py +56 -0
  675. dpk_pdf2parquet/ray/.gitignore +39 -0
  676. dpk_pdf2parquet/ray/__init__.py +0 -0
  677. dpk_pdf2parquet/ray/local_ray.py +55 -0
  678. dpk_pdf2parquet/ray/s3_ray.py +60 -0
  679. dpk_pdf2parquet/ray/transform.py +102 -0
  680. dpk_pdf2parquet/transform.py +498 -0
  681. dpk_pdf2parquet/transform_python.py +66 -0
  682. dpk_pii_redactor/__init__.py +1 -0
  683. dpk_pii_redactor/flair_recognizer.py +160 -0
  684. dpk_pii_redactor/local.py +35 -0
  685. dpk_pii_redactor/local_python.py +37 -0
  686. dpk_pii_redactor/pii_analyzer.py +83 -0
  687. dpk_pii_redactor/pii_anonymizer.py +38 -0
  688. dpk_pii_redactor/ray/__init__.py +0 -0
  689. dpk_pii_redactor/ray/local.py +54 -0
  690. dpk_pii_redactor/ray/s3.py +59 -0
  691. dpk_pii_redactor/ray/transform.py +66 -0
  692. dpk_pii_redactor/transform.py +162 -0
  693. dpk_pii_redactor/transform_python.py +56 -0
  694. dpk_profiler/__init__.py +2 -0
  695. dpk_profiler/base_tokenizer.py +36 -0
  696. dpk_profiler/local.py +44 -0
  697. dpk_profiler/local_python.py +45 -0
  698. dpk_profiler/ray/__init__.py +0 -0
  699. dpk_profiler/ray/local.py +52 -0
  700. dpk_profiler/ray/runtime.py +244 -0
  701. dpk_profiler/ray/s3.py +55 -0
  702. dpk_profiler/runtime.py +152 -0
  703. dpk_profiler/spark/__init__.py +0 -0
  704. dpk_profiler/spark/local.py +46 -0
  705. dpk_profiler/spark/runtime.py +108 -0
  706. dpk_profiler/transform_base.py +176 -0
  707. dpk_readability/__init__.py +1 -0
  708. dpk_readability/common.py +85 -0
  709. dpk_readability/ray/__init__.py +0 -0
  710. dpk_readability/ray/runtime.py +57 -0
  711. dpk_readability/runtime.py +173 -0
  712. dpk_readability/transform.py +171 -0
  713. dpk_rep_removal/__init__.py +1 -0
  714. dpk_rep_removal/dedup_Rust_scripts.py +101 -0
  715. dpk_rep_removal/dedup_pq_level.py +203 -0
  716. dpk_rep_removal/gpt2/merges.txt +50001 -0
  717. dpk_rep_removal/gpt2/special_tokens_map.json +23 -0
  718. dpk_rep_removal/gpt2/tokenizer_config.json +33 -0
  719. dpk_rep_removal/gpt2/vocab.json +50259 -0
  720. dpk_rep_removal/make_suffix_array.py +177 -0
  721. dpk_rep_removal/ray/__init__.py +0 -0
  722. dpk_rep_removal/ray/runtime.py +54 -0
  723. dpk_rep_removal/runtime.py +140 -0
  724. dpk_rep_removal/rust/Cargo.toml +19 -0
  725. dpk_rep_removal/rust/src/main.rs +1279 -0
  726. dpk_rep_removal/rust/src/table.rs +940 -0
  727. dpk_rep_removal/rust/target/release/dedup_dataset +0 -0
  728. dpk_rep_removal/rust/target/release/dedup_dataset.d +1 -0
  729. dpk_rep_removal/transform.py +103 -0
  730. dpk_rep_removal/utils.py +316 -0
  731. dpk_resize/__init__.py +2 -0
  732. dpk_resize/local.py +36 -0
  733. dpk_resize/local_python.py +46 -0
  734. dpk_resize/ray/__init__.py +0 -0
  735. dpk_resize/ray/local.py +51 -0
  736. dpk_resize/ray/runtime.py +73 -0
  737. dpk_resize/ray/s3.py +57 -0
  738. dpk_resize/runtime.py +64 -0
  739. dpk_resize/spark/__init__.py +0 -0
  740. dpk_resize/spark/local.py +47 -0
  741. dpk_resize/spark/runtime.py +39 -0
  742. dpk_resize/transform.py +193 -0
  743. dpk_similarity/__init__.py +1 -0
  744. dpk_similarity/data/result_list.json +73 -0
  745. dpk_similarity/local.py +50 -0
  746. dpk_similarity/local_python.py +50 -0
  747. dpk_similarity/ray/__init__.py +0 -0
  748. dpk_similarity/transform.py +356 -0
  749. dpk_similarity/transform_python.py +38 -0
  750. dpk_text_encoder/__init__.py +0 -0
  751. dpk_text_encoder/local.py +44 -0
  752. dpk_text_encoder/local_python.py +44 -0
  753. dpk_text_encoder/ray/__init__.py +0 -0
  754. dpk_text_encoder/ray/local.py +50 -0
  755. dpk_text_encoder/ray/s3.py +56 -0
  756. dpk_text_encoder/ray/transform.py +75 -0
  757. dpk_text_encoder/transform.py +127 -0
  758. dpk_text_encoder/transform_python.py +68 -0
  759. dpk_tokenization/local.py +40 -0
  760. dpk_tokenization/local_long_doc.py +49 -0
  761. dpk_tokenization/ray/local.py +49 -0
  762. dpk_tokenization/ray/s3.py +59 -0
  763. dpk_tokenization/ray/transform.py +62 -0
  764. dpk_tokenization/s3_long_doc.py +52 -0
  765. dpk_tokenization/transform.py +258 -0
  766. dpk_tokenization/transform_python.py +53 -0
  767. dpk_tokenization/utils.py +143 -0
  768. dpk_tokenization2arrow/transform.py +168 -0
  769. dpk_tokenization2arrow/transform_python.py +53 -0
  770. dpk_tokenization2arrow/transform_ray.py +62 -0
  771. dpk_web2parquet/config.py +81 -0
  772. dpk_web2parquet/local.py +26 -0
  773. dpk_web2parquet/local_python.py +49 -0
  774. dpk_web2parquet/python_runtime.py +44 -0
  775. dpk_web2parquet/transform.py +126 -0
  776. dpk_web2parquet/utils.py +38 -0
@@ -0,0 +1,506 @@
1
+ Metadata-Version: 2.2
2
+ Name: data_prep_toolkit_transforms
3
+ Version: 1.0.1.dev1
4
+ Summary: Data Preparation Toolkit Transforms using Ray
5
+ Author-email: Maroun Touma <touma@us.ibm.com>
6
+ License: Apache-2.0
7
+ Keywords: transforms,data preprocessing,data preparation,llm,generative,ai,fine-tuning,llmapps
8
+ Requires-Python: <3.13,>=3.10
9
+ Description-Content-Type: text/markdown
10
+ Requires-Dist: data-prep-toolkit>=0.2.4.dev0
11
+ Provides-Extra: dev
12
+ Requires-Dist: twine; extra == "dev"
13
+ Requires-Dist: pytest>=7.3.2; extra == "dev"
14
+ Requires-Dist: pytest-dotenv>=0.5.2; extra == "dev"
15
+ Requires-Dist: pytest-env>=1.0.0; extra == "dev"
16
+ Requires-Dist: pre-commit>=3.3.2; extra == "dev"
17
+ Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
18
+ Requires-Dist: pytest-mock>=3.10.0; extra == "dev"
19
+ Requires-Dist: moto==5.0.5; extra == "dev"
20
+ Requires-Dist: markupsafe==2.0.1; extra == "dev"
21
+ Provides-Extra: ray
22
+ Requires-Dist: data-prep-toolkit[ray]>=0.2.4.dev0; extra == "ray"
23
+ Requires-Dist: networkx==3.3; extra == "ray"
24
+ Requires-Dist: colorlog==6.8.2; extra == "ray"
25
+ Requires-Dist: func-timeout==4.3.5; extra == "ray"
26
+ Requires-Dist: emerge-viz==2.0.0; extra == "ray"
27
+ Provides-Extra: all
28
+ Requires-Dist: data-prep-toolkit>=0.2.3; extra == "all"
29
+ Requires-Dist: data-prep-toolkit>=0.2.3; extra == "all"
30
+ Requires-Dist: scancode-toolkit==32.1.0; platform_system != "Darwin" and extra == "all"
31
+ Requires-Dist: timeout-timer==0.2.0; extra == "all"
32
+ Requires-Dist: data-prep-toolkit>=0.2.3; extra == "all"
33
+ Requires-Dist: data-prep-toolkit>=0.2.3; extra == "all"
34
+ Requires-Dist: bs4==0.0.2; extra == "all"
35
+ Requires-Dist: transformers>=4.38.2; extra == "all"
36
+ Requires-Dist: data-prep-toolkit>=0.2.3; extra == "all"
37
+ Requires-Dist: parameterized; extra == "all"
38
+ Requires-Dist: pandas; extra == "all"
39
+ Requires-Dist: data-prep-toolkit>=0.2.3; extra == "all"
40
+ Requires-Dist: parameterized>=0.9.0; extra == "all"
41
+ Requires-Dist: pandas>=2.2.2; extra == "all"
42
+ Requires-Dist: aiolimiter==1.1.0; extra == "all"
43
+ Requires-Dist: altair==5.3.0; extra == "all"
44
+ Requires-Dist: annotated-types==0.7.0; extra == "all"
45
+ Requires-Dist: anyio==4.4.0; extra == "all"
46
+ Requires-Dist: appnope==0.1.4; extra == "all"
47
+ Requires-Dist: asttokens==2.4.1; extra == "all"
48
+ Requires-Dist: attrs==23.2.0; extra == "all"
49
+ Requires-Dist: blinker==1.8.2; extra == "all"
50
+ Requires-Dist: cachetools==5.3.3; extra == "all"
51
+ Requires-Dist: certifi==2024.7.4; extra == "all"
52
+ Requires-Dist: charset-normalizer==3.3.2; extra == "all"
53
+ Requires-Dist: click==8.1.7; extra == "all"
54
+ Requires-Dist: comm==0.2.2; extra == "all"
55
+ Requires-Dist: contourpy==1.2.1; extra == "all"
56
+ Requires-Dist: cycler==0.12.1; extra == "all"
57
+ Requires-Dist: debugpy==1.8.1; extra == "all"
58
+ Requires-Dist: decorator==5.1.1; extra == "all"
59
+ Requires-Dist: Deprecated==1.2.14; extra == "all"
60
+ Requires-Dist: executing==2.0.1; extra == "all"
61
+ Requires-Dist: fonttools==4.53.0; extra == "all"
62
+ Requires-Dist: gitdb==4.0.11; extra == "all"
63
+ Requires-Dist: GitPython==3.1.43; extra == "all"
64
+ Requires-Dist: h11==0.14.0; extra == "all"
65
+ Requires-Dist: htbuilder==0.6.2; extra == "all"
66
+ Requires-Dist: httpcore==1.0.5; extra == "all"
67
+ Requires-Dist: httpx==0.27.0; extra == "all"
68
+ Requires-Dist: httpx-sse==0.4.0; extra == "all"
69
+ Requires-Dist: ibm-generative-ai==3.0.0; extra == "all"
70
+ Requires-Dist: idna==3.7; extra == "all"
71
+ Requires-Dist: ipykernel==6.29.4; extra == "all"
72
+ Requires-Dist: ipython==8.25.0; extra == "all"
73
+ Requires-Dist: jedi==0.19.1; extra == "all"
74
+ Requires-Dist: Jinja2==3.1.4; extra == "all"
75
+ Requires-Dist: jsonschema==4.22.0; extra == "all"
76
+ Requires-Dist: jsonschema-specifications==2023.12.1; extra == "all"
77
+ Requires-Dist: jupyter_client==8.6.2; extra == "all"
78
+ Requires-Dist: jupyter_core==5.7.2; extra == "all"
79
+ Requires-Dist: kiwisolver==1.4.5; extra == "all"
80
+ Requires-Dist: markdown-it-py==3.0.0; extra == "all"
81
+ Requires-Dist: MarkupSafe==2.1.5; extra == "all"
82
+ Requires-Dist: matplotlib==3.9.0; extra == "all"
83
+ Requires-Dist: matplotlib-inline==0.1.7; extra == "all"
84
+ Requires-Dist: mdurl==0.1.2; extra == "all"
85
+ Requires-Dist: more-itertools==10.3.0; extra == "all"
86
+ Requires-Dist: nest-asyncio==1.6.0; extra == "all"
87
+ Requires-Dist: networkx==3.3; extra == "all"
88
+ Requires-Dist: numpy==1.26.4; extra == "all"
89
+ Requires-Dist: packaging==24.0; extra == "all"
90
+ Requires-Dist: parso==0.8.4; extra == "all"
91
+ Requires-Dist: pexpect==4.9.0; extra == "all"
92
+ Requires-Dist: pillow>=10.3.0; extra == "all"
93
+ Requires-Dist: platformdirs==4.2.2; extra == "all"
94
+ Requires-Dist: prompt_toolkit==3.0.45; extra == "all"
95
+ Requires-Dist: protobuf==5.27.2; extra == "all"
96
+ Requires-Dist: psutil==5.9.8; extra == "all"
97
+ Requires-Dist: ptyprocess==0.7.0; extra == "all"
98
+ Requires-Dist: pure-eval==0.2.2; extra == "all"
99
+ Requires-Dist: pyarrow==16.1.0; extra == "all"
100
+ Requires-Dist: pydantic>=2.7.4; extra == "all"
101
+ Requires-Dist: pydantic_core>=2.18.4; extra == "all"
102
+ Requires-Dist: pydeck==0.9.1; extra == "all"
103
+ Requires-Dist: Pygments==2.18.0; extra == "all"
104
+ Requires-Dist: pyparsing==3.1.2; extra == "all"
105
+ Requires-Dist: python-dateutil==2.9.0.post0; extra == "all"
106
+ Requires-Dist: pytz==2024.1; extra == "all"
107
+ Requires-Dist: pyzmq==26.0.3; extra == "all"
108
+ Requires-Dist: referencing==0.35.1; extra == "all"
109
+ Requires-Dist: regex==2024.5.15; extra == "all"
110
+ Requires-Dist: requests==2.32.3; extra == "all"
111
+ Requires-Dist: rich==13.7.1; extra == "all"
112
+ Requires-Dist: rpds-py==0.18.1; extra == "all"
113
+ Requires-Dist: seaborn==0.13.2; extra == "all"
114
+ Requires-Dist: six==1.16.0; extra == "all"
115
+ Requires-Dist: smmap==5.0.1; extra == "all"
116
+ Requires-Dist: sniffio==1.3.1; extra == "all"
117
+ Requires-Dist: st-annotated-text==4.0.1; extra == "all"
118
+ Requires-Dist: stack-data==0.6.3; extra == "all"
119
+ Requires-Dist: streamlit==1.37.0; extra == "all"
120
+ Requires-Dist: tenacity==8.4.2; extra == "all"
121
+ Requires-Dist: toml==0.10.2; extra == "all"
122
+ Requires-Dist: toolz==0.12.1; extra == "all"
123
+ Requires-Dist: tornado==6.4.1; extra == "all"
124
+ Requires-Dist: traitlets==5.14.3; extra == "all"
125
+ Requires-Dist: tree-sitter==0.21.3; extra == "all"
126
+ Requires-Dist: tree-sitter-cpp==0.22.1; extra == "all"
127
+ Requires-Dist: tree-sitter-java==0.21.0; extra == "all"
128
+ Requires-Dist: tree-sitter-languages==1.10.2; extra == "all"
129
+ Requires-Dist: tree-sitter-php==0.22.5; extra == "all"
130
+ Requires-Dist: typing_extensions==4.12.2; extra == "all"
131
+ Requires-Dist: tzdata==2024.1; extra == "all"
132
+ Requires-Dist: uuid; extra == "all"
133
+ Requires-Dist: wcwidth==0.2.13; extra == "all"
134
+ Requires-Dist: wrapt==1.16.0; extra == "all"
135
+ Requires-Dist: plotly==5.15.0; extra == "all"
136
+ Requires-Dist: presidio-analyzer>=2.2.355; extra == "all"
137
+ Requires-Dist: presidio-anonymizer>=2.2.355; extra == "all"
138
+ Requires-Dist: flair>=0.14.0; extra == "all"
139
+ Requires-Dist: pandas; extra == "all"
140
+ Requires-Dist: mmh3==4.1.0; extra == "all"
141
+ Requires-Dist: xxhash==3.4.1; extra == "all"
142
+ Requires-Dist: fasttext>=0.9.2; platform_system != "Windows" and extra == "all"
143
+ Requires-Dist: langcodes>=3.3.0; extra == "all"
144
+ Requires-Dist: huggingface-hub<1.0.0,>=0.21.4; extra == "all"
145
+ Requires-Dist: numpy==1.26.4; extra == "all"
146
+ Requires-Dist: docling-core==2.18.0; extra == "all"
147
+ Requires-Dist: docling-ibm-models==3.3.1; extra == "all"
148
+ Requires-Dist: docling-parse==3.3.0; extra == "all"
149
+ Requires-Dist: deepsearch-glm==1.0.0; extra == "all"
150
+ Requires-Dist: docling==2.21.0; extra == "all"
151
+ Requires-Dist: filetype<2.0.0,>=1.2.0; extra == "all"
152
+ Requires-Dist: docling-core==2.18.0; extra == "all"
153
+ Requires-Dist: pydantic>=2.0.0; extra == "all"
154
+ Requires-Dist: llama-index-core<0.12.0,>=0.11.22; extra == "all"
155
+ Requires-Dist: sentence-transformers>=3.0.1; extra == "all"
156
+ Requires-Dist: nltk>=3.9.1; extra == "all"
157
+ Requires-Dist: transformers>=4.38.2; extra == "all"
158
+ Requires-Dist: pandas; extra == "all"
159
+ Requires-Dist: requests; extra == "all"
160
+ Requires-Dist: polars>=1.9.0; extra == "all"
161
+ Requires-Dist: textstat; extra == "all"
162
+ Requires-Dist: pandas; extra == "all"
163
+ Requires-Dist: fasttext>=0.9.3; platform_system != "Windows" and extra == "all"
164
+ Requires-Dist: langcodes>=3.5.0; extra == "all"
165
+ Requires-Dist: huggingface-hub<1.0.0,>=0.21.4; extra == "all"
166
+ Requires-Dist: numpy<1.29.0,>=1.26.4; extra == "all"
167
+ Requires-Dist: duckdb>=0.10.1; extra == "all"
168
+ Requires-Dist: mmh3>=4.1.0; extra == "all"
169
+ Requires-Dist: xxhash==3.4.1; extra == "all"
170
+ Requires-Dist: pyyaml>=6.0.2; extra == "all"
171
+ Requires-Dist: boto3>=1.34.69; extra == "all"
172
+ Requires-Dist: kubernetes>=30.1.0; extra == "all"
173
+ Requires-Dist: polars!=1.10.0,!=1.11.0,!=1.12.0,>=1.9.0; extra == "all"
174
+ Requires-Dist: disjoint-set>=0.8.0; extra == "all"
175
+ Requires-Dist: scipy<2.0.0,>=1.12.1; extra == "all"
176
+ Requires-Dist: numpy<1.29.0; extra == "all"
177
+ Requires-Dist: sentencepiece>=0.2.0; extra == "all"
178
+ Requires-Dist: mmh3>=4.1.0; extra == "all"
179
+ Requires-Dist: nltk==3.9.1; extra == "all"
180
+ Requires-Dist: transformers>=4.38.2; extra == "all"
181
+ Requires-Dist: torch<=2.5.1,>=2.2.2; extra == "all"
182
+ Requires-Dist: pandas; extra == "all"
183
+ Requires-Dist: transformers>=4.38.2; extra == "all"
184
+ Requires-Dist: data_prep_connector>=0.2.3; extra == "all"
185
+ Requires-Dist: nltk>=3.9.1; extra == "all"
186
+ Requires-Dist: requests; extra == "all"
187
+ Requires-Dist: transformers; extra == "all"
188
+ Requires-Dist: pandas; extra == "all"
189
+ Requires-Dist: psutil; extra == "all"
190
+ Requires-Dist: GPUtil; extra == "all"
191
+ Provides-Extra: language
192
+ Requires-Dist: presidio-analyzer>=2.2.355; extra == "language"
193
+ Requires-Dist: presidio-anonymizer>=2.2.355; extra == "language"
194
+ Requires-Dist: flair>=0.14.0; extra == "language"
195
+ Requires-Dist: pandas; extra == "language"
196
+ Requires-Dist: fasttext>=0.9.2; platform_system != "Windows" and extra == "language"
197
+ Requires-Dist: langcodes>=3.3.0; extra == "language"
198
+ Requires-Dist: huggingface-hub<1.0.0,>=0.21.4; extra == "language"
199
+ Requires-Dist: numpy==1.26.4; extra == "language"
200
+ Requires-Dist: docling-core==2.18.0; extra == "language"
201
+ Requires-Dist: docling-ibm-models==3.3.1; extra == "language"
202
+ Requires-Dist: docling-parse==3.3.0; extra == "language"
203
+ Requires-Dist: deepsearch-glm==1.0.0; extra == "language"
204
+ Requires-Dist: docling==2.21.0; extra == "language"
205
+ Requires-Dist: filetype<2.0.0,>=1.2.0; extra == "language"
206
+ Requires-Dist: docling-core==2.18.0; extra == "language"
207
+ Requires-Dist: pydantic>=2.0.0; extra == "language"
208
+ Requires-Dist: llama-index-core<0.12.0,>=0.11.22; extra == "language"
209
+ Requires-Dist: sentence-transformers>=3.0.1; extra == "language"
210
+ Requires-Dist: nltk>=3.9.1; extra == "language"
211
+ Requires-Dist: transformers>=4.38.2; extra == "language"
212
+ Requires-Dist: pandas; extra == "language"
213
+ Requires-Dist: requests; extra == "language"
214
+ Requires-Dist: polars>=1.9.0; extra == "language"
215
+ Requires-Dist: textstat; extra == "language"
216
+ Requires-Dist: pandas; extra == "language"
217
+ Requires-Dist: fasttext>=0.9.3; platform_system != "Windows" and extra == "language"
218
+ Requires-Dist: langcodes>=3.5.0; extra == "language"
219
+ Requires-Dist: huggingface-hub<1.0.0,>=0.21.4; extra == "language"
220
+ Requires-Dist: numpy<1.29.0,>=1.26.4; extra == "language"
221
+ Requires-Dist: duckdb>=0.10.1; extra == "language"
222
+ Requires-Dist: mmh3>=4.1.0; extra == "language"
223
+ Requires-Dist: xxhash==3.4.1; extra == "language"
224
+ Requires-Dist: pyyaml>=6.0.2; extra == "language"
225
+ Requires-Dist: boto3>=1.34.69; extra == "language"
226
+ Requires-Dist: kubernetes>=30.1.0; extra == "language"
227
+ Requires-Dist: polars!=1.10.0,!=1.11.0,!=1.12.0,>=1.9.0; extra == "language"
228
+ Requires-Dist: disjoint-set>=0.8.0; extra == "language"
229
+ Requires-Dist: scipy<2.0.0,>=1.12.1; extra == "language"
230
+ Requires-Dist: numpy<1.29.0; extra == "language"
231
+ Requires-Dist: sentencepiece>=0.2.0; extra == "language"
232
+ Requires-Dist: mmh3>=4.1.0; extra == "language"
233
+ Requires-Dist: nltk==3.9.1; extra == "language"
234
+ Requires-Dist: transformers>=4.38.2; extra == "language"
235
+ Requires-Dist: torch<=2.5.1,>=2.2.2; extra == "language"
236
+ Requires-Dist: pandas; extra == "language"
237
+ Requires-Dist: transformers>=4.38.2; extra == "language"
238
+ Requires-Dist: data_prep_connector>=0.2.3; extra == "language"
239
+ Requires-Dist: mmh3==4.1.0; extra == "language"
240
+ Requires-Dist: xxhash==3.4.1; extra == "language"
241
+ Requires-Dist: nltk>=3.9.1; extra == "language"
242
+ Requires-Dist: requests; extra == "language"
243
+ Requires-Dist: transformers; extra == "language"
244
+ Requires-Dist: pandas; extra == "language"
245
+ Requires-Dist: psutil; extra == "language"
246
+ Requires-Dist: GPUtil; extra == "language"
247
+ Provides-Extra: proglang-select
248
+ Requires-Dist: data-prep-toolkit>=0.2.3; extra == "proglang-select"
249
+ Provides-Extra: header-cleanser
250
+ Requires-Dist: data-prep-toolkit>=0.2.3; extra == "header-cleanser"
251
+ Requires-Dist: scancode-toolkit==32.1.0; platform_system != "Darwin" and extra == "header-cleanser"
252
+ Requires-Dist: timeout-timer==0.2.0; extra == "header-cleanser"
253
+ Provides-Extra: license-select
254
+ Requires-Dist: data-prep-toolkit>=0.2.3; extra == "license-select"
255
+ Provides-Extra: code-quality
256
+ Requires-Dist: data-prep-toolkit>=0.2.3; extra == "code-quality"
257
+ Requires-Dist: bs4==0.0.2; extra == "code-quality"
258
+ Requires-Dist: transformers>=4.38.2; extra == "code-quality"
259
+ Provides-Extra: code2parquet
260
+ Requires-Dist: data-prep-toolkit>=0.2.3; extra == "code2parquet"
261
+ Requires-Dist: parameterized; extra == "code2parquet"
262
+ Requires-Dist: pandas; extra == "code2parquet"
263
+ Provides-Extra: profiler
264
+ Requires-Dist: mmh3==4.1.0; extra == "profiler"
265
+ Requires-Dist: xxhash==3.4.1; extra == "profiler"
266
+ Provides-Extra: resize
267
+ Provides-Extra: doc-chunk
268
+ Requires-Dist: docling-core==2.18.0; extra == "doc-chunk"
269
+ Requires-Dist: pydantic>=2.0.0; extra == "doc-chunk"
270
+ Requires-Dist: llama-index-core<0.12.0,>=0.11.22; extra == "doc-chunk"
271
+ Provides-Extra: doc-quality
272
+ Provides-Extra: html2parquet
273
+ Requires-Dist: trafilatura==1.12.0; extra == "html2parquet"
274
+ Provides-Extra: lang-id
275
+ Requires-Dist: fasttext>=0.9.2; platform_system != "Windows" and extra == "lang-id"
276
+ Requires-Dist: langcodes>=3.3.0; extra == "lang-id"
277
+ Requires-Dist: huggingface-hub<1.0.0,>=0.21.4; extra == "lang-id"
278
+ Requires-Dist: numpy==1.26.4; extra == "lang-id"
279
+ Provides-Extra: pdf2parquet
280
+ Requires-Dist: docling-core==2.18.0; extra == "pdf2parquet"
281
+ Requires-Dist: docling-ibm-models==3.3.1; extra == "pdf2parquet"
282
+ Requires-Dist: docling-parse==3.3.0; extra == "pdf2parquet"
283
+ Requires-Dist: deepsearch-glm==1.0.0; extra == "pdf2parquet"
284
+ Requires-Dist: docling==2.21.0; extra == "pdf2parquet"
285
+ Requires-Dist: filetype<2.0.0,>=1.2.0; extra == "pdf2parquet"
286
+ Provides-Extra: text-encoder
287
+ Requires-Dist: sentence-transformers>=3.0.1; extra == "text-encoder"
288
+ Provides-Extra: pii-redactor
289
+ Requires-Dist: presidio-analyzer>=2.2.355; extra == "pii-redactor"
290
+ Requires-Dist: presidio-anonymizer>=2.2.355; extra == "pii-redactor"
291
+ Requires-Dist: flair>=0.14.0; extra == "pii-redactor"
292
+ Requires-Dist: pandas; extra == "pii-redactor"
293
+ Provides-Extra: filter
294
+ Requires-Dist: duckdb>=0.10.1; extra == "filter"
295
+ Provides-Extra: doc-id
296
+ Provides-Extra: hap
297
+ Requires-Dist: nltk==3.9.1; extra == "hap"
298
+ Requires-Dist: transformers>=4.38.2; extra == "hap"
299
+ Requires-Dist: torch<=2.5.1,>=2.2.2; extra == "hap"
300
+ Requires-Dist: pandas; extra == "hap"
301
+ Provides-Extra: ededup
302
+ Requires-Dist: mmh3>=4.1.0; extra == "ededup"
303
+ Requires-Dist: xxhash==3.4.1; extra == "ededup"
304
+ Provides-Extra: fdedup
305
+ Requires-Dist: pyyaml>=6.0.2; extra == "fdedup"
306
+ Requires-Dist: boto3>=1.34.69; extra == "fdedup"
307
+ Requires-Dist: kubernetes>=30.1.0; extra == "fdedup"
308
+ Requires-Dist: polars!=1.10.0,!=1.11.0,!=1.12.0,>=1.9.0; extra == "fdedup"
309
+ Requires-Dist: disjoint-set>=0.8.0; extra == "fdedup"
310
+ Requires-Dist: scipy<2.0.0,>=1.12.1; extra == "fdedup"
311
+ Requires-Dist: numpy<1.29.0; extra == "fdedup"
312
+ Requires-Dist: sentencepiece>=0.2.0; extra == "fdedup"
313
+ Requires-Dist: mmh3>=4.1.0; extra == "fdedup"
314
+ Provides-Extra: tokenization
315
+ Requires-Dist: transformers>=4.38.2; extra == "tokenization"
316
+ Provides-Extra: web2parquet
317
+ Requires-Dist: data_prep_connector>=0.2.3; extra == "web2parquet"
318
+ Provides-Extra: similarity
319
+ Requires-Dist: nltk>=3.9.1; extra == "similarity"
320
+ Requires-Dist: transformers>=4.38.2; extra == "similarity"
321
+ Requires-Dist: pandas; extra == "similarity"
322
+ Requires-Dist: requests; extra == "similarity"
323
+ Provides-Extra: extreme-tokenized
324
+ Requires-Dist: polars>=1.9.0; extra == "extreme-tokenized"
325
+ Provides-Extra: readability
326
+ Requires-Dist: textstat; extra == "readability"
327
+ Requires-Dist: pandas; extra == "readability"
328
+ Provides-Extra: code-profiler
329
+ Requires-Dist: data-prep-toolkit>=0.2.3; extra == "code-profiler"
330
+ Requires-Dist: parameterized>=0.9.0; extra == "code-profiler"
331
+ Requires-Dist: pandas>=2.2.2; extra == "code-profiler"
332
+ Requires-Dist: aiolimiter==1.1.0; extra == "code-profiler"
333
+ Requires-Dist: altair==5.3.0; extra == "code-profiler"
334
+ Requires-Dist: annotated-types==0.7.0; extra == "code-profiler"
335
+ Requires-Dist: anyio==4.4.0; extra == "code-profiler"
336
+ Requires-Dist: appnope==0.1.4; extra == "code-profiler"
337
+ Requires-Dist: asttokens==2.4.1; extra == "code-profiler"
338
+ Requires-Dist: attrs==23.2.0; extra == "code-profiler"
339
+ Requires-Dist: blinker==1.8.2; extra == "code-profiler"
340
+ Requires-Dist: cachetools==5.3.3; extra == "code-profiler"
341
+ Requires-Dist: certifi==2024.7.4; extra == "code-profiler"
342
+ Requires-Dist: charset-normalizer==3.3.2; extra == "code-profiler"
343
+ Requires-Dist: click==8.1.7; extra == "code-profiler"
344
+ Requires-Dist: comm==0.2.2; extra == "code-profiler"
345
+ Requires-Dist: contourpy==1.2.1; extra == "code-profiler"
346
+ Requires-Dist: cycler==0.12.1; extra == "code-profiler"
347
+ Requires-Dist: debugpy==1.8.1; extra == "code-profiler"
348
+ Requires-Dist: decorator==5.1.1; extra == "code-profiler"
349
+ Requires-Dist: Deprecated==1.2.14; extra == "code-profiler"
350
+ Requires-Dist: executing==2.0.1; extra == "code-profiler"
351
+ Requires-Dist: fonttools==4.53.0; extra == "code-profiler"
352
+ Requires-Dist: gitdb==4.0.11; extra == "code-profiler"
353
+ Requires-Dist: GitPython==3.1.43; extra == "code-profiler"
354
+ Requires-Dist: h11==0.14.0; extra == "code-profiler"
355
+ Requires-Dist: htbuilder==0.6.2; extra == "code-profiler"
356
+ Requires-Dist: httpcore==1.0.5; extra == "code-profiler"
357
+ Requires-Dist: httpx==0.27.0; extra == "code-profiler"
358
+ Requires-Dist: httpx-sse==0.4.0; extra == "code-profiler"
359
+ Requires-Dist: ibm-generative-ai==3.0.0; extra == "code-profiler"
360
+ Requires-Dist: idna==3.7; extra == "code-profiler"
361
+ Requires-Dist: ipykernel==6.29.4; extra == "code-profiler"
362
+ Requires-Dist: ipython==8.25.0; extra == "code-profiler"
363
+ Requires-Dist: jedi==0.19.1; extra == "code-profiler"
364
+ Requires-Dist: Jinja2==3.1.4; extra == "code-profiler"
365
+ Requires-Dist: jsonschema==4.22.0; extra == "code-profiler"
366
+ Requires-Dist: jsonschema-specifications==2023.12.1; extra == "code-profiler"
367
+ Requires-Dist: jupyter_client==8.6.2; extra == "code-profiler"
368
+ Requires-Dist: jupyter_core==5.7.2; extra == "code-profiler"
369
+ Requires-Dist: kiwisolver==1.4.5; extra == "code-profiler"
370
+ Requires-Dist: markdown-it-py==3.0.0; extra == "code-profiler"
371
+ Requires-Dist: MarkupSafe==2.1.5; extra == "code-profiler"
372
+ Requires-Dist: matplotlib==3.9.0; extra == "code-profiler"
373
+ Requires-Dist: matplotlib-inline==0.1.7; extra == "code-profiler"
374
+ Requires-Dist: mdurl==0.1.2; extra == "code-profiler"
375
+ Requires-Dist: more-itertools==10.3.0; extra == "code-profiler"
376
+ Requires-Dist: nest-asyncio==1.6.0; extra == "code-profiler"
377
+ Requires-Dist: networkx==3.3; extra == "code-profiler"
378
+ Requires-Dist: numpy==1.26.4; extra == "code-profiler"
379
+ Requires-Dist: packaging==24.0; extra == "code-profiler"
380
+ Requires-Dist: parso==0.8.4; extra == "code-profiler"
381
+ Requires-Dist: pexpect==4.9.0; extra == "code-profiler"
382
+ Requires-Dist: pillow>=10.3.0; extra == "code-profiler"
383
+ Requires-Dist: platformdirs==4.2.2; extra == "code-profiler"
384
+ Requires-Dist: prompt_toolkit==3.0.45; extra == "code-profiler"
385
+ Requires-Dist: protobuf==5.27.2; extra == "code-profiler"
386
+ Requires-Dist: psutil==5.9.8; extra == "code-profiler"
387
+ Requires-Dist: ptyprocess==0.7.0; extra == "code-profiler"
388
+ Requires-Dist: pure-eval==0.2.2; extra == "code-profiler"
389
+ Requires-Dist: pyarrow==16.1.0; extra == "code-profiler"
390
+ Requires-Dist: pydantic>=2.7.4; extra == "code-profiler"
391
+ Requires-Dist: pydantic_core>=2.18.4; extra == "code-profiler"
392
+ Requires-Dist: pydeck==0.9.1; extra == "code-profiler"
393
+ Requires-Dist: Pygments==2.18.0; extra == "code-profiler"
394
+ Requires-Dist: pyparsing==3.1.2; extra == "code-profiler"
395
+ Requires-Dist: python-dateutil==2.9.0.post0; extra == "code-profiler"
396
+ Requires-Dist: pytz==2024.1; extra == "code-profiler"
397
+ Requires-Dist: pyzmq==26.0.3; extra == "code-profiler"
398
+ Requires-Dist: referencing==0.35.1; extra == "code-profiler"
399
+ Requires-Dist: regex==2024.5.15; extra == "code-profiler"
400
+ Requires-Dist: requests==2.32.3; extra == "code-profiler"
401
+ Requires-Dist: rich==13.7.1; extra == "code-profiler"
402
+ Requires-Dist: rpds-py==0.18.1; extra == "code-profiler"
403
+ Requires-Dist: seaborn==0.13.2; extra == "code-profiler"
404
+ Requires-Dist: six==1.16.0; extra == "code-profiler"
405
+ Requires-Dist: smmap==5.0.1; extra == "code-profiler"
406
+ Requires-Dist: sniffio==1.3.1; extra == "code-profiler"
407
+ Requires-Dist: st-annotated-text==4.0.1; extra == "code-profiler"
408
+ Requires-Dist: stack-data==0.6.3; extra == "code-profiler"
409
+ Requires-Dist: streamlit==1.37.0; extra == "code-profiler"
410
+ Requires-Dist: tenacity==8.4.2; extra == "code-profiler"
411
+ Requires-Dist: toml==0.10.2; extra == "code-profiler"
412
+ Requires-Dist: toolz==0.12.1; extra == "code-profiler"
413
+ Requires-Dist: tornado==6.4.1; extra == "code-profiler"
414
+ Requires-Dist: traitlets==5.14.3; extra == "code-profiler"
415
+ Requires-Dist: tree-sitter==0.21.3; extra == "code-profiler"
416
+ Requires-Dist: tree-sitter-cpp==0.22.1; extra == "code-profiler"
417
+ Requires-Dist: tree-sitter-java==0.21.0; extra == "code-profiler"
418
+ Requires-Dist: tree-sitter-languages==1.10.2; extra == "code-profiler"
419
+ Requires-Dist: tree-sitter-php==0.22.5; extra == "code-profiler"
420
+ Requires-Dist: typing_extensions==4.12.2; extra == "code-profiler"
421
+ Requires-Dist: tzdata==2024.1; extra == "code-profiler"
422
+ Requires-Dist: uuid; extra == "code-profiler"
423
+ Requires-Dist: wcwidth==0.2.13; extra == "code-profiler"
424
+ Requires-Dist: wrapt==1.16.0; extra == "code-profiler"
425
+ Requires-Dist: plotly==5.15.0; extra == "code-profiler"
426
+ Provides-Extra: gneissweb-classification
427
+ Requires-Dist: fasttext>=0.9.3; platform_system != "Windows" and extra == "gneissweb-classification"
428
+ Requires-Dist: langcodes>=3.5.0; extra == "gneissweb-classification"
429
+ Requires-Dist: huggingface-hub<1.0.0,>=0.21.4; extra == "gneissweb-classification"
430
+ Requires-Dist: numpy<1.29.0,>=1.26.4; extra == "gneissweb-classification"
431
+ Provides-Extra: rep-removal
432
+ Requires-Dist: nltk>=3.9.1; extra == "rep-removal"
433
+ Requires-Dist: requests; extra == "rep-removal"
434
+ Requires-Dist: transformers; extra == "rep-removal"
435
+ Requires-Dist: pandas; extra == "rep-removal"
436
+ Requires-Dist: psutil; extra == "rep-removal"
437
+ Requires-Dist: GPUtil; extra == "rep-removal"
438
+
439
+ # DPK Python Transforms
440
+
441
+ ## installation
442
+
443
+ The [transforms](https://github.com/IBM/data-prep-kit/blob/dev/transforms/README.md) are delivered as a standard pyton library available on pypi and can be installed using pip install:
444
+
445
+ `python -m pip install data-prep-toolkit-transforms[all]`
446
+ or
447
+ `python -m pip install data-prep-toolkit-transforms[ray, all]`
448
+ or
449
+ `python -m pip install data-prep-toolkit-transforms[language]`
450
+
451
+
452
+ installing the python transforms will also install `data-prep-toolkit`
453
+
454
+ installing the ray transforms will also install `data-prep-toolkit[ray]`
455
+
456
+ ## List of Transforms in current package
457
+
458
+ Note: This list includes the transforms that were part of the release starting with data-prep-toolkit-transforms:0.2.1. This list may not always reflect up to date information. Users are encourage to raise an issue in git when they discover missing components or packages that are listed below but not in the current release they get from pypi.
459
+
460
+ * code
461
+ * [code2parquet](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/code2parquet/python/README.md)
462
+ * [header_cleanser (Not available on MacOS)](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/header_cleanser/python/README.md)
463
+ * [code_quality](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/code_quality/python/README.md)
464
+ * [proglang_select](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/proglang_select/python/README.md)
465
+ * [code_profiler](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/code_profiler/README.md)
466
+ * language
467
+ * [doc_chunk](https://github.com/IBM/data-prep-kit/blob/dev/transforms/language/doc_chunk/README.md)
468
+ * [doc_quality](https://github.com/IBM/data-prep-kit/blob/dev/transforms/language/doc_quality/README.md)
469
+ * [lang_id](https://github.com/IBM/data-prep-kit/blob/dev/transforms/language/lang_id/README.md)
470
+ * [pdf2parquet](https://github.com/IBM/data-prep-kit/blob/dev/transforms/language/pdf2parquet/README.md)
471
+ * [text_encoder](https://github.com/IBM/data-prep-kit/blob/dev/transforms/language/text_encoder/README.md)
472
+ * [pii_redactor](https://github.com/IBM/data-prep-kit/blob/dev/transforms/language/pii_redactor/python/README.md)
473
+ * universal
474
+ * [ededup](https://github.com/IBM/data-prep-kit/blob/dev/transforms/universal/ededup/README.md)
475
+ * [fdedup](https://github.com/IBM/data-prep-kit/blob/dev/transforms/universal/fdedup/README.md)
476
+ * [filter](https://github.com/IBM/data-prep-kit/blob/dev/transforms/universal/filter/python/README.md)
477
+ * [resize](https://github.com/IBM/data-prep-kit/blob/dev/transforms/universal/resize/python/README.md)
478
+ * [tokenization](https://github.com/IBM/data-prep-kit/blob/dev/transforms/universal/tokenization/README.md)
479
+ * [doc_id](https://github.com/IBM/data-prep-kit/blob/dev/transforms/universal/doc_id/README.md)
480
+ * [web2parquet](https://github.com/IBM/data-prep-kit/blob/dev/transforms/universal/web2parquet/README.md)
481
+
482
+ ## Release notes:
483
+
484
+ ### 1.0.1.dev1
485
+ Added Gneissweb transforms
486
+ fdedup fix for windows
487
+ ### 1.0.1.dev0
488
+ PR #979 (code_profiler)
489
+ ### 1.0.0.a6
490
+ Added Profiler
491
+ Added Resize
492
+ ### 1.0.0.a5
493
+ Added Pii Redactor
494
+ Relax fasttext requirement >= 0.9.2
495
+ ### 1.0.0.a4
496
+ Added missing ray implementation for lang_id, doc_quality, tokenization and filter
497
+ Added ray notebooks for lang id, Doc Quality, tokenization, and Filter
498
+ ### 1.0.0.a3
499
+ Added code_profiler
500
+ ### 1.0.0.a2
501
+ Relax dependencies on pandas (use latest or whatever is installed by application)
502
+ Relax dependencies on requests (use latest or whatever is installed by application)
503
+
504
+
505
+
506
+