data-prep-toolkit-transforms 1.0.1.dev1__9-py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data_prep_toolkit_transforms-1.0.1.dev1.dist-info/METADATA +506 -0
- data_prep_toolkit_transforms-1.0.1.dev1.dist-info/RECORD +776 -0
- data_prep_toolkit_transforms-1.0.1.dev1.dist-info/WHEEL +5 -0
- data_prep_toolkit_transforms-1.0.1.dev1.dist-info/top_level.txt +23 -0
- dpk_code_profiler/UAST.py +324 -0
- dpk_code_profiler/UAST_parser.py +315 -0
- dpk_code_profiler/__init__.py +0 -0
- dpk_code_profiler/data/Concept_dataset/uast_comment/agda/0.txt +5 -0
- dpk_code_profiler/data/Concept_dataset/uast_comment/c/0.txt +4 -0
- dpk_code_profiler/data/Concept_dataset/uast_comment/c_sharp/0.txt +4 -0
- dpk_code_profiler/data/Concept_dataset/uast_comment/cpp/0.txt +12 -0
- dpk_code_profiler/data/Concept_dataset/uast_comment/d/0.txt +7 -0
- dpk_code_profiler/data/Concept_dataset/uast_comment/dart/0.txt +1 -0
- dpk_code_profiler/data/Concept_dataset/uast_comment/dart/1.txt +3 -0
- dpk_code_profiler/data/Concept_dataset/uast_comment/elm/0.txt +1 -0
- dpk_code_profiler/data/Concept_dataset/uast_comment/elm/1.txt +1 -0
- dpk_code_profiler/data/Concept_dataset/uast_comment/go/0.txt +3 -0
- dpk_code_profiler/data/Concept_dataset/uast_comment/haskell/0.txt +1 -0
- dpk_code_profiler/data/Concept_dataset/uast_comment/java/0.txt +7 -0
- dpk_code_profiler/data/Concept_dataset/uast_comment/java/1.txt +12 -0
- dpk_code_profiler/data/Concept_dataset/uast_comment/js/0.txt +7 -0
- dpk_code_profiler/data/Concept_dataset/uast_comment/kotlin/0.txt +1 -0
- dpk_code_profiler/data/Concept_dataset/uast_comment/kotlin/1.txt +2 -0
- dpk_code_profiler/data/Concept_dataset/uast_comment/nim/0.txt +1 -0
- dpk_code_profiler/data/Concept_dataset/uast_comment/nim/1.txt +3 -0
- dpk_code_profiler/data/Concept_dataset/uast_comment/objc/0.txt +6 -0
- dpk_code_profiler/data/Concept_dataset/uast_comment/ocaml/0.txt +2 -0
- dpk_code_profiler/data/Concept_dataset/uast_comment/py/0.txt +2 -0
- dpk_code_profiler/data/Concept_dataset/uast_comment/qmljs/0.txt +6 -0
- dpk_code_profiler/data/Concept_dataset/uast_comment/rust/0.txt +1 -0
- dpk_code_profiler/data/Concept_dataset/uast_comment/scala/0.txt +1 -0
- dpk_code_profiler/data/Concept_dataset/uast_comment/scala/1.txt +1 -0
- dpk_code_profiler/data/Concept_dataset/uast_comment/ts/0.txt +2 -0
- dpk_code_profiler/data/Concept_dataset/uast_comment/verilog/0.txt +4 -0
- dpk_code_profiler/data/Concept_dataset/uast_comment/vhdl/0.txt +1 -0
- dpk_code_profiler/data/Concept_dataset/uast_function/agda/0.txt +2 -0
- dpk_code_profiler/data/Concept_dataset/uast_function/c/0.txt +6 -0
- dpk_code_profiler/data/Concept_dataset/uast_function/c_sharp/0.txt +4 -0
- dpk_code_profiler/data/Concept_dataset/uast_function/cpp/0.txt +10 -0
- dpk_code_profiler/data/Concept_dataset/uast_function/cpp/1.txt +46 -0
- dpk_code_profiler/data/Concept_dataset/uast_function/cpp/2.txt +75 -0
- dpk_code_profiler/data/Concept_dataset/uast_function/cpp/3.txt +99 -0
- dpk_code_profiler/data/Concept_dataset/uast_function/d/0.txt +8 -0
- dpk_code_profiler/data/Concept_dataset/uast_function/dart/0.txt +9 -0
- dpk_code_profiler/data/Concept_dataset/uast_function/elm/0.txt +1 -0
- dpk_code_profiler/data/Concept_dataset/uast_function/go/0.txt +20 -0
- dpk_code_profiler/data/Concept_dataset/uast_function/haskell/0.txt +1 -0
- dpk_code_profiler/data/Concept_dataset/uast_function/java/0.txt +8 -0
- dpk_code_profiler/data/Concept_dataset/uast_function/js/0.txt +10 -0
- dpk_code_profiler/data/Concept_dataset/uast_function/kotlin/0.txt +3 -0
- dpk_code_profiler/data/Concept_dataset/uast_function/nim/0.txt +3 -0
- dpk_code_profiler/data/Concept_dataset/uast_function/objc/0.txt +14 -0
- dpk_code_profiler/data/Concept_dataset/uast_function/perl/0.txt +3 -0
- dpk_code_profiler/data/Concept_dataset/uast_function/py/0.txt +5 -0
- dpk_code_profiler/data/Concept_dataset/uast_function/rust/0.txt +5 -0
- dpk_code_profiler/data/Concept_dataset/uast_function/scala/0.txt +6 -0
- dpk_code_profiler/data/Concept_dataset/uast_function/scala/1.txt +14 -0
- dpk_code_profiler/data/Concept_dataset/uast_function/ts/0.txt +14 -0
- dpk_code_profiler/data/Concept_dataset/uast_function/verilog/0.txt +5 -0
- dpk_code_profiler/data/Concept_dataset/uast_function/vhdl/0.txt +22 -0
- dpk_code_profiler/data/Concept_dataset/uast_package/agda/0.txt +2 -0
- dpk_code_profiler/data/Concept_dataset/uast_package/c/0.txt +3 -0
- dpk_code_profiler/data/Concept_dataset/uast_package/c_sharp/0.txt +2 -0
- dpk_code_profiler/data/Concept_dataset/uast_package/c_sharp/1.txt +2 -0
- dpk_code_profiler/data/Concept_dataset/uast_package/cpp/0.txt +8 -0
- dpk_code_profiler/data/Concept_dataset/uast_package/cpp/1.txt +4 -0
- dpk_code_profiler/data/Concept_dataset/uast_package/cpp/2.txt +25 -0
- dpk_code_profiler/data/Concept_dataset/uast_package/cpp/3.txt +25 -0
- dpk_code_profiler/data/Concept_dataset/uast_package/cpp/4.txt +29 -0
- dpk_code_profiler/data/Concept_dataset/uast_package/cpp/5.txt +29 -0
- dpk_code_profiler/data/Concept_dataset/uast_package/cpp/6.txt +29 -0
- dpk_code_profiler/data/Concept_dataset/uast_package/d/0.txt +2 -0
- dpk_code_profiler/data/Concept_dataset/uast_package/dart/0.txt +5 -0
- dpk_code_profiler/data/Concept_dataset/uast_package/elm/0.txt +5 -0
- dpk_code_profiler/data/Concept_dataset/uast_package/go/0.txt +26 -0
- dpk_code_profiler/data/Concept_dataset/uast_package/haskell/0.txt +11 -0
- dpk_code_profiler/data/Concept_dataset/uast_package/java/0.txt +10 -0
- dpk_code_profiler/data/Concept_dataset/uast_package/js/0.txt +4 -0
- dpk_code_profiler/data/Concept_dataset/uast_package/kotlin/0.txt +3 -0
- dpk_code_profiler/data/Concept_dataset/uast_package/nim/0.txt +4 -0
- dpk_code_profiler/data/Concept_dataset/uast_package/nim/1.txt +1 -0
- dpk_code_profiler/data/Concept_dataset/uast_package/nim/2.txt +1 -0
- dpk_code_profiler/data/Concept_dataset/uast_package/objc/0.txt +2 -0
- dpk_code_profiler/data/Concept_dataset/uast_package/objc/1.txt +4 -0
- dpk_code_profiler/data/Concept_dataset/uast_package/ocaml/0.txt +1 -0
- dpk_code_profiler/data/Concept_dataset/uast_package/perl/0.txt +6 -0
- dpk_code_profiler/data/Concept_dataset/uast_package/py/0.txt +5 -0
- dpk_code_profiler/data/Concept_dataset/uast_package/py/1.txt +2 -0
- dpk_code_profiler/data/Concept_dataset/uast_package/qmljs/0.txt +2 -0
- dpk_code_profiler/data/Concept_dataset/uast_package/rust/0.txt +2 -0
- dpk_code_profiler/data/Concept_dataset/uast_package/scala/0.txt +6 -0
- dpk_code_profiler/data/Concept_dataset/uast_package/scala/1.txt +30 -0
- dpk_code_profiler/data/Concept_dataset/uast_package/ts/0.txt +14 -0
- dpk_code_profiler/data/Concept_dataset/uast_package/verilog/0.txt +1 -0
- dpk_code_profiler/data/Concept_dataset/uast_package/verilog/1.txt +2 -0
- dpk_code_profiler/data/Concept_dataset/uast_package/vhdl/0.txt +2 -0
- dpk_code_profiler/data/few_shot_outputs/uast_comment/agda/0.txt +14 -0
- dpk_code_profiler/data/few_shot_outputs/uast_comment/c/0.txt +14 -0
- dpk_code_profiler/data/few_shot_outputs/uast_comment/c_sharp/0.txt +14 -0
- dpk_code_profiler/data/few_shot_outputs/uast_comment/cpp/0.txt +11 -0
- dpk_code_profiler/data/few_shot_outputs/uast_comment/d/0.txt +12 -0
- dpk_code_profiler/data/few_shot_outputs/uast_comment/dart/0.txt +11 -0
- dpk_code_profiler/data/few_shot_outputs/uast_comment/dart/1.txt +14 -0
- dpk_code_profiler/data/few_shot_outputs/uast_comment/elm/0.txt +10 -0
- dpk_code_profiler/data/few_shot_outputs/uast_comment/elm/1.txt +8 -0
- dpk_code_profiler/data/few_shot_outputs/uast_comment/go/0.txt +14 -0
- dpk_code_profiler/data/few_shot_outputs/uast_comment/haskell/0.txt +10 -0
- dpk_code_profiler/data/few_shot_outputs/uast_comment/java/0.txt +7 -0
- dpk_code_profiler/data/few_shot_outputs/uast_comment/java/1.txt +7 -0
- dpk_code_profiler/data/few_shot_outputs/uast_comment/js/0.txt +12 -0
- dpk_code_profiler/data/few_shot_outputs/uast_comment/kotlin/0.txt +10 -0
- dpk_code_profiler/data/few_shot_outputs/uast_comment/kotlin/1.txt +10 -0
- dpk_code_profiler/data/few_shot_outputs/uast_comment/nim/0.txt +10 -0
- dpk_code_profiler/data/few_shot_outputs/uast_comment/nim/1.txt +10 -0
- dpk_code_profiler/data/few_shot_outputs/uast_comment/objc/0.txt +14 -0
- dpk_code_profiler/data/few_shot_outputs/uast_comment/ocaml/0.txt +10 -0
- dpk_code_profiler/data/few_shot_outputs/uast_comment/py/0.txt +7 -0
- dpk_code_profiler/data/few_shot_outputs/uast_comment/qmljs/0.txt +14 -0
- dpk_code_profiler/data/few_shot_outputs/uast_comment/rust/0.txt +10 -0
- dpk_code_profiler/data/few_shot_outputs/uast_comment/scala/0.txt +10 -0
- dpk_code_profiler/data/few_shot_outputs/uast_comment/scala/1.txt +10 -0
- dpk_code_profiler/data/few_shot_outputs/uast_comment/ts/0.txt +14 -0
- dpk_code_profiler/data/few_shot_outputs/uast_comment/verilog/0.txt +14 -0
- dpk_code_profiler/data/few_shot_outputs/uast_comment/vhdl/0.txt +10 -0
- dpk_code_profiler/data/few_shot_outputs/uast_function/agda/0.txt +12 -0
- dpk_code_profiler/data/few_shot_outputs/uast_function/c/0.txt +12 -0
- dpk_code_profiler/data/few_shot_outputs/uast_function/c_sharp/0.txt +12 -0
- dpk_code_profiler/data/few_shot_outputs/uast_function/cpp/0.txt +9 -0
- dpk_code_profiler/data/few_shot_outputs/uast_function/cpp/1.txt +12 -0
- dpk_code_profiler/data/few_shot_outputs/uast_function/cpp/2.txt +12 -0
- dpk_code_profiler/data/few_shot_outputs/uast_function/cpp/3.txt +12 -0
- dpk_code_profiler/data/few_shot_outputs/uast_function/d/0.txt +12 -0
- dpk_code_profiler/data/few_shot_outputs/uast_function/dart/0.txt +12 -0
- dpk_code_profiler/data/few_shot_outputs/uast_function/elm/0.txt +12 -0
- dpk_code_profiler/data/few_shot_outputs/uast_function/elm/1.txt +12 -0
- dpk_code_profiler/data/few_shot_outputs/uast_function/go/0.txt +9 -0
- dpk_code_profiler/data/few_shot_outputs/uast_function/haskell/0.txt +12 -0
- dpk_code_profiler/data/few_shot_outputs/uast_function/java/0.txt +9 -0
- dpk_code_profiler/data/few_shot_outputs/uast_function/js/0.txt +9 -0
- dpk_code_profiler/data/few_shot_outputs/uast_function/kotlin/0.txt +12 -0
- dpk_code_profiler/data/few_shot_outputs/uast_function/nim/0.txt +12 -0
- dpk_code_profiler/data/few_shot_outputs/uast_function/objc/0.txt +12 -0
- dpk_code_profiler/data/few_shot_outputs/uast_function/perl/0.txt +12 -0
- dpk_code_profiler/data/few_shot_outputs/uast_function/py/0.txt +9 -0
- dpk_code_profiler/data/few_shot_outputs/uast_function/rust/0.txt +12 -0
- dpk_code_profiler/data/few_shot_outputs/uast_function/scala/0.txt +12 -0
- dpk_code_profiler/data/few_shot_outputs/uast_function/scala/1.txt +14 -0
- dpk_code_profiler/data/few_shot_outputs/uast_function/ts/0.txt +11 -0
- dpk_code_profiler/data/few_shot_outputs/uast_function/verilog/0.txt +10 -0
- dpk_code_profiler/data/few_shot_outputs/uast_function/vhdl/0.txt +12 -0
- dpk_code_profiler/data/few_shot_outputs/uast_package/agda/0.txt +16 -0
- dpk_code_profiler/data/few_shot_outputs/uast_package/c/0.txt +26 -0
- dpk_code_profiler/data/few_shot_outputs/uast_package/c_sharp/0.txt +18 -0
- dpk_code_profiler/data/few_shot_outputs/uast_package/c_sharp/1.txt +20 -0
- dpk_code_profiler/data/few_shot_outputs/uast_package/cpp/0.txt +10 -0
- dpk_code_profiler/data/few_shot_outputs/uast_package/cpp/1.txt +25 -0
- dpk_code_profiler/data/few_shot_outputs/uast_package/cpp/2.txt +25 -0
- dpk_code_profiler/data/few_shot_outputs/uast_package/cpp/3.txt +25 -0
- dpk_code_profiler/data/few_shot_outputs/uast_package/cpp/4.txt +28 -0
- dpk_code_profiler/data/few_shot_outputs/uast_package/cpp/5.txt +28 -0
- dpk_code_profiler/data/few_shot_outputs/uast_package/cpp/6.txt +28 -0
- dpk_code_profiler/data/few_shot_outputs/uast_package/d/0.txt +17 -0
- dpk_code_profiler/data/few_shot_outputs/uast_package/dart/0.txt +15 -0
- dpk_code_profiler/data/few_shot_outputs/uast_package/elm/0.txt +20 -0
- dpk_code_profiler/data/few_shot_outputs/uast_package/go/0.txt +23 -0
- dpk_code_profiler/data/few_shot_outputs/uast_package/haskell/0.txt +28 -0
- dpk_code_profiler/data/few_shot_outputs/uast_package/java/0.txt +13 -0
- dpk_code_profiler/data/few_shot_outputs/uast_package/js/0.txt +10 -0
- dpk_code_profiler/data/few_shot_outputs/uast_package/kotlin/0.txt +20 -0
- dpk_code_profiler/data/few_shot_outputs/uast_package/nim/0.txt +45 -0
- dpk_code_profiler/data/few_shot_outputs/uast_package/nim/1.txt +12 -0
- dpk_code_profiler/data/few_shot_outputs/uast_package/nim/2.txt +14 -0
- dpk_code_profiler/data/few_shot_outputs/uast_package/objc/0.txt +15 -0
- dpk_code_profiler/data/few_shot_outputs/uast_package/objc/1.txt +19 -0
- dpk_code_profiler/data/few_shot_outputs/uast_package/ocaml/0.txt +12 -0
- dpk_code_profiler/data/few_shot_outputs/uast_package/perl/0.txt +24 -0
- dpk_code_profiler/data/few_shot_outputs/uast_package/py/0.txt +38 -0
- dpk_code_profiler/data/few_shot_outputs/uast_package/py/1.txt +11 -0
- dpk_code_profiler/data/few_shot_outputs/uast_package/qmljs/0.txt +14 -0
- dpk_code_profiler/data/few_shot_outputs/uast_package/rust/0.txt +18 -0
- dpk_code_profiler/data/few_shot_outputs/uast_package/scala/0.txt +45 -0
- dpk_code_profiler/data/few_shot_outputs/uast_package/scala/1.txt +24 -0
- dpk_code_profiler/data/few_shot_outputs/uast_package/ts/0.txt +27 -0
- dpk_code_profiler/data/few_shot_outputs/uast_package/verilog/0.txt +12 -0
- dpk_code_profiler/data/few_shot_outputs/uast_package/verilog/1.txt +19 -0
- dpk_code_profiler/data/few_shot_outputs/uast_package/vhdl/0.txt +14 -0
- dpk_code_profiler/data/final_UI_outputs/comment/agda/0.txt +14 -0
- dpk_code_profiler/data/final_UI_outputs/comment/agda/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/comment/agda/prompt.txt +24 -0
- dpk_code_profiler/data/final_UI_outputs/comment/agda/test_code.txt +5 -0
- dpk_code_profiler/data/final_UI_outputs/comment/c/0.txt +14 -0
- dpk_code_profiler/data/final_UI_outputs/comment/c/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/comment/c/prompt.txt +24 -0
- dpk_code_profiler/data/final_UI_outputs/comment/c/test_code.txt +4 -0
- dpk_code_profiler/data/final_UI_outputs/comment/c_sharp/0.txt +14 -0
- dpk_code_profiler/data/final_UI_outputs/comment/c_sharp/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/comment/c_sharp/prompt.txt +24 -0
- dpk_code_profiler/data/final_UI_outputs/comment/c_sharp/test_code.txt +4 -0
- dpk_code_profiler/data/final_UI_outputs/comment/cpp/0.txt +11 -0
- dpk_code_profiler/data/final_UI_outputs/comment/d/0.txt +12 -0
- dpk_code_profiler/data/final_UI_outputs/comment/d/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/comment/d/prompt.txt +24 -0
- dpk_code_profiler/data/final_UI_outputs/comment/d/test_code.txt +7 -0
- dpk_code_profiler/data/final_UI_outputs/comment/dart/0.txt +11 -0
- dpk_code_profiler/data/final_UI_outputs/comment/dart/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/comment/dart/prompt.txt +24 -0
- dpk_code_profiler/data/final_UI_outputs/comment/dart/test_code.txt +1 -0
- dpk_code_profiler/data/final_UI_outputs/comment/elm/0.txt +10 -0
- dpk_code_profiler/data/final_UI_outputs/comment/elm/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/comment/elm/prompt.txt +24 -0
- dpk_code_profiler/data/final_UI_outputs/comment/elm/test_code.txt +1 -0
- dpk_code_profiler/data/final_UI_outputs/comment/go/0.txt +14 -0
- dpk_code_profiler/data/final_UI_outputs/comment/go/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/comment/go/prompt.txt +24 -0
- dpk_code_profiler/data/final_UI_outputs/comment/go/test_code.txt +3 -0
- dpk_code_profiler/data/final_UI_outputs/comment/haskell/0.txt +10 -0
- dpk_code_profiler/data/final_UI_outputs/comment/haskell/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/comment/haskell/prompt.txt +24 -0
- dpk_code_profiler/data/final_UI_outputs/comment/haskell/test_code.txt +1 -0
- dpk_code_profiler/data/final_UI_outputs/comment/java/0.txt +7 -0
- dpk_code_profiler/data/final_UI_outputs/comment/java/1.txt +7 -0
- dpk_code_profiler/data/final_UI_outputs/comment/js/0.txt +14 -0
- dpk_code_profiler/data/final_UI_outputs/comment/js/prompt.txt +24 -0
- dpk_code_profiler/data/final_UI_outputs/comment/kotlin/0.txt +10 -0
- dpk_code_profiler/data/final_UI_outputs/comment/kotlin/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/comment/kotlin/prompt.txt +24 -0
- dpk_code_profiler/data/final_UI_outputs/comment/kotlin/test_code.txt +1 -0
- dpk_code_profiler/data/final_UI_outputs/comment/nim/0.txt +10 -0
- dpk_code_profiler/data/final_UI_outputs/comment/nim/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/comment/nim/prompt.txt +24 -0
- dpk_code_profiler/data/final_UI_outputs/comment/nim/test_code.txt +1 -0
- dpk_code_profiler/data/final_UI_outputs/comment/objc/0.txt +14 -0
- dpk_code_profiler/data/final_UI_outputs/comment/objc/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/comment/objc/prompt.txt +24 -0
- dpk_code_profiler/data/final_UI_outputs/comment/objc/test_code.txt +6 -0
- dpk_code_profiler/data/final_UI_outputs/comment/ocaml/0.txt +10 -0
- dpk_code_profiler/data/final_UI_outputs/comment/ocaml/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/comment/ocaml/prompt.txt +24 -0
- dpk_code_profiler/data/final_UI_outputs/comment/ocaml/test_code.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/comment/py/0.txt +7 -0
- dpk_code_profiler/data/final_UI_outputs/comment/qmljs/0.txt +14 -0
- dpk_code_profiler/data/final_UI_outputs/comment/qmljs/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/comment/qmljs/prompt.txt +24 -0
- dpk_code_profiler/data/final_UI_outputs/comment/qmljs/test_code.txt +6 -0
- dpk_code_profiler/data/final_UI_outputs/comment/rust/0.txt +10 -0
- dpk_code_profiler/data/final_UI_outputs/comment/rust/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/comment/rust/prompt.txt +24 -0
- dpk_code_profiler/data/final_UI_outputs/comment/rust/test_code.txt +1 -0
- dpk_code_profiler/data/final_UI_outputs/comment/scala/0.txt +10 -0
- dpk_code_profiler/data/final_UI_outputs/comment/scala/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/comment/scala/prompt.txt +24 -0
- dpk_code_profiler/data/final_UI_outputs/comment/scala/test_code.txt +1 -0
- dpk_code_profiler/data/final_UI_outputs/comment/ts/0.txt +14 -0
- dpk_code_profiler/data/final_UI_outputs/comment/ts/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/comment/ts/prompt.txt +24 -0
- dpk_code_profiler/data/final_UI_outputs/comment/ts/test_code.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/comment/verilog/0.txt +14 -0
- dpk_code_profiler/data/final_UI_outputs/comment/verilog/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/comment/verilog/prompt.txt +24 -0
- dpk_code_profiler/data/final_UI_outputs/comment/verilog/test_code.txt +4 -0
- dpk_code_profiler/data/final_UI_outputs/comment/vhdl/0.txt +10 -0
- dpk_code_profiler/data/final_UI_outputs/comment/vhdl/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/comment/vhdl/prompt.txt +24 -0
- dpk_code_profiler/data/final_UI_outputs/comment/vhdl/test_code.txt +1 -0
- dpk_code_profiler/data/final_UI_outputs/function/agda/0.txt +12 -0
- dpk_code_profiler/data/final_UI_outputs/function/agda/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/function/agda/prompt.txt +23 -0
- dpk_code_profiler/data/final_UI_outputs/function/agda/test_code.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/function/c/0.txt +12 -0
- dpk_code_profiler/data/final_UI_outputs/function/c/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/function/c/prompt.txt +21 -0
- dpk_code_profiler/data/final_UI_outputs/function/c/test_code.txt +6 -0
- dpk_code_profiler/data/final_UI_outputs/function/c_sharp/0.txt +12 -0
- dpk_code_profiler/data/final_UI_outputs/function/c_sharp/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/function/c_sharp/prompt.txt +30 -0
- dpk_code_profiler/data/final_UI_outputs/function/c_sharp/test_code.txt +4 -0
- dpk_code_profiler/data/final_UI_outputs/function/cpp/0.txt +9 -0
- dpk_code_profiler/data/final_UI_outputs/function/cpp/1.txt +12 -0
- dpk_code_profiler/data/final_UI_outputs/function/cpp/2.txt +12 -0
- dpk_code_profiler/data/final_UI_outputs/function/cpp/3.txt +12 -0
- dpk_code_profiler/data/final_UI_outputs/function/cpp/example_languages_1.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/function/cpp/example_languages_2.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/function/cpp/example_languages_3.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/function/cpp/prompt_1.txt +31 -0
- dpk_code_profiler/data/final_UI_outputs/function/cpp/prompt_2.txt +31 -0
- dpk_code_profiler/data/final_UI_outputs/function/cpp/prompt_3.txt +31 -0
- dpk_code_profiler/data/final_UI_outputs/function/cpp/test_code_1.txt +46 -0
- dpk_code_profiler/data/final_UI_outputs/function/cpp/test_code_2.txt +75 -0
- dpk_code_profiler/data/final_UI_outputs/function/cpp/test_code_3.txt +99 -0
- dpk_code_profiler/data/final_UI_outputs/function/d/0.txt +12 -0
- dpk_code_profiler/data/final_UI_outputs/function/d/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/function/d/prompt.txt +30 -0
- dpk_code_profiler/data/final_UI_outputs/function/d/test_code.txt +8 -0
- dpk_code_profiler/data/final_UI_outputs/function/dart/0.txt +12 -0
- dpk_code_profiler/data/final_UI_outputs/function/dart/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/function/dart/prompt.txt +30 -0
- dpk_code_profiler/data/final_UI_outputs/function/dart/test_code.txt +9 -0
- dpk_code_profiler/data/final_UI_outputs/function/elm/0.txt +12 -0
- dpk_code_profiler/data/final_UI_outputs/function/elm/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/function/elm/prompt.txt +30 -0
- dpk_code_profiler/data/final_UI_outputs/function/elm/test_code.txt +1 -0
- dpk_code_profiler/data/final_UI_outputs/function/go/0.txt +12 -0
- dpk_code_profiler/data/final_UI_outputs/function/go/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/function/go/prompt.txt +23 -0
- dpk_code_profiler/data/final_UI_outputs/function/go/test_code.txt +3 -0
- dpk_code_profiler/data/final_UI_outputs/function/haskell/0.txt +12 -0
- dpk_code_profiler/data/final_UI_outputs/function/haskell/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/function/haskell/prompt.txt +21 -0
- dpk_code_profiler/data/final_UI_outputs/function/haskell/test_code.txt +1 -0
- dpk_code_profiler/data/final_UI_outputs/function/java/0.txt +9 -0
- dpk_code_profiler/data/final_UI_outputs/function/js/0.txt +12 -0
- dpk_code_profiler/data/final_UI_outputs/function/js/prompt.txt +22 -0
- dpk_code_profiler/data/final_UI_outputs/function/kotlin/0.txt +12 -0
- dpk_code_profiler/data/final_UI_outputs/function/kotlin/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/function/kotlin/prompt.txt +30 -0
- dpk_code_profiler/data/final_UI_outputs/function/kotlin/test_code.txt +3 -0
- dpk_code_profiler/data/final_UI_outputs/function/nim/0.txt +12 -0
- dpk_code_profiler/data/final_UI_outputs/function/nim/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/function/nim/prompt.txt +30 -0
- dpk_code_profiler/data/final_UI_outputs/function/nim/test_code.txt +3 -0
- dpk_code_profiler/data/final_UI_outputs/function/objc/0.txt +12 -0
- dpk_code_profiler/data/final_UI_outputs/function/objc/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/function/objc/prompt.txt +30 -0
- dpk_code_profiler/data/final_UI_outputs/function/objc/test_code.txt +14 -0
- dpk_code_profiler/data/final_UI_outputs/function/perl/0.txt +12 -0
- dpk_code_profiler/data/final_UI_outputs/function/perl/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/function/perl/prompt.txt +30 -0
- dpk_code_profiler/data/final_UI_outputs/function/perl/test_code.txt +3 -0
- dpk_code_profiler/data/final_UI_outputs/function/py/0.txt +9 -0
- dpk_code_profiler/data/final_UI_outputs/function/rust/0.txt +12 -0
- dpk_code_profiler/data/final_UI_outputs/function/rust/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/function/rust/prompt.txt +30 -0
- dpk_code_profiler/data/final_UI_outputs/function/rust/test_code.txt +5 -0
- dpk_code_profiler/data/final_UI_outputs/function/scala/0.txt +12 -0
- dpk_code_profiler/data/final_UI_outputs/function/scala/1.txt +14 -0
- dpk_code_profiler/data/final_UI_outputs/function/scala/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/function/scala/example_languages_1.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/function/scala/prompt.txt +31 -0
- dpk_code_profiler/data/final_UI_outputs/function/scala/prompt_1.txt +31 -0
- dpk_code_profiler/data/final_UI_outputs/function/scala/test_code.txt +6 -0
- dpk_code_profiler/data/final_UI_outputs/function/scala/test_code_1.txt +14 -0
- dpk_code_profiler/data/final_UI_outputs/function/ts/0.txt +11 -0
- dpk_code_profiler/data/final_UI_outputs/function/ts/prompt.txt +21 -0
- dpk_code_profiler/data/final_UI_outputs/function/verilog/0.txt +10 -0
- dpk_code_profiler/data/final_UI_outputs/function/verilog/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/function/verilog/prompt.txt +31 -0
- dpk_code_profiler/data/final_UI_outputs/function/verilog/test_code.txt +5 -0
- dpk_code_profiler/data/final_UI_outputs/function/vhdl/0.txt +12 -0
- dpk_code_profiler/data/final_UI_outputs/function/vhdl/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/function/vhdl/prompt.txt +31 -0
- dpk_code_profiler/data/final_UI_outputs/function/vhdl/test_code.txt +22 -0
- dpk_code_profiler/data/final_UI_outputs/package/agda/0.txt +16 -0
- dpk_code_profiler/data/final_UI_outputs/package/agda/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/package/agda/prompt.txt +31 -0
- dpk_code_profiler/data/final_UI_outputs/package/agda/test_code.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/package/c/0.txt +26 -0
- dpk_code_profiler/data/final_UI_outputs/package/c/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/package/c/prompt.txt +32 -0
- dpk_code_profiler/data/final_UI_outputs/package/c/test_code.txt +3 -0
- dpk_code_profiler/data/final_UI_outputs/package/c_sharp/0.txt +18 -0
- dpk_code_profiler/data/final_UI_outputs/package/c_sharp/1.txt +20 -0
- dpk_code_profiler/data/final_UI_outputs/package/c_sharp/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/package/c_sharp/example_languages_1.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/package/c_sharp/prompt.txt +31 -0
- dpk_code_profiler/data/final_UI_outputs/package/c_sharp/prompt_1.txt +33 -0
- dpk_code_profiler/data/final_UI_outputs/package/c_sharp/test_code.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/package/c_sharp/test_code_1.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/package/cpp/0.txt +7 -0
- dpk_code_profiler/data/final_UI_outputs/package/cpp/1.txt +25 -0
- dpk_code_profiler/data/final_UI_outputs/package/cpp/2.txt +25 -0
- dpk_code_profiler/data/final_UI_outputs/package/cpp/3.txt +25 -0
- dpk_code_profiler/data/final_UI_outputs/package/cpp/4.txt +28 -0
- dpk_code_profiler/data/final_UI_outputs/package/cpp/5.txt +28 -0
- dpk_code_profiler/data/final_UI_outputs/package/cpp/6.txt +28 -0
- dpk_code_profiler/data/final_UI_outputs/package/cpp/example_languages_1.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/package/cpp/example_languages_2.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/package/cpp/example_languages_3.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/package/cpp/example_languages_4.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/package/cpp/example_languages_5.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/package/cpp/example_languages_6.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/package/cpp/prompt.txt +0 -0
- dpk_code_profiler/data/final_UI_outputs/package/cpp/prompt_1.txt +33 -0
- dpk_code_profiler/data/final_UI_outputs/package/cpp/prompt_2.txt +33 -0
- dpk_code_profiler/data/final_UI_outputs/package/cpp/prompt_3.txt +33 -0
- dpk_code_profiler/data/final_UI_outputs/package/cpp/prompt_4.txt +33 -0
- dpk_code_profiler/data/final_UI_outputs/package/cpp/prompt_5.txt +33 -0
- dpk_code_profiler/data/final_UI_outputs/package/cpp/prompt_6.txt +33 -0
- dpk_code_profiler/data/final_UI_outputs/package/cpp/test_code_1.txt +4 -0
- dpk_code_profiler/data/final_UI_outputs/package/cpp/test_code_2.txt +25 -0
- dpk_code_profiler/data/final_UI_outputs/package/cpp/test_code_3.txt +25 -0
- dpk_code_profiler/data/final_UI_outputs/package/cpp/test_code_4.txt +29 -0
- dpk_code_profiler/data/final_UI_outputs/package/cpp/test_code_5.txt +29 -0
- dpk_code_profiler/data/final_UI_outputs/package/cpp/test_code_6.txt +29 -0
- dpk_code_profiler/data/final_UI_outputs/package/d/0.txt +17 -0
- dpk_code_profiler/data/final_UI_outputs/package/d/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/package/d/prompt.txt +31 -0
- dpk_code_profiler/data/final_UI_outputs/package/d/test_code.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/package/dart/0.txt +15 -0
- dpk_code_profiler/data/final_UI_outputs/package/dart/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/package/dart/prompt.txt +33 -0
- dpk_code_profiler/data/final_UI_outputs/package/dart/test_code.txt +5 -0
- dpk_code_profiler/data/final_UI_outputs/package/elm/0.txt +20 -0
- dpk_code_profiler/data/final_UI_outputs/package/elm/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/package/elm/prompt.txt +31 -0
- dpk_code_profiler/data/final_UI_outputs/package/elm/test_code.txt +5 -0
- dpk_code_profiler/data/final_UI_outputs/package/go/0.txt +23 -0
- dpk_code_profiler/data/final_UI_outputs/package/go/prompt.txt +0 -0
- dpk_code_profiler/data/final_UI_outputs/package/haskell/0.txt +28 -0
- dpk_code_profiler/data/final_UI_outputs/package/haskell/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/package/haskell/prompt.txt +32 -0
- dpk_code_profiler/data/final_UI_outputs/package/haskell/test_code.txt +11 -0
- dpk_code_profiler/data/final_UI_outputs/package/java/0.txt +13 -0
- dpk_code_profiler/data/final_UI_outputs/package/java/prompt.txt +0 -0
- dpk_code_profiler/data/final_UI_outputs/package/js/0.txt +16 -0
- dpk_code_profiler/data/final_UI_outputs/package/js/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/package/js/prompt.txt +30 -0
- dpk_code_profiler/data/final_UI_outputs/package/js/test_code.txt +4 -0
- dpk_code_profiler/data/final_UI_outputs/package/kotlin/0.txt +20 -0
- dpk_code_profiler/data/final_UI_outputs/package/kotlin/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/package/kotlin/prompt.txt +32 -0
- dpk_code_profiler/data/final_UI_outputs/package/kotlin/test_code.txt +3 -0
- dpk_code_profiler/data/final_UI_outputs/package/nim/0.txt +45 -0
- dpk_code_profiler/data/final_UI_outputs/package/nim/1.txt +12 -0
- dpk_code_profiler/data/final_UI_outputs/package/nim/2.txt +14 -0
- dpk_code_profiler/data/final_UI_outputs/package/nim/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/package/nim/prompt.txt +31 -0
- dpk_code_profiler/data/final_UI_outputs/package/nim/test_code.txt +4 -0
- dpk_code_profiler/data/final_UI_outputs/package/nim/test_code_0.txt +4 -0
- dpk_code_profiler/data/final_UI_outputs/package/nim/test_code_1.txt +1 -0
- dpk_code_profiler/data/final_UI_outputs/package/nim/test_code_2.txt +1 -0
- dpk_code_profiler/data/final_UI_outputs/package/objc/0.txt +15 -0
- dpk_code_profiler/data/final_UI_outputs/package/objc/1.txt +19 -0
- dpk_code_profiler/data/final_UI_outputs/package/objc/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/package/objc/prompt.txt +31 -0
- dpk_code_profiler/data/final_UI_outputs/package/objc/test_code.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/package/objc/test_code_1.txt +4 -0
- dpk_code_profiler/data/final_UI_outputs/package/ocaml/0.txt +12 -0
- dpk_code_profiler/data/final_UI_outputs/package/ocaml/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/package/ocaml/prompt.txt +31 -0
- dpk_code_profiler/data/final_UI_outputs/package/ocaml/test_code.txt +1 -0
- dpk_code_profiler/data/final_UI_outputs/package/perl/0.txt +24 -0
- dpk_code_profiler/data/final_UI_outputs/package/perl/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/package/perl/prompt.txt +32 -0
- dpk_code_profiler/data/final_UI_outputs/package/perl/test_code.txt +6 -0
- dpk_code_profiler/data/final_UI_outputs/package/py/0.txt +47 -0
- dpk_code_profiler/data/final_UI_outputs/package/py/1.txt +17 -0
- dpk_code_profiler/data/final_UI_outputs/package/py/prompt.txt +0 -0
- dpk_code_profiler/data/final_UI_outputs/package/qmljs/0.txt +14 -0
- dpk_code_profiler/data/final_UI_outputs/package/qmljs/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/package/qmljs/prompt.txt +31 -0
- dpk_code_profiler/data/final_UI_outputs/package/qmljs/test_code.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/package/rust/0.txt +18 -0
- dpk_code_profiler/data/final_UI_outputs/package/rust/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/package/rust/prompt.txt +33 -0
- dpk_code_profiler/data/final_UI_outputs/package/rust/test_code.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/package/scala/0.txt +45 -0
- dpk_code_profiler/data/final_UI_outputs/package/scala/1.txt +24 -0
- dpk_code_profiler/data/final_UI_outputs/package/scala/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/package/scala/example_languages_1.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/package/scala/prompt.txt +30 -0
- dpk_code_profiler/data/final_UI_outputs/package/scala/prompt_1.txt +33 -0
- dpk_code_profiler/data/final_UI_outputs/package/scala/test_code.txt +5 -0
- dpk_code_profiler/data/final_UI_outputs/package/scala/test_code_1.txt +30 -0
- dpk_code_profiler/data/final_UI_outputs/package/ts/0.txt +27 -0
- dpk_code_profiler/data/final_UI_outputs/package/ts/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/package/ts/prompt.txt +30 -0
- dpk_code_profiler/data/final_UI_outputs/package/ts/test_code.txt +14 -0
- dpk_code_profiler/data/final_UI_outputs/package/verilog/0.txt +12 -0
- dpk_code_profiler/data/final_UI_outputs/package/verilog/1.txt +19 -0
- dpk_code_profiler/data/final_UI_outputs/package/verilog/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/package/verilog/prompt.txt +31 -0
- dpk_code_profiler/data/final_UI_outputs/package/verilog/test_code.txt +1 -0
- dpk_code_profiler/data/final_UI_outputs/package/verilog/test_code_1.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/package/vhdl/0.txt +14 -0
- dpk_code_profiler/data/final_UI_outputs/package/vhdl/example_languages.txt +2 -0
- dpk_code_profiler/data/final_UI_outputs/package/vhdl/prompt.txt +33 -0
- dpk_code_profiler/data/final_UI_outputs/package/vhdl/test_code.txt +2 -0
- dpk_code_profiler/data/helper.ipynb +165 -0
- dpk_code_profiler/data/prompts/comment.txt +24 -0
- dpk_code_profiler/data/prompts/function.txt +31 -0
- dpk_code_profiler/data/prompts/package.txt +33 -0
- dpk_code_profiler/grammar/UAST_Grammar.json +20 -0
- dpk_code_profiler/higher_order_concepts.py +63 -0
- dpk_code_profiler/local.py +68 -0
- dpk_code_profiler/local_python.py +47 -0
- dpk_code_profiler/offline-customizations/cached_requirements.json +198 -0
- dpk_code_profiler/offline-customizations/config_LLM_runner_app.py +21 -0
- dpk_code_profiler/offline-customizations/generic_LLM_runner_app.py +655 -0
- dpk_code_profiler/output_data-prep-kit.sl.cloud9.ibm.com_20250129-062407-115.html +205 -0
- dpk_code_profiler/output_data-prep-kit.sl.cloud9.ibm.com_20250129-062407-115.json +596 -0
- dpk_code_profiler/profiler-report/template.html +107 -0
- dpk_code_profiler/profiler_report.py +195 -0
- dpk_code_profiler/ray/local.py +59 -0
- dpk_code_profiler/ray/s3.py +56 -0
- dpk_code_profiler/ray/transform.py +49 -0
- dpk_code_profiler/ruleset/UAST_rules_agda.json +14 -0
- dpk_code_profiler/ruleset/UAST_rules_c.json +14 -0
- dpk_code_profiler/ruleset/UAST_rules_c_sharp.json +14 -0
- dpk_code_profiler/ruleset/UAST_rules_cpp.json +22 -0
- dpk_code_profiler/ruleset/UAST_rules_d.json +14 -0
- dpk_code_profiler/ruleset/UAST_rules_dart.json +14 -0
- dpk_code_profiler/ruleset/UAST_rules_elm.json +18 -0
- dpk_code_profiler/ruleset/UAST_rules_go.json +10 -0
- dpk_code_profiler/ruleset/UAST_rules_haskell.json +14 -0
- dpk_code_profiler/ruleset/UAST_rules_java.json +22 -0
- dpk_code_profiler/ruleset/UAST_rules_js.json +10 -0
- dpk_code_profiler/ruleset/UAST_rules_kotlin.json +18 -0
- dpk_code_profiler/ruleset/UAST_rules_nim.json +26 -0
- dpk_code_profiler/ruleset/UAST_rules_objc.json +18 -0
- dpk_code_profiler/ruleset/UAST_rules_ocaml.json +10 -0
- dpk_code_profiler/ruleset/UAST_rules_perl.json +10 -0
- dpk_code_profiler/ruleset/UAST_rules_py.json +26 -0
- dpk_code_profiler/ruleset/UAST_rules_qmljs.json +10 -0
- dpk_code_profiler/ruleset/UAST_rules_rust.json +14 -0
- dpk_code_profiler/ruleset/UAST_rules_scala.json +18 -0
- dpk_code_profiler/ruleset/UAST_rules_ts.json +14 -0
- dpk_code_profiler/ruleset/UAST_rules_typescript.json +14 -0
- dpk_code_profiler/ruleset/UAST_rules_verilog.json +18 -0
- dpk_code_profiler/ruleset/UAST_rules_vhdl.json +14 -0
- dpk_code_profiler/semantic-ruleset/ikb_model.csv +2002 -0
- dpk_code_profiler/semantic-ruleset/null_libs.csv +10105 -0
- dpk_code_profiler/semantic-ruleset/offline-ikb-builder/concept_list.csv +14 -0
- dpk_code_profiler/semantic-ruleset/offline-ikb-builder/examples/examples-i.csv +27 -0
- dpk_code_profiler/semantic-ruleset/offline-ikb-builder/examples/examples-o.csv +27 -0
- dpk_code_profiler/semantic-ruleset/offline-ikb-builder/generate_ikb.py +178 -0
- dpk_code_profiler/semantic-ruleset/offline-ikb-builder/watsonxai.py +32 -0
- dpk_code_profiler/semantic_concepts.py +112 -0
- dpk_code_profiler/template.html +107 -0
- dpk_code_profiler/tool_utils/aggregate_report.py +57 -0
- dpk_code_profiler/tool_utils/aggregated_output_wca_ept_1.json +67757 -0
- dpk_code_profiler/tool_utils/report_stats_generation.py +105 -0
- dpk_code_profiler/transform.py +371 -0
- dpk_code_profiler/transform_python.py +49 -0
- dpk_doc_chunk/__init__.py +1 -0
- dpk_doc_chunk/chunkers.py +138 -0
- dpk_doc_chunk/local.py +34 -0
- dpk_doc_chunk/local_python.py +56 -0
- dpk_doc_chunk/ray/__init__.py +0 -0
- dpk_doc_chunk/ray/local.py +50 -0
- dpk_doc_chunk/ray/s3.py +57 -0
- dpk_doc_chunk/ray/transform.py +81 -0
- dpk_doc_chunk/transform.py +254 -0
- dpk_doc_chunk/transform_python.py +69 -0
- dpk_doc_id/__init__.py +4 -0
- dpk_doc_id/local.py +57 -0
- dpk_doc_id/local_python.py +54 -0
- dpk_doc_id/ray/__init__.py +0 -0
- dpk_doc_id/ray/local.py +59 -0
- dpk_doc_id/ray/s3.py +62 -0
- dpk_doc_id/ray/transform.py +143 -0
- dpk_doc_id/spark/__init__.py +0 -0
- dpk_doc_id/spark/local.py +52 -0
- dpk_doc_id/spark/transform.py +185 -0
- dpk_doc_id/transform.py +178 -0
- dpk_doc_id/transform_python.py +143 -0
- dpk_doc_quality/__init__.py +4 -0
- dpk_doc_quality/cc_net_prepro.py +168 -0
- dpk_doc_quality/doc_Gopher_statistics.py +158 -0
- dpk_doc_quality/doc_c4_statistics.py +167 -0
- dpk_doc_quality/ldnoobw/de +66 -0
- dpk_doc_quality/ldnoobw/en +403 -0
- dpk_doc_quality/ldnoobw/es +68 -0
- dpk_doc_quality/ldnoobw/fr +91 -0
- dpk_doc_quality/ldnoobw/ja +180 -0
- dpk_doc_quality/ldnoobw/pt +76 -0
- dpk_doc_quality/local.py +43 -0
- dpk_doc_quality/local_python.py +61 -0
- dpk_doc_quality/ray/__init__.py +0 -0
- dpk_doc_quality/ray/local.py +59 -0
- dpk_doc_quality/ray/s3.py +71 -0
- dpk_doc_quality/ray/transform.py +84 -0
- dpk_doc_quality/transform.py +241 -0
- dpk_doc_quality/transform_python.py +83 -0
- dpk_doc_quality/utils.py +67 -0
- dpk_ededup/__init__.py +1 -0
- dpk_ededup/local.py +46 -0
- dpk_ededup/local_python.py +49 -0
- dpk_ededup/local_python_incremental.py +53 -0
- dpk_ededup/ray/__init__.py +0 -0
- dpk_ededup/ray/cluster_estimator.py +59 -0
- dpk_ededup/ray/local.py +61 -0
- dpk_ededup/ray/local_incremental.py +65 -0
- dpk_ededup/ray/s3.py +64 -0
- dpk_ededup/ray/transform.py +273 -0
- dpk_ededup/transform_base.py +248 -0
- dpk_ededup/transform_python.py +171 -0
- dpk_extreme_tokenized/__init__.py +1 -0
- dpk_extreme_tokenized/common.py +47 -0
- dpk_extreme_tokenized/ray/__init__.py +0 -0
- dpk_extreme_tokenized/ray/runtime.py +55 -0
- dpk_extreme_tokenized/runtime.py +123 -0
- dpk_extreme_tokenized/transform.py +125 -0
- dpk_fdedup/Murmur_MH.py +112 -0
- dpk_fdedup/cluster_analysis/local_python.py +50 -0
- dpk_fdedup/cluster_analysis/ray/cluster_estimator.py +99 -0
- dpk_fdedup/cluster_analysis/ray/local.py +53 -0
- dpk_fdedup/cluster_analysis/ray/transform.py +74 -0
- dpk_fdedup/cluster_analysis/spark/local.py +49 -0
- dpk_fdedup/cluster_analysis/spark/transform.py +75 -0
- dpk_fdedup/cluster_analysis/transform.py +342 -0
- dpk_fdedup/cluster_analysis/transform_python.py +76 -0
- dpk_fdedup/data_cleaning/local_python.py +60 -0
- dpk_fdedup/data_cleaning/ray/local.py +69 -0
- dpk_fdedup/data_cleaning/ray/transform.py +138 -0
- dpk_fdedup/data_cleaning/spark/local.py +61 -0
- dpk_fdedup/data_cleaning/spark/transform.py +124 -0
- dpk_fdedup/data_cleaning/transform.py +179 -0
- dpk_fdedup/data_cleaning/transform_python.py +103 -0
- dpk_fdedup/get_duplicate_list/ray/transform.py +69 -0
- dpk_fdedup/get_duplicate_list/transform.py +173 -0
- dpk_fdedup/get_duplicate_list/transform_local_python.py +46 -0
- dpk_fdedup/get_duplicate_list/transform_python.py +71 -0
- dpk_fdedup/ray/transform.py +92 -0
- dpk_fdedup/signature_calc/local_python.py +51 -0
- dpk_fdedup/signature_calc/ray/local.py +54 -0
- dpk_fdedup/signature_calc/ray/transform.py +43 -0
- dpk_fdedup/signature_calc/spark/local.py +50 -0
- dpk_fdedup/signature_calc/spark/transform.py +42 -0
- dpk_fdedup/signature_calc/transform.py +517 -0
- dpk_fdedup/signature_calc/transform_python.py +44 -0
- dpk_fdedup/spark/transform.py +62 -0
- dpk_fdedup/transform_python.py +289 -0
- dpk_filter/__init__.py +1 -0
- dpk_filter/local.py +58 -0
- dpk_filter/local_python.py +60 -0
- dpk_filter/ray/__init__.py +0 -0
- dpk_filter/ray/local.py +71 -0
- dpk_filter/ray/s3.py +74 -0
- dpk_filter/ray/transform.py +63 -0
- dpk_filter/spark/local.py +60 -0
- dpk_filter/spark/transform.py +41 -0
- dpk_filter/test_support.py +135 -0
- dpk_filter/transform.py +192 -0
- dpk_filter/transform_python.py +56 -0
- dpk_gneissweb_classification/classification_models.py +63 -0
- dpk_gneissweb_classification/local.py +48 -0
- dpk_gneissweb_classification/local_python.py +54 -0
- dpk_gneissweb_classification/nlp.py +46 -0
- dpk_gneissweb_classification/ray/local.py +64 -0
- dpk_gneissweb_classification/ray/s3.py +73 -0
- dpk_gneissweb_classification/ray/transform.py +75 -0
- dpk_gneissweb_classification/transform.py +171 -0
- dpk_gneissweb_classification/transform_python.py +66 -0
- dpk_hap/__init__.py +4 -0
- dpk_hap/local.py +51 -0
- dpk_hap/local_python.py +54 -0
- dpk_hap/ray/__init__.py +0 -0
- dpk_hap/ray/local.py +58 -0
- dpk_hap/ray/s3.py +64 -0
- dpk_hap/ray/transform.py +40 -0
- dpk_hap/transform.py +186 -0
- dpk_hap/transform_python.py +65 -0
- dpk_html2parquet/__init__.py +4 -0
- dpk_html2parquet/local.py +35 -0
- dpk_html2parquet/local_python.py +46 -0
- dpk_html2parquet/ray/__init__.py +0 -0
- dpk_html2parquet/ray/local_ray.py +55 -0
- dpk_html2parquet/ray/s3_ray.py +57 -0
- dpk_html2parquet/ray/transform.py +60 -0
- dpk_html2parquet/transform.py +270 -0
- dpk_html2parquet/transform_python.py +66 -0
- dpk_lang_id/lang_models.py +52 -0
- dpk_lang_id/local.py +49 -0
- dpk_lang_id/local_python.py +55 -0
- dpk_lang_id/nlp.py +46 -0
- dpk_lang_id/ray/local.py +65 -0
- dpk_lang_id/ray/s3.py +71 -0
- dpk_lang_id/ray/transform.py +73 -0
- dpk_lang_id/transform.py +146 -0
- dpk_lang_id/transform_python.py +66 -0
- dpk_pdf2parquet/.gitignore +39 -0
- dpk_pdf2parquet/__init__.py +1 -0
- dpk_pdf2parquet/local.py +39 -0
- dpk_pdf2parquet/local_python.py +56 -0
- dpk_pdf2parquet/ray/.gitignore +39 -0
- dpk_pdf2parquet/ray/__init__.py +0 -0
- dpk_pdf2parquet/ray/local_ray.py +55 -0
- dpk_pdf2parquet/ray/s3_ray.py +60 -0
- dpk_pdf2parquet/ray/transform.py +102 -0
- dpk_pdf2parquet/transform.py +498 -0
- dpk_pdf2parquet/transform_python.py +66 -0
- dpk_pii_redactor/__init__.py +1 -0
- dpk_pii_redactor/flair_recognizer.py +160 -0
- dpk_pii_redactor/local.py +35 -0
- dpk_pii_redactor/local_python.py +37 -0
- dpk_pii_redactor/pii_analyzer.py +83 -0
- dpk_pii_redactor/pii_anonymizer.py +38 -0
- dpk_pii_redactor/ray/__init__.py +0 -0
- dpk_pii_redactor/ray/local.py +54 -0
- dpk_pii_redactor/ray/s3.py +59 -0
- dpk_pii_redactor/ray/transform.py +66 -0
- dpk_pii_redactor/transform.py +162 -0
- dpk_pii_redactor/transform_python.py +56 -0
- dpk_profiler/__init__.py +2 -0
- dpk_profiler/base_tokenizer.py +36 -0
- dpk_profiler/local.py +44 -0
- dpk_profiler/local_python.py +45 -0
- dpk_profiler/ray/__init__.py +0 -0
- dpk_profiler/ray/local.py +52 -0
- dpk_profiler/ray/runtime.py +244 -0
- dpk_profiler/ray/s3.py +55 -0
- dpk_profiler/runtime.py +152 -0
- dpk_profiler/spark/__init__.py +0 -0
- dpk_profiler/spark/local.py +46 -0
- dpk_profiler/spark/runtime.py +108 -0
- dpk_profiler/transform_base.py +176 -0
- dpk_readability/__init__.py +1 -0
- dpk_readability/common.py +85 -0
- dpk_readability/ray/__init__.py +0 -0
- dpk_readability/ray/runtime.py +57 -0
- dpk_readability/runtime.py +173 -0
- dpk_readability/transform.py +171 -0
- dpk_rep_removal/__init__.py +1 -0
- dpk_rep_removal/dedup_Rust_scripts.py +101 -0
- dpk_rep_removal/dedup_pq_level.py +203 -0
- dpk_rep_removal/gpt2/merges.txt +50001 -0
- dpk_rep_removal/gpt2/special_tokens_map.json +23 -0
- dpk_rep_removal/gpt2/tokenizer_config.json +33 -0
- dpk_rep_removal/gpt2/vocab.json +50259 -0
- dpk_rep_removal/make_suffix_array.py +177 -0
- dpk_rep_removal/ray/__init__.py +0 -0
- dpk_rep_removal/ray/runtime.py +54 -0
- dpk_rep_removal/runtime.py +140 -0
- dpk_rep_removal/rust/Cargo.toml +19 -0
- dpk_rep_removal/rust/src/main.rs +1279 -0
- dpk_rep_removal/rust/src/table.rs +940 -0
- dpk_rep_removal/rust/target/release/dedup_dataset +0 -0
- dpk_rep_removal/rust/target/release/dedup_dataset.d +1 -0
- dpk_rep_removal/transform.py +103 -0
- dpk_rep_removal/utils.py +316 -0
- dpk_resize/__init__.py +2 -0
- dpk_resize/local.py +36 -0
- dpk_resize/local_python.py +46 -0
- dpk_resize/ray/__init__.py +0 -0
- dpk_resize/ray/local.py +51 -0
- dpk_resize/ray/runtime.py +73 -0
- dpk_resize/ray/s3.py +57 -0
- dpk_resize/runtime.py +64 -0
- dpk_resize/spark/__init__.py +0 -0
- dpk_resize/spark/local.py +47 -0
- dpk_resize/spark/runtime.py +39 -0
- dpk_resize/transform.py +193 -0
- dpk_similarity/__init__.py +1 -0
- dpk_similarity/data/result_list.json +73 -0
- dpk_similarity/local.py +50 -0
- dpk_similarity/local_python.py +50 -0
- dpk_similarity/ray/__init__.py +0 -0
- dpk_similarity/transform.py +356 -0
- dpk_similarity/transform_python.py +38 -0
- dpk_text_encoder/__init__.py +0 -0
- dpk_text_encoder/local.py +44 -0
- dpk_text_encoder/local_python.py +44 -0
- dpk_text_encoder/ray/__init__.py +0 -0
- dpk_text_encoder/ray/local.py +50 -0
- dpk_text_encoder/ray/s3.py +56 -0
- dpk_text_encoder/ray/transform.py +75 -0
- dpk_text_encoder/transform.py +127 -0
- dpk_text_encoder/transform_python.py +68 -0
- dpk_tokenization/local.py +40 -0
- dpk_tokenization/local_long_doc.py +49 -0
- dpk_tokenization/ray/local.py +49 -0
- dpk_tokenization/ray/s3.py +59 -0
- dpk_tokenization/ray/transform.py +62 -0
- dpk_tokenization/s3_long_doc.py +52 -0
- dpk_tokenization/transform.py +258 -0
- dpk_tokenization/transform_python.py +53 -0
- dpk_tokenization/utils.py +143 -0
- dpk_tokenization2arrow/transform.py +168 -0
- dpk_tokenization2arrow/transform_python.py +53 -0
- dpk_tokenization2arrow/transform_ray.py +62 -0
- dpk_web2parquet/config.py +81 -0
- dpk_web2parquet/local.py +26 -0
- dpk_web2parquet/local_python.py +49 -0
- dpk_web2parquet/python_runtime.py +44 -0
- dpk_web2parquet/transform.py +126 -0
- dpk_web2parquet/utils.py +38 -0
|
@@ -0,0 +1,506 @@
|
|
|
1
|
+
Metadata-Version: 2.2
|
|
2
|
+
Name: data_prep_toolkit_transforms
|
|
3
|
+
Version: 1.0.1.dev1
|
|
4
|
+
Summary: Data Preparation Toolkit Transforms using Ray
|
|
5
|
+
Author-email: Maroun Touma <touma@us.ibm.com>
|
|
6
|
+
License: Apache-2.0
|
|
7
|
+
Keywords: transforms,data preprocessing,data preparation,llm,generative,ai,fine-tuning,llmapps
|
|
8
|
+
Requires-Python: <3.13,>=3.10
|
|
9
|
+
Description-Content-Type: text/markdown
|
|
10
|
+
Requires-Dist: data-prep-toolkit>=0.2.4.dev0
|
|
11
|
+
Provides-Extra: dev
|
|
12
|
+
Requires-Dist: twine; extra == "dev"
|
|
13
|
+
Requires-Dist: pytest>=7.3.2; extra == "dev"
|
|
14
|
+
Requires-Dist: pytest-dotenv>=0.5.2; extra == "dev"
|
|
15
|
+
Requires-Dist: pytest-env>=1.0.0; extra == "dev"
|
|
16
|
+
Requires-Dist: pre-commit>=3.3.2; extra == "dev"
|
|
17
|
+
Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
|
|
18
|
+
Requires-Dist: pytest-mock>=3.10.0; extra == "dev"
|
|
19
|
+
Requires-Dist: moto==5.0.5; extra == "dev"
|
|
20
|
+
Requires-Dist: markupsafe==2.0.1; extra == "dev"
|
|
21
|
+
Provides-Extra: ray
|
|
22
|
+
Requires-Dist: data-prep-toolkit[ray]>=0.2.4.dev0; extra == "ray"
|
|
23
|
+
Requires-Dist: networkx==3.3; extra == "ray"
|
|
24
|
+
Requires-Dist: colorlog==6.8.2; extra == "ray"
|
|
25
|
+
Requires-Dist: func-timeout==4.3.5; extra == "ray"
|
|
26
|
+
Requires-Dist: emerge-viz==2.0.0; extra == "ray"
|
|
27
|
+
Provides-Extra: all
|
|
28
|
+
Requires-Dist: data-prep-toolkit>=0.2.3; extra == "all"
|
|
29
|
+
Requires-Dist: data-prep-toolkit>=0.2.3; extra == "all"
|
|
30
|
+
Requires-Dist: scancode-toolkit==32.1.0; platform_system != "Darwin" and extra == "all"
|
|
31
|
+
Requires-Dist: timeout-timer==0.2.0; extra == "all"
|
|
32
|
+
Requires-Dist: data-prep-toolkit>=0.2.3; extra == "all"
|
|
33
|
+
Requires-Dist: data-prep-toolkit>=0.2.3; extra == "all"
|
|
34
|
+
Requires-Dist: bs4==0.0.2; extra == "all"
|
|
35
|
+
Requires-Dist: transformers>=4.38.2; extra == "all"
|
|
36
|
+
Requires-Dist: data-prep-toolkit>=0.2.3; extra == "all"
|
|
37
|
+
Requires-Dist: parameterized; extra == "all"
|
|
38
|
+
Requires-Dist: pandas; extra == "all"
|
|
39
|
+
Requires-Dist: data-prep-toolkit>=0.2.3; extra == "all"
|
|
40
|
+
Requires-Dist: parameterized>=0.9.0; extra == "all"
|
|
41
|
+
Requires-Dist: pandas>=2.2.2; extra == "all"
|
|
42
|
+
Requires-Dist: aiolimiter==1.1.0; extra == "all"
|
|
43
|
+
Requires-Dist: altair==5.3.0; extra == "all"
|
|
44
|
+
Requires-Dist: annotated-types==0.7.0; extra == "all"
|
|
45
|
+
Requires-Dist: anyio==4.4.0; extra == "all"
|
|
46
|
+
Requires-Dist: appnope==0.1.4; extra == "all"
|
|
47
|
+
Requires-Dist: asttokens==2.4.1; extra == "all"
|
|
48
|
+
Requires-Dist: attrs==23.2.0; extra == "all"
|
|
49
|
+
Requires-Dist: blinker==1.8.2; extra == "all"
|
|
50
|
+
Requires-Dist: cachetools==5.3.3; extra == "all"
|
|
51
|
+
Requires-Dist: certifi==2024.7.4; extra == "all"
|
|
52
|
+
Requires-Dist: charset-normalizer==3.3.2; extra == "all"
|
|
53
|
+
Requires-Dist: click==8.1.7; extra == "all"
|
|
54
|
+
Requires-Dist: comm==0.2.2; extra == "all"
|
|
55
|
+
Requires-Dist: contourpy==1.2.1; extra == "all"
|
|
56
|
+
Requires-Dist: cycler==0.12.1; extra == "all"
|
|
57
|
+
Requires-Dist: debugpy==1.8.1; extra == "all"
|
|
58
|
+
Requires-Dist: decorator==5.1.1; extra == "all"
|
|
59
|
+
Requires-Dist: Deprecated==1.2.14; extra == "all"
|
|
60
|
+
Requires-Dist: executing==2.0.1; extra == "all"
|
|
61
|
+
Requires-Dist: fonttools==4.53.0; extra == "all"
|
|
62
|
+
Requires-Dist: gitdb==4.0.11; extra == "all"
|
|
63
|
+
Requires-Dist: GitPython==3.1.43; extra == "all"
|
|
64
|
+
Requires-Dist: h11==0.14.0; extra == "all"
|
|
65
|
+
Requires-Dist: htbuilder==0.6.2; extra == "all"
|
|
66
|
+
Requires-Dist: httpcore==1.0.5; extra == "all"
|
|
67
|
+
Requires-Dist: httpx==0.27.0; extra == "all"
|
|
68
|
+
Requires-Dist: httpx-sse==0.4.0; extra == "all"
|
|
69
|
+
Requires-Dist: ibm-generative-ai==3.0.0; extra == "all"
|
|
70
|
+
Requires-Dist: idna==3.7; extra == "all"
|
|
71
|
+
Requires-Dist: ipykernel==6.29.4; extra == "all"
|
|
72
|
+
Requires-Dist: ipython==8.25.0; extra == "all"
|
|
73
|
+
Requires-Dist: jedi==0.19.1; extra == "all"
|
|
74
|
+
Requires-Dist: Jinja2==3.1.4; extra == "all"
|
|
75
|
+
Requires-Dist: jsonschema==4.22.0; extra == "all"
|
|
76
|
+
Requires-Dist: jsonschema-specifications==2023.12.1; extra == "all"
|
|
77
|
+
Requires-Dist: jupyter_client==8.6.2; extra == "all"
|
|
78
|
+
Requires-Dist: jupyter_core==5.7.2; extra == "all"
|
|
79
|
+
Requires-Dist: kiwisolver==1.4.5; extra == "all"
|
|
80
|
+
Requires-Dist: markdown-it-py==3.0.0; extra == "all"
|
|
81
|
+
Requires-Dist: MarkupSafe==2.1.5; extra == "all"
|
|
82
|
+
Requires-Dist: matplotlib==3.9.0; extra == "all"
|
|
83
|
+
Requires-Dist: matplotlib-inline==0.1.7; extra == "all"
|
|
84
|
+
Requires-Dist: mdurl==0.1.2; extra == "all"
|
|
85
|
+
Requires-Dist: more-itertools==10.3.0; extra == "all"
|
|
86
|
+
Requires-Dist: nest-asyncio==1.6.0; extra == "all"
|
|
87
|
+
Requires-Dist: networkx==3.3; extra == "all"
|
|
88
|
+
Requires-Dist: numpy==1.26.4; extra == "all"
|
|
89
|
+
Requires-Dist: packaging==24.0; extra == "all"
|
|
90
|
+
Requires-Dist: parso==0.8.4; extra == "all"
|
|
91
|
+
Requires-Dist: pexpect==4.9.0; extra == "all"
|
|
92
|
+
Requires-Dist: pillow>=10.3.0; extra == "all"
|
|
93
|
+
Requires-Dist: platformdirs==4.2.2; extra == "all"
|
|
94
|
+
Requires-Dist: prompt_toolkit==3.0.45; extra == "all"
|
|
95
|
+
Requires-Dist: protobuf==5.27.2; extra == "all"
|
|
96
|
+
Requires-Dist: psutil==5.9.8; extra == "all"
|
|
97
|
+
Requires-Dist: ptyprocess==0.7.0; extra == "all"
|
|
98
|
+
Requires-Dist: pure-eval==0.2.2; extra == "all"
|
|
99
|
+
Requires-Dist: pyarrow==16.1.0; extra == "all"
|
|
100
|
+
Requires-Dist: pydantic>=2.7.4; extra == "all"
|
|
101
|
+
Requires-Dist: pydantic_core>=2.18.4; extra == "all"
|
|
102
|
+
Requires-Dist: pydeck==0.9.1; extra == "all"
|
|
103
|
+
Requires-Dist: Pygments==2.18.0; extra == "all"
|
|
104
|
+
Requires-Dist: pyparsing==3.1.2; extra == "all"
|
|
105
|
+
Requires-Dist: python-dateutil==2.9.0.post0; extra == "all"
|
|
106
|
+
Requires-Dist: pytz==2024.1; extra == "all"
|
|
107
|
+
Requires-Dist: pyzmq==26.0.3; extra == "all"
|
|
108
|
+
Requires-Dist: referencing==0.35.1; extra == "all"
|
|
109
|
+
Requires-Dist: regex==2024.5.15; extra == "all"
|
|
110
|
+
Requires-Dist: requests==2.32.3; extra == "all"
|
|
111
|
+
Requires-Dist: rich==13.7.1; extra == "all"
|
|
112
|
+
Requires-Dist: rpds-py==0.18.1; extra == "all"
|
|
113
|
+
Requires-Dist: seaborn==0.13.2; extra == "all"
|
|
114
|
+
Requires-Dist: six==1.16.0; extra == "all"
|
|
115
|
+
Requires-Dist: smmap==5.0.1; extra == "all"
|
|
116
|
+
Requires-Dist: sniffio==1.3.1; extra == "all"
|
|
117
|
+
Requires-Dist: st-annotated-text==4.0.1; extra == "all"
|
|
118
|
+
Requires-Dist: stack-data==0.6.3; extra == "all"
|
|
119
|
+
Requires-Dist: streamlit==1.37.0; extra == "all"
|
|
120
|
+
Requires-Dist: tenacity==8.4.2; extra == "all"
|
|
121
|
+
Requires-Dist: toml==0.10.2; extra == "all"
|
|
122
|
+
Requires-Dist: toolz==0.12.1; extra == "all"
|
|
123
|
+
Requires-Dist: tornado==6.4.1; extra == "all"
|
|
124
|
+
Requires-Dist: traitlets==5.14.3; extra == "all"
|
|
125
|
+
Requires-Dist: tree-sitter==0.21.3; extra == "all"
|
|
126
|
+
Requires-Dist: tree-sitter-cpp==0.22.1; extra == "all"
|
|
127
|
+
Requires-Dist: tree-sitter-java==0.21.0; extra == "all"
|
|
128
|
+
Requires-Dist: tree-sitter-languages==1.10.2; extra == "all"
|
|
129
|
+
Requires-Dist: tree-sitter-php==0.22.5; extra == "all"
|
|
130
|
+
Requires-Dist: typing_extensions==4.12.2; extra == "all"
|
|
131
|
+
Requires-Dist: tzdata==2024.1; extra == "all"
|
|
132
|
+
Requires-Dist: uuid; extra == "all"
|
|
133
|
+
Requires-Dist: wcwidth==0.2.13; extra == "all"
|
|
134
|
+
Requires-Dist: wrapt==1.16.0; extra == "all"
|
|
135
|
+
Requires-Dist: plotly==5.15.0; extra == "all"
|
|
136
|
+
Requires-Dist: presidio-analyzer>=2.2.355; extra == "all"
|
|
137
|
+
Requires-Dist: presidio-anonymizer>=2.2.355; extra == "all"
|
|
138
|
+
Requires-Dist: flair>=0.14.0; extra == "all"
|
|
139
|
+
Requires-Dist: pandas; extra == "all"
|
|
140
|
+
Requires-Dist: mmh3==4.1.0; extra == "all"
|
|
141
|
+
Requires-Dist: xxhash==3.4.1; extra == "all"
|
|
142
|
+
Requires-Dist: fasttext>=0.9.2; platform_system != "Windows" and extra == "all"
|
|
143
|
+
Requires-Dist: langcodes>=3.3.0; extra == "all"
|
|
144
|
+
Requires-Dist: huggingface-hub<1.0.0,>=0.21.4; extra == "all"
|
|
145
|
+
Requires-Dist: numpy==1.26.4; extra == "all"
|
|
146
|
+
Requires-Dist: docling-core==2.18.0; extra == "all"
|
|
147
|
+
Requires-Dist: docling-ibm-models==3.3.1; extra == "all"
|
|
148
|
+
Requires-Dist: docling-parse==3.3.0; extra == "all"
|
|
149
|
+
Requires-Dist: deepsearch-glm==1.0.0; extra == "all"
|
|
150
|
+
Requires-Dist: docling==2.21.0; extra == "all"
|
|
151
|
+
Requires-Dist: filetype<2.0.0,>=1.2.0; extra == "all"
|
|
152
|
+
Requires-Dist: docling-core==2.18.0; extra == "all"
|
|
153
|
+
Requires-Dist: pydantic>=2.0.0; extra == "all"
|
|
154
|
+
Requires-Dist: llama-index-core<0.12.0,>=0.11.22; extra == "all"
|
|
155
|
+
Requires-Dist: sentence-transformers>=3.0.1; extra == "all"
|
|
156
|
+
Requires-Dist: nltk>=3.9.1; extra == "all"
|
|
157
|
+
Requires-Dist: transformers>=4.38.2; extra == "all"
|
|
158
|
+
Requires-Dist: pandas; extra == "all"
|
|
159
|
+
Requires-Dist: requests; extra == "all"
|
|
160
|
+
Requires-Dist: polars>=1.9.0; extra == "all"
|
|
161
|
+
Requires-Dist: textstat; extra == "all"
|
|
162
|
+
Requires-Dist: pandas; extra == "all"
|
|
163
|
+
Requires-Dist: fasttext>=0.9.3; platform_system != "Windows" and extra == "all"
|
|
164
|
+
Requires-Dist: langcodes>=3.5.0; extra == "all"
|
|
165
|
+
Requires-Dist: huggingface-hub<1.0.0,>=0.21.4; extra == "all"
|
|
166
|
+
Requires-Dist: numpy<1.29.0,>=1.26.4; extra == "all"
|
|
167
|
+
Requires-Dist: duckdb>=0.10.1; extra == "all"
|
|
168
|
+
Requires-Dist: mmh3>=4.1.0; extra == "all"
|
|
169
|
+
Requires-Dist: xxhash==3.4.1; extra == "all"
|
|
170
|
+
Requires-Dist: pyyaml>=6.0.2; extra == "all"
|
|
171
|
+
Requires-Dist: boto3>=1.34.69; extra == "all"
|
|
172
|
+
Requires-Dist: kubernetes>=30.1.0; extra == "all"
|
|
173
|
+
Requires-Dist: polars!=1.10.0,!=1.11.0,!=1.12.0,>=1.9.0; extra == "all"
|
|
174
|
+
Requires-Dist: disjoint-set>=0.8.0; extra == "all"
|
|
175
|
+
Requires-Dist: scipy<2.0.0,>=1.12.1; extra == "all"
|
|
176
|
+
Requires-Dist: numpy<1.29.0; extra == "all"
|
|
177
|
+
Requires-Dist: sentencepiece>=0.2.0; extra == "all"
|
|
178
|
+
Requires-Dist: mmh3>=4.1.0; extra == "all"
|
|
179
|
+
Requires-Dist: nltk==3.9.1; extra == "all"
|
|
180
|
+
Requires-Dist: transformers>=4.38.2; extra == "all"
|
|
181
|
+
Requires-Dist: torch<=2.5.1,>=2.2.2; extra == "all"
|
|
182
|
+
Requires-Dist: pandas; extra == "all"
|
|
183
|
+
Requires-Dist: transformers>=4.38.2; extra == "all"
|
|
184
|
+
Requires-Dist: data_prep_connector>=0.2.3; extra == "all"
|
|
185
|
+
Requires-Dist: nltk>=3.9.1; extra == "all"
|
|
186
|
+
Requires-Dist: requests; extra == "all"
|
|
187
|
+
Requires-Dist: transformers; extra == "all"
|
|
188
|
+
Requires-Dist: pandas; extra == "all"
|
|
189
|
+
Requires-Dist: psutil; extra == "all"
|
|
190
|
+
Requires-Dist: GPUtil; extra == "all"
|
|
191
|
+
Provides-Extra: language
|
|
192
|
+
Requires-Dist: presidio-analyzer>=2.2.355; extra == "language"
|
|
193
|
+
Requires-Dist: presidio-anonymizer>=2.2.355; extra == "language"
|
|
194
|
+
Requires-Dist: flair>=0.14.0; extra == "language"
|
|
195
|
+
Requires-Dist: pandas; extra == "language"
|
|
196
|
+
Requires-Dist: fasttext>=0.9.2; platform_system != "Windows" and extra == "language"
|
|
197
|
+
Requires-Dist: langcodes>=3.3.0; extra == "language"
|
|
198
|
+
Requires-Dist: huggingface-hub<1.0.0,>=0.21.4; extra == "language"
|
|
199
|
+
Requires-Dist: numpy==1.26.4; extra == "language"
|
|
200
|
+
Requires-Dist: docling-core==2.18.0; extra == "language"
|
|
201
|
+
Requires-Dist: docling-ibm-models==3.3.1; extra == "language"
|
|
202
|
+
Requires-Dist: docling-parse==3.3.0; extra == "language"
|
|
203
|
+
Requires-Dist: deepsearch-glm==1.0.0; extra == "language"
|
|
204
|
+
Requires-Dist: docling==2.21.0; extra == "language"
|
|
205
|
+
Requires-Dist: filetype<2.0.0,>=1.2.0; extra == "language"
|
|
206
|
+
Requires-Dist: docling-core==2.18.0; extra == "language"
|
|
207
|
+
Requires-Dist: pydantic>=2.0.0; extra == "language"
|
|
208
|
+
Requires-Dist: llama-index-core<0.12.0,>=0.11.22; extra == "language"
|
|
209
|
+
Requires-Dist: sentence-transformers>=3.0.1; extra == "language"
|
|
210
|
+
Requires-Dist: nltk>=3.9.1; extra == "language"
|
|
211
|
+
Requires-Dist: transformers>=4.38.2; extra == "language"
|
|
212
|
+
Requires-Dist: pandas; extra == "language"
|
|
213
|
+
Requires-Dist: requests; extra == "language"
|
|
214
|
+
Requires-Dist: polars>=1.9.0; extra == "language"
|
|
215
|
+
Requires-Dist: textstat; extra == "language"
|
|
216
|
+
Requires-Dist: pandas; extra == "language"
|
|
217
|
+
Requires-Dist: fasttext>=0.9.3; platform_system != "Windows" and extra == "language"
|
|
218
|
+
Requires-Dist: langcodes>=3.5.0; extra == "language"
|
|
219
|
+
Requires-Dist: huggingface-hub<1.0.0,>=0.21.4; extra == "language"
|
|
220
|
+
Requires-Dist: numpy<1.29.0,>=1.26.4; extra == "language"
|
|
221
|
+
Requires-Dist: duckdb>=0.10.1; extra == "language"
|
|
222
|
+
Requires-Dist: mmh3>=4.1.0; extra == "language"
|
|
223
|
+
Requires-Dist: xxhash==3.4.1; extra == "language"
|
|
224
|
+
Requires-Dist: pyyaml>=6.0.2; extra == "language"
|
|
225
|
+
Requires-Dist: boto3>=1.34.69; extra == "language"
|
|
226
|
+
Requires-Dist: kubernetes>=30.1.0; extra == "language"
|
|
227
|
+
Requires-Dist: polars!=1.10.0,!=1.11.0,!=1.12.0,>=1.9.0; extra == "language"
|
|
228
|
+
Requires-Dist: disjoint-set>=0.8.0; extra == "language"
|
|
229
|
+
Requires-Dist: scipy<2.0.0,>=1.12.1; extra == "language"
|
|
230
|
+
Requires-Dist: numpy<1.29.0; extra == "language"
|
|
231
|
+
Requires-Dist: sentencepiece>=0.2.0; extra == "language"
|
|
232
|
+
Requires-Dist: mmh3>=4.1.0; extra == "language"
|
|
233
|
+
Requires-Dist: nltk==3.9.1; extra == "language"
|
|
234
|
+
Requires-Dist: transformers>=4.38.2; extra == "language"
|
|
235
|
+
Requires-Dist: torch<=2.5.1,>=2.2.2; extra == "language"
|
|
236
|
+
Requires-Dist: pandas; extra == "language"
|
|
237
|
+
Requires-Dist: transformers>=4.38.2; extra == "language"
|
|
238
|
+
Requires-Dist: data_prep_connector>=0.2.3; extra == "language"
|
|
239
|
+
Requires-Dist: mmh3==4.1.0; extra == "language"
|
|
240
|
+
Requires-Dist: xxhash==3.4.1; extra == "language"
|
|
241
|
+
Requires-Dist: nltk>=3.9.1; extra == "language"
|
|
242
|
+
Requires-Dist: requests; extra == "language"
|
|
243
|
+
Requires-Dist: transformers; extra == "language"
|
|
244
|
+
Requires-Dist: pandas; extra == "language"
|
|
245
|
+
Requires-Dist: psutil; extra == "language"
|
|
246
|
+
Requires-Dist: GPUtil; extra == "language"
|
|
247
|
+
Provides-Extra: proglang-select
|
|
248
|
+
Requires-Dist: data-prep-toolkit>=0.2.3; extra == "proglang-select"
|
|
249
|
+
Provides-Extra: header-cleanser
|
|
250
|
+
Requires-Dist: data-prep-toolkit>=0.2.3; extra == "header-cleanser"
|
|
251
|
+
Requires-Dist: scancode-toolkit==32.1.0; platform_system != "Darwin" and extra == "header-cleanser"
|
|
252
|
+
Requires-Dist: timeout-timer==0.2.0; extra == "header-cleanser"
|
|
253
|
+
Provides-Extra: license-select
|
|
254
|
+
Requires-Dist: data-prep-toolkit>=0.2.3; extra == "license-select"
|
|
255
|
+
Provides-Extra: code-quality
|
|
256
|
+
Requires-Dist: data-prep-toolkit>=0.2.3; extra == "code-quality"
|
|
257
|
+
Requires-Dist: bs4==0.0.2; extra == "code-quality"
|
|
258
|
+
Requires-Dist: transformers>=4.38.2; extra == "code-quality"
|
|
259
|
+
Provides-Extra: code2parquet
|
|
260
|
+
Requires-Dist: data-prep-toolkit>=0.2.3; extra == "code2parquet"
|
|
261
|
+
Requires-Dist: parameterized; extra == "code2parquet"
|
|
262
|
+
Requires-Dist: pandas; extra == "code2parquet"
|
|
263
|
+
Provides-Extra: profiler
|
|
264
|
+
Requires-Dist: mmh3==4.1.0; extra == "profiler"
|
|
265
|
+
Requires-Dist: xxhash==3.4.1; extra == "profiler"
|
|
266
|
+
Provides-Extra: resize
|
|
267
|
+
Provides-Extra: doc-chunk
|
|
268
|
+
Requires-Dist: docling-core==2.18.0; extra == "doc-chunk"
|
|
269
|
+
Requires-Dist: pydantic>=2.0.0; extra == "doc-chunk"
|
|
270
|
+
Requires-Dist: llama-index-core<0.12.0,>=0.11.22; extra == "doc-chunk"
|
|
271
|
+
Provides-Extra: doc-quality
|
|
272
|
+
Provides-Extra: html2parquet
|
|
273
|
+
Requires-Dist: trafilatura==1.12.0; extra == "html2parquet"
|
|
274
|
+
Provides-Extra: lang-id
|
|
275
|
+
Requires-Dist: fasttext>=0.9.2; platform_system != "Windows" and extra == "lang-id"
|
|
276
|
+
Requires-Dist: langcodes>=3.3.0; extra == "lang-id"
|
|
277
|
+
Requires-Dist: huggingface-hub<1.0.0,>=0.21.4; extra == "lang-id"
|
|
278
|
+
Requires-Dist: numpy==1.26.4; extra == "lang-id"
|
|
279
|
+
Provides-Extra: pdf2parquet
|
|
280
|
+
Requires-Dist: docling-core==2.18.0; extra == "pdf2parquet"
|
|
281
|
+
Requires-Dist: docling-ibm-models==3.3.1; extra == "pdf2parquet"
|
|
282
|
+
Requires-Dist: docling-parse==3.3.0; extra == "pdf2parquet"
|
|
283
|
+
Requires-Dist: deepsearch-glm==1.0.0; extra == "pdf2parquet"
|
|
284
|
+
Requires-Dist: docling==2.21.0; extra == "pdf2parquet"
|
|
285
|
+
Requires-Dist: filetype<2.0.0,>=1.2.0; extra == "pdf2parquet"
|
|
286
|
+
Provides-Extra: text-encoder
|
|
287
|
+
Requires-Dist: sentence-transformers>=3.0.1; extra == "text-encoder"
|
|
288
|
+
Provides-Extra: pii-redactor
|
|
289
|
+
Requires-Dist: presidio-analyzer>=2.2.355; extra == "pii-redactor"
|
|
290
|
+
Requires-Dist: presidio-anonymizer>=2.2.355; extra == "pii-redactor"
|
|
291
|
+
Requires-Dist: flair>=0.14.0; extra == "pii-redactor"
|
|
292
|
+
Requires-Dist: pandas; extra == "pii-redactor"
|
|
293
|
+
Provides-Extra: filter
|
|
294
|
+
Requires-Dist: duckdb>=0.10.1; extra == "filter"
|
|
295
|
+
Provides-Extra: doc-id
|
|
296
|
+
Provides-Extra: hap
|
|
297
|
+
Requires-Dist: nltk==3.9.1; extra == "hap"
|
|
298
|
+
Requires-Dist: transformers>=4.38.2; extra == "hap"
|
|
299
|
+
Requires-Dist: torch<=2.5.1,>=2.2.2; extra == "hap"
|
|
300
|
+
Requires-Dist: pandas; extra == "hap"
|
|
301
|
+
Provides-Extra: ededup
|
|
302
|
+
Requires-Dist: mmh3>=4.1.0; extra == "ededup"
|
|
303
|
+
Requires-Dist: xxhash==3.4.1; extra == "ededup"
|
|
304
|
+
Provides-Extra: fdedup
|
|
305
|
+
Requires-Dist: pyyaml>=6.0.2; extra == "fdedup"
|
|
306
|
+
Requires-Dist: boto3>=1.34.69; extra == "fdedup"
|
|
307
|
+
Requires-Dist: kubernetes>=30.1.0; extra == "fdedup"
|
|
308
|
+
Requires-Dist: polars!=1.10.0,!=1.11.0,!=1.12.0,>=1.9.0; extra == "fdedup"
|
|
309
|
+
Requires-Dist: disjoint-set>=0.8.0; extra == "fdedup"
|
|
310
|
+
Requires-Dist: scipy<2.0.0,>=1.12.1; extra == "fdedup"
|
|
311
|
+
Requires-Dist: numpy<1.29.0; extra == "fdedup"
|
|
312
|
+
Requires-Dist: sentencepiece>=0.2.0; extra == "fdedup"
|
|
313
|
+
Requires-Dist: mmh3>=4.1.0; extra == "fdedup"
|
|
314
|
+
Provides-Extra: tokenization
|
|
315
|
+
Requires-Dist: transformers>=4.38.2; extra == "tokenization"
|
|
316
|
+
Provides-Extra: web2parquet
|
|
317
|
+
Requires-Dist: data_prep_connector>=0.2.3; extra == "web2parquet"
|
|
318
|
+
Provides-Extra: similarity
|
|
319
|
+
Requires-Dist: nltk>=3.9.1; extra == "similarity"
|
|
320
|
+
Requires-Dist: transformers>=4.38.2; extra == "similarity"
|
|
321
|
+
Requires-Dist: pandas; extra == "similarity"
|
|
322
|
+
Requires-Dist: requests; extra == "similarity"
|
|
323
|
+
Provides-Extra: extreme-tokenized
|
|
324
|
+
Requires-Dist: polars>=1.9.0; extra == "extreme-tokenized"
|
|
325
|
+
Provides-Extra: readability
|
|
326
|
+
Requires-Dist: textstat; extra == "readability"
|
|
327
|
+
Requires-Dist: pandas; extra == "readability"
|
|
328
|
+
Provides-Extra: code-profiler
|
|
329
|
+
Requires-Dist: data-prep-toolkit>=0.2.3; extra == "code-profiler"
|
|
330
|
+
Requires-Dist: parameterized>=0.9.0; extra == "code-profiler"
|
|
331
|
+
Requires-Dist: pandas>=2.2.2; extra == "code-profiler"
|
|
332
|
+
Requires-Dist: aiolimiter==1.1.0; extra == "code-profiler"
|
|
333
|
+
Requires-Dist: altair==5.3.0; extra == "code-profiler"
|
|
334
|
+
Requires-Dist: annotated-types==0.7.0; extra == "code-profiler"
|
|
335
|
+
Requires-Dist: anyio==4.4.0; extra == "code-profiler"
|
|
336
|
+
Requires-Dist: appnope==0.1.4; extra == "code-profiler"
|
|
337
|
+
Requires-Dist: asttokens==2.4.1; extra == "code-profiler"
|
|
338
|
+
Requires-Dist: attrs==23.2.0; extra == "code-profiler"
|
|
339
|
+
Requires-Dist: blinker==1.8.2; extra == "code-profiler"
|
|
340
|
+
Requires-Dist: cachetools==5.3.3; extra == "code-profiler"
|
|
341
|
+
Requires-Dist: certifi==2024.7.4; extra == "code-profiler"
|
|
342
|
+
Requires-Dist: charset-normalizer==3.3.2; extra == "code-profiler"
|
|
343
|
+
Requires-Dist: click==8.1.7; extra == "code-profiler"
|
|
344
|
+
Requires-Dist: comm==0.2.2; extra == "code-profiler"
|
|
345
|
+
Requires-Dist: contourpy==1.2.1; extra == "code-profiler"
|
|
346
|
+
Requires-Dist: cycler==0.12.1; extra == "code-profiler"
|
|
347
|
+
Requires-Dist: debugpy==1.8.1; extra == "code-profiler"
|
|
348
|
+
Requires-Dist: decorator==5.1.1; extra == "code-profiler"
|
|
349
|
+
Requires-Dist: Deprecated==1.2.14; extra == "code-profiler"
|
|
350
|
+
Requires-Dist: executing==2.0.1; extra == "code-profiler"
|
|
351
|
+
Requires-Dist: fonttools==4.53.0; extra == "code-profiler"
|
|
352
|
+
Requires-Dist: gitdb==4.0.11; extra == "code-profiler"
|
|
353
|
+
Requires-Dist: GitPython==3.1.43; extra == "code-profiler"
|
|
354
|
+
Requires-Dist: h11==0.14.0; extra == "code-profiler"
|
|
355
|
+
Requires-Dist: htbuilder==0.6.2; extra == "code-profiler"
|
|
356
|
+
Requires-Dist: httpcore==1.0.5; extra == "code-profiler"
|
|
357
|
+
Requires-Dist: httpx==0.27.0; extra == "code-profiler"
|
|
358
|
+
Requires-Dist: httpx-sse==0.4.0; extra == "code-profiler"
|
|
359
|
+
Requires-Dist: ibm-generative-ai==3.0.0; extra == "code-profiler"
|
|
360
|
+
Requires-Dist: idna==3.7; extra == "code-profiler"
|
|
361
|
+
Requires-Dist: ipykernel==6.29.4; extra == "code-profiler"
|
|
362
|
+
Requires-Dist: ipython==8.25.0; extra == "code-profiler"
|
|
363
|
+
Requires-Dist: jedi==0.19.1; extra == "code-profiler"
|
|
364
|
+
Requires-Dist: Jinja2==3.1.4; extra == "code-profiler"
|
|
365
|
+
Requires-Dist: jsonschema==4.22.0; extra == "code-profiler"
|
|
366
|
+
Requires-Dist: jsonschema-specifications==2023.12.1; extra == "code-profiler"
|
|
367
|
+
Requires-Dist: jupyter_client==8.6.2; extra == "code-profiler"
|
|
368
|
+
Requires-Dist: jupyter_core==5.7.2; extra == "code-profiler"
|
|
369
|
+
Requires-Dist: kiwisolver==1.4.5; extra == "code-profiler"
|
|
370
|
+
Requires-Dist: markdown-it-py==3.0.0; extra == "code-profiler"
|
|
371
|
+
Requires-Dist: MarkupSafe==2.1.5; extra == "code-profiler"
|
|
372
|
+
Requires-Dist: matplotlib==3.9.0; extra == "code-profiler"
|
|
373
|
+
Requires-Dist: matplotlib-inline==0.1.7; extra == "code-profiler"
|
|
374
|
+
Requires-Dist: mdurl==0.1.2; extra == "code-profiler"
|
|
375
|
+
Requires-Dist: more-itertools==10.3.0; extra == "code-profiler"
|
|
376
|
+
Requires-Dist: nest-asyncio==1.6.0; extra == "code-profiler"
|
|
377
|
+
Requires-Dist: networkx==3.3; extra == "code-profiler"
|
|
378
|
+
Requires-Dist: numpy==1.26.4; extra == "code-profiler"
|
|
379
|
+
Requires-Dist: packaging==24.0; extra == "code-profiler"
|
|
380
|
+
Requires-Dist: parso==0.8.4; extra == "code-profiler"
|
|
381
|
+
Requires-Dist: pexpect==4.9.0; extra == "code-profiler"
|
|
382
|
+
Requires-Dist: pillow>=10.3.0; extra == "code-profiler"
|
|
383
|
+
Requires-Dist: platformdirs==4.2.2; extra == "code-profiler"
|
|
384
|
+
Requires-Dist: prompt_toolkit==3.0.45; extra == "code-profiler"
|
|
385
|
+
Requires-Dist: protobuf==5.27.2; extra == "code-profiler"
|
|
386
|
+
Requires-Dist: psutil==5.9.8; extra == "code-profiler"
|
|
387
|
+
Requires-Dist: ptyprocess==0.7.0; extra == "code-profiler"
|
|
388
|
+
Requires-Dist: pure-eval==0.2.2; extra == "code-profiler"
|
|
389
|
+
Requires-Dist: pyarrow==16.1.0; extra == "code-profiler"
|
|
390
|
+
Requires-Dist: pydantic>=2.7.4; extra == "code-profiler"
|
|
391
|
+
Requires-Dist: pydantic_core>=2.18.4; extra == "code-profiler"
|
|
392
|
+
Requires-Dist: pydeck==0.9.1; extra == "code-profiler"
|
|
393
|
+
Requires-Dist: Pygments==2.18.0; extra == "code-profiler"
|
|
394
|
+
Requires-Dist: pyparsing==3.1.2; extra == "code-profiler"
|
|
395
|
+
Requires-Dist: python-dateutil==2.9.0.post0; extra == "code-profiler"
|
|
396
|
+
Requires-Dist: pytz==2024.1; extra == "code-profiler"
|
|
397
|
+
Requires-Dist: pyzmq==26.0.3; extra == "code-profiler"
|
|
398
|
+
Requires-Dist: referencing==0.35.1; extra == "code-profiler"
|
|
399
|
+
Requires-Dist: regex==2024.5.15; extra == "code-profiler"
|
|
400
|
+
Requires-Dist: requests==2.32.3; extra == "code-profiler"
|
|
401
|
+
Requires-Dist: rich==13.7.1; extra == "code-profiler"
|
|
402
|
+
Requires-Dist: rpds-py==0.18.1; extra == "code-profiler"
|
|
403
|
+
Requires-Dist: seaborn==0.13.2; extra == "code-profiler"
|
|
404
|
+
Requires-Dist: six==1.16.0; extra == "code-profiler"
|
|
405
|
+
Requires-Dist: smmap==5.0.1; extra == "code-profiler"
|
|
406
|
+
Requires-Dist: sniffio==1.3.1; extra == "code-profiler"
|
|
407
|
+
Requires-Dist: st-annotated-text==4.0.1; extra == "code-profiler"
|
|
408
|
+
Requires-Dist: stack-data==0.6.3; extra == "code-profiler"
|
|
409
|
+
Requires-Dist: streamlit==1.37.0; extra == "code-profiler"
|
|
410
|
+
Requires-Dist: tenacity==8.4.2; extra == "code-profiler"
|
|
411
|
+
Requires-Dist: toml==0.10.2; extra == "code-profiler"
|
|
412
|
+
Requires-Dist: toolz==0.12.1; extra == "code-profiler"
|
|
413
|
+
Requires-Dist: tornado==6.4.1; extra == "code-profiler"
|
|
414
|
+
Requires-Dist: traitlets==5.14.3; extra == "code-profiler"
|
|
415
|
+
Requires-Dist: tree-sitter==0.21.3; extra == "code-profiler"
|
|
416
|
+
Requires-Dist: tree-sitter-cpp==0.22.1; extra == "code-profiler"
|
|
417
|
+
Requires-Dist: tree-sitter-java==0.21.0; extra == "code-profiler"
|
|
418
|
+
Requires-Dist: tree-sitter-languages==1.10.2; extra == "code-profiler"
|
|
419
|
+
Requires-Dist: tree-sitter-php==0.22.5; extra == "code-profiler"
|
|
420
|
+
Requires-Dist: typing_extensions==4.12.2; extra == "code-profiler"
|
|
421
|
+
Requires-Dist: tzdata==2024.1; extra == "code-profiler"
|
|
422
|
+
Requires-Dist: uuid; extra == "code-profiler"
|
|
423
|
+
Requires-Dist: wcwidth==0.2.13; extra == "code-profiler"
|
|
424
|
+
Requires-Dist: wrapt==1.16.0; extra == "code-profiler"
|
|
425
|
+
Requires-Dist: plotly==5.15.0; extra == "code-profiler"
|
|
426
|
+
Provides-Extra: gneissweb-classification
|
|
427
|
+
Requires-Dist: fasttext>=0.9.3; platform_system != "Windows" and extra == "gneissweb-classification"
|
|
428
|
+
Requires-Dist: langcodes>=3.5.0; extra == "gneissweb-classification"
|
|
429
|
+
Requires-Dist: huggingface-hub<1.0.0,>=0.21.4; extra == "gneissweb-classification"
|
|
430
|
+
Requires-Dist: numpy<1.29.0,>=1.26.4; extra == "gneissweb-classification"
|
|
431
|
+
Provides-Extra: rep-removal
|
|
432
|
+
Requires-Dist: nltk>=3.9.1; extra == "rep-removal"
|
|
433
|
+
Requires-Dist: requests; extra == "rep-removal"
|
|
434
|
+
Requires-Dist: transformers; extra == "rep-removal"
|
|
435
|
+
Requires-Dist: pandas; extra == "rep-removal"
|
|
436
|
+
Requires-Dist: psutil; extra == "rep-removal"
|
|
437
|
+
Requires-Dist: GPUtil; extra == "rep-removal"
|
|
438
|
+
|
|
439
|
+
# DPK Python Transforms
|
|
440
|
+
|
|
441
|
+
## installation
|
|
442
|
+
|
|
443
|
+
The [transforms](https://github.com/IBM/data-prep-kit/blob/dev/transforms/README.md) are delivered as a standard pyton library available on pypi and can be installed using pip install:
|
|
444
|
+
|
|
445
|
+
`python -m pip install data-prep-toolkit-transforms[all]`
|
|
446
|
+
or
|
|
447
|
+
`python -m pip install data-prep-toolkit-transforms[ray, all]`
|
|
448
|
+
or
|
|
449
|
+
`python -m pip install data-prep-toolkit-transforms[language]`
|
|
450
|
+
|
|
451
|
+
|
|
452
|
+
installing the python transforms will also install `data-prep-toolkit`
|
|
453
|
+
|
|
454
|
+
installing the ray transforms will also install `data-prep-toolkit[ray]`
|
|
455
|
+
|
|
456
|
+
## List of Transforms in current package
|
|
457
|
+
|
|
458
|
+
Note: This list includes the transforms that were part of the release starting with data-prep-toolkit-transforms:0.2.1. This list may not always reflect up to date information. Users are encourage to raise an issue in git when they discover missing components or packages that are listed below but not in the current release they get from pypi.
|
|
459
|
+
|
|
460
|
+
* code
|
|
461
|
+
* [code2parquet](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/code2parquet/python/README.md)
|
|
462
|
+
* [header_cleanser (Not available on MacOS)](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/header_cleanser/python/README.md)
|
|
463
|
+
* [code_quality](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/code_quality/python/README.md)
|
|
464
|
+
* [proglang_select](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/proglang_select/python/README.md)
|
|
465
|
+
* [code_profiler](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/code_profiler/README.md)
|
|
466
|
+
* language
|
|
467
|
+
* [doc_chunk](https://github.com/IBM/data-prep-kit/blob/dev/transforms/language/doc_chunk/README.md)
|
|
468
|
+
* [doc_quality](https://github.com/IBM/data-prep-kit/blob/dev/transforms/language/doc_quality/README.md)
|
|
469
|
+
* [lang_id](https://github.com/IBM/data-prep-kit/blob/dev/transforms/language/lang_id/README.md)
|
|
470
|
+
* [pdf2parquet](https://github.com/IBM/data-prep-kit/blob/dev/transforms/language/pdf2parquet/README.md)
|
|
471
|
+
* [text_encoder](https://github.com/IBM/data-prep-kit/blob/dev/transforms/language/text_encoder/README.md)
|
|
472
|
+
* [pii_redactor](https://github.com/IBM/data-prep-kit/blob/dev/transforms/language/pii_redactor/python/README.md)
|
|
473
|
+
* universal
|
|
474
|
+
* [ededup](https://github.com/IBM/data-prep-kit/blob/dev/transforms/universal/ededup/README.md)
|
|
475
|
+
* [fdedup](https://github.com/IBM/data-prep-kit/blob/dev/transforms/universal/fdedup/README.md)
|
|
476
|
+
* [filter](https://github.com/IBM/data-prep-kit/blob/dev/transforms/universal/filter/python/README.md)
|
|
477
|
+
* [resize](https://github.com/IBM/data-prep-kit/blob/dev/transforms/universal/resize/python/README.md)
|
|
478
|
+
* [tokenization](https://github.com/IBM/data-prep-kit/blob/dev/transforms/universal/tokenization/README.md)
|
|
479
|
+
* [doc_id](https://github.com/IBM/data-prep-kit/blob/dev/transforms/universal/doc_id/README.md)
|
|
480
|
+
* [web2parquet](https://github.com/IBM/data-prep-kit/blob/dev/transforms/universal/web2parquet/README.md)
|
|
481
|
+
|
|
482
|
+
## Release notes:
|
|
483
|
+
|
|
484
|
+
### 1.0.1.dev1
|
|
485
|
+
Added Gneissweb transforms
|
|
486
|
+
fdedup fix for windows
|
|
487
|
+
### 1.0.1.dev0
|
|
488
|
+
PR #979 (code_profiler)
|
|
489
|
+
### 1.0.0.a6
|
|
490
|
+
Added Profiler
|
|
491
|
+
Added Resize
|
|
492
|
+
### 1.0.0.a5
|
|
493
|
+
Added Pii Redactor
|
|
494
|
+
Relax fasttext requirement >= 0.9.2
|
|
495
|
+
### 1.0.0.a4
|
|
496
|
+
Added missing ray implementation for lang_id, doc_quality, tokenization and filter
|
|
497
|
+
Added ray notebooks for lang id, Doc Quality, tokenization, and Filter
|
|
498
|
+
### 1.0.0.a3
|
|
499
|
+
Added code_profiler
|
|
500
|
+
### 1.0.0.a2
|
|
501
|
+
Relax dependencies on pandas (use latest or whatever is installed by application)
|
|
502
|
+
Relax dependencies on requests (use latest or whatever is installed by application)
|
|
503
|
+
|
|
504
|
+
|
|
505
|
+
|
|
506
|
+
|