PyPI - mindstudio-probe - Versions diffs - 1.0.1__py3-none-any.whl → 1.0.4__py3-none-any.whl - Mend

mindstudio-probe 1.0.1py3-none-any.whl → 1.0.4py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (323) hide show

{mindstudio_probe-1.0.1.dist-info → mindstudio_probe-1.0.4.dist-info}/LICENSE +201 -201
{mindstudio_probe-1.0.1.dist-info → mindstudio_probe-1.0.4.dist-info}/METADATA +36 -30
mindstudio_probe-1.0.4.dist-info/RECORD +276 -0
{mindstudio_probe-1.0.1.dist-info → mindstudio_probe-1.0.4.dist-info}/WHEEL +1 -1
{mindstudio_probe-1.0.1.dist-info → mindstudio_probe-1.0.4.dist-info}/entry_points.txt +1 -0
msprobe/README.md +101 -182
msprobe/__init__.py +1 -0
msprobe/{config/config.json → config.json} +49 -27
msprobe/core/__init__.py +0 -0
msprobe/{pytorch → core}/advisor/advisor.py +124 -124
msprobe/{pytorch → core}/advisor/advisor_const.py +59 -59
msprobe/{pytorch → core}/advisor/advisor_result.py +58 -58
msprobe/core/common/const.py +341 -241
msprobe/core/common/exceptions.py +100 -88
msprobe/core/common/{file_check.py → file_utils.py} +478 -265
msprobe/core/common/log.py +76 -55
msprobe/core/common/utils.py +385 -516
msprobe/core/common_config.py +85 -58
msprobe/core/compare/acc_compare.py +300 -0
msprobe/core/compare/check.py +95 -0
msprobe/core/compare/compare_cli.py +49 -0
msprobe/core/compare/highlight.py +223 -0
msprobe/core/compare/multiprocessing_compute.py +149 -0
msprobe/{pytorch → core}/compare/npy_compare.py +295 -244
msprobe/core/compare/utils.py +430 -0
msprobe/core/data_dump/data_collector.py +154 -140
msprobe/core/data_dump/data_processor/base.py +314 -245
msprobe/core/data_dump/data_processor/factory.py +59 -61
msprobe/core/data_dump/data_processor/mindspore_processor.py +186 -0
msprobe/core/data_dump/data_processor/pytorch_processor.py +366 -346
msprobe/core/data_dump/json_writer.py +96 -116
msprobe/core/data_dump/scope.py +178 -178
msprobe/core/grad_probe/__init__.py +0 -0
msprobe/core/grad_probe/constant.py +71 -0
msprobe/core/grad_probe/grad_compare.py +171 -0
msprobe/core/grad_probe/utils.py +64 -0
msprobe/docs/01.installation.md +89 -0
msprobe/docs/02.config_introduction.md +165 -0
msprobe/docs/03.config_examples.md +247 -0
msprobe/docs/04.acl_config_examples.md +76 -0
msprobe/docs/05.data_dump_PyTorch.md +198 -0
msprobe/docs/06.data_dump_MindSpore.md +243 -0
msprobe/docs/07.accuracy_checker_PyTorch.md +274 -0
msprobe/docs/08.accuracy_checker_online_PyTorch.md +198 -0
msprobe/docs/09.accuracy_checker_MindSpore.md +68 -0
msprobe/docs/10.accuracy_compare_PyTorch.md +245 -0
msprobe/docs/11.accuracy_compare_MindSpore.md +202 -0
msprobe/docs/12.overflow_check_PyTorch.md +79 -0
msprobe/docs/13.overflow_check_MindSpore.md +31 -0
msprobe/{pytorch/doc/parse_tool.md → docs/14.data_parse_PyTorch.md} +283 -286
msprobe/docs/15.free_benchmarking_PyTorch.md +164 -0
msprobe/docs/17.grad_probe.md +207 -0
msprobe/docs/FAQ_PyTorch.md +177 -0
msprobe/docs/S02.report_free_benchmarking_validation_performance_baseline.md +146 -0
msprobe/docs/img/free_benchmark_framework.png +0 -0
msprobe/docs/img/grad_probe_image-1.png +0 -0
msprobe/docs/img/grad_probe_image-2.png +0 -0
msprobe/docs/img/grad_probe_image-3.png +0 -0
msprobe/docs/img/grad_probe_image-4.png +0 -0
msprobe/docs/img/grad_probe_image.png +0 -0
msprobe/mindspore/__init__.py +1 -1
msprobe/mindspore/api_accuracy_checker/__init__.py +0 -0
msprobe/mindspore/api_accuracy_checker/api_accuracy_checker.py +255 -0
msprobe/mindspore/api_accuracy_checker/api_info.py +69 -0
msprobe/mindspore/api_accuracy_checker/api_runner.py +156 -0
msprobe/mindspore/api_accuracy_checker/base_compare_algorithm.py +197 -0
msprobe/mindspore/api_accuracy_checker/cmd_parser.py +6 -0
msprobe/mindspore/api_accuracy_checker/compute_element.py +239 -0
msprobe/mindspore/api_accuracy_checker/main.py +9 -0
msprobe/mindspore/api_accuracy_checker/type_mapping.py +114 -0
msprobe/mindspore/api_accuracy_checker/utils.py +80 -0
msprobe/mindspore/cell_processor.py +34 -0
msprobe/mindspore/common/const.py +106 -0
msprobe/mindspore/common/log.py +38 -0
msprobe/mindspore/common/utils.py +81 -0
msprobe/mindspore/compare/distributed_compare.py +75 -0
msprobe/mindspore/compare/ms_compare.py +219 -0
msprobe/mindspore/compare/ms_graph_compare.py +348 -0
msprobe/mindspore/compare/ms_to_pt_api.yaml +399 -0
msprobe/mindspore/debugger/debugger_config.py +66 -51
msprobe/mindspore/debugger/precision_debugger.py +126 -32
msprobe/mindspore/dump/dump_tool_factory.py +35 -38
msprobe/mindspore/dump/hook_cell/api_registry.py +118 -0
msprobe/mindspore/dump/hook_cell/hook_cell.py +55 -0
msprobe/mindspore/dump/hook_cell/support_wrap_ops.yaml +922 -0
msprobe/mindspore/dump/hook_cell/wrap_api.py +113 -0
msprobe/mindspore/dump/jit_dump.py +72 -0
msprobe/mindspore/dump/kernel_graph_dump.py +59 -60
msprobe/mindspore/dump/kernel_kbyk_dump.py +64 -0
msprobe/mindspore/free_benchmark/__init__.py +0 -0
msprobe/mindspore/free_benchmark/api_pynative_self_check.py +116 -0
msprobe/mindspore/free_benchmark/common/__init__.py +0 -0
msprobe/mindspore/free_benchmark/common/config.py +12 -0
msprobe/mindspore/free_benchmark/common/handler_params.py +17 -0
msprobe/mindspore/free_benchmark/common/utils.py +71 -0
msprobe/mindspore/free_benchmark/data/support_wrap_ops.yaml +842 -0
msprobe/mindspore/free_benchmark/decorator/__init__.py +0 -0
msprobe/mindspore/free_benchmark/decorator/dec_forward.py +43 -0
msprobe/mindspore/free_benchmark/decorator/decorator_factory.py +107 -0
msprobe/mindspore/free_benchmark/handler/__init__.py +0 -0
msprobe/mindspore/free_benchmark/handler/base_handler.py +90 -0
msprobe/mindspore/free_benchmark/handler/check_handler.py +41 -0
msprobe/mindspore/free_benchmark/handler/fix_handler.py +36 -0
msprobe/mindspore/free_benchmark/handler/handler_factory.py +21 -0
msprobe/mindspore/free_benchmark/perturbation/add_noise.py +67 -0
msprobe/mindspore/free_benchmark/perturbation/base_perturbation.py +21 -0
msprobe/mindspore/free_benchmark/perturbation/bit_noise.py +63 -0
msprobe/mindspore/free_benchmark/perturbation/exchange_value.py +51 -0
msprobe/mindspore/free_benchmark/perturbation/improve_precision.py +35 -0
msprobe/mindspore/free_benchmark/perturbation/no_change.py +12 -0
msprobe/mindspore/free_benchmark/perturbation/perturbation_factory.py +29 -0
msprobe/mindspore/free_benchmark/self_check_tool_factory.py +33 -0
msprobe/mindspore/grad_probe/__init__.py +0 -0
msprobe/mindspore/grad_probe/global_context.py +90 -0
msprobe/mindspore/grad_probe/grad_analyzer.py +231 -0
msprobe/mindspore/grad_probe/grad_monitor.py +27 -0
msprobe/mindspore/grad_probe/grad_stat_csv.py +132 -0
msprobe/mindspore/grad_probe/hook.py +94 -0
msprobe/mindspore/grad_probe/utils.py +30 -0
msprobe/mindspore/ms_config.py +128 -78
msprobe/mindspore/overflow_check/kernel_graph_overflow_check.py +44 -45
msprobe/mindspore/overflow_check/overflow_check_tool_factory.py +34 -32
msprobe/mindspore/runtime.py +4 -0
msprobe/mindspore/service.py +378 -0
msprobe/mindspore/task_handler_factory.py +24 -21
msprobe/msprobe.py +105 -67
msprobe/pytorch/__init__.py +4 -4
msprobe/pytorch/api_accuracy_checker/common/config.py +53 -50
msprobe/pytorch/api_accuracy_checker/common/utils.py +214 -224
msprobe/pytorch/api_accuracy_checker/compare/algorithm.py +213 -216
msprobe/pytorch/api_accuracy_checker/compare/api_precision_compare.py +606 -545
msprobe/pytorch/api_accuracy_checker/compare/api_precision_standard.yaml +132 -132
msprobe/pytorch/api_accuracy_checker/compare/api_precision_threshold.yaml +390 -390
msprobe/pytorch/api_accuracy_checker/compare/compare.py +386 -345
msprobe/pytorch/api_accuracy_checker/compare/compare_column.py +73 -73
msprobe/pytorch/api_accuracy_checker/compare/compare_utils.py +245 -248
msprobe/pytorch/api_accuracy_checker/config.yaml +10 -4
msprobe/pytorch/api_accuracy_checker/run_ut/data_generate.py +335 -328
msprobe/pytorch/api_accuracy_checker/run_ut/multi_run_ut.py +200 -203
msprobe/pytorch/api_accuracy_checker/run_ut/run_overflow_check.py +133 -127
msprobe/pytorch/api_accuracy_checker/run_ut/run_ut.py +592 -493
msprobe/pytorch/api_accuracy_checker/run_ut/run_ut_utils.py +70 -7
msprobe/pytorch/api_accuracy_checker/run_ut/torch_ut_setting.json +7 -4
msprobe/pytorch/api_accuracy_checker/tensor_transport_layer/__init__.py +0 -0
msprobe/pytorch/api_accuracy_checker/tensor_transport_layer/attl.py +197 -0
msprobe/pytorch/api_accuracy_checker/tensor_transport_layer/client.py +325 -0
msprobe/pytorch/api_accuracy_checker/tensor_transport_layer/device_dispatch.py +204 -0
msprobe/pytorch/api_accuracy_checker/tensor_transport_layer/server.py +219 -0
msprobe/pytorch/api_accuracy_checker/tensor_transport_layer/ssl_config.py +10 -0
msprobe/pytorch/bench_functions/__init__.py +15 -0
msprobe/pytorch/bench_functions/apply_adam_w.py +28 -0
msprobe/pytorch/bench_functions/confusion_transpose.py +19 -0
msprobe/pytorch/bench_functions/fast_gelu.py +55 -0
msprobe/pytorch/bench_functions/layer_norm_eval.py +6 -0
msprobe/pytorch/bench_functions/linear.py +12 -0
msprobe/pytorch/bench_functions/matmul_backward.py +48 -0
msprobe/pytorch/bench_functions/npu_fusion_attention.py +509 -0
msprobe/pytorch/bench_functions/rms_norm.py +15 -0
msprobe/pytorch/bench_functions/rotary_mul.py +52 -0
msprobe/pytorch/bench_functions/scaled_mask_softmax.py +26 -0
msprobe/pytorch/bench_functions/swiglu.py +55 -0
msprobe/pytorch/common/__init__.py +2 -2
msprobe/pytorch/common/compare_script.template +14 -14
msprobe/pytorch/common/log.py +20 -31
msprobe/pytorch/common/parse_json.py +39 -37
msprobe/pytorch/common/utils.py +305 -224
msprobe/pytorch/compare/distributed_compare.py +66 -111
msprobe/pytorch/compare/mapping.yaml +607 -607
msprobe/pytorch/compare/match.py +34 -36
msprobe/pytorch/compare/pt_compare.py +50 -0
msprobe/pytorch/debugger/debugger_config.py +95 -86
msprobe/pytorch/debugger/precision_debugger.py +125 -95
msprobe/pytorch/free_benchmark/__init__.py +8 -8
msprobe/pytorch/free_benchmark/common/constant.py +70 -67
msprobe/pytorch/free_benchmark/common/counter.py +71 -71
msprobe/pytorch/free_benchmark/common/enums.py +37 -37
msprobe/pytorch/free_benchmark/common/params.py +129 -129
msprobe/pytorch/free_benchmark/common/utils.py +102 -98
msprobe/pytorch/free_benchmark/compare/grad_saver.py +179 -183
msprobe/pytorch/free_benchmark/compare/single_benchmark.py +104 -104
msprobe/pytorch/free_benchmark/main.py +105 -102
msprobe/pytorch/free_benchmark/perturbed_layers/base_layer.py +13 -13
msprobe/pytorch/free_benchmark/perturbed_layers/layer_factory.py +41 -41
msprobe/pytorch/free_benchmark/perturbed_layers/npu/add_noise.py +90 -90
msprobe/pytorch/free_benchmark/perturbed_layers/npu/bit_noise.py +104 -104
msprobe/pytorch/free_benchmark/perturbed_layers/npu/change_value.py +63 -63
msprobe/pytorch/free_benchmark/perturbed_layers/npu/improve_precision.py +68 -68
msprobe/pytorch/free_benchmark/perturbed_layers/npu/no_change.py +28 -28
msprobe/pytorch/free_benchmark/perturbed_layers/npu/npu_base_layser.py +45 -45
msprobe/pytorch/free_benchmark/perturbed_layers/run_cpu.py +19 -19
msprobe/pytorch/free_benchmark/result_handlers/base_handler.py +217 -203
msprobe/pytorch/free_benchmark/result_handlers/check_handler.py +39 -39
msprobe/pytorch/free_benchmark/result_handlers/fix_handler.py +23 -23
msprobe/pytorch/free_benchmark/result_handlers/handler_factory.py +30 -31
msprobe/pytorch/free_benchmark/result_handlers/preheat_handler.py +170 -170
msprobe/pytorch/function_factory.py +76 -0
msprobe/pytorch/functional/dump_module.py +39 -39
msprobe/pytorch/grad_probe/__init__.py +0 -0
msprobe/pytorch/grad_probe/grad_monitor.py +91 -0
msprobe/pytorch/grad_probe/grad_stat_csv.py +129 -0
msprobe/pytorch/hook_module/api_registry.py +161 -161
msprobe/pytorch/hook_module/hook_module.py +120 -109
msprobe/pytorch/hook_module/support_wrap_ops.yaml +1879 -1876
msprobe/pytorch/hook_module/utils.py +30 -29
msprobe/pytorch/hook_module/wrap_aten.py +110 -100
msprobe/pytorch/hook_module/wrap_distributed.py +78 -75
msprobe/pytorch/hook_module/wrap_functional.py +105 -108
msprobe/pytorch/hook_module/wrap_npu_custom.py +93 -73
msprobe/pytorch/hook_module/wrap_tensor.py +71 -72
msprobe/pytorch/hook_module/wrap_torch.py +86 -88
msprobe/pytorch/hook_module/wrap_vf.py +62 -64
msprobe/pytorch/module_processer.py +138 -98
msprobe/pytorch/online_dispatch/__init__.py +20 -20
msprobe/pytorch/online_dispatch/compare.py +236 -236
msprobe/pytorch/online_dispatch/dispatch.py +271 -273
msprobe/pytorch/online_dispatch/dump_compare.py +155 -186
msprobe/pytorch/online_dispatch/single_compare.py +391 -391
msprobe/pytorch/online_dispatch/torch_ops_config.yaml +49 -49
msprobe/pytorch/online_dispatch/utils.py +130 -187
msprobe/pytorch/parse.py +4 -4
msprobe/pytorch/parse_tool/cli.py +32 -32
msprobe/pytorch/parse_tool/lib/compare.py +260 -259
msprobe/pytorch/parse_tool/lib/config.py +52 -51
msprobe/pytorch/parse_tool/lib/file_desc.py +31 -31
msprobe/pytorch/parse_tool/lib/interactive_cli.py +102 -102
msprobe/pytorch/parse_tool/lib/parse_exception.py +54 -54
msprobe/pytorch/parse_tool/lib/parse_tool.py +158 -158
msprobe/pytorch/parse_tool/lib/utils.py +316 -367
msprobe/pytorch/parse_tool/lib/visualization.py +85 -90
msprobe/pytorch/pt_config.py +188 -93
msprobe/pytorch/service.py +246 -167
mindstudio_probe-1.0.1.dist-info/RECORD +0 -228
msprobe/config/README.md +0 -397
msprobe/mindspore/doc/dump.md +0 -65
msprobe/mindspore/dump/api_kbk_dump.py +0 -55
msprobe/pytorch/compare/acc_compare.py +0 -1024
msprobe/pytorch/compare/highlight.py +0 -100
msprobe/pytorch/doc/FAQ.md +0 -193
msprobe/pytorch/doc/api_accuracy_checker.md +0 -269
msprobe/pytorch/doc/atat/321/207/342/226/223/342/225/233/321/205/342/225/221/320/266/321/205/342/225/226/320/265/321/205/320/225/342/225/226/321/206/320/245/342/226/221/321/206/320/235/320/276dump/321/206/320/260/320/227/321/205/320/227/320/226/321/206/320/220/320/267/321/210/320/223/342/225/234/321/205/320/257/342/225/221/321/207/342/225/221/342/224/220/321/206/320/232/320/265/321/205/320/241/320/232.md +0 -182
msprobe/pytorch/doc/dump.md +0 -207
msprobe/pytorch/doc/ptdbg_ascend_compare.md +0 -176
msprobe/pytorch/doc/ptdbg_ascend_overview.md +0 -68
msprobe/pytorch/doc/ptdbg_ascend_quickstart.md +0 -381
msprobe/pytorch/doc/run_overflow_check.md +0 -25
msprobe/pytorch/doc//321/205/320/254/320/270/321/207/342/225/221/342/224/220/321/207/342/226/223/342/225/233/321/205/342/225/221/320/266/321/206/320/277/320/244/321/205/320/277/342/225/243.md +0 -90
msprobe/test/core_ut/common/test_utils.py +0 -345
msprobe/test/core_ut/data_dump/test_data_collector.py +0 -47
msprobe/test/core_ut/data_dump/test_json_writer.py +0 -183
msprobe/test/core_ut/data_dump/test_scope.py +0 -151
msprobe/test/core_ut/test_common_config.py +0 -152
msprobe/test/core_ut/test_file_check.py +0 -218
msprobe/test/core_ut/test_log.py +0 -109
msprobe/test/mindspore_ut/test_api_kbk_dump.py +0 -51
msprobe/test/mindspore_ut/test_debugger_config.py +0 -42
msprobe/test/mindspore_ut/test_dump_tool_factory.py +0 -51
msprobe/test/mindspore_ut/test_kernel_graph_dump.py +0 -66
msprobe/test/mindspore_ut/test_kernel_graph_overflow_check.py +0 -63
msprobe/test/mindspore_ut/test_ms_config.py +0 -69
msprobe/test/mindspore_ut/test_overflow_check_tool_factory.py +0 -51
msprobe/test/mindspore_ut/test_precision_debugger.py +0 -56
msprobe/test/mindspore_ut/test_task_handler_factory.py +0 -58
msprobe/test/pytorch_ut/advisor/test_advisor.py +0 -83
msprobe/test/pytorch_ut/api_accuracy_checker/common/test_common_utils.py +0 -108
msprobe/test/pytorch_ut/api_accuracy_checker/common/test_config.py +0 -39
msprobe/test/pytorch_ut/api_accuracy_checker/compare/test_algorithm.py +0 -112
msprobe/test/pytorch_ut/api_accuracy_checker/compare/test_api_precision_compare.py +0 -77
msprobe/test/pytorch_ut/api_accuracy_checker/compare/test_compare.py +0 -125
msprobe/test/pytorch_ut/api_accuracy_checker/compare/test_compare_column.py +0 -10
msprobe/test/pytorch_ut/api_accuracy_checker/compare/test_compare_utils.py +0 -43
msprobe/test/pytorch_ut/api_accuracy_checker/run_ut/dump.json +0 -179
msprobe/test/pytorch_ut/api_accuracy_checker/run_ut/forward.json +0 -63
msprobe/test/pytorch_ut/api_accuracy_checker/run_ut/test_data_generate.py +0 -99
msprobe/test/pytorch_ut/api_accuracy_checker/run_ut/test_multi_run_ut.py +0 -115
msprobe/test/pytorch_ut/api_accuracy_checker/run_ut/test_run_ut.py +0 -72
msprobe/test/pytorch_ut/compare/test_acc_compare.py +0 -17
msprobe/test/pytorch_ut/free_benchmark/perturbed_layers/test_perturbed_layser.py +0 -105
msprobe/test/pytorch_ut/free_benchmark/result_handlers/test_result_handler.py +0 -121
msprobe/test/pytorch_ut/free_benchmark/test_main.py +0 -101
msprobe/test/pytorch_ut/functional/test_dump_module.py +0 -15
msprobe/test/pytorch_ut/hook_module/test_api_registry.py +0 -130
msprobe/test/pytorch_ut/hook_module/test_hook_module.py +0 -42
msprobe/test/pytorch_ut/hook_module/test_wrap_aten.py +0 -65
msprobe/test/pytorch_ut/hook_module/test_wrap_distributed.py +0 -35
msprobe/test/pytorch_ut/hook_module/test_wrap_functional.py +0 -20
msprobe/test/pytorch_ut/hook_module/test_wrap_tensor.py +0 -35
msprobe/test/pytorch_ut/hook_module/test_wrap_torch.py +0 -43
msprobe/test/pytorch_ut/hook_module/test_wrap_vf.py +0 -11
msprobe/test/pytorch_ut/test_pt_config.py +0 -69
msprobe/test/pytorch_ut/test_service.py +0 -59
msprobe/test/resources/advisor.txt +0 -3
msprobe/test/resources/compare_result_20230703104808.csv +0 -9
msprobe/test/resources/compare_result_without_accuracy.csv +0 -9
msprobe/test/resources/config.yaml +0 -3
msprobe/test/resources/npu_test.pkl +0 -8
msprobe/test/run_test.sh +0 -30
msprobe/test/run_ut.py +0 -58
msprobe/test/test_module_processer.py +0 -64
{mindstudio_probe-1.0.1.dist-info → mindstudio_probe-1.0.4.dist-info}/top_level.txt +0 -0
/msprobe/{pytorch/doc → docs}/img/BLOOM-7B_1.png +0 -0
/msprobe/{pytorch/doc → docs}/img/BLOOM-7B_2.png +0 -0
/msprobe/{pytorch/doc → docs}/img/BLOOM-7B_3.png +0 -0
/msprobe/{pytorch/doc → docs}/img/BLOOM-7B_4.png +0 -0
/msprobe/{pytorch/doc → docs}/img/GPT-3_1.png +0 -0
/msprobe/{pytorch/doc → docs}/img/GPT-3_2.png +0 -0
/msprobe/{pytorch/doc → docs}/img/GPT-3_3.png +0 -0
/msprobe/{pytorch/doc → docs}/img/GPT-3_4.png +0 -0
/msprobe/{pytorch/doc → docs}/img/GPT-3_5.png +0 -0
/msprobe/{pytorch/doc → docs}/img/GPT-3_6.png +0 -0
/msprobe/{pytorch/doc → docs}/img/GPT-3_7.png +0 -0
/msprobe/{pytorch/doc → docs}/img/GPT-3_8.png +0 -0
/msprobe/{pytorch/doc → docs}/img/YOLOV5S_1.png +0 -0
/msprobe/{pytorch/doc → docs}/img/YOLOV5S_2.png +0 -0
/msprobe/{pytorch/doc → docs}/img/accuracy_checking_details.png +0 -0
/msprobe/{pytorch/doc → docs}/img/accuracy_checking_result.png +0 -0
/msprobe/{pytorch/doc → docs}/img/api_precision_compare_details.png +0 -0
/msprobe/{pytorch/doc → docs}/img/api_precision_compare_result.png +0 -0
/msprobe/{pytorch/doc → docs}/img/auto_analyze_log.png +0 -0
/msprobe/{pytorch/doc → docs}/img/compare_result_pkl.png +0 -0
/msprobe/{pytorch/doc → docs}/img/compare_result_pkl_md5.png.png +0 -0
/msprobe/{pytorch/doc → docs}/img/cpu_info.png +0 -0
/msprobe/{config → docs}/img/free_benchmark.png +0 -0
/msprobe/{pytorch/doc → docs}/img/module_compare.png +0 -0

msprobe/pytorch/doc/atat/321/207/342/226/223/342/225/233/321/205/342/225/221/320/266/321/205/342/225/226/320/265/321/205/320/225/342/225/226/321/206/320/245/342/226/221/321/206/320/235/320/276dump/321/206/320/260/320/227/321/205/320/227/320/226/321/206/320/220/320/267/321/210/320/223/342/225/234/321/205/320/257/342/225/221/321/207/342/225/221/342/224/220/321/206/320/232/320/265/321/205/320/241/320/232.md DELETED Viewed

@@ -1,182 +0,0 @@
-# msprobe精度工具标准性能基线报告
-## 环境信息
-NPU：Atlas A2 训练系列产品
-CPU：
-![输入图片说明](img/cpu_info.png)
-Torch：2.1.0
-CANN：8.0.T2
-除上述环境信息影响性能外，API的数量、种类以及Shape都会对性能产生影响，因此本次选取指定网络进行测试，为了避免算子编译耗时的影响，所有模型运行时都开启二进制，模型中添加torch.npu.set_compile_mode(jit_compile=False)，所有模型都dump第二个step的数据。
-## 模型信息和性能基线
-大模型在使用msprobe工具dump数据时，建议先简化模型层数，减少dump数据量。
-以下场景的性能基线测试数据均为多次测试后取平均值，因此实际运行时性能数据可能会根据环境状态稍有浮动。
-### 工具配置信息
-dump全部API级别输入输出数据以及相应堆栈信息，配置如下：
-```python
-debugger = PrecisionDebugger(dump_path="./dump_path", hook_name="dump")
-debugger.configure_hook(mode="api_stack")
-```
-多卡指定rank0 dump，配置如下：
-```python
-debugger = PrecisionDebugger(dump_path="./dump_path", hook_name="dump",rank=0)
-debugger.configure_hook(mode="api_stack")
-```
-dump保存API统计信息的pkl文件，配置如下：
-```python
-debugger = PrecisionDebugger(dump_path="./dump_path", hook_name="dump")
-debugger.configure_hook(mode="api_stack", summary_only=True)
-```
-### YOLOV5s
-单卡
-主要数据类型：FLOAT32
-启动命令参数：python3 train_ptdbg.py --data ./data/coco.yaml --cfg yolov5s.yaml --weights '' --epochs 1 --batch-size 8 --device 1
-dump保存API统计信息的pkl文件耗时：**7s**
-进行单卡dump全部API级别输入输出数据以及相应堆栈信息耗时：**11s**
-- dump存盘的API numpy文件大小：13G
-  ![输入图片说明](img/YOLOV5S_1.png)
-- api numpy文件数量：3009个
-  ![输入图片说明](img/YOLOV5S_2.png)
-### GPT-3
-#### NUM_LAYER：1
-8卡
-主要数据类型：FLOAT16
-启动命令参数：
-```
-python3 -m torch.distributed.launch $DISTRIBUTED_ARGS ../../pretrain_gpt_ptdbg.py --num-layers 1 --hidden-size 12288 --num-attention-heads 24 --micro-batch-size 2 --global-batch-size 2 --seq-length 1024 --max-position-embeddings 1024 --train-iters 10 --lr-decay-iters 320000 --save $CHECKPOINT_PATH --load $CHECKPOINT_PATH --data-path $DATA_PATH --tensor-model-parallel-size 8 --use-distributed-optimizer --pipeline-model-parallel-size 8 --vocab-file gpt2-vocab.json --merge-file gpt2-merges.txt --data-impl mmap --split 949,50,1 --distributed-backend nccl --lr 0.375e-5 --lr-decay-style cosine --min-lr 0.375e-6 --weight-decay 0.1 --clip-grad 1.0 --lr-warmup-fraction .01 --adam-beta1 0.9 --adam-beta2 0.95 --init-method-std 0.006
---recompute-granularity full --recompute-method uniform --no-gradient-accumulation-fusion --log-interval 1 --save-interval 10000 --eval-interval 1000 --eval-iters 10 --fp16
-```
-dump保存API统计信息的pkl文件耗时：**3.3s**
-进行8卡dump全部API级别输入输出数据以及相应堆栈信息耗时：**53s**
-- dump存盘的api numpy文件大小：145G
-  ![输入图片说明](img/GPT-3_1.png)
-- API numpy文件数量：5130个
-  ![输入图片说明](img/GPT-3_2.png)
-**经测试8卡同时写入磁盘已达到磁盘I/O上限，工具的dump速度取决于磁盘性能，本机环境多进程写入磁盘上限为3GB/秒左右，理论上保存145GB的数据需要50秒左右，如果dump的数据中包含许多的小文件，那么耗时将会更久。**
-指定rank0 dump耗时：**9s**
-- dump存盘的api numpy文件大小：19G
-  ![输入图片说明](img/GPT-3_3.png)
-- api numpy文件数量：643个
-  ![输入图片说明](img/GPT-3_4.png)
-#### NUM_LAYER：8
-8卡
-主要数据类型：FLOAT16
-启动命令参数：
-```
-python3 -m torch.distributed.launch $DISTRIBUTED_ARGS ../../pretrain_gpt_ptdbg.py --num-layers 8 --hidden-size 12288 --num-attention-heads 24 --micro-batch-size 2 --global-batch-size 2 --seq-length 1024 --max-position-embeddings 1024 --train-iters 10 --lr-decay-iters 320000 --save $CHECKPOINT_PATH --load $CHECKPOINT_PATH --data-path $DATA_PATH --tensor-model-parallel-size 8 --use-distributed-optimizer --pipeline-model-parallel-size 8 --vocab-file gpt2-vocab.json --merge-file gpt2-merges.txt --data-impl mmap --split 949,50,1 --distributed-backend nccl --lr 0.375e-5 --lr-decay-style cosine --min-lr 0.375e-6 --weight-decay 0.1 --clip-grad 1.0 --lr-warmup-fraction .01 --adam-beta1 0.9 --adam-beta2 0.95 --init-method-std 0.006 --recompute-granularity full --recompute-method uniform --no-gradient-accumulation-fusion --log-interval 1 --save-interval 10000 --eval-interval 1000 --eval-iters 10 --fp16
-```
-dump保存API统计信息的pkl文件耗时：**6.7s**
-进行8卡dump全部API级别输入输出数据以及相应堆栈信息耗时：**323s**
-- dump存盘的API numpy文件大小：878G
-  ![输入图片说明](img/GPT-3_5.png)
-- API numpy文件数量：24002个
-  ![输入图片说明](img/GPT-3_6.png)
-指定rank0 dump耗时：**47s**
-- dump存盘的API numpy文件大小：110G
-  ![输入图片说明](img/GPT-3_7.png)
-- API numpy文件数量：3002个
-  ![输入图片说明](img/GPT-3_8.png)
-### BLOOM-7B
-8卡
-NUM_LAYER：1
-主要数据类型：BFLOAT16
-启动命令参数：
-```
-python -m torch.distributed.launch $DISTRIBUTED_ARGS pretrain_llama.py --DDP-impl local --tensor-model-parallel-size 8 --pipeline-model-parallel-size 1 --sequence-parallel --num-layers 1 --hidden-size 12288 --position-embedding-type rope --normalization RMSNorm --ffn-hidden-size 11008 --num-attention-heads 24 --attention-dropout 0.0 --hidden-dropout 0.0 --init-method-std 0.01 --micro-batch-size 2 --global-batch-size 2 --seq-length 1024 --max-position-embeddings 1024 --data-path $DATA_PATH --tokenizer-name-or-path $TOKENIZER_PATH --tokenizer-not-use-fast --split 100,0,0 --distributed-backend nccl --lr 1.25e-5 --min-lr 1.25e-6 --lr-decay-style cosine --weight-decay 1e-1 --clip-grad 1.0 --initial-loss-scale 65536.0 --adam-beta1 0.9 --adam-beta2 0.95 --log-interval 1 --load ${LOAD_CHECKPOINT_PATH} --save ${SAVE_CHECKPOINT_PATH} --save-interval 10000 --eval-interval 10000 --eval-iters 0 --use-fused-rotary-pos-emb --no-masked-softmax-fusion --no-load-optim --no-load-rng --train-iters 20 --lr-warmup-fraction 0.01 --mlp-layer-fusion --use-flash-attn --use-fused-rmsnorm --bf16
-```
-dump保存API统计信息的pkl文件耗时：**3s**
-进行8卡dump全部API级别输入输出数据以及相应堆栈信息耗时：**61s**
-- dump存盘的API numpy文件大小：160G
-  ![输入图片说明](img/BLOOM-7B_1.png)
-- API numpy文件数量：4924个
-  ![输入图片说明](img/BLOOM-7B_2.png)
-指定rank0 dump耗时：**17s**
-- dump存盘的API numpy文件大小：20G
-  ![输入图片说明](img/BLOOM-7B_3.png)
-- API numpy文件数量：633个
-  ![输入图片说明](img/BLOOM-7B_4.png)

msprobe/pytorch/doc/dump.md DELETED Viewed

@@ -1,207 +0,0 @@
-# **精度数据采集**
-msprobe工具主要通过在训练脚本内添加dump接口并启动训练的方式来采集精度数据。
-执行dump操作需要安装msprobe工具。详见《[MindStudio精度调试工具](../../README.md)》的“工具安装”章节。
-## dump接口介绍
-### PrecisionDebugger
-**功能说明**
-通过加载dump配置文件的方式来确定dump操作的详细配置。
-可以在from msprobe.pytorch import PrecisionDebugger和模型初始化之间的任意位置添加该接口。
-**原型**
-```Python
-PrecisionDebugger(config_path=None, task=None, dump_path=None, level=None, model=None, step=None)
-```
-说明：上述参数除config_path和model外，其他参数均在[config.json](../../config)文件中可配，此处的参数优先级高于[config.json](../../config)文件中的配置，而config.json文件可以配置更多参数，若需要进行更多场景的精度数据dump，建议配置[config.json](../../config)文件。
-**参数说明**
-| 参数名      | 说明                                                         | 是否必选 |
-| ----------- | ------------------------------------------------------------ | -------- |
-| config_path | 指定dump配置文件路径，String类型。参数示例："./config.json"。未配置该路径时，默认使用[config.json](../../config)文件的默认配置。 | 否       |
-| task        | dump的任务类型，String类型。可取值"statistics"（仅dump API统计信息）、"tensor"（dump API统计信息和完全复刻整网的API运行情况的真实数据）、"overflow_check"（溢出检测），默认未配置，取"statistics"，参数示例：task="tensor"。 | 否       |
-| dump_path   | 设置dump数据目录路径，String类型。参数示例：dump_path="./dump_path"。 | 否       |
-| level       | dump级别，根据不同级别dump不同数据，String类型。可取值：<br>        "L0"：dump module模块级精度数据，仅PyTorch场景支持”。<br/>        "L1"：dump API级精度数据，默认值。<br/>        "L2"：dump kernel级精度数据。<br/>        "mix"：dump module模块级和API级精度数据。<br/>配置示例：level="L1"。 | 否       |
-| model       | 指定具体的torch.nn.Module，默认未配置，level配置为"L0"或"mix"时必须配置该参数。配置示例参见“**model配置代码示例**”。 | 否       |
-| step        | 指定dump某个step的数据，list[int]类型。默认未配置，表示dump所有step数据。dump特定step时，须指定为训练脚本中存在的step。step为list格式，可配置逐个step，例如：step=[0,1,2]。 | 否       |
-#### model配置代码示例
-示例中定义了一个nn.Module类型的简单网络，在进行数据dump时使用原型函数PrecisionDebugger并传入config_path参数和model参数，其中model参数传入数据的类型为torch.nn.Module类型或torch.nn.Module子类型。
-```python
-#根据需要import包
-import os
-import torch
-import torch.nn as nn
-import torch_npu
-import torch.nn.functional as F
-from msprobe.pytorch import PrecisionDebugger
-torch.npu.set_device("npu:0")
-#定义一个简单的网络
-class ModuleOP(nn.Module):
-	def __init__(self) -> None:
-    	super().__init__()
-    	self.linear_1 = nn.Linear(in_features=8,out_features=4)
-    	self.linear_2 = nn.Linear(in_features=4,out_features=2)
-	def forward(self,x):
-        x1 = self.linear_1(x)
-        x2 = self.linear_2(x1)
-        r1 = F.relu(x2)
-        return r1
-if __name__ == "__main__"
-	module = ModuleOP()
-	#注册工具
-	debugger = PrecisionDebugger('./config.json',model=module)
-	debugger.start()
-	x = torch.randn(10,8)
-	out = module(x)
-	loss = out.sum()
-	loss.backward()
-	debugger.stop()
-```
-### start函数
-**功能说明**
-启动函数。
-在模型初始化之后的任意位置添加。
-**原型**
-```Python
-debugger.start()
-```
-该函数为类函数，可以使用debugger.start()也可以使用PrecisionDebugger.start()。
-### stop函数
-**功能说明**
-停止函数。
-在**start**函数之后的任意位置添加。
-**原型**
-```Python
-debugger.stop()
-```
-该函数为类函数，可以使用debugger.stop()也可以使用PrecisionDebugger.stop()。
-### step函数
-**功能说明**
-结束标识。
-在最后一个**stop**函数后或一个step结束的位置添加。
-**原型**
-```Python
-debugger.step()
-```
-该函数为类函数，可以使用debugger.step()也可以使用PrecisionDebugger.step()。
-## 示例代码
-```Python
-from msprobe.pytorch import PrecisionDebugger
-debugger = PrecisionDebugger(config_path="./config.json", dump_path="./dump_path")
-# 请勿将以上初始化流程插入到循环代码中
-# 模型初始化
-# 下面代码也可以用PrecisionDebugger.start()和PrecisionDebugger.stop()
-debugger.start()
-# 需要dump的代码片段1
-debugger.stop()
-debugger.start()
-# 需要dump的代码片段2
-debugger.stop()
-debugger.step()
-```
-## dump结果文件介绍
-训练结束后，工具将dump的数据保存在dump_path参数指定的目录下。
-dump结果目录结构示例如下：
-```Python
-├── dump_path
-│   ├── step0
-│   |   ├── rank0
-│   |   │   ├── dump_tensor_data
-|   |   |   |    ├── Tensor.permute.1.forward.pt
-|   |   |   |    ├── MyModule.0.forward.input.pt        # 开启模块级精度数据dump时存在模块级的dump数据文件
-|   |   |   |    ...
-|   |   |   |    └── Fcuntion.linear.5.backward.output.pt
-│   |   |   ├── dump.json        # 保存前反向算子、算子的统计量信息或溢出算子信息。包含dump数据的API名称（命名格式为：`{api_type}_{api_name}_{API调用次数}_{前向反向}_{input/output}.{参数序号}`）、dtype、 shape、各数据的max、min、mean、L2norm统计信息以及当配置summary_mode="md5"时的md5数据。其中，“参数序号”表示该API下的第n个参数，例如1，则为第一个参数，若该参数为list格式，则根据list继续排序，例如1.1，表示该API的第1个参数的第1个子参数；L2norm表示2范数（平方根）
-│   |   |   ├── stack.json        # 算子调用栈信息
-│   |   |   └── construct.json        # 分层分级结构
-│   |   ├── rank1
-|   |   |   ├── dump_tensor_data
-|   |   |   |   └── ...
-│   |   |   ├── dump.json
-│   |   |   ├── stack.json
-|   |   |   └── construct.json
-│   |   ├── ...
-│   |   |
-|   |   └── rank7
-│   ├── step1
-│   |   ├── ...
-│   ├── step2
-```
-dump过程中，pt文件在对应算子或者模块被执行后就会落盘，而json文件则需要在正常执行PrecisionDebugger.stop()后才会被落盘保存，异常的程序终止会保存终止前被执行算子的相关pt文件，但是不会生成json文件。
-其中rank为设备上各卡的ID，每张卡上dump的数据会生成对应dump目录。
-pt文件保存的前缀和PyTorch对应关系如下：
-| 前缀        | Torch模块           |
-| ----------- | ------------------- |
-| Tensor      | torch.Tensor        |
-| Torch       | torch               |
-| Functional  | torch.nn.functional |
-| NPU         | NPU亲和算子         |
-| VF          | torch._VF           |
-| Aten        | torch.ops.aten      |
-| Distributed | torch.distributed   |
-## 工具支持的API列表
-msprobe工具维护固定的API支持列表，若需要删除或增加dump的API，可以在msprobe/pytorch/hook_module/support_wrap_ops.yaml文件内手动修改，如下示例：
-```Python
-functional:  # functional为算子类别，找到对应的类别，在该类别下按照下列格式删除或添加API
-  - conv1d
-  - conv2d
-  - conv3d
-```
-# FAQ
-[FAQ](./FAQ.md)

msprobe/pytorch/doc/ptdbg_ascend_compare.md DELETED Viewed

@@ -1,176 +0,0 @@
-# **精度比对工具**
-## CPU或GPU与NPU精度数据比对
-### 总体说明
-- 本节主要介绍CPU或GPU与NPU精度数据比对的函数以及示例，执行精度比对操作前需要先完成CPU或GPU与NPU的精度数据dump，详见《[精度数据采集](./dump.md)》。
-- 比对函数均通过单独创建精度比对脚本执行，可支持单卡和多卡场景的精度数据比对。
-- 工具性能：比对数据量较小时（参考值单份文件小于10GB），参考比对速度0.1GB/s；比对数据量较大时，参考比对速度0.3GB/s。 推荐环境配置：独占环境，CPU核心数192，固态硬盘（IO速度参考：固态硬盘 > 500MB/s，机械硬盘60 ~ 170MB/s）。
-  用户环境性能弱于标准约束或非独占使用的比对速度酌情向下浮动。比对速度的计算方式：两份比对文件大小/比对耗时。
-### 约束
-- NPU自研API，在CPU或GPU若没有对应的API，该API的dump数据不比对。
-- NPU与CPU或GPU的计算结果误差可能会随着模型的执行不断累积，最终会出现同一个API因为输入的数据差异较大而无法比对的情况。
-- CPU或GPU与NPU中两个相同的API会因为调用次数不同导致无法比对或比对到错误的API，不影响整体运行，该API忽略。
-### compare_distributed
-**功能说明**
-将CPU或GPU与NPU的dump文件进行比对，支持单卡和多卡，可同时比对多卡的dump数据。多机场景需要每个设备单独执行比对操作。可自动检索和匹配对应卡和进程所dump的数据文件，再调用compare进行比对。单机单卡时与compare函数二选一。
-**函数原型**
-```Python
-compare_distributed(npu_dump_dir, bench_dump_dir, output_path, **kwargs)
-```
-**参数说明**
-| 参数名         | 说明                                                         | 是否必选 |
-| -------------- | ------------------------------------------------------------ | -------- |
-| npu_dump_dir   | 配置NPU环境下的dump目录。dump数据目录须指定到step级。参数示例：'./npu_dump/step0'。数据类型：str。 | 是       |
-| bench_dump_dir | 配置CPU、GPU或NPU环境下的dump目录。参数示例：'./gpu_dump/step0'。数据类型：str。 | 是       |
-| output_path    | 配置比对结果文件存盘目录。需要预先创建output_path目录。参数示例：'./output'。文件名称基于时间戳自动生成，格式为：`compare_result_rank{npu_ID}-rank{cpu/gpu/npu_ID}_{timestamp}.xlsx`。数据类型：str。 | 是       |
-| **kwargs       | 支持compare的所有可选参数。                                  | 否       |
-**函数示例**
-创建比对脚本，例如compare_distributed.py，拷贝如下代码，具体参数请根据实际环境修改。
-```Python
-from msprobe.pytorch import *
-compare_distributed('./npu_dump/step0', './gpu_dump/step0', './output')
-```
-dump数据目录须指定到step级。
-### compare
-**功能说明**
-将CPU或GPU与NPU的dump文件进行比对，仅支持单机单卡。
-**函数原型**
-```Python
-compare(input_param, output_path, stack_mode=False, auto_analyze=True, fuzzy_match=False)
-```
-**参数说明**
-| 参数名       | 说明                                                         | 是否必选 |
-| ------------ | ------------------------------------------------------------ | -------- |
-| input_param  | 配置dump数据文件及目录。数据类型：dict。配置参数包括：<br>        "npu_json_path"：指定NPU dump目录下的dump.json文件。参数示例："npu_json_path": "./npu_dump/dump.json"。必选。<br/>        "bench_json_path"：指定CPU、GPU或NPU dump目录下的dump.json文件。参数示例："bench_json_path": "./gpu_dump/dump.json"。必选。<br/>        "stack_json_path"：指定NPU dump目录下的stack.json文件。参数示例："stack_json_path": "./npu_dump/stack.json"。可选。<br/>        "is_print_compare_log"：配置是否开启日志打屏。可取值True或False。可选。 | 是       |
-| output_path  | 配置比对结果文件存盘目录。参数示例：'./output'。文件名称基于时间戳自动生成，格式为：`compare_result_{timestamp}.xlsx`。数据类型：str。 | 是       |
-| stack_mode   | 配置stack_mode的开关。仅当配置"stack_json_path"需要开启。可取值True或False，参数示例：stack_mode=True，默认为False。数据类型：bool。 | 否       |
-| auto_analyze | 自动精度分析，开启后工具自动针对比对结果进行分析，识别到第一个精度不达标节点（在比对结果文件中的“Accuracy Reached or Not”列显示为No），并给出问题可能产生的原因（打屏展示并生成advisor_{timestamp}.txt文件）。可取值True或False，参数示例：auto_analyze=False，默认为True。数据类型：bool。 | 否       |
-| fuzzy_match  | 模糊匹配。开启后，对于网络中同一层级且命名仅调用次数不同的API，可匹配并进行比对。可取值True或False，参数示例：fuzzy_match=True，默认为False。数据类型：bool。 | 否       |
-**函数示例**
-单机单卡场景下创建比对脚本，例如compare.py，拷贝如下代码，具体参数请根据实际环境修改。
-```Python
-from msprobe.pytorch import compare
-dump_result_param={
-"npu_json_path": "./npu_dump/dump.json",
-"bench_json_path": "./gpu_dump/dump.json",
-"stack_json_path": "./npu_dump/stack.json",
-"is_print_compare_log": True
-}
-compare(dump_result_param, output_path="./output", stack_mode=True)
-```
-### 统计量比对
-若使用**compare**或**compare_distributed**函数创建的比对脚本中，在[config.json](../../config/config.json)文件中配置"task": "statistics"方式dump时，可以进行统计量比对，此时比对dump.json文件中的统计信息，开启后的比对结果文件生成Max diff、Min diff、Mean diff和L2norm diff，表示NPU dump数据中API的输入或输出与标杆数据输入或输出的最大值、最小值、平均值以及L2范数的差。可以通过该值判断API是否存在精度问题：当某个API的输入和输出的Max diff、Min diff、Mean diff和L2norm diff均为0或无限趋于0，那么可以判断该API无精度问题，反之则可能存在精度问题。
-**比对脚本示例**
-以compare.py为例。
-```Python
-from msprobe.pytorch import compare
-dump_result_param={
-"npu_json_path": "./npu_dump/dump.json",
-"bench_json_path": "./gpu_dump/dump.json",
-"stack_json_path": "./npu_dump/stack.json",
-"is_print_compare_log": True
-}
-compare(dump_result_param, output_path="./output", stack_mode=True)
-```
-**比对结果**
-数据量比对同样生成`compare_result_{timestamp}.xlsx`和`advisor_{timestamp}.txt`文件。其中`advisor_{timestamp}.txt`主要对`compare_result_{timestamp}.xlsx`中可能存在精度问题（Result为Waring）的API提出定位建议；`compare_result_{timestamp}.xlsx`主要有如下两种情况：
-- "summary_mode": "statistics"时比对dump.json文件：
-  ![compare_result_pkl](img/compare_result_pkl.png)
-  上图是对dump.json文件中NPU及标杆API的统计信息进行比对，判断可能存在精度问题的API，文件中记录NPU及标杆API的基本信息和统计信息，其中需要关注Result列，包含结果：Waring（NPU与标杆统计信息的比对中存在相对误差大于0.5，则需要重点检查该API）；为空（相对误差小于等于0.5，可以不需要重点关注，但不代表不存在精度问题）；Nan（表示统计信息数据没有匹配上）。
-- "summary_mode": "md5"时比对dump.json文件：
-  ![compare_result_pkl_md5.png](img/compare_result_pkl_md5.png.png)
-  上图是对dump.json文件中NPU及标杆API的MD5信息进行比对，判断API数据的完整性，文件中记录NPU及标杆API的基本信息和MD5信息，其中需要关注Result列，包含结果：Pass（表示NPU与标杆的MD5值一致，即API数据完整）；Different（表示NPU与标杆的MD5值不一致，即API数据不完全一致，可以通过NPU_Stack_Info列API调用栈查询该API的详细信息）；Nan（表示MD5信息数据没有匹配上）。
-## 比对结果分析
-PyTorch精度比对是以CPU或GPU的计算结果为标杆，通过计算精度评价指标判断API在运行时是否存在精度问题。
-- `advisor_{timestamp}.txt`文件中给出了可能存在精度问题的API的专家建议，可直接打开查看。
-- `compare_result_{timestamp}.xlsx`文件列出了所有执行精度比对的API详细信息和比对结果，如下示例：
-  ![compare_result](https://gitee.com/cai-weiwei1989/att_ptdbg/raw/master/debug/accuracy_tools/ptdbg_ascend/doc/img/compare_result.png)
-  可以从该结果文件中进行“**判断计算精度达标情况**”、“**计算精度评价指标分析**”以及“**异常信息识别**”等分析动作。
-### **判断计算精度达标情况**
-精度比对结果`compare_result_{timestamp}.xlsx`文件中只需要通过Accuracy Reached or Not来判断计算精度是否达标，判断标准如下：
-1. Cosine < 0.99 且 MaxAbsError > 0.001时，精度不达标，标记为“No”。
-2. Cosine < 0.9，精度不达标，标记为“No”。
-3. MaxAbsError > 1，精度不达标，标记为“No”。
-4. 其余情况下记为精度达标，标记为“Yes”。
-### **计算精度评价指标分析**
-1. Cosine：通过计算两个向量的余弦值来判断其相似度，数值越接近于1说明计算出的两个张量越相似，实际可接受阈值为大于0.99。在计算中可能会存在nan，主要由于可能会出现其中一个向量为0。
-2. MaxAbsErr：当最大绝对误差越接近0表示其计算的误差越小，实际可接受阈值为小于0.001。
-3. MaxRelativeErr：当最大相对误差越接近0表示其计算的误差越小。
-   当dump数据中存在0或Nan时，比对结果中最大相对误差则出现inf或Nan的情况，属于正常现象。
-4. One Thousandth Err Ratio（双千分之一）、Five Thousandths Err Ratio（双千分之五）精度指标：是指NPU的Tensor中的元素逐个与对应的标杆数据对比，相对误差大于千分之一、千分之五的比例占总元素个数的比例小于千分之一、千分之五。该数据仅作为精度下降趋势的参考，并不参与计算精度是否通过的判定。
-### **异常信息识别**
-精度比对结果`compare_result_{timestamp}.xlsx`文件中对于存在异常信息的API会进行高亮处理：
-- 红色可能出现的情况有：
-  - NPU max或NPU min信息中存在nan/inf
-  - Max diff存在大于1e+10的值
-  - 统计数据中output的Max diff除以max(0.01, Bench max) > 0.5
-  - 真实数据中One Thousandth Err Ratio的input > 0.9同时output < 0.6
-- 黄色可能出现的情况有：
-  - Max diff的input与output都大于1，同时output比input大一个数量级以上
-  - 统计数据Max diff除以max(0.01, Bench max)的output > 0.1同时input < 0.01
-  - 真实数据One Thousandth Err Ratio的input - output > 0.1
-  - 真实数据Cosine的input - output > 0.1
-# FAQ
-[FAQ](./FAQ.md)

msprobe/pytorch/doc/ptdbg_ascend_overview.md DELETED Viewed

@@ -1,68 +0,0 @@
-# **精度比对工具**
-## 简介
-在PyTorch训练网络，对同一模型或API调试过程中，遇到API相关的计算精度问题，定位时费时费力。
-msprobe的精度比对工具，用来进行PyTorch整网API粒度的数据dump、精度比对和溢出检测，从而定位PyTorch训练场景下的精度问题。
-**使用场景**
-主要的使用场景包括：
-- 同一模型，从CPU或GPU移植到NPU中存在精度下降问题，对比NPU芯片中的API计算数值与CPU或GPU芯片中的API计算数值，进行问题定位。
-- 同一模型，进行迭代（模型、框架版本升级或设备硬件升级）时存在的精度下降问题，对比相同模型在迭代前后版本的API计算数值，进行问题定位。
-## 原理介绍
-精度对比工具，通过在PyTorch模型中注册hook，跟踪计算图中API的前向传播与反向传播时的输入与输出，排查存在计算精度误差，进行问题的精准定位。
-**精度比对流程**
-1. 当模型在CPU或GPU上进行正向和反向传播时，分别dump每一层的数值输入与输出。
-2. 当模型在NPU中进行计算时，采用相同的方式dump下相应的数据。
-3. 通过对比dump出的数值，计算余弦相似度和最大绝对误差的方式，定位和排查NPU API存在的计算精度问题。如下图所示。
-   精度比对逻辑图
-   ![module_compare](img/module_compare.png)
-**API匹配条件**
-进行精度比对时，需要判断CPU或GPU的API与NPU的API是否相同可比对，须满足以下匹配条件：
-- 两个API的名称相同，API命名规则：`{api_type}.{api_name}.{api调用次数}.{正反向}.{输入输出}.index`，如：Functional.conv2d.1.backward.input.0。
-- 两个API的输入输出Tensor数量和各个Tensor的Shape相同。
-通常满足以上两个条件，工具就认为是同一个API，成功进行API的匹配，后续进行相应的计算精度比对。
-## 精度比对总体流程
-1. 准备CPU或GPU训练工程。
-2. 在环境下安装msprobe工具。详见《[MindStudio精度调试工具](../../README.md)》的“工具安装”章节。
-3. 在训练脚本内添加msprobe工具dump接口PrecisionDebugger采集标杆数据。详见《[精度数据采集](./dump.md)》。
-4. 执行训练dump数据。
-5. 将CPU或GPU训练工程迁移为NPU训练工程。详见《[PyTorch模型迁移调优指南](https://www.hiascend.com/document/detail/zh/Pytorch/60RC1/ptmoddevg/trainingmigrguide/PT_LMTMOG_0003.html)》。
-6. 在NPU环境下安装msprobe工具。详见《[MindStudio精度调试工具](../../README.md)》的“工具安装”章节。
-7. 在NPU训练脚本内添加msprobe工具dump接口PrecisionDebugger采集标杆数据。详见《[精度数据采集](./dump.md)》。
-8. NPU环境下执行训练dump数据。
-9. 执行精度比对。
-   1. 创建并配置精度比对脚本，例如compare.py。
-   2. 执行CPU或GPU dump与NPU dump数据的精度比对。
-   3. 比对结果分析。
-      详见《[CPU或GPU与NPU精度数据比对](./ptdbg_ascend_compare.md)》。

mindstudio-probe 1.0.1__py3-none-any.whl → 1.0.4__py3-none-any.whl

mindstudio-probe 1.0.1py3-none-any.whl → 1.0.4py3-none-any.whl