math-precision 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,3 @@
1
+ This is proprietary code.
2
+
3
+ Its copying, alteration and distribution outside official media is strictly prohibited.
@@ -0,0 +1,10 @@
1
+ Metadata-Version: 2.2
2
+ Name: math_precision
3
+ Version: 1.0.0
4
+ Home-page: https://github.com/sapiens-technology/math_precision
5
+ Author: SAPIENS TECHNOLOGY
6
+ License: Proprietary Software
7
+ License-File: LICENSE.txt
8
+ Dynamic: author
9
+ Dynamic: home-page
10
+ Dynamic: license
@@ -0,0 +1,77 @@
1
+ """
2
+ The presented algorithm implements a benchmarking method designed to evaluate the **mathematical precision** of language models,
3
+ targeting a well-known limitation of these systems: their ability to reliably perform numerical reasoning under high precision constraints.
4
+
5
+ The `MathPrecision` class functions as a synthetic data generator, automatically producing a dataset of mathematically structured problems.
6
+ Each sample consists of an expression built from three high-precision numbers (generated using the `decimal` module) combined with two randomly selected arithmetic operators:
7
+ addition, subtraction, multiplication, or division. These numbers are not standard floats but high-precision values with many decimal places,
8
+ significantly increasing the difficulty and minimizing the likelihood of correct answers through approximation.
9
+
10
+ For every generated expression, the algorithm computes the correct result with high numerical precision and then constructs a multiple-choice question.
11
+ Only one option corresponds to the correct answer, while the remaining alternatives are generated by applying small perturbations to the true result.
12
+ These incorrect options are intentionally designed to be plausible,
13
+ forcing the evaluated model to perform actual numerical reasoning rather than relying on superficial pattern recognition.
14
+
15
+ Additionally, both operators and answer choices are shuffled, ensuring structural diversity across samples and reducing positional bias.
16
+ The final output is a structured dataset containing input-output pairs, ready to be used in automated evaluation pipelines.
17
+
18
+ Among the main advantages of this benchmarking approach:
19
+
20
+ First, the **high numerical precision requirement** makes the test substantially more rigorous than traditional benchmarks based on simple arithmetic.
21
+ It allows clear differentiation between models that merely approximate reasoning and those that can consistently execute precise calculations.
22
+
23
+ Another key advantage is **scalable synthetic data generation**. Since the dataset is produced programmatically, it can be expanded arbitrarily,
24
+ with full control over complexity, distribution, and format. This removes reliance on static datasets and mitigates the risk of overfitting to known benchmarks.
25
+
26
+ The method is also **robust against memorization**, as each execution generates entirely new and unique problems.
27
+ This prevents models from gaining artificial advantages by memorizing fixed question sets.
28
+
29
+ Furthermore, the use of closely related incorrect alternatives enables **fine-grained evaluation**, capturing not only whether a model gets the answer right,
30
+ but also how close it gets. This is particularly useful for identifying systematic errors, such as precision loss or difficulties handling operator precedence.
31
+
32
+ Finally, the approach is **simple, automated, and easily integrable** into evaluation workflows,
33
+ making it suitable for both offline benchmarking and continuous testing during model development.
34
+
35
+ In summary, this algorithm provides an effective and rigorous benchmark for assessing a fundamental capability—mathematical precision—while balancing experimental control,
36
+ scalability, and realistic difficulty.
37
+ """
38
+ # --------------------------> A SAPIENS TECHNOLOGY®️ PRODUCTION) <--------------------------
39
+ from .math_precision import *
40
+ """
41
+ The presented algorithm implements a benchmarking method designed to evaluate the **mathematical precision** of language models,
42
+ targeting a well-known limitation of these systems: their ability to reliably perform numerical reasoning under high precision constraints.
43
+
44
+ The `MathPrecision` class functions as a synthetic data generator, automatically producing a dataset of mathematically structured problems.
45
+ Each sample consists of an expression built from three high-precision numbers (generated using the `decimal` module) combined with two randomly selected arithmetic operators:
46
+ addition, subtraction, multiplication, or division. These numbers are not standard floats but high-precision values with many decimal places,
47
+ significantly increasing the difficulty and minimizing the likelihood of correct answers through approximation.
48
+
49
+ For every generated expression, the algorithm computes the correct result with high numerical precision and then constructs a multiple-choice question.
50
+ Only one option corresponds to the correct answer, while the remaining alternatives are generated by applying small perturbations to the true result.
51
+ These incorrect options are intentionally designed to be plausible,
52
+ forcing the evaluated model to perform actual numerical reasoning rather than relying on superficial pattern recognition.
53
+
54
+ Additionally, both operators and answer choices are shuffled, ensuring structural diversity across samples and reducing positional bias.
55
+ The final output is a structured dataset containing input-output pairs, ready to be used in automated evaluation pipelines.
56
+
57
+ Among the main advantages of this benchmarking approach:
58
+
59
+ First, the **high numerical precision requirement** makes the test substantially more rigorous than traditional benchmarks based on simple arithmetic.
60
+ It allows clear differentiation between models that merely approximate reasoning and those that can consistently execute precise calculations.
61
+
62
+ Another key advantage is **scalable synthetic data generation**. Since the dataset is produced programmatically, it can be expanded arbitrarily,
63
+ with full control over complexity, distribution, and format. This removes reliance on static datasets and mitigates the risk of overfitting to known benchmarks.
64
+
65
+ The method is also **robust against memorization**, as each execution generates entirely new and unique problems.
66
+ This prevents models from gaining artificial advantages by memorizing fixed question sets.
67
+
68
+ Furthermore, the use of closely related incorrect alternatives enables **fine-grained evaluation**, capturing not only whether a model gets the answer right,
69
+ but also how close it gets. This is particularly useful for identifying systematic errors, such as precision loss or difficulties handling operator precedence.
70
+
71
+ Finally, the approach is **simple, automated, and easily integrable** into evaluation workflows,
72
+ making it suitable for both offline benchmarking and continuous testing during model development.
73
+
74
+ In summary, this algorithm provides an effective and rigorous benchmark for assessing a fundamental capability—mathematical precision—while balancing experimental control,
75
+ scalability, and realistic difficulty.
76
+ """
77
+ # --------------------------> A SAPIENS TECHNOLOGY®️ PRODUCTION) <--------------------------
@@ -0,0 +1,159 @@
1
+ """
2
+ The presented algorithm implements a benchmarking method designed to evaluate the **mathematical precision** of language models,
3
+ targeting a well-known limitation of these systems: their ability to reliably perform numerical reasoning under high precision constraints.
4
+
5
+ The `MathPrecision` class functions as a synthetic data generator, automatically producing a dataset of mathematically structured problems.
6
+ Each sample consists of an expression built from three high-precision numbers (generated using the `decimal` module) combined with two randomly selected arithmetic operators:
7
+ addition, subtraction, multiplication, or division. These numbers are not standard floats but high-precision values with many decimal places,
8
+ significantly increasing the difficulty and minimizing the likelihood of correct answers through approximation.
9
+
10
+ For every generated expression, the algorithm computes the correct result with high numerical precision and then constructs a multiple-choice question.
11
+ Only one option corresponds to the correct answer, while the remaining alternatives are generated by applying small perturbations to the true result.
12
+ These incorrect options are intentionally designed to be plausible,
13
+ forcing the evaluated model to perform actual numerical reasoning rather than relying on superficial pattern recognition.
14
+
15
+ Additionally, both operators and answer choices are shuffled, ensuring structural diversity across samples and reducing positional bias.
16
+ The final output is a structured dataset containing input-output pairs, ready to be used in automated evaluation pipelines.
17
+
18
+ Among the main advantages of this benchmarking approach:
19
+
20
+ First, the **high numerical precision requirement** makes the test substantially more rigorous than traditional benchmarks based on simple arithmetic.
21
+ It allows clear differentiation between models that merely approximate reasoning and those that can consistently execute precise calculations.
22
+
23
+ Another key advantage is **scalable synthetic data generation**. Since the dataset is produced programmatically, it can be expanded arbitrarily,
24
+ with full control over complexity, distribution, and format. This removes reliance on static datasets and mitigates the risk of overfitting to known benchmarks.
25
+
26
+ The method is also **robust against memorization**, as each execution generates entirely new and unique problems.
27
+ This prevents models from gaining artificial advantages by memorizing fixed question sets.
28
+
29
+ Furthermore, the use of closely related incorrect alternatives enables **fine-grained evaluation**, capturing not only whether a model gets the answer right,
30
+ but also how close it gets. This is particularly useful for identifying systematic errors, such as precision loss or difficulties handling operator precedence.
31
+
32
+ Finally, the approach is **simple, automated, and easily integrable** into evaluation workflows,
33
+ making it suitable for both offline benchmarking and continuous testing during model development.
34
+
35
+ In summary, this algorithm provides an effective and rigorous benchmark for assessing a fundamental capability—mathematical precision—while balancing experimental control,
36
+ scalability, and realistic difficulty.
37
+ """
38
+ # --------------------------> A SAPIENS TECHNOLOGY®️ PRODUCTION) <--------------------------
39
+ class MathPrecision:
40
+ def __init__(self, show_errors=True, display_error_point=False):
41
+ try:
42
+ self.__show_errors = bool(show_errors) if type(show_errors) in (bool, int, float) else True
43
+ self.__display_error_point = bool(display_error_point) if type(display_error_point) in (bool, int, float) else False
44
+ try:
45
+ from warnings import simplefilter, filterwarnings
46
+ from logging import disable, CRITICAL
47
+ from os import environ
48
+ simplefilter('ignore')
49
+ filterwarnings('ignore')
50
+ disable(CRITICAL)
51
+ except: pass
52
+ from traceback import print_exc
53
+ self.__print_exc = print_exc
54
+ except Exception as error:
55
+ try:
56
+ if self.__show_errors:
57
+ error_message = 'ERROR in MathPrecision.__init__: '+str(error)
58
+ print(error_message)
59
+ try: self.__print_exc() if self.__display_error_point else None
60
+ except: pass
61
+ except: pass
62
+ def generateDatabase(self, n_samples=10):
63
+ try:
64
+ return_dictionary, database = {}, []
65
+ n_samples = max(1, int(n_samples)) if type(n_samples) in (int, float) else 10
66
+ def _generate_high_precision_number(start=100, end=200, precision=100):
67
+ from decimal import getcontext, Decimal
68
+ from random import SystemRandom
69
+ getcontext().prec = precision
70
+ random_generator = SystemRandom()
71
+ integer_part = Decimal(random_generator.randint(start, end))
72
+ decimal_part = Decimal(str(random_generator.random()))
73
+ high_precision_number = integer_part + decimal_part
74
+ return high_precision_number
75
+ def _shuffle_list(input_list=[]):
76
+ from random import shuffle
77
+ shuffle(input_list)
78
+ return input_list
79
+ from random import randint
80
+ operators, alternatives = ['+', '-', '*', '/'], ['A)', 'B)', 'C)', 'D)']
81
+ for _ in range(n_samples):
82
+ operator_a = _shuffle_list(input_list=operators)[randint(0, 3)]
83
+ operator_b = _shuffle_list(input_list=operators)[randint(0, 3)]
84
+ number1, number2, number3 = _generate_high_precision_number(), _generate_high_precision_number(), _generate_high_precision_number()
85
+ number1, number2, number3 = sorted([number1, number2, number3])[::-1]
86
+ formatted_input = f'{number1} {operator_a} {number2} {operator_b} {number3}'
87
+ correct_answer = eval(formatted_input)
88
+ alternative_answers, input_output, used_values = [], {}, set()
89
+ for index, alternative in enumerate(_shuffle_list(input_list=alternatives)):
90
+ if index == 0:
91
+ formatted_correct = f'{correct_answer:.10f}'
92
+ correct_alternative = f'{alternative} {formatted_correct}'
93
+ alternative_answers.append(correct_alternative)
94
+ input_output['output'] = correct_alternative
95
+ used_values.add(formatted_correct)
96
+ else:
97
+ while True:
98
+ operator_x = _shuffle_list(input_list=['+', '-'])[randint(0, 1)]
99
+ wrong_answer = eval(f'{correct_answer} {operator_x} {_generate_high_precision_number(start=0, end=1)}')
100
+ formatted_wrong = f'{wrong_answer:.10f}'
101
+ if formatted_wrong not in used_values:
102
+ used_values.add(formatted_wrong)
103
+ string_wrong_answer = f'{alternative} {formatted_wrong}'
104
+ alternative_answers.append(string_wrong_answer)
105
+ break
106
+ _input = f'{formatted_input} = ?\n\n'
107
+ alternative_answers = sorted(alternative_answers)
108
+ for alternative_answer in alternative_answers: _input += f'{alternative_answer}\n'
109
+ input_output['input'] = _input.strip()
110
+ database.append(input_output)
111
+ return_dictionary['data'] = database
112
+ return return_dictionary
113
+ except Exception as error:
114
+ try:
115
+ if self.__show_errors:
116
+ error_message = 'ERROR in MathPrecision.generateDatabase: '+str(error)
117
+ print(error_message)
118
+ try: self.__print_exc() if self.__display_error_point else None
119
+ except: pass
120
+ except: pass
121
+ return {'data': []}
122
+ """
123
+ The presented algorithm implements a benchmarking method designed to evaluate the **mathematical precision** of language models,
124
+ targeting a well-known limitation of these systems: their ability to reliably perform numerical reasoning under high precision constraints.
125
+
126
+ The `MathPrecision` class functions as a synthetic data generator, automatically producing a dataset of mathematically structured problems.
127
+ Each sample consists of an expression built from three high-precision numbers (generated using the `decimal` module) combined with two randomly selected arithmetic operators:
128
+ addition, subtraction, multiplication, or division. These numbers are not standard floats but high-precision values with many decimal places,
129
+ significantly increasing the difficulty and minimizing the likelihood of correct answers through approximation.
130
+
131
+ For every generated expression, the algorithm computes the correct result with high numerical precision and then constructs a multiple-choice question.
132
+ Only one option corresponds to the correct answer, while the remaining alternatives are generated by applying small perturbations to the true result.
133
+ These incorrect options are intentionally designed to be plausible,
134
+ forcing the evaluated model to perform actual numerical reasoning rather than relying on superficial pattern recognition.
135
+
136
+ Additionally, both operators and answer choices are shuffled, ensuring structural diversity across samples and reducing positional bias.
137
+ The final output is a structured dataset containing input-output pairs, ready to be used in automated evaluation pipelines.
138
+
139
+ Among the main advantages of this benchmarking approach:
140
+
141
+ First, the **high numerical precision requirement** makes the test substantially more rigorous than traditional benchmarks based on simple arithmetic.
142
+ It allows clear differentiation between models that merely approximate reasoning and those that can consistently execute precise calculations.
143
+
144
+ Another key advantage is **scalable synthetic data generation**. Since the dataset is produced programmatically, it can be expanded arbitrarily,
145
+ with full control over complexity, distribution, and format. This removes reliance on static datasets and mitigates the risk of overfitting to known benchmarks.
146
+
147
+ The method is also **robust against memorization**, as each execution generates entirely new and unique problems.
148
+ This prevents models from gaining artificial advantages by memorizing fixed question sets.
149
+
150
+ Furthermore, the use of closely related incorrect alternatives enables **fine-grained evaluation**, capturing not only whether a model gets the answer right,
151
+ but also how close it gets. This is particularly useful for identifying systematic errors, such as precision loss or difficulties handling operator precedence.
152
+
153
+ Finally, the approach is **simple, automated, and easily integrable** into evaluation workflows,
154
+ making it suitable for both offline benchmarking and continuous testing during model development.
155
+
156
+ In summary, this algorithm provides an effective and rigorous benchmark for assessing a fundamental capability—mathematical precision—while balancing experimental control,
157
+ scalability, and realistic difficulty.
158
+ """
159
+ # --------------------------> A SAPIENS TECHNOLOGY®️ PRODUCTION) <--------------------------
@@ -0,0 +1,10 @@
1
+ Metadata-Version: 2.2
2
+ Name: math_precision
3
+ Version: 1.0.0
4
+ Home-page: https://github.com/sapiens-technology/math_precision
5
+ Author: SAPIENS TECHNOLOGY
6
+ License: Proprietary Software
7
+ License-File: LICENSE.txt
8
+ Dynamic: author
9
+ Dynamic: home-page
10
+ Dynamic: license
@@ -0,0 +1,9 @@
1
+ LICENSE.txt
2
+ setup.cfg
3
+ setup.py
4
+ math_precision/__init__.py
5
+ math_precision/math_precision.py
6
+ math_precision.egg-info/PKG-INFO
7
+ math_precision.egg-info/SOURCES.txt
8
+ math_precision.egg-info/dependency_links.txt
9
+ math_precision.egg-info/top_level.txt
@@ -0,0 +1 @@
1
+ math_precision
@@ -0,0 +1,7 @@
1
+ [metadata]
2
+ license_file = LICENSE.txt
3
+
4
+ [egg_info]
5
+ tag_build =
6
+ tag_date = 0
7
+
@@ -0,0 +1,87 @@
1
+ """
2
+ The presented algorithm implements a benchmarking method designed to evaluate the **mathematical precision** of language models,
3
+ targeting a well-known limitation of these systems: their ability to reliably perform numerical reasoning under high precision constraints.
4
+
5
+ The `MathPrecision` class functions as a synthetic data generator, automatically producing a dataset of mathematically structured problems.
6
+ Each sample consists of an expression built from three high-precision numbers (generated using the `decimal` module) combined with two randomly selected arithmetic operators:
7
+ addition, subtraction, multiplication, or division. These numbers are not standard floats but high-precision values with many decimal places,
8
+ significantly increasing the difficulty and minimizing the likelihood of correct answers through approximation.
9
+
10
+ For every generated expression, the algorithm computes the correct result with high numerical precision and then constructs a multiple-choice question.
11
+ Only one option corresponds to the correct answer, while the remaining alternatives are generated by applying small perturbations to the true result.
12
+ These incorrect options are intentionally designed to be plausible,
13
+ forcing the evaluated model to perform actual numerical reasoning rather than relying on superficial pattern recognition.
14
+
15
+ Additionally, both operators and answer choices are shuffled, ensuring structural diversity across samples and reducing positional bias.
16
+ The final output is a structured dataset containing input-output pairs, ready to be used in automated evaluation pipelines.
17
+
18
+ Among the main advantages of this benchmarking approach:
19
+
20
+ First, the **high numerical precision requirement** makes the test substantially more rigorous than traditional benchmarks based on simple arithmetic.
21
+ It allows clear differentiation between models that merely approximate reasoning and those that can consistently execute precise calculations.
22
+
23
+ Another key advantage is **scalable synthetic data generation**. Since the dataset is produced programmatically, it can be expanded arbitrarily,
24
+ with full control over complexity, distribution, and format. This removes reliance on static datasets and mitigates the risk of overfitting to known benchmarks.
25
+
26
+ The method is also **robust against memorization**, as each execution generates entirely new and unique problems.
27
+ This prevents models from gaining artificial advantages by memorizing fixed question sets.
28
+
29
+ Furthermore, the use of closely related incorrect alternatives enables **fine-grained evaluation**, capturing not only whether a model gets the answer right,
30
+ but also how close it gets. This is particularly useful for identifying systematic errors, such as precision loss or difficulties handling operator precedence.
31
+
32
+ Finally, the approach is **simple, automated, and easily integrable** into evaluation workflows,
33
+ making it suitable for both offline benchmarking and continuous testing during model development.
34
+
35
+ In summary, this algorithm provides an effective and rigorous benchmark for assessing a fundamental capability—mathematical precision—while balancing experimental control,
36
+ scalability, and realistic difficulty.
37
+ """
38
+ # --------------------------> A SAPIENS TECHNOLOGY®️ PRODUCTION) <--------------------------
39
+ from setuptools import setup, find_packages
40
+ package_name = 'math_precision'
41
+ version = '1.0.0'
42
+ setup(
43
+ name=package_name,
44
+ version=version,
45
+ author='SAPIENS TECHNOLOGY',
46
+ packages=find_packages(),
47
+ url='https://github.com/sapiens-technology/math_precision',
48
+ license='Proprietary Software'
49
+ )
50
+ """
51
+ The presented algorithm implements a benchmarking method designed to evaluate the **mathematical precision** of language models,
52
+ targeting a well-known limitation of these systems: their ability to reliably perform numerical reasoning under high precision constraints.
53
+
54
+ The `MathPrecision` class functions as a synthetic data generator, automatically producing a dataset of mathematically structured problems.
55
+ Each sample consists of an expression built from three high-precision numbers (generated using the `decimal` module) combined with two randomly selected arithmetic operators:
56
+ addition, subtraction, multiplication, or division. These numbers are not standard floats but high-precision values with many decimal places,
57
+ significantly increasing the difficulty and minimizing the likelihood of correct answers through approximation.
58
+
59
+ For every generated expression, the algorithm computes the correct result with high numerical precision and then constructs a multiple-choice question.
60
+ Only one option corresponds to the correct answer, while the remaining alternatives are generated by applying small perturbations to the true result.
61
+ These incorrect options are intentionally designed to be plausible,
62
+ forcing the evaluated model to perform actual numerical reasoning rather than relying on superficial pattern recognition.
63
+
64
+ Additionally, both operators and answer choices are shuffled, ensuring structural diversity across samples and reducing positional bias.
65
+ The final output is a structured dataset containing input-output pairs, ready to be used in automated evaluation pipelines.
66
+
67
+ Among the main advantages of this benchmarking approach:
68
+
69
+ First, the **high numerical precision requirement** makes the test substantially more rigorous than traditional benchmarks based on simple arithmetic.
70
+ It allows clear differentiation between models that merely approximate reasoning and those that can consistently execute precise calculations.
71
+
72
+ Another key advantage is **scalable synthetic data generation**. Since the dataset is produced programmatically, it can be expanded arbitrarily,
73
+ with full control over complexity, distribution, and format. This removes reliance on static datasets and mitigates the risk of overfitting to known benchmarks.
74
+
75
+ The method is also **robust against memorization**, as each execution generates entirely new and unique problems.
76
+ This prevents models from gaining artificial advantages by memorizing fixed question sets.
77
+
78
+ Furthermore, the use of closely related incorrect alternatives enables **fine-grained evaluation**, capturing not only whether a model gets the answer right,
79
+ but also how close it gets. This is particularly useful for identifying systematic errors, such as precision loss or difficulties handling operator precedence.
80
+
81
+ Finally, the approach is **simple, automated, and easily integrable** into evaluation workflows,
82
+ making it suitable for both offline benchmarking and continuous testing during model development.
83
+
84
+ In summary, this algorithm provides an effective and rigorous benchmark for assessing a fundamental capability—mathematical precision—while balancing experimental control,
85
+ scalability, and realistic difficulty.
86
+ """
87
+ # --------------------------> A SAPIENS TECHNOLOGY®️ PRODUCTION) <--------------------------