israeli-invoice-parser 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Yohay Cohen
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,152 @@
1
+ Metadata-Version: 2.4
2
+ Name: israeli-invoice-parser
3
+ Version: 0.1.0
4
+ Summary: A unified parsing library for Israeli digital grocery and retail receipts
5
+ Author-email: Yohay Cohen <yohaybn@gmail.com>
6
+ Project-URL: Homepage, https://github.com/yohaybn/israeli-invoice-parser-lib
7
+ Project-URL: Bug Tracker, https://github.com/yohaybn/israeli-invoice-parser-lib/issues
8
+ Classifier: Programming Language :: Python :: 3
9
+ Classifier: License :: OSI Approved :: MIT License
10
+ Classifier: Operating System :: OS Independent
11
+ Requires-Python: >=3.8
12
+ Description-Content-Type: text/markdown
13
+ License-File: LICENSE
14
+ Requires-Dist: beautifulsoup4>=4.12.0
15
+ Dynamic: license-file
16
+
17
+ # israel-invoice-parser
18
+
19
+ A unified, highly accurate Python parsing library for Israeli digital receipts, grocery bills, and commercial retail invoices. This library standardizes fragmented vendor payloads (including direct APIs, raw HTML, and complex Nuxt transport matrices) into a single, clean, structured Python dictionary.
20
+
21
+ ## Supported Retailers and Providers
22
+
23
+ The library supports major Israeli storefronts directly or via central receipt infrastructure aggregators:
24
+
25
+ * **Rami Levy (רמי לוי)** — Native support for standard digital bills and microservice data streams.
26
+ * **Weezmo / Wee.ai Infrastructure** — Multi-brand validation supporting grocery gateways like **Yohananof (יוחננוף)**, high-street retail setups, and fashion entities (**TopTen**, **Tamnun**).
27
+ * **Pairzon Engine (פיירזון)** — Dynamic resolution for short-link token routers and partner store layouts (e.g., **Osher Ad (אושר עד)**, **Max Stock (מקס סטוק)**, etc.).
28
+
29
+ > **Current Limitations:** Automated extraction for **Shufersal (שופרסל)** invoices is currently blocked. The endpoint uses robust bot-protection / WAF rules that reject standard programmatic requests. We are actively trying to figure out how to bypass or properly emulate browser signatures to restore this functionality. Contributions or ideas on this technical issue are highly appreciated!
30
+
31
+ ---
32
+
33
+ ## Installation
34
+
35
+ Install the package via `pip`:
36
+
37
+ ```bash
38
+ pip install israel-invoice-parser
39
+
40
+ ```
41
+
42
+ ---
43
+
44
+ ## Quick Start and Usage Examples
45
+
46
+ Every parser inherits from a common interface (`BaseReceiptParser`) and returns a standardized data model, making it simple to process invoices interchangeably.
47
+
48
+ ### 1. Parsing a Rami Levy URL
49
+
50
+ ```python
51
+ from invoice_parser import RamiLevyParser
52
+
53
+ # Initialize the dedicated parser
54
+ parser = RamiLevyParser()
55
+
56
+ # Pass a live receipt or invoice URL directly
57
+ url = "https://api-digi.rami-levy.co.il/api/v1/receipts/example-token-12345"
58
+ receipt = parser.parse(url)
59
+
60
+ # Access standardized fields uniformly
61
+ print(f"Store: {receipt['store_name']}")
62
+ print(f"Total Paid: ₪{receipt['total_paid']}")
63
+ print(f"Date: {receipt['date']} at {receipt['time']}")
64
+
65
+ for item in receipt['items']:
66
+ print(f" - {item['description']}: ₪{item['final_price']} (Qty: {item['quantity_or_weight']})")
67
+
68
+ ```
69
+
70
+ ### 2. Parsing a Weezmo / Wee.ai Provider Short-Link (e.g., Yohananof)
71
+
72
+ ```python
73
+ from invoice_parser import WeezmoParser
74
+
75
+ parser = WeezmoParser()
76
+
77
+ # Works with central wee.ai tracking tokens or short links
78
+ weezmo_url = "https://wee.ai/r/v123abcd"
79
+ receipt = parser.parse(weezmo_url)
80
+
81
+ # The parser dynamically extracts real corporate metadata to identify the sub-brand
82
+ print(f"Identified Brand: {receipt['store_name']}") # e.g., 'יוחננוף'
83
+ print(f"Legal Business ID: {receipt['company_legal_id']}")
84
+
85
+ ```
86
+
87
+ ### 3. Standardized Output Format Matrix
88
+
89
+ Regardless of which vendor parser is called, the output dictionary always complies with the following layout structure:
90
+
91
+ ```python
92
+ {
93
+ "store_name": "רמי לוי",
94
+ "company_legal_id": "513770669",
95
+ "branch_name": "סניף תל אביב",
96
+ "store_address": "דרך מנחם בגין 123",
97
+ "store_phone": "03-1234567",
98
+ "customer_name": "ישראל ישראלי",
99
+ "date": "23/06/2026",
100
+ "time": "14:30:00",
101
+ "receipt_id": "987654321",
102
+ "total_paid": 245.50,
103
+ "vat_rate": 17.0,
104
+ "total_vat_paid": 35.67,
105
+ "payment_method": "אשראי",
106
+ "items": [
107
+ {
108
+ "description": "חלב תנובה 3%",
109
+ "barcode": "7290000042431",
110
+ "is_by_weight": False,
111
+ "quantity_or_weight": 2.0,
112
+ "unit_price": 6.50,
113
+ "original_total_price": 13.00,
114
+ "is_part_of_deal": True,
115
+ "deal_text": "2 ב-₪11",
116
+ "discount_amount": 2.00,
117
+ "final_price": 11.00,
118
+ "category_path": ["סופרמרקט"]
119
+ }
120
+ ]
121
+ }
122
+
123
+ ```
124
+
125
+ ---
126
+
127
+ ## Contributing and Helping Out
128
+
129
+ Parsing real-world digital invoices is a game of cat-and-mouse as retailers update their internal schemas. We need your help to make this library resilient!
130
+
131
+ ### Have an Unsupported Receipt / Found a Bug?
132
+
133
+ If you run into an invoice that fails to parse (such as **Shufersal** or a newly formatted receipt Layout):
134
+
135
+ 1. Open a New Issue on the [israeli-invoice-parser-lib Bug Tracker](https://github.com/yohaybn/israeli-invoice-parser-lib/issues).
136
+ 2. **Crucial:** Provide a **real, live link to the invoice**. Without a working URL, it is impossible to inspect the underlying network payload structure, test backend responses, or map out the necessary payload parameters.
137
+ 3. If you have suggestions or workarounds for bypassing Shufersal's anti-bot restrictions, please detail them inside the dedicated discussion issues!
138
+
139
+ ### Want to Add a New Parser?
140
+
141
+ We warmly welcome pull requests! To contribute a new parser:
142
+
143
+ 1. Subclass `BaseReceiptParser` from `base_parser.py`.
144
+ 2. Implement the `.parse(self, source_data: str) -> Dict[str, Any]` method.
145
+ 3. Map the data cleanly into our uniform dictionary format.
146
+ 4. Submit your PR directly to the [GitHub Repository](https://github.com/yohaybn/israeli-invoice-parser-lib/).
147
+
148
+ ---
149
+
150
+ ## License
151
+
152
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
@@ -0,0 +1,136 @@
1
+ # israel-invoice-parser
2
+
3
+ A unified, highly accurate Python parsing library for Israeli digital receipts, grocery bills, and commercial retail invoices. This library standardizes fragmented vendor payloads (including direct APIs, raw HTML, and complex Nuxt transport matrices) into a single, clean, structured Python dictionary.
4
+
5
+ ## Supported Retailers and Providers
6
+
7
+ The library supports major Israeli storefronts directly or via central receipt infrastructure aggregators:
8
+
9
+ * **Rami Levy (רמי לוי)** — Native support for standard digital bills and microservice data streams.
10
+ * **Weezmo / Wee.ai Infrastructure** — Multi-brand validation supporting grocery gateways like **Yohananof (יוחננוף)**, high-street retail setups, and fashion entities (**TopTen**, **Tamnun**).
11
+ * **Pairzon Engine (פיירזון)** — Dynamic resolution for short-link token routers and partner store layouts (e.g., **Osher Ad (אושר עד)**, **Max Stock (מקס סטוק)**, etc.).
12
+
13
+ > **Current Limitations:** Automated extraction for **Shufersal (שופרסל)** invoices is currently blocked. The endpoint uses robust bot-protection / WAF rules that reject standard programmatic requests. We are actively trying to figure out how to bypass or properly emulate browser signatures to restore this functionality. Contributions or ideas on this technical issue are highly appreciated!
14
+
15
+ ---
16
+
17
+ ## Installation
18
+
19
+ Install the package via `pip`:
20
+
21
+ ```bash
22
+ pip install israel-invoice-parser
23
+
24
+ ```
25
+
26
+ ---
27
+
28
+ ## Quick Start and Usage Examples
29
+
30
+ Every parser inherits from a common interface (`BaseReceiptParser`) and returns a standardized data model, making it simple to process invoices interchangeably.
31
+
32
+ ### 1. Parsing a Rami Levy URL
33
+
34
+ ```python
35
+ from invoice_parser import RamiLevyParser
36
+
37
+ # Initialize the dedicated parser
38
+ parser = RamiLevyParser()
39
+
40
+ # Pass a live receipt or invoice URL directly
41
+ url = "https://api-digi.rami-levy.co.il/api/v1/receipts/example-token-12345"
42
+ receipt = parser.parse(url)
43
+
44
+ # Access standardized fields uniformly
45
+ print(f"Store: {receipt['store_name']}")
46
+ print(f"Total Paid: ₪{receipt['total_paid']}")
47
+ print(f"Date: {receipt['date']} at {receipt['time']}")
48
+
49
+ for item in receipt['items']:
50
+ print(f" - {item['description']}: ₪{item['final_price']} (Qty: {item['quantity_or_weight']})")
51
+
52
+ ```
53
+
54
+ ### 2. Parsing a Weezmo / Wee.ai Provider Short-Link (e.g., Yohananof)
55
+
56
+ ```python
57
+ from invoice_parser import WeezmoParser
58
+
59
+ parser = WeezmoParser()
60
+
61
+ # Works with central wee.ai tracking tokens or short links
62
+ weezmo_url = "https://wee.ai/r/v123abcd"
63
+ receipt = parser.parse(weezmo_url)
64
+
65
+ # The parser dynamically extracts real corporate metadata to identify the sub-brand
66
+ print(f"Identified Brand: {receipt['store_name']}") # e.g., 'יוחננוף'
67
+ print(f"Legal Business ID: {receipt['company_legal_id']}")
68
+
69
+ ```
70
+
71
+ ### 3. Standardized Output Format Matrix
72
+
73
+ Regardless of which vendor parser is called, the output dictionary always complies with the following layout structure:
74
+
75
+ ```python
76
+ {
77
+ "store_name": "רמי לוי",
78
+ "company_legal_id": "513770669",
79
+ "branch_name": "סניף תל אביב",
80
+ "store_address": "דרך מנחם בגין 123",
81
+ "store_phone": "03-1234567",
82
+ "customer_name": "ישראל ישראלי",
83
+ "date": "23/06/2026",
84
+ "time": "14:30:00",
85
+ "receipt_id": "987654321",
86
+ "total_paid": 245.50,
87
+ "vat_rate": 17.0,
88
+ "total_vat_paid": 35.67,
89
+ "payment_method": "אשראי",
90
+ "items": [
91
+ {
92
+ "description": "חלב תנובה 3%",
93
+ "barcode": "7290000042431",
94
+ "is_by_weight": False,
95
+ "quantity_or_weight": 2.0,
96
+ "unit_price": 6.50,
97
+ "original_total_price": 13.00,
98
+ "is_part_of_deal": True,
99
+ "deal_text": "2 ב-₪11",
100
+ "discount_amount": 2.00,
101
+ "final_price": 11.00,
102
+ "category_path": ["סופרמרקט"]
103
+ }
104
+ ]
105
+ }
106
+
107
+ ```
108
+
109
+ ---
110
+
111
+ ## Contributing and Helping Out
112
+
113
+ Parsing real-world digital invoices is a game of cat-and-mouse as retailers update their internal schemas. We need your help to make this library resilient!
114
+
115
+ ### Have an Unsupported Receipt / Found a Bug?
116
+
117
+ If you run into an invoice that fails to parse (such as **Shufersal** or a newly formatted receipt Layout):
118
+
119
+ 1. Open a New Issue on the [israeli-invoice-parser-lib Bug Tracker](https://github.com/yohaybn/israeli-invoice-parser-lib/issues).
120
+ 2. **Crucial:** Provide a **real, live link to the invoice**. Without a working URL, it is impossible to inspect the underlying network payload structure, test backend responses, or map out the necessary payload parameters.
121
+ 3. If you have suggestions or workarounds for bypassing Shufersal's anti-bot restrictions, please detail them inside the dedicated discussion issues!
122
+
123
+ ### Want to Add a New Parser?
124
+
125
+ We warmly welcome pull requests! To contribute a new parser:
126
+
127
+ 1. Subclass `BaseReceiptParser` from `base_parser.py`.
128
+ 2. Implement the `.parse(self, source_data: str) -> Dict[str, Any]` method.
129
+ 3. Map the data cleanly into our uniform dictionary format.
130
+ 4. Submit your PR directly to the [GitHub Repository](https://github.com/yohaybn/israeli-invoice-parser-lib/).
131
+
132
+ ---
133
+
134
+ ## License
135
+
136
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
@@ -0,0 +1,28 @@
1
+ [build-system]
2
+ requires = ["setuptools>=61.0.0", "wheel"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [project]
6
+ name = "israeli-invoice-parser"
7
+ version = "0.1.0"
8
+ authors = [
9
+ { name = "Yohay Cohen", email = "yohaybn@gmail.com" }
10
+ ]
11
+ description = "A unified parsing library for Israeli digital grocery and retail receipts"
12
+ readme = "README.md"
13
+ requires-python = ">=3.8"
14
+ classifiers = [
15
+ "Programming Language :: Python :: 3",
16
+ "License :: OSI Approved :: MIT License",
17
+ "Operating System :: OS Independent",
18
+ ]
19
+ dependencies = [
20
+ "beautifulsoup4>=4.12.0",
21
+ ]
22
+
23
+ [project.urls]
24
+ "Homepage" = "https://github.com/yohaybn/israeli-invoice-parser-lib"
25
+ "Bug Tracker" = "https://github.com/yohaybn/israeli-invoice-parser-lib/issues"
26
+
27
+ [tool.setuptools.packages.find]
28
+ where = ["src"]
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+
@@ -0,0 +1,12 @@
1
+ from .base_parser import BaseReceiptParser, NuxtDataHydrator
2
+ from .pairzon_parser import PairzonParser
3
+ from .rami_levy_parser import RamiLevyParser
4
+ from .weezmo_parser import WeezmoParser
5
+
6
+ __all__ = [
7
+ "BaseReceiptParser",
8
+ "NuxtDataHydrator",
9
+ "PairzonParser",
10
+ "RamiLevyParser",
11
+ "WeezmoParser",
12
+ ]
@@ -0,0 +1,40 @@
1
+ from abc import ABC, abstractmethod
2
+ from typing import Dict, Any, List
3
+
4
+ class BaseReceiptParser(ABC):
5
+ def __init__(self, store_name: str) -> None:
6
+ self.store_name: str = store_name
7
+
8
+ @abstractmethod
9
+ def parse(self, source_data: str) -> Dict[str, Any]:
10
+ pass
11
+
12
+ class NuxtDataHydrator:
13
+ """
14
+ Decompresses multi-type transport data matrices from Nuxt 3 back
15
+ into standard dictionaries, cleanly handling cross-referenced table indices.
16
+ """
17
+ def __init__(self, data_list: List[Any]) -> None:
18
+ self.raw_pool: List[Any] = data_list
19
+ self.visited = set()
20
+
21
+ def hydrate_node(self, node: Any) -> Any:
22
+ if isinstance(node, int) and 0 <= node < len(self.raw_pool):
23
+ if node in self.visited:
24
+ return f"<CircularRef: Index {node}>"
25
+
26
+ self.visited.add(node)
27
+ resolved = self.raw_pool[node]
28
+ result = self._transform(resolved)
29
+ self.visited.remove(node)
30
+ return result
31
+ return self._transform(node)
32
+
33
+ def _transform(self, val: Any) -> Any:
34
+ if isinstance(val, dict):
35
+ return {k: self.hydrate_node(v) for k, v in val.items()}
36
+ elif isinstance(val, list):
37
+ if val and val[0] in ("ShallowReactive", "Reactive", "ShallowRef", "Ref"):
38
+ return self.hydrate_node(val[1])
39
+ return [self.hydrate_node(item) for item in val]
40
+ return val
@@ -0,0 +1,186 @@
1
+ import json
2
+ import logging
3
+ import os
4
+ import urllib.request
5
+ import urllib.error
6
+ from urllib.parse import urlparse, parse_qs
7
+ from typing import Dict, Any
8
+ from .base_parser import BaseReceiptParser
9
+
10
+ logging.basicConfig(level=logging.INFO)
11
+ logger = logging.getLogger("PairzonParser")
12
+
13
+ class PairzonParser(BaseReceiptParser):
14
+ def __init__(self) -> None:
15
+ # Initialize with a flexible placeholder; we will dynamically
16
+ # override self.store_name based on the receipt's real corporate metadata.
17
+ super().__init__(store_name="Pairzon Provider")
18
+
19
+ def parse(self, source_data: str) -> Dict[str, Any]:
20
+ raw_json: str = ""
21
+
22
+ if source_data.startswith("http://") or source_data.startswith("https://"):
23
+ try:
24
+ parsed_url = urlparse(source_data.strip())
25
+ queries = parse_qs(parsed_url.query)
26
+
27
+ doc_id = queries.get("id", [None])[0]
28
+ pin_id = queries.get("p", [None])[0]
29
+
30
+ headers = {
31
+ 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
32
+ 'Accept': 'application/json, text/plain, */*',
33
+ 'Accept-Language': 'he-IL,he;q=0.9,en-US;q=0.8,en;q=0.7'
34
+ }
35
+
36
+ # Custom handler to block automatic 302 jumps so we can catch short-links gracefully
37
+ class NoRedirectHandler(urllib.request.HTTPRedirectHandler):
38
+ def http_error_302(self, req, fp, code, msg, headers):
39
+ return fp
40
+
41
+ opener = urllib.request.build_opener(NoRedirectHandler())
42
+
43
+ # 1. Coordinate Extraction (Direct Parameters vs Short-Link Token Routing)
44
+ if doc_id and pin_id:
45
+ api_url = f"https://{parsed_url.netloc}/v1.0/documents/{doc_id}?p={pin_id}"
46
+ else:
47
+ logger.info("Processing standard short-link sequence. Resolving internal payload routing...")
48
+ path_parts = [p for p in parsed_url.path.split('/') if p]
49
+ if len(path_parts) < 2:
50
+ raise ValueError("The provided Pairzon short-link route structure is invalid or missing paths.")
51
+
52
+ prefix = path_parts[0]
53
+ token = path_parts[1]
54
+
55
+ # Target Pairzon's centralized cross-brand mapping api
56
+ link_lookup_url = f"https://{parsed_url.netloc}/v1.0/links/{prefix}/{token}"
57
+ logger.info(f"Querying endpoint metadata resolver: {link_lookup_url}")
58
+
59
+ lookup_req = urllib.request.Request(link_lookup_url, headers=headers)
60
+ try:
61
+ with urllib.request.urlopen(lookup_req, timeout=15) as lookup_res:
62
+ lookup_data = json.loads(lookup_res.read().decode('utf-8'))
63
+ if isinstance(lookup_data, dict) and "data" in lookup_data:
64
+ lookup_data = lookup_data["data"]
65
+
66
+ doc_id = lookup_data.get("documentId") or lookup_data.get("id")
67
+ pin_id = lookup_data.get("prefix") or prefix
68
+ except Exception as e:
69
+ logger.warning(f"Metadata link API resolution dropped ({e}). Trying 302 header interception loop...")
70
+
71
+ req = urllib.request.Request(source_data, headers=headers)
72
+ with opener.open(req, timeout=15) as response:
73
+ redirect_location = response.headers.get('Location', '')
74
+ if redirect_location:
75
+ parsed_redirect = urlparse(redirect_location)
76
+ redirect_queries = parse_qs(parsed_redirect.query)
77
+ doc_id = redirect_queries.get("id", [None])[0]
78
+ pin_id = redirect_queries.get("p", [None])[0]
79
+
80
+ if doc_id and pin_id:
81
+ api_url = f"https://{parsed_url.netloc}/v1.0/documents/{doc_id}?p={pin_id}"
82
+ else:
83
+ raise ValueError("Failed to extract backend parameters from token signature.")
84
+
85
+ # 2. Complete Transaction JSON Fetch
86
+ logger.info(f"Targeting active data stream gateway: {api_url}")
87
+ data_req = urllib.request.Request(api_url, headers=headers)
88
+ with urllib.request.urlopen(data_req, timeout=15) as response:
89
+ raw_json = response.read().decode('utf-8')
90
+
91
+ os.makedirs("temp", exist_ok=True)
92
+ with open("temp/pairzon_generic_raw.json", "w", encoding="utf-8") as f:
93
+ f.write(raw_json)
94
+
95
+ except urllib.error.HTTPError as http_err:
96
+ error_body = http_err.read().decode('utf-8', errors='ignore')[:500]
97
+ logger.error(f"Pairzon backend connection returned error code ({http_err.code}). Payload: {error_body}")
98
+ raise ValueError(f"שגיאת תקשורת מול שרת קבלות פיירזון: {http_err.code}")
99
+ except Exception as e:
100
+ logger.error(f"Critical execution error resolving Pairzon network document: {e}")
101
+ raise e
102
+ else:
103
+ raw_json = source_data
104
+
105
+ # 3. Dynamic Mapping & Extraction Grid
106
+ try:
107
+ if not raw_json.strip():
108
+ raise ValueError("The resolved data stream payload came back empty.")
109
+
110
+ payload = json.loads(raw_json)
111
+ if isinstance(payload, dict) and "data" in payload:
112
+ payload = payload["data"]
113
+
114
+ # Dynamic Brand Identification
115
+ store_info = payload.get("store", {}) or {}
116
+ biz_info = store_info.get("business", {}) or {}
117
+
118
+ # Extract brand name dynamically from business node or fallback to domain signatures
119
+ dynamic_store_name = biz_info.get("name", store_info.get("name", "רשת קמעונאות")).strip()
120
+ self.store_name = dynamic_store_name
121
+ logger.info(f"Dynamic branding identity successfully verified as: '{self.store_name}'")
122
+
123
+ # Standardize date segments
124
+ created_date = payload.get("createdDate", "2026-01-01T00:00:00")
125
+ date_part, time_part = created_date.split("T") if "T" in created_date else (created_date, "00:00:00")
126
+ if "-" in date_part:
127
+ parts = date_part.split("-")
128
+ formatted_date = f"{parts[2]}/{parts[1]}/{parts[0]}"
129
+ else:
130
+ formatted_date = date_part
131
+
132
+ unified_receipt: Dict[str, Any] = {
133
+ "store_name": self.store_name,
134
+ "company_legal_id": str(biz_info.get("companyLeagalId", payload.get("businessID", "513461053"))),
135
+ "branch_name": store_info.get("name", "סניף כללי").strip(),
136
+ "store_address": store_info.get("address", "").strip() or biz_info.get("address", "").strip(),
137
+ "store_phone": store_info.get("phone", "").strip() or biz_info.get("phone", "").strip(),
138
+ "customer_name": payload.get("cashierName", "").strip() or None,
139
+ "date": formatted_date,
140
+ "time": time_part[:8],
141
+ "receipt_id": str(payload.get("transactionID", payload.get("id", ""))),
142
+ "total_paid": float(payload.get("total", 0.0)),
143
+ "vat_rate": float(payload.get("Vat", 17.0)),
144
+ "total_vat_paid": float(payload.get("totalVat", 0.0)),
145
+ "payment_method": payload.get("payments", [{}])[0].get("name", "").strip() or "אשראי",
146
+ "items": []
147
+ }
148
+
149
+ for item in payload.get("items", []):
150
+ weight = item.get("weight")
151
+ quantity = float(weight / 1000.0) if weight else float(item.get("quantity", 1.0))
152
+ unit_price = float(item.get("price", 0.0))
153
+ final_price = float(item.get("total", quantity * unit_price))
154
+ expected_total = round(quantity * unit_price, 2)
155
+
156
+ deal_description = ""
157
+ add_info = item.get("additionalInfo", [])
158
+ if isinstance(add_info, list) and len(add_info) > 0:
159
+ deal_description = str(add_info[0].get("key", "")).strip()
160
+
161
+ has_deal = False
162
+ discount_amount = 0.0
163
+ if deal_description or final_price < expected_total:
164
+ has_deal = True
165
+ discount_amount = max(0.0, round(expected_total - final_price, 2))
166
+ if not deal_description:
167
+ deal_description = "מבצע רשת"
168
+
169
+ unified_receipt["items"].append({
170
+ "description": item.get("name", "פריט").strip(),
171
+ "barcode": str(item.get("code")) if item.get("code") else None,
172
+ "is_by_weight": True if weight else False,
173
+ "quantity_or_weight": quantity,
174
+ "unit_price": unit_price,
175
+ "original_total_price": expected_total,
176
+ "is_part_of_deal": has_deal,
177
+ "deal_text": deal_description or None,
178
+ "discount_amount": discount_amount,
179
+ "final_price": final_price,
180
+ "category_path": item.get("category", ["כללי"])
181
+ })
182
+
183
+ return unified_receipt
184
+ except Exception as ex:
185
+ logger.error(f"Failed parsing inner metrics via Pairzon schema definition matrix: {ex}")
186
+ raise ex
@@ -0,0 +1,208 @@
1
+ import json
2
+ import logging
3
+ import os
4
+ import re
5
+ import urllib.request
6
+ import urllib.error
7
+ from typing import Dict, Any
8
+ from bs4 import BeautifulSoup
9
+ from .base_parser import BaseReceiptParser, NuxtDataHydrator
10
+
11
+ logging.basicConfig(level=logging.INFO)
12
+ logger = logging.getLogger("RamiLevyParser")
13
+
14
+ class RamiLevyParser(BaseReceiptParser):
15
+ def __init__(self) -> None:
16
+ super().__init__(store_name="Rami Levy")
17
+
18
+ def parse(self, source_data: str) -> Dict[str, Any]:
19
+ html_content = ""
20
+
21
+ if source_data.startswith("http://") or source_data.startswith("https://"):
22
+ try:
23
+ # Clean and identify short links vs direct data resources
24
+ target_url = source_data.strip()
25
+ logger.info(f"Downloading live Rami Levy content context: {target_url}")
26
+
27
+ # Emulate a complete browser identity to step through security filters
28
+ headers = {
29
+ 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
30
+ 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8',
31
+ 'Accept-Language': 'he-IL,he;q=0.9,en-US;q=0.8,en;q=0.7',
32
+ 'Cache-Control': 'no-cache',
33
+ 'Pragma': 'no-cache'
34
+ }
35
+
36
+ req = urllib.request.Request(target_url, headers=headers)
37
+ with urllib.request.urlopen(req, timeout=15) as response:
38
+ html_content = response.read().decode('utf-8')
39
+
40
+ # Save data backup trace locally for evaluation
41
+ os.makedirs("temp", exist_ok=True)
42
+ with open("temp/rami_levy_raw_page.html", "w", encoding="utf-8") as f:
43
+ f.write(html_content)
44
+
45
+ except urllib.error.HTTPError as http_err:
46
+ logger.error(f"Rami Levy gateway connection dropped (HTTP {http_err.code})")
47
+ raise ValueError(f"נכשל חיבור לשרת רמי לוי. קוד שגיאה: {http_err.code}")
48
+ except Exception as e:
49
+ logger.error(f"Failed downloading remote Rami Levy page context: {e}")
50
+ raise e
51
+ else:
52
+ html_content = source_data
53
+
54
+ try:
55
+ raw_json_text = ""
56
+
57
+ # 1. Standard HTML Extraction Flow
58
+ if "<script" in html_content or "<body" in html_content:
59
+ soup = BeautifulSoup(html_content, "html.parser")
60
+ nuxt_script = soup.find("script", id="__NUXT_DATA__")
61
+ if not nuxt_script:
62
+ nuxt_script = soup.find("script", string=re.compile(r'__NUXT_DATA__'))
63
+
64
+ if nuxt_script:
65
+ raw_json_text = nuxt_script.string if nuxt_script.string else nuxt_script.text
66
+
67
+ # 2. API Fallback Flow (If Nuxt text blocks are missing or encoded)
68
+ if not raw_json_text.strip():
69
+ logger.info("Script extraction returned empty text. Attempting API transformation route...")
70
+ # Extract the token directly out of the URL path (e.g. /0fFlup4Bp5Iw_ikNn9ZU)
71
+ url_path = urlparse(source_data).path if source_data.startswith("http") else ""
72
+ token_match = re.search(r'/([^/]+)$', url_path)
73
+
74
+ if token_match:
75
+ token = token_match.group(1)
76
+ # Query Rami Levy's microservice data endpoint directly
77
+ api_fallback_url = f"https://api-digi.rami-levy.co.il/api/v1/receipts/{token}"
78
+ logger.info(f"Querying production backup data endpoint: {api_fallback_url}")
79
+
80
+ api_headers = {
81
+ 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)',
82
+ 'Accept': 'application/json'
83
+ }
84
+ fallback_req = urllib.request.Request(api_fallback_url, headers=api_headers)
85
+ with urllib.request.urlopen(fallback_req, timeout=15) as fallback_res:
86
+ raw_json_text = fallback_res.read().decode('utf-8')
87
+ else:
88
+ raise ValueError("Could not extract document tracking tokens from provided link layout.")
89
+
90
+ if not raw_json_text.strip():
91
+ raise ValueError("Rami Levy parsing sequence generated an empty payload text string.")
92
+
93
+ raw_payload = json.loads(raw_json_text.strip())
94
+
95
+ # Backup uncompressed JSON objects locally
96
+ with open("temp/rami_levy_raw.json", "w", encoding="utf-8") as f:
97
+ json.dump(raw_payload, f, ensure_ascii=False, indent=2)
98
+
99
+ # Initialize hydration pipeline
100
+ # If the payload is from the backup direct API, it will already be unflattened.
101
+ # If it's standard Nuxt transport text, use index 5 to expand the root data trees.
102
+ if isinstance(raw_payload, list):
103
+ hydrator = NuxtDataHydrator(raw_payload)
104
+ # Search across primary structural indices for receipt context arrays
105
+ receipt_core = None
106
+ for base_index in (5, 4, 3, 2, 1):
107
+ try:
108
+ node = hydrator.hydrate_node(base_index)
109
+ if isinstance(node, dict) and ("items" in node or "branch" in node):
110
+ receipt_core = node
111
+ break
112
+ elif isinstance(node, dict) and "data" in node:
113
+ inner_data = node["data"]
114
+ if isinstance(inner_data, dict):
115
+ # Loop keys to check for hidden instances
116
+ for k, v in inner_data.items():
117
+ if isinstance(v, dict) and "items" in v:
118
+ receipt_core = v
119
+ break
120
+ except Exception:
121
+ continue
122
+ if not receipt_core:
123
+ raise ValueError("Failed to locate receipt parameters inside the Nuxt transport grid.")
124
+ else:
125
+ # Payload is already standard dictionary data from API backup stream
126
+ receipt_core = raw_payload.get("data", raw_payload)
127
+
128
+ branch_info = receipt_core.get("branch", {}) or {}
129
+ company_info = receipt_core.get("company", {}) or {}
130
+ payment_core = receipt_core.get("payments", {}) or {}
131
+
132
+ methods_list = payment_core.get("methods", [])
133
+ primary_method = methods_list[0] if isinstance(methods_list, list) and len(methods_list) > 0 else {}
134
+
135
+ created_at = receipt_core.get("created_at", "2026-01-01T00:00:00.000Z")
136
+ date_part, time_part = created_at.split("T") if "T" in created_at else (created_at, "00:00:00")
137
+ if "-" in date_part:
138
+ parts = date_part.split("-")
139
+ formatted_date = f"{parts[2]}/{parts[1]}/{parts[0]}"
140
+ else:
141
+ formatted_date = date_part
142
+
143
+ unified_receipt: Dict[str, Any] = {
144
+ "store_name": self.store_name,
145
+ "company_legal_id": str(company_info.get("tax_id", receipt_core.get("business_id", "513770669"))),
146
+ "branch_name": str(branch_info.get("name", "רמי לוי סניף")).strip(),
147
+ "store_address": str(company_info.get("address", "")).strip(),
148
+ "store_phone": str(company_info.get("customer_service", {}).get("branch_phone", "")).strip(),
149
+ "customer_name": str(receipt_core.get("customer", {}).get("name", "")).strip() or None,
150
+ "date": formatted_date,
151
+ "time": time_part[:8],
152
+ "receipt_id": str(receipt_core.get("transaction_id", receipt_core.get("id", ""))),
153
+ "total_paid": float(payment_core.get("total", receipt_core.get("total", 0.0))),
154
+ "vat_rate": float(receipt_core.get("vat_rate", 18.0)),
155
+ "total_vat_paid": float(payment_core.get("total_vat", 0.0)),
156
+ "payment_method": str(primary_method.get("name", "אשראי")).strip(),
157
+ "items": []
158
+ }
159
+
160
+ for item in receipt_core.get("items", []):
161
+ if not isinstance(item, dict):
162
+ continue
163
+
164
+ weight_val = item.get("weight")
165
+ quantity = float(weight_val / 1000.0) if weight_val else float(item.get("quantity", 1.0))
166
+ unit_price = float(item.get("price", 0.0))
167
+
168
+ expected_total = round(quantity * unit_price, 2)
169
+
170
+ discount_amount = 0.0
171
+ deal_description = ""
172
+ add_info = item.get("additional_info", [])
173
+ if isinstance(add_info, list):
174
+ for info_node in add_info:
175
+ if isinstance(info_node, dict) and "value" in info_node:
176
+ val_str = str(info_node.get("value", "0"))
177
+ if "-" in val_str:
178
+ try:
179
+ discount_amount = abs(float(val_str))
180
+ deal_description = str(info_node.get("key", "")).strip()
181
+ except ValueError:
182
+ pass
183
+
184
+ final_price = round(expected_total - discount_amount, 2)
185
+ # Verify cross-referenced pricing boundaries
186
+ if final_price <= 0 and item.get("total"):
187
+ final_price = float(item.get("total"))
188
+
189
+ has_deal = True if (discount_amount > 0 or deal_description) else False
190
+
191
+ unified_receipt["items"].append({
192
+ "description": str(item.get("name", "פריט")).strip(),
193
+ "barcode": str(item.get("code")) if item.get("code") else None,
194
+ "is_by_weight": True if weight_val else False,
195
+ "quantity_or_weight": quantity,
196
+ "unit_price": unit_price,
197
+ "original_total_price": expected_total,
198
+ "is_part_of_deal": has_deal,
199
+ "deal_text": deal_description or None,
200
+ "discount_amount": discount_amount,
201
+ "final_price": final_price,
202
+ "category_path": ["סופרמרקט"]
203
+ })
204
+
205
+ return unified_receipt
206
+ except Exception as ex:
207
+ logger.error(f"Error expanding serialized Rami Levy tables: {ex}")
208
+ raise ex
@@ -0,0 +1,180 @@
1
+ import json
2
+ import logging
3
+ import os
4
+ import urllib.request
5
+ import urllib.error
6
+ from urllib.parse import urlparse, parse_qs, urljoin
7
+ from typing import Dict, Any
8
+ from .base_parser import BaseReceiptParser
9
+
10
+ logging.basicConfig(level=logging.INFO)
11
+ logger = logging.getLogger("WeezmoParser")
12
+
13
+ class WeezmoParser(BaseReceiptParser):
14
+ def __init__(self) -> None:
15
+ super().__init__(store_name="Weezmo Provider")
16
+
17
+ def parse(self, source_data: str) -> Dict[str, Any]:
18
+ raw_json: str = ""
19
+
20
+ if source_data.startswith("http://") or source_data.startswith("https://"):
21
+ try:
22
+ base_url = source_data.strip()
23
+
24
+ # Modern browser header signature matrix to prevent WAF socket hanging
25
+ headers = {
26
+ 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
27
+ 'Accept': 'application/json, text/plain, */*',
28
+ 'Accept-Language': 'he-IL,he;q=0.9,en-US;q=0.8,en;q=0.7',
29
+ 'Connection': 'close' # Explicitly close the connection to prevent hanging sockets
30
+ }
31
+
32
+ parsed_url = urlparse(base_url)
33
+ queries = parse_qs(parsed_url.query)
34
+
35
+ receipt_token = queries.get("q", [None])[0]
36
+
37
+ # If it's a short link (wee.ai), catch the 302 redirect location header
38
+ if not receipt_token and "wee.ai" in base_url:
39
+ logger.info("Intercepting wee.ai short-link redirection matrix...")
40
+
41
+ class InterceptRedirectHandler(urllib.request.HTTPRedirectHandler):
42
+ def http_error_302(self, req, fp, code, msg, headers):
43
+ return fp
44
+
45
+ # Build opener with low fallback timeouts
46
+ opener = urllib.request.build_opener(InterceptRedirectHandler())
47
+ req = urllib.request.Request(base_url, headers=headers)
48
+
49
+ with opener.open(req, timeout=8) as response:
50
+ redirect_location = response.headers.get('Location', '')
51
+ if redirect_location:
52
+ # Safely handle both absolute and relative paths (/cms.html?q=...)
53
+ if not redirect_location.startswith('http'):
54
+ redirect_location = urljoin(base_url, redirect_location)
55
+
56
+ logger.info(f"Redirection resolved to target landing: {redirect_location}")
57
+ parsed_redirect = urlparse(redirect_location)
58
+ redirect_queries = parse_qs(parsed_redirect.query)
59
+ receipt_token = redirect_queries.get("q", [None])[0]
60
+
61
+ if not receipt_token:
62
+ raise ValueError("Could not extract a valid query token 'q' from the Weezmo link framework.")
63
+
64
+ api_url = f"https://receipts.weezmo.com/api/receipts/{receipt_token}"
65
+ logger.info(f"Targeting Weezmo data provider gateway: {api_url}")
66
+
67
+ # Download the target stream with an explicit socket timeout trigger (10 seconds max)
68
+ api_req = urllib.request.Request(api_url, headers=headers)
69
+ with urllib.request.urlopen(api_req, timeout=10) as response:
70
+ raw_json = response.read().decode('utf-8')
71
+
72
+ os.makedirs("temp", exist_ok=True)
73
+ with open("temp/weezmo_generic_raw.json", "w", encoding="utf-8") as f:
74
+ f.write(raw_json)
75
+
76
+ except urllib.error.HTTPError as http_err:
77
+ logger.error(f"Weezmo engine connection exception: {http_err.code}")
78
+ raise ValueError(f"שגיאת תקשורת מול שרת וויזמו: קוד {http_err.code}")
79
+ except urllib.error.URLError as url_err:
80
+ logger.error(f"Weezmo network timeout or unresolved destination: {url_err.reason}")
81
+ raise ValueError("חיבור הרשת לשרת וויזמו נותק או הגיע למגבלת זמן (Timeout).")
82
+ except Exception as e:
83
+ logger.error(f"Critical execution error resolving Weezmo document: {e}")
84
+ raise e
85
+ else:
86
+ raw_json = source_data
87
+
88
+ try:
89
+ if not raw_json.strip():
90
+ raise ValueError("Payload data stream returned empty string bounds.")
91
+
92
+ payload_data = json.loads(raw_json)
93
+ if isinstance(payload_data, list):
94
+ if len(payload_data) == 0:
95
+ raise ValueError("Weezmo data matrix array is empty.")
96
+ payload = payload_data[0]
97
+ else:
98
+ payload = payload_data
99
+
100
+ branch_info = payload.get("tBranch", {}) or {}
101
+ business_info = payload.get("tBusiness", {}) or {}
102
+
103
+ dynamic_store_name = business_info.get("businessName", branch_info.get("branchName", "רשת קמעונאות")).strip()
104
+ self.store_name = dynamic_store_name
105
+ logger.info(f"Dynamic branding identity successfully verified as: '{self.store_name}'")
106
+
107
+ created_date = payload.get("createdDate", "2026-01-01T00:00:00Z")
108
+ date_part, time_part = created_date.split("T") if "T" in created_date else (created_date, "00:00:00")
109
+ if "-" in date_part:
110
+ parts = date_part.split("-")
111
+ formatted_date = f"{parts[2]}/{parts[1]}/{parts[0]}"
112
+ else:
113
+ formatted_date = date_part
114
+
115
+ payments_list = payload.get("payments", [])
116
+ primary_payment = payments_list[0] if isinstance(payments_list, list) and len(payments_list) > 0 else {}
117
+
118
+ unified_receipt: Dict[str, Any] = {
119
+ "store_name": self.store_name,
120
+ "company_legal_id": str(branch_info.get("vatNumber", payload.get("businessID", "515136893"))),
121
+ "branch_name": str(branch_info.get("branchName", "סניף כללי")).strip(),
122
+ "store_address": str(branch_info.get("branchAddress", "")).strip(),
123
+ "store_phone": str(business_info.get("phone", "")).strip() or str(branch_info.get("branchPhone", "")).strip(),
124
+ "customer_name": str(payload.get("loyalName", "")).strip() or None,
125
+ "date": formatted_date,
126
+ "time": time_part[:8],
127
+ "receipt_id": str(payload.get("transactionNumber", payload.get("id", ""))),
128
+ "total_paid": float(payload.get("total", 0.0)),
129
+ "vat_rate": float(payload.get("vat", 17.0)),
130
+ "total_vat_paid": float(payload.get("vatTotal", 0.0)),
131
+ "payment_method": str(primary_payment.get("name", "אשראי")).strip(),
132
+ "items": []
133
+ }
134
+
135
+ for item in payload.get("items", []):
136
+ if not isinstance(item, dict):
137
+ continue
138
+
139
+ quantity = float(item.get("quantity", 1.0))
140
+ unit_price = float(item.get("price", 0.0))
141
+ final_price = float(item.get("total", quantity * unit_price))
142
+ expected_total = round(quantity * unit_price, 2)
143
+
144
+ discount_amount = 0.0
145
+ deal_description = ""
146
+ additional_data = item.get("additionalData", [])
147
+
148
+ if isinstance(additional_data, list):
149
+ for data_node in additional_data:
150
+ if isinstance(data_node, dict) and "value" in data_node:
151
+ val_str = str(data_node.get("value", ""))
152
+ if "-" in val_str:
153
+ try:
154
+ discount_amount = abs(float(val_str))
155
+ deal_description = str(data_node.get("key", "")).strip()
156
+ except ValueError:
157
+ pass
158
+
159
+ has_deal = True if (discount_amount > 0 or deal_description) else False
160
+ is_weight = not quantity.is_integer()
161
+
162
+ unified_receipt["items"].append({
163
+ "description": str(item.get("name", "פריט")).strip(),
164
+ "barcode": str(item.get("itemCode")) if item.get("itemCode") else None,
165
+ "is_by_weight": is_weight,
166
+ "quantity_or_weight": quantity,
167
+ "unit_price": unit_price,
168
+ "original_total_price": expected_total,
169
+ "is_part_of_deal": has_deal,
170
+ "deal_text": deal_description or None,
171
+ "discount_amount": discount_amount,
172
+ "final_price": final_price,
173
+ "category_path": ["ביגוד ואופנה"] if self.store_name == "H&O" else ["סופרמרקט"]
174
+ })
175
+
176
+ return unified_receipt
177
+
178
+ except Exception as ex:
179
+ logger.error(f"Error mapping Weezmo dynamic parameters: {ex}")
180
+ raise ex
@@ -0,0 +1,152 @@
1
+ Metadata-Version: 2.4
2
+ Name: israeli-invoice-parser
3
+ Version: 0.1.0
4
+ Summary: A unified parsing library for Israeli digital grocery and retail receipts
5
+ Author-email: Yohay Cohen <yohaybn@gmail.com>
6
+ Project-URL: Homepage, https://github.com/yohaybn/israeli-invoice-parser-lib
7
+ Project-URL: Bug Tracker, https://github.com/yohaybn/israeli-invoice-parser-lib/issues
8
+ Classifier: Programming Language :: Python :: 3
9
+ Classifier: License :: OSI Approved :: MIT License
10
+ Classifier: Operating System :: OS Independent
11
+ Requires-Python: >=3.8
12
+ Description-Content-Type: text/markdown
13
+ License-File: LICENSE
14
+ Requires-Dist: beautifulsoup4>=4.12.0
15
+ Dynamic: license-file
16
+
17
+ # israel-invoice-parser
18
+
19
+ A unified, highly accurate Python parsing library for Israeli digital receipts, grocery bills, and commercial retail invoices. This library standardizes fragmented vendor payloads (including direct APIs, raw HTML, and complex Nuxt transport matrices) into a single, clean, structured Python dictionary.
20
+
21
+ ## Supported Retailers and Providers
22
+
23
+ The library supports major Israeli storefronts directly or via central receipt infrastructure aggregators:
24
+
25
+ * **Rami Levy (רמי לוי)** — Native support for standard digital bills and microservice data streams.
26
+ * **Weezmo / Wee.ai Infrastructure** — Multi-brand validation supporting grocery gateways like **Yohananof (יוחננוף)**, high-street retail setups, and fashion entities (**TopTen**, **Tamnun**).
27
+ * **Pairzon Engine (פיירזון)** — Dynamic resolution for short-link token routers and partner store layouts (e.g., **Osher Ad (אושר עד)**, **Max Stock (מקס סטוק)**, etc.).
28
+
29
+ > **Current Limitations:** Automated extraction for **Shufersal (שופרסל)** invoices is currently blocked. The endpoint uses robust bot-protection / WAF rules that reject standard programmatic requests. We are actively trying to figure out how to bypass or properly emulate browser signatures to restore this functionality. Contributions or ideas on this technical issue are highly appreciated!
30
+
31
+ ---
32
+
33
+ ## Installation
34
+
35
+ Install the package via `pip`:
36
+
37
+ ```bash
38
+ pip install israel-invoice-parser
39
+
40
+ ```
41
+
42
+ ---
43
+
44
+ ## Quick Start and Usage Examples
45
+
46
+ Every parser inherits from a common interface (`BaseReceiptParser`) and returns a standardized data model, making it simple to process invoices interchangeably.
47
+
48
+ ### 1. Parsing a Rami Levy URL
49
+
50
+ ```python
51
+ from invoice_parser import RamiLevyParser
52
+
53
+ # Initialize the dedicated parser
54
+ parser = RamiLevyParser()
55
+
56
+ # Pass a live receipt or invoice URL directly
57
+ url = "https://api-digi.rami-levy.co.il/api/v1/receipts/example-token-12345"
58
+ receipt = parser.parse(url)
59
+
60
+ # Access standardized fields uniformly
61
+ print(f"Store: {receipt['store_name']}")
62
+ print(f"Total Paid: ₪{receipt['total_paid']}")
63
+ print(f"Date: {receipt['date']} at {receipt['time']}")
64
+
65
+ for item in receipt['items']:
66
+ print(f" - {item['description']}: ₪{item['final_price']} (Qty: {item['quantity_or_weight']})")
67
+
68
+ ```
69
+
70
+ ### 2. Parsing a Weezmo / Wee.ai Provider Short-Link (e.g., Yohananof)
71
+
72
+ ```python
73
+ from invoice_parser import WeezmoParser
74
+
75
+ parser = WeezmoParser()
76
+
77
+ # Works with central wee.ai tracking tokens or short links
78
+ weezmo_url = "https://wee.ai/r/v123abcd"
79
+ receipt = parser.parse(weezmo_url)
80
+
81
+ # The parser dynamically extracts real corporate metadata to identify the sub-brand
82
+ print(f"Identified Brand: {receipt['store_name']}") # e.g., 'יוחננוף'
83
+ print(f"Legal Business ID: {receipt['company_legal_id']}")
84
+
85
+ ```
86
+
87
+ ### 3. Standardized Output Format Matrix
88
+
89
+ Regardless of which vendor parser is called, the output dictionary always complies with the following layout structure:
90
+
91
+ ```python
92
+ {
93
+ "store_name": "רמי לוי",
94
+ "company_legal_id": "513770669",
95
+ "branch_name": "סניף תל אביב",
96
+ "store_address": "דרך מנחם בגין 123",
97
+ "store_phone": "03-1234567",
98
+ "customer_name": "ישראל ישראלי",
99
+ "date": "23/06/2026",
100
+ "time": "14:30:00",
101
+ "receipt_id": "987654321",
102
+ "total_paid": 245.50,
103
+ "vat_rate": 17.0,
104
+ "total_vat_paid": 35.67,
105
+ "payment_method": "אשראי",
106
+ "items": [
107
+ {
108
+ "description": "חלב תנובה 3%",
109
+ "barcode": "7290000042431",
110
+ "is_by_weight": False,
111
+ "quantity_or_weight": 2.0,
112
+ "unit_price": 6.50,
113
+ "original_total_price": 13.00,
114
+ "is_part_of_deal": True,
115
+ "deal_text": "2 ב-₪11",
116
+ "discount_amount": 2.00,
117
+ "final_price": 11.00,
118
+ "category_path": ["סופרמרקט"]
119
+ }
120
+ ]
121
+ }
122
+
123
+ ```
124
+
125
+ ---
126
+
127
+ ## Contributing and Helping Out
128
+
129
+ Parsing real-world digital invoices is a game of cat-and-mouse as retailers update their internal schemas. We need your help to make this library resilient!
130
+
131
+ ### Have an Unsupported Receipt / Found a Bug?
132
+
133
+ If you run into an invoice that fails to parse (such as **Shufersal** or a newly formatted receipt Layout):
134
+
135
+ 1. Open a New Issue on the [israeli-invoice-parser-lib Bug Tracker](https://github.com/yohaybn/israeli-invoice-parser-lib/issues).
136
+ 2. **Crucial:** Provide a **real, live link to the invoice**. Without a working URL, it is impossible to inspect the underlying network payload structure, test backend responses, or map out the necessary payload parameters.
137
+ 3. If you have suggestions or workarounds for bypassing Shufersal's anti-bot restrictions, please detail them inside the dedicated discussion issues!
138
+
139
+ ### Want to Add a New Parser?
140
+
141
+ We warmly welcome pull requests! To contribute a new parser:
142
+
143
+ 1. Subclass `BaseReceiptParser` from `base_parser.py`.
144
+ 2. Implement the `.parse(self, source_data: str) -> Dict[str, Any]` method.
145
+ 3. Map the data cleanly into our uniform dictionary format.
146
+ 4. Submit your PR directly to the [GitHub Repository](https://github.com/yohaybn/israeli-invoice-parser-lib/).
147
+
148
+ ---
149
+
150
+ ## License
151
+
152
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
@@ -0,0 +1,13 @@
1
+ LICENSE
2
+ README.md
3
+ pyproject.toml
4
+ src/israeli-invoice-parser/__init__.py
5
+ src/israeli-invoice-parser/base_parser.py
6
+ src/israeli-invoice-parser/pairzon_parser.py
7
+ src/israeli-invoice-parser/rami_levy_parser.py
8
+ src/israeli-invoice-parser/weezmo_parser.py
9
+ src/israeli_invoice_parser.egg-info/PKG-INFO
10
+ src/israeli_invoice_parser.egg-info/SOURCES.txt
11
+ src/israeli_invoice_parser.egg-info/dependency_links.txt
12
+ src/israeli_invoice_parser.egg-info/requires.txt
13
+ src/israeli_invoice_parser.egg-info/top_level.txt