mymagic 0.0.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
mymagic-0.0.1/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2022 digipodium
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
mymagic-0.0.1/PKG-INFO ADDED
@@ -0,0 +1,171 @@
1
+ Metadata-Version: 2.4
2
+ Name: mymagic
3
+ Version: 0.0.1
4
+ Summary: This is a test library
5
+ License: MIT
6
+ License-File: LICENSE
7
+ Author: Mohammad Mubassir
8
+ Author-email: triplem656@gmail.com
9
+ Maintainer: Mohammad Mubassir
10
+ Maintainer-email: triplem656@gmail.com
11
+ Requires-Python: >=3.10,<4.0
12
+ Classifier: License :: OSI Approved :: MIT License
13
+ Classifier: Programming Language :: Python :: 3
14
+ Classifier: Programming Language :: Python :: 3.10
15
+ Classifier: Programming Language :: Python :: 3.11
16
+ Classifier: Programming Language :: Python :: 3.12
17
+ Classifier: Programming Language :: Python :: 3.13
18
+ Classifier: Programming Language :: Python :: 3.14
19
+ Requires-Dist: requests (>=2.32.5,<3.0.0)
20
+ Project-URL: Documentation, https://digipodium.github.io/dputils/
21
+ Project-URL: Homepage, https://digipodium.github.io/dputils/
22
+ Project-URL: Repository, https://github.com/digipodium/dputils
23
+ Description-Content-Type: text/markdown
24
+
25
+
26
+
27
+ <img alt="Python Version" src="https://img.shields.io/badge/python-3.8+-blue"> <img alt="Contributions welcome" src="https://img.shields.io/badge/contributions-welcome-brightgreen.svg"> <img alt="License" src="https://img.shields.io/badge/license-MIT-green"> <img alt="Build Status" src="https://img.shields.io/badge/build-passing-brightgreen.svg">
28
+
29
+ <img alt="Documentation Status" src="https://img.shields.io/badge/documentation-up%20to%20date-brightgreen.svg"> <img alt="PyPI - Downloads" src="https://img.shields.io/pypi/dm/dputils"> <img alt="Stars" src="https://img.shields.io/github/stars/digipodium/dputils?style=social">
30
+
31
+
32
+ A python library which can be used to extraxct data from files, pdfs, doc(x) files, as well as save data into these
33
+ files. This library can be used to scrape and extract webpage data from websites as well.
34
+
35
+ # Installation Requirements and Instructions
36
+
37
+ Python versions 3.8 or above should be installed. After that open your terminal:
38
+ For Windows users:
39
+
40
+ ```shell
41
+ pip install dputils
42
+ ```
43
+
44
+ For Mac/Linux users:
45
+
46
+ ```shell
47
+ pip3 install dputils
48
+ ```
49
+
50
+ # Files Module
51
+
52
+ Functions from dputils.files:
53
+ for now, the files module has two functions:
54
+
55
+ 1. get_data:
56
+ - To import, use statement:
57
+ ```python3
58
+ from dputils.files import get_data
59
+ ```
60
+ - Obtains data from files of any extension given as args(supports text files, binary files, pdf, doc for now, more
61
+ coming!)
62
+ - sample call:
63
+ ```python3
64
+ content = get_data(r"sample.docx")
65
+ print(content)
66
+ ```
67
+
68
+ - Returns a string or binary data depending on the output arg
69
+ - images will not be extracted
70
+
71
+ 2. save_data:
72
+ - save_data can be used to write and save data into a file of valid extension.
73
+ - sample call:
74
+ ```python3
75
+ from dputils.files import save_data
76
+
77
+ pdfContent = save_data("sample.pdf", "Sample text to insert")
78
+ print(pdfContent)
79
+ ```
80
+ - Returns True if file is successfully accessed and modified. Otherwise, False.
81
+
82
+ # Scrape Module
83
+
84
+ #### Data extraction from a page
85
+
86
+ Here's a basic tutorial to help you get started with the `scraper` module.
87
+
88
+ 1. **Import the required classes and functions:**
89
+
90
+ ```python
91
+ from dputils.scrape import Scraper, Tag
92
+ ```
93
+
94
+ 2. **Initialize the `Scraper` class with the URL of the webpage you want to scrape:**
95
+
96
+ ```python
97
+ url = "https://www.example.com"
98
+ scraper = Scraper(url)
99
+ ```
100
+
101
+ 3. **Define the tags you want to scrape using the `Tag` class:**
102
+
103
+ ```python
104
+ title_tag = Tag(name='h1', cls='title', output='text')
105
+ price_tag = Tag(name='span', cls='price', output='text')
106
+ ```
107
+
108
+ 4. **Extract data from the page:**
109
+
110
+ ```python
111
+ data = scraper.get_data_from_page(title=title_tag, price=price_tag)
112
+ print(data)
113
+ ```
114
+
115
+ #### Extracting list of items from a page
116
+ For more advanced usage, such as extracting repeated data from lists of items on a page, you can use the following approach:
117
+
118
+ 1. **Initialize the `Scraper` class:**
119
+
120
+ ```python
121
+ url = "https://www.example.com/products"
122
+ scraper = Scraper(url)
123
+ ```
124
+
125
+ 2. **Define the tags for the target section and the items within that section:**
126
+ For repeated data extraction, you need to define `Target` and `item` and pass it to `get_repeating_data_from_page()` method.
127
+ - *target* - defines the `Tag()` for area of the page containing the list of items.
128
+ - *items* - defines the `Tag()` for repeated items within the target section. Like a product-card in product grid/list.
129
+ ```python
130
+ target_tag = Tag(name='div', cls='product-list')
131
+ item_tag = Tag(name='div', cls='product-item')
132
+ title_tag = Tag(name='h2', cls='product-title', output='text')
133
+ price_tag = Tag(name='span', cls='product-price', output='text')
134
+ link_tag = Tag(name='a', cls='product-link', output='href')
135
+ ```
136
+
137
+ 1. **Extract repeated data from the page:**
138
+
139
+ ```python
140
+ products = scraper.get_repeating_data_from_page(
141
+ target=target_tag,
142
+ items=item_tag,
143
+ title=title_tag,
144
+ price=price_tag,
145
+ link=link_tag
146
+ )
147
+ for product in products:
148
+ print(product)
149
+ ```
150
+
151
+ These functions can used on python versions 3.8 or greater.
152
+
153
+ References for more help: https://digipodium.github.io/dputils/
154
+
155
+ # Contribution
156
+ if you want to contribute to this project and make it better, your help is very welcome.
157
+ * Fork the project
158
+ * Create your feature branch (`git checkout -b feature/fooBar`)
159
+ * Commit your changes (`git commit -am 'Add some fooBar`')
160
+ * Push to the branch (`git push origin feature/fooBar`)
161
+ * Create a new Pull Request
162
+ * Wait for your PR to be reviewed and merged
163
+ * Star the project if you've found it useful
164
+ * Share the project with your friends
165
+ * Create an issue if you find a bug or want to request a new feature
166
+ * Improve the project by refactoring the code
167
+ * Review the PRs of other contributors
168
+ * Suggest new features
169
+ * Suggest new technologies to be used
170
+
171
+ Thank you for using dputils!
@@ -0,0 +1,7 @@
1
+ __version__ = '0.0.1'
2
+
3
+ '''
4
+ mymagic
5
+ ~~~~~~~
6
+
7
+ '''
@@ -0,0 +1,4 @@
1
+ import requests
2
+
3
+ def testme():
4
+ print('its working')
@@ -0,0 +1,20 @@
1
+ [tool.poetry]
2
+ name = "mymagic"
3
+ version = "0.0.1"
4
+ description = "This is a test library"
5
+ authors = ["Mohammad Mubassir <triplem656@gmail.com>"]
6
+ maintainers = ["Mohammad Mubassir <triplem656@gmail.com>"]
7
+ readme = "readme.md"
8
+ license = "MIT"
9
+
10
+ homepage = "https://digipodium.github.io/dputils/"
11
+ repository = "https://github.com/digipodium/dputils"
12
+ documentation = "https://digipodium.github.io/dputils/"
13
+
14
+ [tool.poetry.dependencies]
15
+ python = "^3.10"
16
+ requests = "^2.32.5"
17
+
18
+ [build-system]
19
+ requires = ["poetry-core>=1.0.0"]
20
+ build-backend = "poetry.core.masonry.api"
@@ -0,0 +1,147 @@
1
+
2
+
3
+ <img alt="Python Version" src="https://img.shields.io/badge/python-3.8+-blue"> <img alt="Contributions welcome" src="https://img.shields.io/badge/contributions-welcome-brightgreen.svg"> <img alt="License" src="https://img.shields.io/badge/license-MIT-green"> <img alt="Build Status" src="https://img.shields.io/badge/build-passing-brightgreen.svg">
4
+
5
+ <img alt="Documentation Status" src="https://img.shields.io/badge/documentation-up%20to%20date-brightgreen.svg"> <img alt="PyPI - Downloads" src="https://img.shields.io/pypi/dm/dputils"> <img alt="Stars" src="https://img.shields.io/github/stars/digipodium/dputils?style=social">
6
+
7
+
8
+ A python library which can be used to extraxct data from files, pdfs, doc(x) files, as well as save data into these
9
+ files. This library can be used to scrape and extract webpage data from websites as well.
10
+
11
+ # Installation Requirements and Instructions
12
+
13
+ Python versions 3.8 or above should be installed. After that open your terminal:
14
+ For Windows users:
15
+
16
+ ```shell
17
+ pip install dputils
18
+ ```
19
+
20
+ For Mac/Linux users:
21
+
22
+ ```shell
23
+ pip3 install dputils
24
+ ```
25
+
26
+ # Files Module
27
+
28
+ Functions from dputils.files:
29
+ for now, the files module has two functions:
30
+
31
+ 1. get_data:
32
+ - To import, use statement:
33
+ ```python3
34
+ from dputils.files import get_data
35
+ ```
36
+ - Obtains data from files of any extension given as args(supports text files, binary files, pdf, doc for now, more
37
+ coming!)
38
+ - sample call:
39
+ ```python3
40
+ content = get_data(r"sample.docx")
41
+ print(content)
42
+ ```
43
+
44
+ - Returns a string or binary data depending on the output arg
45
+ - images will not be extracted
46
+
47
+ 2. save_data:
48
+ - save_data can be used to write and save data into a file of valid extension.
49
+ - sample call:
50
+ ```python3
51
+ from dputils.files import save_data
52
+
53
+ pdfContent = save_data("sample.pdf", "Sample text to insert")
54
+ print(pdfContent)
55
+ ```
56
+ - Returns True if file is successfully accessed and modified. Otherwise, False.
57
+
58
+ # Scrape Module
59
+
60
+ #### Data extraction from a page
61
+
62
+ Here's a basic tutorial to help you get started with the `scraper` module.
63
+
64
+ 1. **Import the required classes and functions:**
65
+
66
+ ```python
67
+ from dputils.scrape import Scraper, Tag
68
+ ```
69
+
70
+ 2. **Initialize the `Scraper` class with the URL of the webpage you want to scrape:**
71
+
72
+ ```python
73
+ url = "https://www.example.com"
74
+ scraper = Scraper(url)
75
+ ```
76
+
77
+ 3. **Define the tags you want to scrape using the `Tag` class:**
78
+
79
+ ```python
80
+ title_tag = Tag(name='h1', cls='title', output='text')
81
+ price_tag = Tag(name='span', cls='price', output='text')
82
+ ```
83
+
84
+ 4. **Extract data from the page:**
85
+
86
+ ```python
87
+ data = scraper.get_data_from_page(title=title_tag, price=price_tag)
88
+ print(data)
89
+ ```
90
+
91
+ #### Extracting list of items from a page
92
+ For more advanced usage, such as extracting repeated data from lists of items on a page, you can use the following approach:
93
+
94
+ 1. **Initialize the `Scraper` class:**
95
+
96
+ ```python
97
+ url = "https://www.example.com/products"
98
+ scraper = Scraper(url)
99
+ ```
100
+
101
+ 2. **Define the tags for the target section and the items within that section:**
102
+ For repeated data extraction, you need to define `Target` and `item` and pass it to `get_repeating_data_from_page()` method.
103
+ - *target* - defines the `Tag()` for area of the page containing the list of items.
104
+ - *items* - defines the `Tag()` for repeated items within the target section. Like a product-card in product grid/list.
105
+ ```python
106
+ target_tag = Tag(name='div', cls='product-list')
107
+ item_tag = Tag(name='div', cls='product-item')
108
+ title_tag = Tag(name='h2', cls='product-title', output='text')
109
+ price_tag = Tag(name='span', cls='product-price', output='text')
110
+ link_tag = Tag(name='a', cls='product-link', output='href')
111
+ ```
112
+
113
+ 1. **Extract repeated data from the page:**
114
+
115
+ ```python
116
+ products = scraper.get_repeating_data_from_page(
117
+ target=target_tag,
118
+ items=item_tag,
119
+ title=title_tag,
120
+ price=price_tag,
121
+ link=link_tag
122
+ )
123
+ for product in products:
124
+ print(product)
125
+ ```
126
+
127
+ These functions can used on python versions 3.8 or greater.
128
+
129
+ References for more help: https://digipodium.github.io/dputils/
130
+
131
+ # Contribution
132
+ if you want to contribute to this project and make it better, your help is very welcome.
133
+ * Fork the project
134
+ * Create your feature branch (`git checkout -b feature/fooBar`)
135
+ * Commit your changes (`git commit -am 'Add some fooBar`')
136
+ * Push to the branch (`git push origin feature/fooBar`)
137
+ * Create a new Pull Request
138
+ * Wait for your PR to be reviewed and merged
139
+ * Star the project if you've found it useful
140
+ * Share the project with your friends
141
+ * Create an issue if you find a bug or want to request a new feature
142
+ * Improve the project by refactoring the code
143
+ * Review the PRs of other contributors
144
+ * Suggest new features
145
+ * Suggest new technologies to be used
146
+
147
+ Thank you for using dputils!