als-document 0.12.0 → 1.0.0-beta

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/readme.md CHANGED
@@ -1,204 +1,273 @@
1
- # Als-Document
1
+ # als-document: HTML Parser & DOM Manipulation Library
2
2
 
3
- *If something wrong or not working properly, please write me to: sh.mashkanta@gmail.com*
3
+ ## Overview
4
4
 
5
+ `als-document` is a powerful library for parsing HTML and manipulating the DOM structure on backend and frontend. It provides a robust and intuitive API for querying and interacting with DOM elements using selectors, making it a valuable tool for web developers.
5
6
 
6
- Document is a class which gets html as string and return new object with DOM tree.
7
- You can add or remove elements from DOM tree and modify each element.
8
- You can select elements and collections and read and modify them with given instruments.
7
+ ## Release notes
8
+ * als-document is still on alpha testing. All tested features works fine, but through the use, discovering some bugs or things that should work different. For example in this release, changed the way for storing attributes with empty value.
9
+ * Also, this release, has additional very powefull feature which is building cache for storing DOM tree as json and building back DOM from cache.
9
10
 
10
- **0.11**
11
- * events issue fixed
12
11
 
13
- **0.12**
14
- * added getAttribute and setAttribute
12
+ ## Installation
15
13
 
16
- ## Write and Read files
17
- Document have 2 static methods to read and write files.
18
- The syntax:
19
- ```javascript
20
- Document.writeFile(filePath,obj,encoding = 'utf-8')
21
- Document.readFile(filePath,encoding = 'utf-8')
14
+ To install the `als-document` library, use the following npm command:
15
+
16
+ ```bash
17
+ npm i als-document
22
18
  ```
23
19
 
24
- * ``filepath`` - filepath can be string (absolute path to file) or array for joining.
25
- * ``obj`` - obj can be string or object for stringify.
26
- * ``encoding`` - encoding for read or write file
27
20
 
28
- Example:
21
+ ## Including the Library
22
+
23
+ The library provides three different files to cater to different module systems:
24
+
25
+ 1. **index.js**: This file uses the CommonJS module system. It's suitable for projects using Node.js or bundlers like Browserify or Webpack. The entry point in `package.json` for this file is "main".
26
+
29
27
  ```javascript
30
- let {Document} = require('als-document')
31
- let html = Document.readFile([__dirname,'index.html'])
28
+ const { parseHTML, Node, Query, TextNode, SingleNode,Root } = require('als-document');
32
29
  ```
33
30
 
31
+ 2. **index.mjs**: This file uses the ES Modules (ESM) system. It's suitable for modern JavaScript environments that support ESM. The entry point in `package.json` for this file is "module".
34
32
 
35
- ## Creating new object
33
+ ```js
34
+ import { parseHTML, Node, Query, TextNode, SingleNode, Root } from 'als-document';
35
+ ```
36
36
 
37
- Document constructor get single string parameter - the outerHTML for converting to virtual DOM tree.
37
+ 3. **document.js**: By including this file, a constant variable named `alsDocument` is created, which wraps all the exports.
38
38
 
39
- ```javascript
40
- let document = new Document(html) // html has to be string
41
- document.domTree // includes virtual DOM tree as array of elements
39
+ ```html
40
+ <script src="/node_modules/als-document/document.js"></script>
41
+ <script>
42
+ const { parseHTML, Node, Query, TextNode, SingleNode, buildFromCache, cacheDoc, Root } = alsDocument
43
+ </script>
42
44
  ```
43
45
 
46
+ ## parseHTML
44
47
 
45
- ## QuerySelector for single element
46
- Then document object has created, you can select elements or collections.
47
- For selecting single element, use ``$(selector)`` and for selecting collections ``$$(selector)``.
48
+ `parseHTML` is a function that takes an HTML string and constructs a DOM tree representation from it. It recognizes various HTML elements, such as comments, scripts, styles, and CDATA, and organizes them into nodes that can be manipulated and queried.
48
49
 
49
- **Selecting element**
50
+ ### API:
51
+ `parseHTML(html: string) -> Node`
50
52
 
51
- ```javascript
52
- document.$('div') // select first div in document
53
- document.$('div.some') // select first div element with some class
54
- ```
53
+ Parses an HTML string and returns a tree structure representing its content.
55
54
 
56
- At this time, selector supports this:
57
- * Selects all elements - ``*``
58
- * element - ``div``
59
- * class - ``.some-class``
60
- * id - ``#some-id``
61
- * parent - ``div > p``
62
- * next - ``div + p``
63
- * previous - ``p ~ ul``
64
- * attribute - ``[some-attribute="some value"]``
65
- * ``[prop]``
66
- * ``[prop~=value]``
67
- * ``[prop|=value]``
68
- * ``[prop^="value"]``
69
- * ``[prop$="value"]``
70
- * ``[prop*="value"]``
55
+ * `html`: The HTML string to parse.
56
+ * `Returns`: A Node object representing the root of the parsed HTML content tree.
71
57
 
58
+ ### Expected Outcome:
59
+ When using the parseHTML function, the output will be a tree of nodes representing the HTML content. Each node can be one of the following:
60
+ * **Node**: A standard HTML element node with tag name, attributes, and child nodes.
61
+ * **SingleNode**: Represents self-closing or void HTML elements.
62
+ * **TextNode**: Represents text content in the HTML.
72
63
 
73
- The folowing, **won't work**: ``div p``.
64
+ Each node will have a tag name, a dictionary of attributes, and a list of child nodes (if applicable).
74
65
 
66
+ ### Examples
75
67
 
76
- Each returned element, has the folowing:
77
- ```javascript
78
- {
79
- parent, // parent element
80
- prev, // previous element (null if no exists)
81
- next, // next element (null if no exists)
82
- innerText, // innner text of element and it's childNodes separated by |
83
- children, // array of childNodes(elements and text nodes) - includes text element too
84
- tagName, // tag name of element
85
- id, // id of element if exists (not included in attributes)
86
- attributes, // object of attributes (id not included)
87
- classList, // array of classes and add and remove methods
88
- getAttribute(name)
89
- setAttribute(name,value)
90
- $(selector),
91
- $$(selector),
92
- json(), // remove all methods and circular objects from object
93
- remove(), // remove this element
94
- add(element/outerHtml,place),
95
- add0(element/outerHtml),
96
- add1(element/outerHtml),
97
- add2(element/outerHtml),
98
- add3(element/outerHtml),
99
- }
100
- ```
68
+ ```js
69
+ const parsedHTML = parseHTML('<div class="container"><img src="image.jpg" alt="Image"/><p>Hello, world!</p></div>');
101
70
 
102
- Text node has the folowing:
103
- ```javascript
104
- {
105
- text,
106
- prev,
107
- next
108
- }
71
+ // The returned `parsedHTML` object will be a tree-like structure.
72
+ // For instance, parsedHTML.childNodes[0] would represent the <div> element,
73
+ // and parsedHTML.childNodes[0].childNodes[0] would represent the <img> element inside it.
109
74
  ```
110
75
 
111
- Comment node:
112
- ```javascript
113
- tagName:comment,
114
- comment // comment it self
115
- ```
76
+ ```js
77
+ const parsedScript = parseHTML('<script>console.log("Hello, world!");</script>');
116
78
 
117
- You can add or remove classes with classList methods.
118
- Example:
119
- ```javascript
120
- let element = document.$('div')
121
- element.classList.remove('some')
122
- element.classList.add('another')
123
- element.classList.add('onemore')
79
+ // The returned `parsedScript` object will contain a `script` Node with a child node
80
+ // holding the JavaScript code as text content.
124
81
  ```
125
82
 
126
- Also you can change element's id:
127
- ```javascript
128
- let element = document.$('div')
129
- element.id = 'new-id'
130
- ```
83
+ Remember, the actual tree structure will be more complex and detailed, but the provided examples give you a basic understanding of how to navigate through the parsed result.
131
84
 
132
- ## Element methods
85
+ ## Node
133
86
 
134
- ```javascript
135
- json() // remove all methods and circular objects from object
136
- remove() // remove this element
137
- add(element/outerHtml,place) // adding AdjacentHTML or AdjacentElement to place(0-3)
138
- add0(element/outerHtml) // adding AdjacentHTML or AdjacentElement beforebegin
139
- add1(element/outerHtml) // adding AdjacentHTML or AdjacentElement afterbegin
140
- add2(element/outerHtml) // adding AdjacentHTML or AdjacentElement beforeend
141
- add3(element/outerHtml) // adding AdjacentHTML or AdjacentElement afterend
142
- ```
87
+ `Node` is a fundamental class that represents an element node in the DOM tree. It provides functionality similar to the native DOM API in browsers, but with its own implementation.
143
88
 
144
- Example:
145
- ```javascript
146
- let document = new Document(html)
147
- let a = document.$('a')
148
- let div = document.$('div')
149
- div.add2('<div id="test">Hello world</div>')
150
- div.add3(a)
151
- a.remove()
152
- ```
89
+ ### Properties:
90
+ - **tagName**: Represents the tag name of the element.
91
+ - **attributes**: A dictionary of attributes and their values.
92
+ - **childNodes**: An array of child nodes for the element.
93
+ - **isSingle**: Boolean value to check if the node is a self-closing tag.
94
+ - **parentNode, previousElementSibling, nextElementSibling, children**: Navigation properties to move through the DOM tree.
95
+ - **dataset, classList, style**: Special properties for interacting with `data-*` attributes, classes, and inline styles.
153
96
 
97
+ ### Methods:
98
+ - **getAttribute, setAttribute, removeAttribute**: Manipulate element's attributes.
99
+ - **remove**: Removes the element from its parent.
100
+ - **innerHTML, outerHTML**: Get and set the inner or entire HTML of the element.
101
+ - **querySelector, querySelectorAll**: Find elements within the node based on CSS-like selectors.
102
+ - limits: pseudo selector like `:first-of-type` or `:checked` not available
103
+ - namaspace for tags `some:namspace` available
104
+ - there are additional methods `$` for `querySelector` and `$$` for `querySelectorAll`
105
+ - **getElementsByClassName, getElementsByTagName, getElementById**: Get elements by class, tag, or id respectively.
106
+ - **insertAdjacentElement, insertAdjacentHTML, insertAdjacentText**: Insert content relative to the element.
107
+ - **appendChild**: Add a child node to the element.
108
+ - **insert(place,element)**: place (0-3) or beforebegin,afterbegin,... eleemnt - raw html or element
154
109
 
155
- Create new element with ``Document.newElement(outerHtml)``
156
110
 
157
- ```javascript
158
- Document.newElement('<div id="test">Hello world</div>')
159
- ```
111
+ ### SingleNode
112
+
113
+ `SingleNode` extends from the `Node` class and represents elements that don't have closing tags (self-closing tags) in HTML. Examples include `<img>`, `<br>`, and `<!DOCTYPE>`. This class has restricted methods and properties since these elements can't have child nodes.
114
+
115
+ ### TextNode
116
+
117
+ `TextNode` is a class that represents text content within the DOM. A TextNode holds raw text data and does not have child nodes.
118
+
119
+
120
+ ### Root node (extends Node)
160
121
 
122
+ Has additional getters and setters:
123
+ * getter root.title
124
+ * setter root.title
125
+ * getter root.body
126
+ * getter root.head
161
127
 
162
- ## QuerySelector for Collection ``$$()``
163
- To select few elements, use ``$$(selector)`` method.
128
+
129
+
130
+ ### Examples:
164
131
 
165
132
  ```javascript
166
- document.$$('div') // return collection of all div elements
133
+ const div = new Node('div');
134
+ div.setAttribute('class', 'container');
135
+
136
+ const img = new SingleNode('img', { src: 'image.jpg', alt: 'An image' });
137
+ div.appendChild(img);
138
+
139
+ console.log(div.outerHTML); // Outputs: <div class="container"><img src="image.jpg" alt="An image"></div>
140
+
141
+ const p = new Node('p',{},div); // adding as last child to parent div
142
+ p.textContent = "Hello, world!";
143
+
144
+ const foundP = div.querySelector('p');
145
+ console.log(foundP.textContent); // Outputs: Hello, world!
167
146
  ```
168
147
 
169
- The collection is array which has the elements and two methods: ``each`` and ``parse``.
170
148
 
171
- ``each`` method gets callback function with 3 parameters: element it self, index of the element in collection and collection itself.
149
+ ## Query
150
+
151
+ The `Query` class is designed to parse CSS selector strings and transform them into a structured object format, providing detailed insights into each selector and its components.
152
+
153
+ By using the class, one can expect to transform a CSS selector string into an array of objects.
154
+
155
+ Each object will represent a selector, containing detailed information such as its tag, identifier, classes, attributes, and associated selectors if any. This can be useful for further processing or analysis of CSS selectors in an application.
156
+
157
+ ### Example
172
158
 
173
- Here example:
174
159
  ```javascript
175
- let array = []
176
- document.$$('div').each((element,index,collection) => {
177
- if(element.innerText.includes('some text'))
178
- array.push(element)
179
- })
160
+ let q1 = 'html>body>div.tabs~.some[type $= "radio and some"]>p+div>.some-id .tab-content~input[disabled] div.some'
161
+ let result = new Query(q1).selectors
162
+ let result1 = Query.get(q1)
163
+ // result and result1 has to be same
164
+ console.log(result)
180
165
  ```
181
166
 
182
- ``parse`` method, gets two parameters: ``part`` and ``fn`` and return array with results.
183
- * ``part`` is a part of element. It can be innerText, id, tagName or any property inside attributes.
184
- * ``fn`` is a filter function which gets content of part. If return true, content will be included.
185
-
186
- Example:
167
+ Result:
187
168
  ```javascript
188
- new Document(htmlText).$$('div')
189
- .parse('innerText',
190
- content=> (content.length > 0) ? true : false)
169
+ [
170
+ {
171
+ "query": "div.some",
172
+ "tag": "div",
173
+ "classList": [
174
+ "some"
175
+ ],
176
+ "ancestors": [
177
+ {
178
+ "query": ".some-id",
179
+ "classList": [
180
+ "some-id"
181
+ ],
182
+ "parents": [
183
+ {
184
+ "query": "div",
185
+ "tag": "div"
186
+ }
187
+ ],
188
+ "prev": {
189
+ "query": "p",
190
+ "tag": "p",
191
+ "parents": [
192
+ {
193
+ "query": ".some[0]",
194
+ "classList": [
195
+ "some"
196
+ ],
197
+ "attribs": [
198
+ {
199
+ check:(f),
200
+ "query": "[type$=\"radio and some\"]",
201
+ "name": "type",
202
+ "value": "radio and some",
203
+ "sign": "$="
204
+ }
205
+ ]
206
+ }
207
+ ],
208
+ "prevAny": {
209
+ "query": "div.tabs",
210
+ "tag": "div",
211
+ "classList": [
212
+ "tabs"
213
+ ],
214
+ "parents": [
215
+ {
216
+ "query": "html",
217
+ "tag": "html"
218
+ },
219
+ {
220
+ "query": "body",
221
+ "tag": "body"
222
+ }
223
+ ]
224
+ },
225
+ "group": "html>body>div.tabs~.some[0]>p"
226
+ },
227
+ "group": "html>body>div.tabs~.some[0]>p+div>.some-id"
228
+ },
229
+ {
230
+ "query": "input[1]",
231
+ "tag": "input",
232
+ "attribs": [
233
+ {
234
+ "query": "[disabled]",
235
+ "name": "disabled"
236
+ }
237
+ ],
238
+ "prevAny": {
239
+ "query": ".tab-content",
240
+ "classList": [
241
+ "tab-content"
242
+ ]
243
+ },
244
+ "group": ".tab-content~input[1]"
245
+ }
246
+ ],
247
+ "group": "html>body>div.tabs~.some[type $= \"radio and some\"]>p+div>.some-id .tab-content~input[disabled] div.some"
248
+ }
249
+ ]
191
250
  ```
192
251
 
193
- ## Building html
252
+ ### Attribs and check function
253
+ if attribute has value, attrib object will contain check function with one parameter for value to check.
194
254
 
195
- For building html again, use ``build`` method.
196
- Example:
197
255
  ```javascript
198
- let element = document.$('div')
199
- element.classList.add('another')
200
- element.classList.remove('some')
201
- element.id = 'new-id'
202
- document.build() // return new html text
203
- document.build([__dirname,'new-index.html']) // will create a file with new html text
256
+ let s = Query.get('[test^="some"]')[0]
257
+ console.log(s.attribs[0].check('some value test')) // true
204
258
  ```
259
+
260
+ ## buildFromCache and cacheDoc
261
+
262
+ Building DOM from raw html, usually takes tens of milliseconds. But now, you can build DOM once and save it's cache as regular stringified JSON.
263
+ The caching process and building from cache takes less then 5ms for each and require realy low resources.
264
+
265
+
266
+ How it works?
267
+ ```js
268
+ const html = `` // some real html 255KB
269
+ const root = parseHTML(html); // 31.9ms
270
+ const cache = cacheDoc(root); // 2.4ms
271
+ const root1 = buildFromCache(cache); // 1.2ms
272
+ console.log(root.inneHTML === root1.innerHTML) // true
273
+ ```
package/src/build.js ADDED
@@ -0,0 +1,66 @@
1
+ const { readFileSync, writeFileSync, watchFile, watch } = require('fs')
2
+ const { join,basename } = require('path')
3
+
4
+ function optimizeCode(content) {
5
+ // content = content.replace(/(?<!\\)\/\/.*$/gm, '') // remove comments
6
+ content = content.replace(/^(?<!\\)\/\/.*$|(?<=\s)(?<!\\)\/\/.*$/gm, '') // remove comments
7
+ // content = content.replace(/\;\s*?$/gm,'') // remove ; at end of line
8
+ content = content.replace(/\[\s*?\n\s*/gm, '[') //
9
+ content = content.replace(/\s*?\]/gm, ']') //
10
+ content = content.replace(/\s*?$/gm, '') // remove space at end of line
11
+ content = content.replace(/^\s\s\s/gm, ' ') // change tripple space to double space
12
+ content = content.replace(/\s(\=|\>|\<|\-|\+|\*|\!)/g, (s) => s.trim()) // replace space before operators
13
+ content = content.replace(/(\=|\>|\-|\+|\*)\s/g, (s) => s.includes(' ') ? s.trim() : s) // replace space after operators
14
+ content = content.replace(/\,\s*?$\s*/gm, ',') // join lines separated with comma
15
+ content = content.replace(/\s?,\s?/g, ',') // trim comma
16
+ content = content.replace(/\s\{/g, '{')
17
+ return content
18
+ }
19
+
20
+ const root = join(__dirname, '..')
21
+ const files = {
22
+ 'query': ['query','check-element'],
23
+ 'node':[
24
+ 'dataset','find','text-node',
25
+ 'style','class-list','node','single-node','root'
26
+ ],
27
+ 'parse':['parse-atts','void-tags','parser','cache'],
28
+ }
29
+ const fileList = []
30
+ function buildFileList() {
31
+ Object.entries(files).forEach(([dir,filenames]) => {
32
+ filenames.forEach(filename => {
33
+ fileList.push(join(__dirname,dir,filename+'.js'))
34
+ });
35
+ });
36
+ }
37
+
38
+ buildFileList()
39
+
40
+ function build() {
41
+ let content = fileList.map(filePath => readFileSync(filePath, 'utf-8')).join('\n');
42
+
43
+ const toReturn = '{ parseHTML, Node, Query, TextNode, SingleNode, buildFromCache, cacheDoc, Root }'
44
+ content = optimizeCode(content)
45
+ writeFileSync(join(root, 'document.js'), `const alsDocument = (function(){\n${content}\nreturn ${toReturn}\n})()`)
46
+ writeFileSync(join(root, 'index.js'), content + '\n' + `module.exports = ${toReturn}`)
47
+ writeFileSync(join(root, 'index.mjs'), content + '\n' + `export default ${toReturn}`)
48
+ console.log('Files are builded')
49
+ }
50
+
51
+ build()
52
+ if (process.argv[2] === '--watch') {
53
+ let count = 1;
54
+ console.log('Waching...')
55
+ let lastChangeTime = Date.now()
56
+ Object.keys(files).forEach(dirName => {
57
+ const dirPath = join(__dirname,dirName)
58
+ watch(dirPath, (eventType, filename) => {
59
+ let newChangeTime = Date.now()
60
+ if(newChangeTime - lastChangeTime < 1000) return
61
+ build()
62
+ lastChangeTime = newChangeTime
63
+ console.log(`${filename} has changed (${count++})`)
64
+ });
65
+ })
66
+ }
@@ -0,0 +1,25 @@
1
+ class NodeClassList {
2
+ constructor(node) { this.node = node }
3
+ get classes() { return (this.node.attributes.class || "").split(" ").filter(Boolean) }
4
+ set classes(val) { this.node.attributes.class = val.join(" ") }
5
+ contains(className) { return this.classes.includes(className) }
6
+
7
+ add(className) {
8
+ const currentClasses = this.classes;
9
+ if (!currentClasses.includes(className)) this.classes = [...currentClasses, className];
10
+ }
11
+
12
+ remove(className) { this.classes = this.classes.filter(cls => cls !== className); }
13
+
14
+ toggle(className) {
15
+ if (this.classes.includes(className)) this.remove(className);
16
+ else this.add(className);
17
+ }
18
+
19
+ replace(oldClass, newClass) {
20
+ if (this.classes.includes(oldClass)) {
21
+ this.remove(oldClass);
22
+ this.add(newClass);
23
+ }
24
+ }
25
+ }
@@ -0,0 +1,15 @@
1
+ const getDataName = prop => 'data-' + prop.toLowerCase()
2
+ function getDataset(element) {
3
+ return new Proxy(element.attributes, {
4
+ get: (target, prop) => {return target[getDataName(prop)]},
5
+ set: (target, prop, value) => {target[getDataName(prop)] = value; return true},
6
+ deleteProperty: (target, prop) => { // Удаляем data-* атрибут
7
+ const dataAttr = getDataName(prop)
8
+ if (dataAttr in target) {
9
+ delete target[dataAttr];
10
+ return true; // обозначает успешное удаление
11
+ }
12
+ return false;
13
+ }
14
+ });
15
+ }
@@ -0,0 +1,12 @@
1
+ function find(selectors,element,collection,first=false,firstTime = true) {
2
+ for(let selector of selectors) {
3
+ if(checkElement(element,selector)) collection.add(element)
4
+ }
5
+ if(element.children)
6
+ element.children.forEach(child => {
7
+ if(first && collection.size > 0) return
8
+ find(selectors,child,collection,first,false)
9
+ })
10
+
11
+ return firstTime ? [...collection] : collection
12
+ }