generative-bayesian-network 0.1.0-beta.3 → 2.0.0-dev.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -178,7 +178,7 @@
178
178
  APPENDIX: How to apply the Apache License to your work.
179
179
 
180
180
  To apply the Apache License to your work, attach the following
181
- boilerplate notice, with the fields enclosed by brackets "[]"
181
+ boilerplate notice, with the fields enclosed by brackets "{}"
182
182
  replaced with your own identifying information. (Don't include
183
183
  the brackets!) The text should be enclosed in the appropriate
184
184
  comment syntax for the file format. We also recommend that a
@@ -186,7 +186,7 @@
186
186
  same "printed page" as the copyright notice for easier
187
187
  identification within third-party archives.
188
188
 
189
- Copyright [yyyy] [name of copyright owner]
189
+ Copyright 2018 Apify Technologies s.r.o.
190
190
 
191
191
  Licensed under the Apache License, Version 2.0 (the "License");
192
192
  you may not use this file except in compliance with the License.
package/README.md CHANGED
@@ -1,155 +1,3 @@
1
- # Generative bayesian network
2
- NodeJs package containing a bayesian network capable of randomly sampling from a distribution defined by a json object.
3
-
4
- <!-- toc -->
5
-
6
- - [Installation](#installation)
7
- - [Usage](#usage)
8
- - [API Reference](#api-reference)
9
-
10
- <!-- tocstop -->
11
-
12
- ## Installation
13
- Run the `npm i generative-bayesian-network` command. No further setup is needed afterwards.
14
- ## Usage
15
- To use the network, you need to create an instance of the `BayesianNetwork` class which is exported from this package. Constructor of this class accepts a JSON object containing the network definition. This definition can either include the probability distributions for the nodes, or these can be calculated later using data. An example of such a definition saved in a JSON file could look like:
16
- ```json
17
- {
18
- "nodes": [
19
- {
20
- "name": "ParentNode",
21
- "values": ["A", "B", "C"],
22
- "parentNames": [],
23
- "conditionalProbabilities": {
24
- "A": 0.1,
25
- "B": 0.8,
26
- "C": 0.1
27
- }
28
- },
29
- {
30
- "name": "ChildNode",
31
- "values": [".", ",", "!", "?"],
32
- "parentNames": ["ParentNode"],
33
- "conditionalProbabilities": {
34
- "A": {
35
- ".": 0.7,
36
- "!": 0.3
37
- },
38
- "B": {
39
- ",": 0.3,
40
- "?": 0.7
41
- },
42
- "C": {
43
- ".": 0.5,
44
- "?": 0.5
45
- }
46
- }
47
- }
48
- ]
49
- }
50
- ```
51
- Once you have the network definition ready, you can create an instance simply by executing:
52
- ```js
53
- let generatorNetwork = new BayesianNetwork(networkDefinition);
54
- ```
55
- If the network definition didn't include the probabilities, you also need to call the `setProbabilitiesAccordingToData` method and provide it with a [Danfo.js dataframe](https://danfo.jsdata.org/api-reference/dataframe) containing the dataset you want to be used to calculate the probabilities:
56
- ```js
57
- generatorNetwork.setProbabilitiesAccordingToData(dataframe);
58
- ```
59
- After the setup, you can save the current network's definition by doing:
60
- ```js
61
- generatorNetwork.saveNetworkDefinition(networkDefinitionFilePath);
62
- ```
63
- Once you have the network all set up, you can use two methods to actually generate the samples - `generateSample` and `generateConsistentSampleWhenPossible`. The first one generates a sample of all node values given (optionally) the values we already know in the form of an object. The second does much the same thing, but instead of just getting the known values of some of the attributes, the object you can give it as an argument can contain multiple possible values for each node, not just one. You could run them for example like this:
64
- ```js
65
- let sample = generatorNetwork.generateSample({ "ParentNode": "A" });
66
- let consistentSample = generatorNetwork.generateSample({
67
- "ParentNode": ["A","B"], "ChildNode": [",","!"]
68
- });
69
- ```
70
-
71
- ## API Reference
72
- All public classes, methods and their parameters can be inspected in this API reference.
73
-
74
- <a name="BayesianNetwork"></a>
75
-
76
- ### BayesianNetwork
77
- BayesianNetwork is an implementation of a bayesian network capable of randomly sampling from the distribution
78
- represented by the network.
79
-
80
-
81
- * [BayesianNetwork](#BayesianNetwork)
82
- * [`new BayesianNetwork(networkDefinition)`](#new_BayesianNetwork_new)
83
- * [`.generateSample(inputValues)`](#BayesianNetwork+generateSample)
84
- * [`.generateConsistentSampleWhenPossible(valuePossibilities)`](#BayesianNetwork+generateConsistentSampleWhenPossible)
85
- * [`.setProbabilitiesAccordingToData(dataframe)`](#BayesianNetwork+setProbabilitiesAccordingToData)
86
- * [`.saveNetworkDefinition(networkDefinitionFilePath)`](#BayesianNetwork+saveNetworkDefinition)
87
-
88
-
89
- * * *
90
-
91
- <a name="new_BayesianNetwork_new"></a>
92
-
93
- #### `new BayesianNetwork(networkDefinition)`
94
-
95
- | Param | Type | Description |
96
- | --- | --- | --- |
97
- | networkDefinition | <code>object</code> | object defining the network structure and distributions |
98
-
99
-
100
- * * *
101
-
102
- <a name="BayesianNetwork+generateSample"></a>
103
-
104
- #### `bayesianNetwork.generateSample(inputValues)`
105
- Randomly samples from the distribution represented by the bayesian network.
106
-
107
-
108
- | Param | Type | Description |
109
- | --- | --- | --- |
110
- | inputValues | <code>object</code> | node values that are known already |
111
-
112
-
113
- * * *
114
-
115
- <a name="BayesianNetwork+generateConsistentSampleWhenPossible"></a>
116
-
117
- #### `bayesianNetwork.generateConsistentSampleWhenPossible(valuePossibilities)`
118
- Randomly samples from the distribution represented by the bayesian network,
119
- making sure the sample is consistent with the provided restrictions on value possibilities.
120
- Returns false if no such sample can be generated.
121
-
122
-
123
- | Param | Type | Description |
124
- | --- | --- | --- |
125
- | valuePossibilities | <code>object</code> | a dictionary of lists of possible values for nodes (if a node isn't present in the dictionary, all values are possible) |
126
-
127
-
128
- * * *
129
-
130
- <a name="BayesianNetwork+setProbabilitiesAccordingToData"></a>
131
-
132
- #### `bayesianNetwork.setProbabilitiesAccordingToData(dataframe)`
133
- Sets the conditional probability distributions of this network's nodes to match the given data.
134
-
135
-
136
- | Param | Type | Description |
137
- | --- | --- | --- |
138
- | dataframe | <code>object</code> | a Danfo.js dataframe containing the data |
139
-
140
-
141
- * * *
142
-
143
- <a name="BayesianNetwork+saveNetworkDefinition"></a>
144
-
145
- #### `bayesianNetwork.saveNetworkDefinition(networkDefinitionFilePath)`
146
- Saves the network definition to the specified file path to be used later.
147
-
148
-
149
- | Param | Type | Description |
150
- | --- | --- | --- |
151
- | networkDefinitionFilePath | <code>string</code> | a file path where the network definition should be saved |
152
-
153
-
154
- * * *
1
+ # Fingerprint suite
155
2
 
3
+ This repository contains a set of fingerprinting tools developed by Apify.
package/index.js ADDED
@@ -0,0 +1,6 @@
1
+ "use strict";
2
+ Object.defineProperty(exports, "__esModule", { value: true });
3
+ exports.BayesianNetwork = void 0;
4
+ var bayesian_network_1 = require("./bayesian-network");
5
+ Object.defineProperty(exports, "BayesianNetwork", { enumerable: true, get: function () { return bayesian_network_1.BayesianNetwork; } });
6
+ //# sourceMappingURL=index.js.map
package/package.json CHANGED
@@ -1,41 +1,40 @@
1
1
  {
2
- "author": {
3
- "name": "Apify"
4
- },
5
- "files": [
6
- "src"
7
- ],
8
- "bugs": {
9
- "url": "https://github.com/apify/generative-bayesian-network/issues"
10
- },
11
- "dependencies": {
12
- "ow": "^0.23.0"
13
- },
14
- "description": "An implementation of a bayesian network usable for .",
15
- "devDependencies": {
16
- "@apify/eslint-config": "^0.1.3",
17
- "eslint": "^7.19.0",
18
- "fs-extra": "^9.1.0",
19
- "jest": "^26.6.3",
20
- "jsdoc-to-markdown": "^7.0.0",
21
- "markdown-toc": "^1.2.0",
22
- "jest-os-detection": "^1.3.1",
23
- "danfojs-node": "^0.2.2"
24
- },
25
- "homepage": "https://github.com/apify/generative-bayesian-network#readme",
26
- "license": "Apache-2.0",
27
- "main": "src/main.js",
28
- "name": "generative-bayesian-network",
29
- "repository": {
30
- "type": "git",
31
- "url": "git+https://github.com/apify/generative-bayesian-network.git"
32
- },
33
- "scripts": {
34
- "build-docs": "npm run build-toc && node docs/build-docs.js",
35
- "build-toc": "markdown-toc docs/README.md -i",
36
- "lint": "eslint ./src --ext .js,.jsx",
37
- "lint:fix": "eslint ./src --ext .js,.jsx --fix",
38
- "test": "jest --maxWorkers=3 --forceExit"
39
- },
40
- "version": "0.1.0-beta.3"
2
+ "name": "generative-bayesian-network",
3
+ "version": "2.0.0-dev.0",
4
+ "author": {
5
+ "name": "Apify"
6
+ },
7
+ "files": [
8
+ "dist"
9
+ ],
10
+ "main": "index.js",
11
+ "module": "index.mjs",
12
+ "types": "index.d.ts",
13
+ "exports": {
14
+ ".": {
15
+ "import": "./index.mjs",
16
+ "require": "./index.js"
17
+ }
18
+ },
19
+ "bugs": {
20
+ "url": "https://github.com/apify/generative-bayesian-network/issues"
21
+ },
22
+ "dependencies": {
23
+ "adm-zip": "^0.5.9",
24
+ "danfojs-node": "^1.1.1"
25
+ },
26
+ "description": "An implementation of a bayesian network usable for .",
27
+ "homepage": "https://github.com/apify/generative-bayesian-network#readme",
28
+ "license": "Apache-2.0",
29
+ "repository": {
30
+ "type": "git",
31
+ "url": "git+https://github.com/apify/generative-bayesian-network.git"
32
+ },
33
+ "scripts": {
34
+ "build": "npm run clean && npm run compile",
35
+ "clean": "rimraf ./dist",
36
+ "compile": "tsc -p tsconfig.build.json && gen-esm-wrapper ./index.js ./index.mjs",
37
+ "copy": "ts-node -T ../../scripts/copy.ts"
38
+ },
39
+ "gitHead": "2dd3d8c9ced1a14cb74ef16ebdd50b349bb5971b"
41
40
  }
@@ -1,112 +0,0 @@
1
- const fs = require('fs');
2
-
3
- const { BayesianNode } = require('./bayesian-node');
4
-
5
- /**
6
- * BayesianNetwork is an implementation of a bayesian network capable of randomly sampling from the distribution
7
- * represented by the network.
8
- */
9
- class BayesianNetwork {
10
- /**
11
- * @param {object} networkDefinition - object defining the network structure and distributions
12
- */
13
- constructor(networkDefinition) {
14
- this.nodesInSamplingOrder = networkDefinition.nodes.map((nodeDefinition) => new BayesianNode(nodeDefinition));
15
- this.nodesByName = {};
16
- for (const node of this.nodesInSamplingOrder) {
17
- this.nodesByName[node.name] = node;
18
- }
19
- }
20
-
21
- /**
22
- * Randomly samples from the distribution represented by the bayesian network.
23
- * @param {object} inputValues - node values that are known already
24
- */
25
- generateSample(inputValues = {}) {
26
- const sample = inputValues;
27
- for (const node of this.nodesInSamplingOrder) {
28
- if (!(node.name in sample)) {
29
- sample[node.name] = node.sample(sample);
30
- }
31
- }
32
-
33
- return sample;
34
- }
35
-
36
- /**
37
- * Randomly samples from the distribution represented by the bayesian network,
38
- * making sure the sample is consistent with the provided restrictions on value possibilities.
39
- * Returns false if no such sample can be generated.
40
- * @param {object} valuePossibilities - a dictionary of lists of possible values for nodes
41
- * (if a node isn't present in the dictionary, all values are possible)
42
- */
43
- generateConsistentSampleWhenPossible(valuePossibilities) {
44
- return this._recursivelyGenerateConsistentSampleWhenPossible({}, valuePossibilities, 0);
45
- }
46
-
47
- /**
48
- * Recursively generates a random sample consistent with the given restrictions on possible values.
49
- * @param {object} sampleSoFar - node values that are known already
50
- * @param {object} valuePossibilities - a dictionary of lists of possible values for nodes
51
- * (if a node isn't present in the dictionary, all values are possible)
52
- * @param {number} depth - in what depth of the recursion this function call is,
53
- * specifies what node this function call is sampling
54
- * @private
55
- */
56
- _recursivelyGenerateConsistentSampleWhenPossible(sampleSoFar, valuePossibilities, depth) {
57
- const bannedValues = [];
58
- const node = this.nodesInSamplingOrder[depth];
59
-
60
- let sampleValue;
61
- do {
62
- sampleValue = node.sampleAccordingToRestrictions(sampleSoFar, valuePossibilities[node.name], bannedValues);
63
-
64
- if (!sampleValue) break;
65
-
66
- sampleSoFar[node.name] = sampleValue;
67
-
68
- if (depth + 1 < this.nodesInSamplingOrder.length) {
69
- const sample = this._recursivelyGenerateConsistentSampleWhenPossible(sampleSoFar, valuePossibilities, depth + 1);
70
- if (sample) {
71
- return sample;
72
- }
73
- } else {
74
- return sampleSoFar;
75
- }
76
-
77
- bannedValues.push(sampleValue);
78
- } while (sampleValue);
79
-
80
- return false;
81
- }
82
-
83
- /**
84
- * Sets the conditional probability distributions of this network's nodes to match the given data.
85
- * @param {object} dataframe - a Danfo.js dataframe containing the data
86
- */
87
- setProbabilitiesAccordingToData(dataframe) {
88
- this.nodesInSamplingOrder.forEach((node) => {
89
- const possibleParentValues = {};
90
- for (const parentName of node.parentNames) {
91
- possibleParentValues[parentName] = this.nodesByName[parentName].possibleValues;
92
- }
93
- node.setProbabilitiesAccordingToData(dataframe, possibleParentValues);
94
- });
95
- }
96
-
97
- /**
98
- * Saves the network definition to the specified file path to be used later.
99
- * @param {string} networkDefinitionFilePath - a file path where the network definition should be saved
100
- */
101
- saveNetworkDefinition(networkDefinitionFilePath) {
102
- const network = {
103
- nodes: this.nodesInSamplingOrder.map((node) => node.nodeDefinition),
104
- };
105
-
106
- fs.writeFileSync(networkDefinitionFilePath, JSON.stringify(network));
107
- }
108
- }
109
-
110
- module.exports = {
111
- BayesianNetwork,
112
- };
@@ -1,182 +0,0 @@
1
- /**
2
- * Calculates relative frequencies of values of specific attribute from the given data
3
- * @param {object} dataframe - a Danfo.js dataframe containing the data
4
- * @param {string} attributeName - name of the attribute
5
- * @private
6
- */
7
- function getRelativeFrequencies(dataframe, attributeName) {
8
- const frequencies = {};
9
- const totalCount = dataframe.shape[0];
10
- const valueCounts = dataframe[attributeName].value_counts();
11
-
12
- for (let index = 0; index < valueCounts.index.length; index++) {
13
- frequencies[valueCounts.index[index]] = valueCounts.values[index] / totalCount;
14
- }
15
-
16
- return frequencies;
17
- }
18
-
19
- /**
20
- * BayesianNode is an implementation of a single node in a bayesian network allowing
21
- * sampling from its conditional distribution.
22
- */
23
- class BayesianNode {
24
- /**
25
- * @param {object} nodeDefinition - node structure and distributions definition
26
- * taken from the network definition file
27
- */
28
- constructor(nodeDefinition) {
29
- this.nodeDefinition = nodeDefinition;
30
- }
31
-
32
- /**
33
- * Extracts unconditional probabilities of node values given the values of the parent nodes
34
- * @param {object} parentValues - values of the parent nodes
35
- * @private
36
- */
37
- _getProbabilitiesGivenKnownValues(parentValues = {}) {
38
- let probabilities = this.nodeDefinition.conditionalProbabilities;
39
-
40
- for (const parentName of this.parentNames) {
41
- const parentValue = parentValues[parentName];
42
- if (parentValue in probabilities.deeper) {
43
- probabilities = probabilities.deeper[parentValue];
44
- } else {
45
- probabilities = probabilities.skip;
46
- }
47
- }
48
- return probabilities;
49
- }
50
-
51
- /**
52
- * Randomly samples from the given values using the given probabilities
53
- * @param {array} possibleValues - a list of values to sample from
54
- * @param {number} totalProbabilityOfPossibleValues - a sum of probabilities of possibleValues
55
- * in the conditional distribution
56
- * @param {object} probabilities - a dictionary of probabilities from the conditional distribution
57
- * indexed by the values
58
- * @private
59
- */
60
- _sampleRandomValueFromPossibilities(possibleValues, totalProbabilityOfPossibleValues, probabilities) {
61
- let chosenValue = possibleValues[0];
62
- const anchor = Math.random() * totalProbabilityOfPossibleValues;
63
- let cumulativeProbability = 0;
64
- for (const possibleValue of possibleValues) {
65
- cumulativeProbability += probabilities[possibleValue];
66
- if (cumulativeProbability > anchor) {
67
- chosenValue = possibleValue;
68
- break;
69
- }
70
- }
71
-
72
- return chosenValue;
73
- }
74
-
75
- /**
76
- * Randomly samples from the conditional distribution of this node given values of parents
77
- * @param {object} parentValues - values of the parent nodes
78
- */
79
- sample(parentValues = {}) {
80
- const probabilities = this._getProbabilitiesGivenKnownValues(parentValues);
81
- const possibleValues = Object.keys(probabilities);
82
-
83
- return this._sampleRandomValueFromPossibilities(possibleValues, 1.0, probabilities);
84
- }
85
-
86
- /**
87
- * Randomly samples from the conditional distribution of this node given restrictions on the possible
88
- * values and the values of the parents.
89
- * @param {object} parentValues - values of the parent nodes
90
- * @param {object} valuePossibilities - a list of possible values for this node
91
- * @param {object} bannedValues - what values of this node are banned
92
- */
93
- sampleAccordingToRestrictions(parentValues, valuePossibilities, bannedValues) {
94
- const probabilities = this._getProbabilitiesGivenKnownValues(parentValues);
95
- let totalProbability = 0.0;
96
- const validValues = [];
97
- const valuesInDistribution = Object.keys(probabilities);
98
- const possibleValues = valuePossibilities || valuesInDistribution;
99
- for (const value of possibleValues) {
100
- if (!bannedValues.includes(value) && valuesInDistribution.includes(value)) {
101
- validValues.push(value);
102
- totalProbability += probabilities[value];
103
- }
104
- }
105
-
106
- if (validValues.length === 0) return false;
107
- return this._sampleRandomValueFromPossibilities(validValues, totalProbability, probabilities);
108
- }
109
-
110
- /**
111
- * Sets the conditional probability distribution for this node to match the given data.
112
- * @param {object} dataframe - a Danfo.js dataframe containing the data
113
- * @param {object} possibleParentValues - a dictionary of lists of possible values for parent nodes
114
- */
115
- setProbabilitiesAccordingToData(dataframe, possibleParentValues = {}) {
116
- this.nodeDefinition.possibleValues = dataframe[this.name].unique().values;
117
- this.nodeDefinition.conditionalProbabilities = this._recursivelyCalculateConditionalProbabilitiesAccordingToData(
118
- dataframe,
119
- possibleParentValues,
120
- 0,
121
- );
122
- }
123
-
124
- /**
125
- * Recursively calculates the conditional probability distribution for this node from the data.
126
- * @param {object} dataframe - a Danfo.js dataframe containing the data
127
- * @param {object} possibleParentValues - a dictionary of lists of possible values for parent nodes
128
- * @param {number} depth - depth of the recursive call
129
- * @private
130
- */
131
- _recursivelyCalculateConditionalProbabilitiesAccordingToData(dataframe, possibleParentValues, depth) {
132
- let probabilities = {
133
- deeper: {},
134
- };
135
-
136
- if (depth < this.parentNames.length) {
137
- const currentParentName = this.parentNames[depth];
138
- for (const possibleValue of possibleParentValues[currentParentName]) {
139
- const skip = !dataframe[currentParentName].unique().values.includes(possibleValue);
140
- let filteredDataframe = dataframe;
141
- if (!skip) {
142
- filteredDataframe = dataframe.query({
143
- column: currentParentName,
144
- is: '==',
145
- to: possibleValue,
146
- });
147
- }
148
- const nextLevel = this._recursivelyCalculateConditionalProbabilitiesAccordingToData(
149
- filteredDataframe,
150
- possibleParentValues,
151
- depth + 1,
152
- );
153
-
154
- if (!skip) {
155
- probabilities.deeper[possibleValue] = nextLevel;
156
- } else {
157
- probabilities.skip = nextLevel;
158
- }
159
- }
160
- } else {
161
- probabilities = getRelativeFrequencies(dataframe, this.name);
162
- }
163
-
164
- return probabilities;
165
- }
166
-
167
- get name() {
168
- return this.nodeDefinition.name;
169
- }
170
-
171
- get parentNames() {
172
- return this.nodeDefinition.parentNames;
173
- }
174
-
175
- get possibleValues() {
176
- return this.nodeDefinition.possibleValues;
177
- }
178
- }
179
-
180
- module.exports = {
181
- BayesianNode,
182
- };
package/src/main.js DELETED
@@ -1,3 +0,0 @@
1
- const BayesianNetwork = require('./bayesian-network');
2
-
3
- module.exports = BayesianNetwork;