graph2emb 0.9.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- graph2emb-0.9.0/.gitignore +8 -0
- graph2emb-0.9.0/LICENSE +21 -0
- graph2emb-0.9.0/PKG-INFO +104 -0
- graph2emb-0.9.0/README.md +83 -0
- graph2emb-0.9.0/README_ORG.md +107 -0
- graph2emb-0.9.0/graph2emb/__init__.py +11 -0
- graph2emb-0.9.0/graph2emb/doc2vec.py +483 -0
- graph2emb-0.9.0/graph2emb/edges.py +107 -0
- graph2emb-0.9.0/graph2emb/graph2vec.py +157 -0
- graph2emb-0.9.0/graph2emb/keyedvectors.py +345 -0
- graph2emb-0.9.0/graph2emb/node2vec.py +273 -0
- graph2emb-0.9.0/graph2emb/parallel.py +153 -0
- graph2emb-0.9.0/graph2emb/utils.py +125 -0
- graph2emb-0.9.0/graph2emb/word2vec.py +326 -0
- graph2emb-0.9.0/pyproject.toml +52 -0
- graph2emb-0.9.0/tests/__init__.py +2 -0
- graph2emb-0.9.0/tests/conftest.py +37 -0
- graph2emb-0.9.0/tests/test_doc2vec.py +100 -0
- graph2emb-0.9.0/tests/test_edges.py +158 -0
- graph2emb-0.9.0/tests/test_graph2vec.py +361 -0
- graph2emb-0.9.0/tests/test_keyedvectors.py +147 -0
- graph2emb-0.9.0/tests/test_node2vec.py +164 -0
- graph2emb-0.9.0/tests/test_parallel.py +99 -0
- graph2emb-0.9.0/tests/test_word2vec.py +91 -0
graph2emb-0.9.0/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2018, 2025 Elior Cohen, Sangkon Han
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
graph2emb-0.9.0/PKG-INFO
ADDED
|
@@ -0,0 +1,104 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: graph2emb
|
|
3
|
+
Version: 0.9.0
|
|
4
|
+
Summary: Implementation of the graph2vec algorithm
|
|
5
|
+
Project-URL: Homepage, https://github.com/sigmadream/graph2emb
|
|
6
|
+
Project-URL: Repository, https://github.com/sigmadream/graph2emb
|
|
7
|
+
Author-email: Sangkon Heo <sigmadream@gmail.com>
|
|
8
|
+
License: MIT
|
|
9
|
+
License-File: LICENSE
|
|
10
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
11
|
+
Classifier: Operating System :: OS Independent
|
|
12
|
+
Classifier: Programming Language :: Python :: 3
|
|
13
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
14
|
+
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
15
|
+
Requires-Python: >=3.12
|
|
16
|
+
Requires-Dist: joblib>=1.4.0
|
|
17
|
+
Requires-Dist: networkx>=3.1.0
|
|
18
|
+
Requires-Dist: numpy>=1.24.0
|
|
19
|
+
Requires-Dist: tqdm>=4.66.1
|
|
20
|
+
Description-Content-Type: text/markdown
|
|
21
|
+
|
|
22
|
+
# graph2emb
|
|
23
|
+
|
|
24
|
+
> [Elior Cohen의 node2vec](https://github.com/eliorc/node2vec), [graph2vec](https://github.com/benedekrozemberczki/graph2vec)를 참고해 Node2Vec + Graph2Vec 기능을 한 프로젝트로 합친 구현체입니다.
|
|
25
|
+
|
|
26
|
+
## 특징
|
|
27
|
+
|
|
28
|
+
- Node2Vec: 랜덤 워크 기반 노드 임베딩
|
|
29
|
+
- Edge embedding: (Hadamard/평균/L1/L2) 방식의 엣지 임베딩
|
|
30
|
+
- Graph2Vec: WL(Weisfeiler–Lehman) hashing + Doc2Vec 기반 그래프 임베딩
|
|
31
|
+
- uv 기반 개발/실행
|
|
32
|
+
|
|
33
|
+
> 파이썬 모듈 이름은 `graph2emb` 입니다.
|
|
34
|
+
|
|
35
|
+
## 빠른 시작 (sample 실행)
|
|
36
|
+
|
|
37
|
+
```bash
|
|
38
|
+
# 의존성/프로젝트 설치
|
|
39
|
+
uv sync
|
|
40
|
+
|
|
41
|
+
# Node2Vec (edge list 로드)
|
|
42
|
+
uv run python sample/node2vec_from_edgelist.py
|
|
43
|
+
|
|
44
|
+
# Node2Vec + edge embedding
|
|
45
|
+
uv run python sample/node2vec_edge_embeddings.py
|
|
46
|
+
|
|
47
|
+
# Graph2Vec
|
|
48
|
+
uv run python sample/graph2vec_basic.py
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
샘플 설명은 `sample/README.md`를 참고하세요.
|
|
52
|
+
|
|
53
|
+
## 사용 예시 (코드)
|
|
54
|
+
|
|
55
|
+
### Node2Vec (노드 임베딩)
|
|
56
|
+
|
|
57
|
+
```python
|
|
58
|
+
import networkx as nx
|
|
59
|
+
from graph2emb import Node2Vec
|
|
60
|
+
|
|
61
|
+
g = nx.fast_gnp_random_graph(n=100, p=0.3, seed=42)
|
|
62
|
+
node2vec = Node2Vec(g, dimensions=64, walk_length=30, num_walks=20, workers=1, seed=42)
|
|
63
|
+
model = node2vec.fit(window=10, min_count=1, epochs=5)
|
|
64
|
+
|
|
65
|
+
print(model.wv.most_similar("2", topn=5)) # 노드 id는 문자열로 조회
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
### Edge embedding (엣지 임베딩)
|
|
69
|
+
|
|
70
|
+
```python
|
|
71
|
+
from graph2emb.edges import HadamardEmbedder
|
|
72
|
+
|
|
73
|
+
edges_embs = HadamardEmbedder(model.wv)
|
|
74
|
+
print(edges_embs[("1", "2")])
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
### Graph2Vec (그래프 임베딩)
|
|
78
|
+
|
|
79
|
+
```python
|
|
80
|
+
import networkx as nx
|
|
81
|
+
from graph2emb import Graph2Vec
|
|
82
|
+
|
|
83
|
+
graphs = [
|
|
84
|
+
nx.fast_gnp_random_graph(n=12, p=0.3, seed=1),
|
|
85
|
+
nx.fast_gnp_random_graph(n=14, p=0.2, seed=2),
|
|
86
|
+
]
|
|
87
|
+
|
|
88
|
+
g2v = Graph2Vec(dimensions=32, workers=1, min_count=1, epochs=3, seed=42)
|
|
89
|
+
g2v.fit(graphs)
|
|
90
|
+
emb = g2v.get_embedding() # shape: (len(graphs), dimensions)
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
## 개발/테스트
|
|
94
|
+
|
|
95
|
+
```bash
|
|
96
|
+
# 테스트 실행
|
|
97
|
+
uv run pytest
|
|
98
|
+
|
|
99
|
+
# 병렬 테스트
|
|
100
|
+
uv run pytest -n auto
|
|
101
|
+
|
|
102
|
+
# 커버리지 포함 테스트 실행
|
|
103
|
+
uv run pytest --cov=graph2emb --cov-report=html
|
|
104
|
+
```
|
|
@@ -0,0 +1,83 @@
|
|
|
1
|
+
# graph2emb
|
|
2
|
+
|
|
3
|
+
> [Elior Cohen의 node2vec](https://github.com/eliorc/node2vec), [graph2vec](https://github.com/benedekrozemberczki/graph2vec)를 참고해 Node2Vec + Graph2Vec 기능을 한 프로젝트로 합친 구현체입니다.
|
|
4
|
+
|
|
5
|
+
## 특징
|
|
6
|
+
|
|
7
|
+
- Node2Vec: 랜덤 워크 기반 노드 임베딩
|
|
8
|
+
- Edge embedding: (Hadamard/평균/L1/L2) 방식의 엣지 임베딩
|
|
9
|
+
- Graph2Vec: WL(Weisfeiler–Lehman) hashing + Doc2Vec 기반 그래프 임베딩
|
|
10
|
+
- uv 기반 개발/실행
|
|
11
|
+
|
|
12
|
+
> 파이썬 모듈 이름은 `graph2emb` 입니다.
|
|
13
|
+
|
|
14
|
+
## 빠른 시작 (sample 실행)
|
|
15
|
+
|
|
16
|
+
```bash
|
|
17
|
+
# 의존성/프로젝트 설치
|
|
18
|
+
uv sync
|
|
19
|
+
|
|
20
|
+
# Node2Vec (edge list 로드)
|
|
21
|
+
uv run python sample/node2vec_from_edgelist.py
|
|
22
|
+
|
|
23
|
+
# Node2Vec + edge embedding
|
|
24
|
+
uv run python sample/node2vec_edge_embeddings.py
|
|
25
|
+
|
|
26
|
+
# Graph2Vec
|
|
27
|
+
uv run python sample/graph2vec_basic.py
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
샘플 설명은 `sample/README.md`를 참고하세요.
|
|
31
|
+
|
|
32
|
+
## 사용 예시 (코드)
|
|
33
|
+
|
|
34
|
+
### Node2Vec (노드 임베딩)
|
|
35
|
+
|
|
36
|
+
```python
|
|
37
|
+
import networkx as nx
|
|
38
|
+
from graph2emb import Node2Vec
|
|
39
|
+
|
|
40
|
+
g = nx.fast_gnp_random_graph(n=100, p=0.3, seed=42)
|
|
41
|
+
node2vec = Node2Vec(g, dimensions=64, walk_length=30, num_walks=20, workers=1, seed=42)
|
|
42
|
+
model = node2vec.fit(window=10, min_count=1, epochs=5)
|
|
43
|
+
|
|
44
|
+
print(model.wv.most_similar("2", topn=5)) # 노드 id는 문자열로 조회
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
### Edge embedding (엣지 임베딩)
|
|
48
|
+
|
|
49
|
+
```python
|
|
50
|
+
from graph2emb.edges import HadamardEmbedder
|
|
51
|
+
|
|
52
|
+
edges_embs = HadamardEmbedder(model.wv)
|
|
53
|
+
print(edges_embs[("1", "2")])
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
### Graph2Vec (그래프 임베딩)
|
|
57
|
+
|
|
58
|
+
```python
|
|
59
|
+
import networkx as nx
|
|
60
|
+
from graph2emb import Graph2Vec
|
|
61
|
+
|
|
62
|
+
graphs = [
|
|
63
|
+
nx.fast_gnp_random_graph(n=12, p=0.3, seed=1),
|
|
64
|
+
nx.fast_gnp_random_graph(n=14, p=0.2, seed=2),
|
|
65
|
+
]
|
|
66
|
+
|
|
67
|
+
g2v = Graph2Vec(dimensions=32, workers=1, min_count=1, epochs=3, seed=42)
|
|
68
|
+
g2v.fit(graphs)
|
|
69
|
+
emb = g2v.get_embedding() # shape: (len(graphs), dimensions)
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
## 개발/테스트
|
|
73
|
+
|
|
74
|
+
```bash
|
|
75
|
+
# 테스트 실행
|
|
76
|
+
uv run pytest
|
|
77
|
+
|
|
78
|
+
# 병렬 테스트
|
|
79
|
+
uv run pytest -n auto
|
|
80
|
+
|
|
81
|
+
# 커버리지 포함 테스트 실행
|
|
82
|
+
uv run pytest --cov=graph2emb --cov-report=html
|
|
83
|
+
```
|
|
@@ -0,0 +1,107 @@
|
|
|
1
|
+
# Node2Vec
|
|
2
|
+
[](http://pepy.tech/project/node2vec)
|
|
3
|
+
|
|
4
|
+
Python3 implementation of the node2vec algorithm Aditya Grover, Jure Leskovec and Vid Kocijan.
|
|
5
|
+
[node2vec: Scalable Feature Learning for Networks. A. Grover, J. Leskovec. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2016.](https://snap.stanford.edu/node2vec/)
|
|
6
|
+
|
|
7
|
+
# Maintenance
|
|
8
|
+
|
|
9
|
+
### I no longer have time to maintain this, if someone wants to pick the baton let me know
|
|
10
|
+
|
|
11
|
+
## Installation
|
|
12
|
+
|
|
13
|
+
`pip install node2vec`
|
|
14
|
+
|
|
15
|
+
## Usage
|
|
16
|
+
```python
|
|
17
|
+
import networkx as nx
|
|
18
|
+
from graph2emb import Node2Vec
|
|
19
|
+
|
|
20
|
+
# Create a graph
|
|
21
|
+
graph = nx.fast_gnp_random_graph(n=100, p=0.5)
|
|
22
|
+
|
|
23
|
+
# Precompute probabilities and generate walks - **ON WINDOWS ONLY WORKS WITH workers=1**
|
|
24
|
+
node2vec = Node2Vec(graph, dimensions=64, walk_length=30, num_walks=200, workers=4) # Use temp_folder for big graphs
|
|
25
|
+
|
|
26
|
+
# Embed nodes
|
|
27
|
+
model = node2vec.fit(window=10, min_count=1, batch_words=4) # Any keywords acceptable by gensim.Word2Vec can be passed, `dimensions` and `workers` are automatically passed (from the Node2Vec constructor)
|
|
28
|
+
|
|
29
|
+
# Look for most similar nodes
|
|
30
|
+
model.wv.most_similar('2') # Output node names are always strings
|
|
31
|
+
|
|
32
|
+
# Save embeddings for later use
|
|
33
|
+
model.wv.save_word2vec_format(EMBEDDING_FILENAME)
|
|
34
|
+
|
|
35
|
+
# Save model for later use
|
|
36
|
+
model.save(EMBEDDING_MODEL_FILENAME)
|
|
37
|
+
|
|
38
|
+
# Embed edges using Hadamard method
|
|
39
|
+
from graph2emb.edges import HadamardEmbedder
|
|
40
|
+
|
|
41
|
+
edges_embs = HadamardEmbedder(keyed_vectors=model.wv)
|
|
42
|
+
|
|
43
|
+
# Look for embeddings on the fly - here we pass normal tuples
|
|
44
|
+
edges_embs[('1', '2')]
|
|
45
|
+
''' OUTPUT
|
|
46
|
+
array([ 5.75068220e-03, -1.10937878e-02, 3.76693785e-01, 2.69105062e-02,
|
|
47
|
+
... ... ....
|
|
48
|
+
..................................................................],
|
|
49
|
+
dtype=float32)
|
|
50
|
+
'''
|
|
51
|
+
|
|
52
|
+
# Get all edges in a separate KeyedVectors instance - use with caution could be huge for big networks
|
|
53
|
+
edges_kv = edges_embs.as_keyed_vectors()
|
|
54
|
+
|
|
55
|
+
# Look for most similar edges - this time tuples must be sorted and as str
|
|
56
|
+
edges_kv.most_similar(str(('1', '2')))
|
|
57
|
+
|
|
58
|
+
# Save embeddings for later use
|
|
59
|
+
edges_kv.save_word2vec_format(EDGES_EMBEDDING_FILENAME)
|
|
60
|
+
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
### Parameters
|
|
64
|
+
|
|
65
|
+
#### `node2vec.Node2vec`
|
|
66
|
+
|
|
67
|
+
- `Node2Vec` constructor:
|
|
68
|
+
1. `graph`: The first positional argument has to be a networkx graph. Node names must be all integers or all strings. On the output model they will always be strings.
|
|
69
|
+
2. `dimensions`: Embedding dimensions (default: 128)
|
|
70
|
+
3. `walk_length`: Number of nodes in each walk (default: 80)
|
|
71
|
+
4. `num_walks`: Number of walks per node (default: 10)
|
|
72
|
+
5. `p`: Return hyper parameter (default: 1)
|
|
73
|
+
6. `q`: Input parameter (default: 1)
|
|
74
|
+
7. `weight_key`: On weighted graphs, this is the key for the weight attribute (default: 'weight')
|
|
75
|
+
8. `workers`: Number of workers for parallel execution (default: 1)
|
|
76
|
+
9. `sampling_strategy`: Node specific sampling strategies, supports setting node specific 'q', 'p', 'num_walks' and 'walk_length'.
|
|
77
|
+
Use these keys exactly. If not set, will use the global ones which were passed on the object initialization`
|
|
78
|
+
10. `quiet`: Boolean controlling the verbosity. (default: False)
|
|
79
|
+
11. `temp_folder`: String path pointing to folder to save a shared memory copy of the graph - Supply when working on graphs that are too big to fit in memory during algorithm execution.
|
|
80
|
+
12. `seed`: Seed for the random number generator (default: None). Deterministic results can be obtained if seed is set and `workers=1`.
|
|
81
|
+
|
|
82
|
+
- `Node2Vec.fit` method:
|
|
83
|
+
Accepts any key word argument acceptable by gensim.Word2Vec
|
|
84
|
+
|
|
85
|
+
#### `node2vec.EdgeEmbedder`
|
|
86
|
+
|
|
87
|
+
`EdgeEmbedder` is an abstract class which all the concrete edge embeddings class inherit from.
|
|
88
|
+
The classes are `AverageEmbedder`, `HadamardEmbedder`, `WeightedL1Embedder` and `WeightedL2Embedder` which their practical definition could be found in the [paper](https://arxiv.org/pdf/1607.00653.pdf) on table 1
|
|
89
|
+
Notice that edge embeddings are defined for any pair of nodes, connected or not and even node with itself.
|
|
90
|
+
|
|
91
|
+
- Constructor:
|
|
92
|
+
1. `keyed_vectors`: A gensim.models.KeyedVectors instance containing the node embeddings
|
|
93
|
+
2. `quiet`: Boolean controlling the verbosity. (default: False)
|
|
94
|
+
|
|
95
|
+
- `EdgeEmbedder.__getitem__(item)` method, better known as `EdgeEmbedder[item]`:
|
|
96
|
+
1. `item` - A tuple consisting of 2 nodes from the `keyed_vectors` passed in the constructor. Will return the embedding of the edge.
|
|
97
|
+
|
|
98
|
+
- `EdgeEmbedder.as_keyed_vectors` method: Returns a `gensim.models.KeyedVectors` instance with all possible node pairs in a *sorted* manner as string.
|
|
99
|
+
For example, for nodes ['1', '2', '3'] we will have as keys "('1', '1')", "('1', '2')", "('1', '3')", "('2', '2')", "('2', '3')" and "('3', '3')".
|
|
100
|
+
|
|
101
|
+
## Caveats
|
|
102
|
+
- Node names in the input graph must be all strings, or all ints
|
|
103
|
+
- Parallel execution not working on Windows (`joblib` known issue). To run non-parallel on Windows pass `workers=1` on the `Node2Vec`'s constructor
|
|
104
|
+
|
|
105
|
+
## TODO
|
|
106
|
+
- [x] Parallel implementation for walk generation
|
|
107
|
+
- [ ] Parallel implementation for probability precomputation
|