graph2emb 0.9.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,8 @@
1
+ *.pyc
2
+ .idea
3
+ .vscode
4
+ dist/
5
+ *.egg-info/
6
+ .venv/
7
+ .pytest_cache/
8
+ __pycache__/
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2018, 2025 Elior Cohen, Sangkon Han
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,104 @@
1
+ Metadata-Version: 2.4
2
+ Name: graph2emb
3
+ Version: 0.9.0
4
+ Summary: Implementation of the graph2vec algorithm
5
+ Project-URL: Homepage, https://github.com/sigmadream/graph2emb
6
+ Project-URL: Repository, https://github.com/sigmadream/graph2emb
7
+ Author-email: Sangkon Heo <sigmadream@gmail.com>
8
+ License: MIT
9
+ License-File: LICENSE
10
+ Classifier: License :: OSI Approved :: MIT License
11
+ Classifier: Operating System :: OS Independent
12
+ Classifier: Programming Language :: Python :: 3
13
+ Classifier: Programming Language :: Python :: 3.12
14
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
15
+ Requires-Python: >=3.12
16
+ Requires-Dist: joblib>=1.4.0
17
+ Requires-Dist: networkx>=3.1.0
18
+ Requires-Dist: numpy>=1.24.0
19
+ Requires-Dist: tqdm>=4.66.1
20
+ Description-Content-Type: text/markdown
21
+
22
+ # graph2emb
23
+
24
+ > [Elior Cohen의 node2vec](https://github.com/eliorc/node2vec), [graph2vec](https://github.com/benedekrozemberczki/graph2vec)를 참고해 Node2Vec + Graph2Vec 기능을 한 프로젝트로 합친 구현체입니다.
25
+
26
+ ## 특징
27
+
28
+ - Node2Vec: 랜덤 워크 기반 노드 임베딩
29
+ - Edge embedding: (Hadamard/평균/L1/L2) 방식의 엣지 임베딩
30
+ - Graph2Vec: WL(Weisfeiler–Lehman) hashing + Doc2Vec 기반 그래프 임베딩
31
+ - uv 기반 개발/실행
32
+
33
+ > 파이썬 모듈 이름은 `graph2emb` 입니다.
34
+
35
+ ## 빠른 시작 (sample 실행)
36
+
37
+ ```bash
38
+ # 의존성/프로젝트 설치
39
+ uv sync
40
+
41
+ # Node2Vec (edge list 로드)
42
+ uv run python sample/node2vec_from_edgelist.py
43
+
44
+ # Node2Vec + edge embedding
45
+ uv run python sample/node2vec_edge_embeddings.py
46
+
47
+ # Graph2Vec
48
+ uv run python sample/graph2vec_basic.py
49
+ ```
50
+
51
+ 샘플 설명은 `sample/README.md`를 참고하세요.
52
+
53
+ ## 사용 예시 (코드)
54
+
55
+ ### Node2Vec (노드 임베딩)
56
+
57
+ ```python
58
+ import networkx as nx
59
+ from graph2emb import Node2Vec
60
+
61
+ g = nx.fast_gnp_random_graph(n=100, p=0.3, seed=42)
62
+ node2vec = Node2Vec(g, dimensions=64, walk_length=30, num_walks=20, workers=1, seed=42)
63
+ model = node2vec.fit(window=10, min_count=1, epochs=5)
64
+
65
+ print(model.wv.most_similar("2", topn=5)) # 노드 id는 문자열로 조회
66
+ ```
67
+
68
+ ### Edge embedding (엣지 임베딩)
69
+
70
+ ```python
71
+ from graph2emb.edges import HadamardEmbedder
72
+
73
+ edges_embs = HadamardEmbedder(model.wv)
74
+ print(edges_embs[("1", "2")])
75
+ ```
76
+
77
+ ### Graph2Vec (그래프 임베딩)
78
+
79
+ ```python
80
+ import networkx as nx
81
+ from graph2emb import Graph2Vec
82
+
83
+ graphs = [
84
+ nx.fast_gnp_random_graph(n=12, p=0.3, seed=1),
85
+ nx.fast_gnp_random_graph(n=14, p=0.2, seed=2),
86
+ ]
87
+
88
+ g2v = Graph2Vec(dimensions=32, workers=1, min_count=1, epochs=3, seed=42)
89
+ g2v.fit(graphs)
90
+ emb = g2v.get_embedding() # shape: (len(graphs), dimensions)
91
+ ```
92
+
93
+ ## 개발/테스트
94
+
95
+ ```bash
96
+ # 테스트 실행
97
+ uv run pytest
98
+
99
+ # 병렬 테스트
100
+ uv run pytest -n auto
101
+
102
+ # 커버리지 포함 테스트 실행
103
+ uv run pytest --cov=graph2emb --cov-report=html
104
+ ```
@@ -0,0 +1,83 @@
1
+ # graph2emb
2
+
3
+ > [Elior Cohen의 node2vec](https://github.com/eliorc/node2vec), [graph2vec](https://github.com/benedekrozemberczki/graph2vec)를 참고해 Node2Vec + Graph2Vec 기능을 한 프로젝트로 합친 구현체입니다.
4
+
5
+ ## 특징
6
+
7
+ - Node2Vec: 랜덤 워크 기반 노드 임베딩
8
+ - Edge embedding: (Hadamard/평균/L1/L2) 방식의 엣지 임베딩
9
+ - Graph2Vec: WL(Weisfeiler–Lehman) hashing + Doc2Vec 기반 그래프 임베딩
10
+ - uv 기반 개발/실행
11
+
12
+ > 파이썬 모듈 이름은 `graph2emb` 입니다.
13
+
14
+ ## 빠른 시작 (sample 실행)
15
+
16
+ ```bash
17
+ # 의존성/프로젝트 설치
18
+ uv sync
19
+
20
+ # Node2Vec (edge list 로드)
21
+ uv run python sample/node2vec_from_edgelist.py
22
+
23
+ # Node2Vec + edge embedding
24
+ uv run python sample/node2vec_edge_embeddings.py
25
+
26
+ # Graph2Vec
27
+ uv run python sample/graph2vec_basic.py
28
+ ```
29
+
30
+ 샘플 설명은 `sample/README.md`를 참고하세요.
31
+
32
+ ## 사용 예시 (코드)
33
+
34
+ ### Node2Vec (노드 임베딩)
35
+
36
+ ```python
37
+ import networkx as nx
38
+ from graph2emb import Node2Vec
39
+
40
+ g = nx.fast_gnp_random_graph(n=100, p=0.3, seed=42)
41
+ node2vec = Node2Vec(g, dimensions=64, walk_length=30, num_walks=20, workers=1, seed=42)
42
+ model = node2vec.fit(window=10, min_count=1, epochs=5)
43
+
44
+ print(model.wv.most_similar("2", topn=5)) # 노드 id는 문자열로 조회
45
+ ```
46
+
47
+ ### Edge embedding (엣지 임베딩)
48
+
49
+ ```python
50
+ from graph2emb.edges import HadamardEmbedder
51
+
52
+ edges_embs = HadamardEmbedder(model.wv)
53
+ print(edges_embs[("1", "2")])
54
+ ```
55
+
56
+ ### Graph2Vec (그래프 임베딩)
57
+
58
+ ```python
59
+ import networkx as nx
60
+ from graph2emb import Graph2Vec
61
+
62
+ graphs = [
63
+ nx.fast_gnp_random_graph(n=12, p=0.3, seed=1),
64
+ nx.fast_gnp_random_graph(n=14, p=0.2, seed=2),
65
+ ]
66
+
67
+ g2v = Graph2Vec(dimensions=32, workers=1, min_count=1, epochs=3, seed=42)
68
+ g2v.fit(graphs)
69
+ emb = g2v.get_embedding() # shape: (len(graphs), dimensions)
70
+ ```
71
+
72
+ ## 개발/테스트
73
+
74
+ ```bash
75
+ # 테스트 실행
76
+ uv run pytest
77
+
78
+ # 병렬 테스트
79
+ uv run pytest -n auto
80
+
81
+ # 커버리지 포함 테스트 실행
82
+ uv run pytest --cov=graph2emb --cov-report=html
83
+ ```
@@ -0,0 +1,107 @@
1
+ # Node2Vec
2
+ [![Downloads](http://pepy.tech/badge/node2vec)](http://pepy.tech/project/node2vec)
3
+
4
+ Python3 implementation of the node2vec algorithm Aditya Grover, Jure Leskovec and Vid Kocijan.
5
+ [node2vec: Scalable Feature Learning for Networks. A. Grover, J. Leskovec. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2016.](https://snap.stanford.edu/node2vec/)
6
+
7
+ # Maintenance
8
+
9
+ ### I no longer have time to maintain this, if someone wants to pick the baton let me know
10
+
11
+ ## Installation
12
+
13
+ `pip install node2vec`
14
+
15
+ ## Usage
16
+ ```python
17
+ import networkx as nx
18
+ from graph2emb import Node2Vec
19
+
20
+ # Create a graph
21
+ graph = nx.fast_gnp_random_graph(n=100, p=0.5)
22
+
23
+ # Precompute probabilities and generate walks - **ON WINDOWS ONLY WORKS WITH workers=1**
24
+ node2vec = Node2Vec(graph, dimensions=64, walk_length=30, num_walks=200, workers=4) # Use temp_folder for big graphs
25
+
26
+ # Embed nodes
27
+ model = node2vec.fit(window=10, min_count=1, batch_words=4) # Any keywords acceptable by gensim.Word2Vec can be passed, `dimensions` and `workers` are automatically passed (from the Node2Vec constructor)
28
+
29
+ # Look for most similar nodes
30
+ model.wv.most_similar('2') # Output node names are always strings
31
+
32
+ # Save embeddings for later use
33
+ model.wv.save_word2vec_format(EMBEDDING_FILENAME)
34
+
35
+ # Save model for later use
36
+ model.save(EMBEDDING_MODEL_FILENAME)
37
+
38
+ # Embed edges using Hadamard method
39
+ from graph2emb.edges import HadamardEmbedder
40
+
41
+ edges_embs = HadamardEmbedder(keyed_vectors=model.wv)
42
+
43
+ # Look for embeddings on the fly - here we pass normal tuples
44
+ edges_embs[('1', '2')]
45
+ ''' OUTPUT
46
+ array([ 5.75068220e-03, -1.10937878e-02, 3.76693785e-01, 2.69105062e-02,
47
+ ... ... ....
48
+ ..................................................................],
49
+ dtype=float32)
50
+ '''
51
+
52
+ # Get all edges in a separate KeyedVectors instance - use with caution could be huge for big networks
53
+ edges_kv = edges_embs.as_keyed_vectors()
54
+
55
+ # Look for most similar edges - this time tuples must be sorted and as str
56
+ edges_kv.most_similar(str(('1', '2')))
57
+
58
+ # Save embeddings for later use
59
+ edges_kv.save_word2vec_format(EDGES_EMBEDDING_FILENAME)
60
+
61
+ ```
62
+
63
+ ### Parameters
64
+
65
+ #### `node2vec.Node2vec`
66
+
67
+ - `Node2Vec` constructor:
68
+ 1. `graph`: The first positional argument has to be a networkx graph. Node names must be all integers or all strings. On the output model they will always be strings.
69
+ 2. `dimensions`: Embedding dimensions (default: 128)
70
+ 3. `walk_length`: Number of nodes in each walk (default: 80)
71
+ 4. `num_walks`: Number of walks per node (default: 10)
72
+ 5. `p`: Return hyper parameter (default: 1)
73
+ 6. `q`: Input parameter (default: 1)
74
+ 7. `weight_key`: On weighted graphs, this is the key for the weight attribute (default: 'weight')
75
+ 8. `workers`: Number of workers for parallel execution (default: 1)
76
+ 9. `sampling_strategy`: Node specific sampling strategies, supports setting node specific 'q', 'p', 'num_walks' and 'walk_length'.
77
+ Use these keys exactly. If not set, will use the global ones which were passed on the object initialization`
78
+ 10. `quiet`: Boolean controlling the verbosity. (default: False)
79
+ 11. `temp_folder`: String path pointing to folder to save a shared memory copy of the graph - Supply when working on graphs that are too big to fit in memory during algorithm execution.
80
+ 12. `seed`: Seed for the random number generator (default: None). Deterministic results can be obtained if seed is set and `workers=1`.
81
+
82
+ - `Node2Vec.fit` method:
83
+ Accepts any key word argument acceptable by gensim.Word2Vec
84
+
85
+ #### `node2vec.EdgeEmbedder`
86
+
87
+ `EdgeEmbedder` is an abstract class which all the concrete edge embeddings class inherit from.
88
+ The classes are `AverageEmbedder`, `HadamardEmbedder`, `WeightedL1Embedder` and `WeightedL2Embedder` which their practical definition could be found in the [paper](https://arxiv.org/pdf/1607.00653.pdf) on table 1
89
+ Notice that edge embeddings are defined for any pair of nodes, connected or not and even node with itself.
90
+
91
+ - Constructor:
92
+ 1. `keyed_vectors`: A gensim.models.KeyedVectors instance containing the node embeddings
93
+ 2. `quiet`: Boolean controlling the verbosity. (default: False)
94
+
95
+ - `EdgeEmbedder.__getitem__(item)` method, better known as `EdgeEmbedder[item]`:
96
+ 1. `item` - A tuple consisting of 2 nodes from the `keyed_vectors` passed in the constructor. Will return the embedding of the edge.
97
+
98
+ - `EdgeEmbedder.as_keyed_vectors` method: Returns a `gensim.models.KeyedVectors` instance with all possible node pairs in a *sorted* manner as string.
99
+ For example, for nodes ['1', '2', '3'] we will have as keys "('1', '1')", "('1', '2')", "('1', '3')", "('2', '2')", "('2', '3')" and "('3', '3')".
100
+
101
+ ## Caveats
102
+ - Node names in the input graph must be all strings, or all ints
103
+ - Parallel execution not working on Windows (`joblib` known issue). To run non-parallel on Windows pass `workers=1` on the `Node2Vec`'s constructor
104
+
105
+ ## TODO
106
+ - [x] Parallel implementation for walk generation
107
+ - [ ] Parallel implementation for probability precomputation
@@ -0,0 +1,11 @@
1
+ from . import edges
2
+ from .node2vec import Node2Vec
3
+ from .graph2vec import Graph2Vec
4
+ from importlib import metadata
5
+
6
+ try:
7
+ __version__ = metadata.version("graph2emb")
8
+ except metadata.PackageNotFoundError:
9
+ __version__ = "0.0.0"
10
+
11
+