difflayers 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- difflayers-0.1.0/LICENSE +79 -0
- difflayers-0.1.0/PKG-INFO +210 -0
- difflayers-0.1.0/README.md +176 -0
- difflayers-0.1.0/difflayers/__init__.py +965 -0
- difflayers-0.1.0/difflayers/activation.py +339 -0
- difflayers-0.1.0/difflayers/attention_operator.py +157 -0
- difflayers-0.1.0/difflayers/auxiliary/__init__.py +0 -0
- difflayers-0.1.0/difflayers/auxiliary/data.py +252 -0
- difflayers-0.1.0/difflayers/diffused_attention.py +427 -0
- difflayers-0.1.0/difflayers/diffusion.py +395 -0
- difflayers-0.1.0/difflayers/dynamics_engine.py +540 -0
- difflayers-0.1.0/difflayers/functional.py +459 -0
- difflayers-0.1.0/difflayers/graph/__init__.py +18 -0
- difflayers-0.1.0/difflayers/graph/build_graph.py +77 -0
- difflayers-0.1.0/difflayers/graph/builder.py +120 -0
- difflayers-0.1.0/difflayers/graph/laplacian.py +76 -0
- difflayers-0.1.0/difflayers/graph/laplacian_builder.py +64 -0
- difflayers-0.1.0/difflayers/transformer.py +212 -0
- difflayers-0.1.0/difflayers.egg-info/PKG-INFO +210 -0
- difflayers-0.1.0/difflayers.egg-info/SOURCES.txt +25 -0
- difflayers-0.1.0/difflayers.egg-info/dependency_links.txt +1 -0
- difflayers-0.1.0/difflayers.egg-info/not-zip-safe +1 -0
- difflayers-0.1.0/difflayers.egg-info/requires.txt +3 -0
- difflayers-0.1.0/difflayers.egg-info/top_level.txt +2 -0
- difflayers-0.1.0/pyproject.toml +42 -0
- difflayers-0.1.0/setup.cfg +4 -0
- difflayers-0.1.0/setup.py +41 -0
difflayers-0.1.0/LICENSE
ADDED
|
@@ -0,0 +1,79 @@
|
|
|
1
|
+
From Hopfield layers:
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2020, Institute for Machine Learning, Johannes Kepler University Linz (Bernhard Schäfl)
|
|
4
|
+
All rights reserved.
|
|
5
|
+
|
|
6
|
+
All other contributions:
|
|
7
|
+
Copyright (c) 2020 the respective contributors
|
|
8
|
+
All rights reserved.
|
|
9
|
+
|
|
10
|
+
From PyTorch:
|
|
11
|
+
|
|
12
|
+
Copyright (c) 2016- Facebook, Inc (Adam Paszke)
|
|
13
|
+
Copyright (c) 2014- Facebook, Inc (Soumith Chintala)
|
|
14
|
+
Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
|
|
15
|
+
Copyright (c) 2012-2014 Deepmind Technologies (Koray Kavukcuoglu)
|
|
16
|
+
Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
|
|
17
|
+
Copyright (c) 2011-2013 NYU (Clement Farabet)
|
|
18
|
+
Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
|
|
19
|
+
Copyright (c) 2006 Idiap Research Institute (Samy Bengio)
|
|
20
|
+
Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
|
|
21
|
+
|
|
22
|
+
From Caffe2:
|
|
23
|
+
|
|
24
|
+
Copyright (c) 2016-present, Facebook Inc. All rights reserved.
|
|
25
|
+
|
|
26
|
+
All contributions by Facebook:
|
|
27
|
+
Copyright (c) 2016 Facebook Inc.
|
|
28
|
+
|
|
29
|
+
All contributions by Google:
|
|
30
|
+
Copyright (c) 2015 Google Inc.
|
|
31
|
+
All rights reserved.
|
|
32
|
+
|
|
33
|
+
All contributions by Yangqing Jia:
|
|
34
|
+
Copyright (c) 2015 Yangqing Jia
|
|
35
|
+
All rights reserved.
|
|
36
|
+
|
|
37
|
+
All contributions from Caffe:
|
|
38
|
+
Copyright(c) 2013, 2014, 2015, the respective contributors
|
|
39
|
+
All rights reserved.
|
|
40
|
+
|
|
41
|
+
All other contributions:
|
|
42
|
+
Copyright(c) 2015, 2016 the respective contributors
|
|
43
|
+
All rights reserved.
|
|
44
|
+
|
|
45
|
+
Caffe2 uses a copyright model similar to Caffe: each contributor holds
|
|
46
|
+
copyright over their contributions to Caffe2. The project versioning records
|
|
47
|
+
all such contribution and copyright details. If a contributor wants to further
|
|
48
|
+
mark their specific copyright on a particular contribution, they should
|
|
49
|
+
indicate their copyright solely in the commit message of the change when it is
|
|
50
|
+
committed.
|
|
51
|
+
|
|
52
|
+
All rights reserved.
|
|
53
|
+
|
|
54
|
+
Redistribution and use in source and binary forms, with or without
|
|
55
|
+
modification, are permitted provided that the following conditions are met:
|
|
56
|
+
|
|
57
|
+
1. Redistributions of source code must retain the above copyright
|
|
58
|
+
notice, this list of conditions and the following disclaimer.
|
|
59
|
+
|
|
60
|
+
2. Redistributions in binary form must reproduce the above copyright
|
|
61
|
+
notice, this list of conditions and the following disclaimer in the
|
|
62
|
+
documentation and/or other materials provided with the distribution.
|
|
63
|
+
|
|
64
|
+
3. Neither the names of Facebook, Deepmind Technologies, NYU, NEC Laboratories America
|
|
65
|
+
and IDIAP Research Institute nor the names of its contributors may be
|
|
66
|
+
used to endorse or promote products derived from this software without
|
|
67
|
+
specific prior written permission.
|
|
68
|
+
|
|
69
|
+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
|
|
70
|
+
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
71
|
+
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
72
|
+
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
|
|
73
|
+
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
|
|
74
|
+
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
|
|
75
|
+
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
|
76
|
+
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
|
|
77
|
+
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
|
|
78
|
+
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
|
|
79
|
+
POSSIBILITY OF SUCH DAMAGE.
|
|
@@ -0,0 +1,210 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: difflayers
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: difflayers: Diffusion-Augmented Hopfield Networks
|
|
5
|
+
Home-page: https://github.com/hopfileds/hopfield-layers
|
|
6
|
+
Author: Priyam Ghosh
|
|
7
|
+
Author-email: Priyam Ghosh <priyamghosh9753@gmail.com>
|
|
8
|
+
License: BSD
|
|
9
|
+
Project-URL: Homepage, https://github.com/hopfileds/hopfield-layers
|
|
10
|
+
Project-URL: Repository, https://github.com/hopfileds/hopfield-layers
|
|
11
|
+
Project-URL: Bug Tracker, https://github.com/hopfileds/hopfield-layers/issues
|
|
12
|
+
Keywords: hopfield networks,deep learning,attention,diffusion,graph
|
|
13
|
+
Classifier: Development Status :: 3 - Alpha
|
|
14
|
+
Classifier: Intended Audience :: Science/Research
|
|
15
|
+
Classifier: License :: OSI Approved :: BSD License
|
|
16
|
+
Classifier: Programming Language :: Python :: 3
|
|
17
|
+
Classifier: Programming Language :: Python :: 3.8
|
|
18
|
+
Classifier: Programming Language :: Python :: 3.9
|
|
19
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
20
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
21
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
22
|
+
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
23
|
+
Classifier: Operating System :: OS Independent
|
|
24
|
+
Requires-Python: >=3.8
|
|
25
|
+
Description-Content-Type: text/markdown
|
|
26
|
+
License-File: LICENSE
|
|
27
|
+
Requires-Dist: torch>=1.9.0
|
|
28
|
+
Requires-Dist: numpy>=1.20.0
|
|
29
|
+
Requires-Dist: scipy>=1.7.0
|
|
30
|
+
Dynamic: author
|
|
31
|
+
Dynamic: home-page
|
|
32
|
+
Dynamic: license-file
|
|
33
|
+
Dynamic: requires-python
|
|
34
|
+
|
|
35
|
+
# Hopfield Networks is All You Need
|
|
36
|
+
|
|
37
|
+
_Hubert Ramsauer<sup>1</sup>, Bernhard Schäfl<sup>1</sup>, Johannes Lehner<sup>1</sup>, Philipp Seidl<sup>1</sup>,
|
|
38
|
+
Michael Widrich<sup>1</sup>, Lukas Gruber<sup>1</sup>, Markus Holzleitner<sup>1</sup>, Milena Pavlović<sup>3, 4</sup>,
|
|
39
|
+
Geir Kjetil Sandve<sup>4</sup>, Victor Greiff<sup>3</sup>, David Kreil<sup>2</sup>, Michael Kopp<sup>2</sup>, Günter
|
|
40
|
+
Klambauer<sup>1</sup>, Johannes Brandstetter<sup>1</sup>, Sepp Hochreiter<sup>1, 2</sup>_
|
|
41
|
+
|
|
42
|
+
<sup>1</sup> ELLIS Unit Linz and LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria
|
|
43
|
+
<sup>2</sup> Institute of Advanced Research in Artificial Intelligence (IARAI)
|
|
44
|
+
<sup>3</sup> Department of Immunology, University of Oslo, Norway
|
|
45
|
+
<sup>4</sup> Department of Informatics, University of Oslo, Norway
|
|
46
|
+
|
|
47
|
+
---
|
|
48
|
+
|
|
49
|
+
##### Detailed blog post on this paper as well as the necessary background on Hopfield networks at [this link](https://ml-jku.github.io/hopfield-layers/).
|
|
50
|
+
|
|
51
|
+
---
|
|
52
|
+
|
|
53
|
+
The transformer and BERT models pushed the performance on NLP tasks to new levels via their attention mechanism. We show
|
|
54
|
+
that this attention mechanism is the update rule of a modern Hopfield network with continuous states. This new Hopfield
|
|
55
|
+
network can store exponentially (with the dimension) many patterns,converges with one update, and has exponentially
|
|
56
|
+
small retrieval errors. The number of stored patterns must be traded off against convergence speed and retrieval error.
|
|
57
|
+
The new Hopfield network has three types of energy minima (fixed points of the update):
|
|
58
|
+
|
|
59
|
+
1. global fixed point averaging over all patterns,
|
|
60
|
+
2. metastable states averaging over a subset of patterns, and
|
|
61
|
+
3. fixed points which store a single pattern.
|
|
62
|
+
|
|
63
|
+
Transformers learn an attention mechanism by constructing an embedding of patterns and queries into an associative
|
|
64
|
+
space. Transformer and BERT models operate in their first layers preferably in the global averaging regime, while they
|
|
65
|
+
operate in higher layers in metastable states. The gradient in transformers is maximal in the regime of metastable
|
|
66
|
+
states, is uniformly distributed when averaging globally, and vanishes when a fixed point is near a stored pattern.
|
|
67
|
+
Based on the Hopfield network interpretation, we analyzed learning of transformer and BERT architectures. Learning
|
|
68
|
+
starts with attention heads that average and then most of them switch to metastable states. However, the majority of
|
|
69
|
+
heads in the first layers still averages and can be replaced by averaging operations like the Gaussian weighting that we
|
|
70
|
+
propose. In contrast, heads in the last layers steadily learn and seem to use metastable states to collect information
|
|
71
|
+
created in lower layers. These heads seem a promising target for improving transformers. Neural networks that integrate
|
|
72
|
+
Hopfield networks that are equivalent to attention heads outperform other methods on immune repertoire classification,
|
|
73
|
+
where the Hopfield net stores several hundreds of thousands of patterns.
|
|
74
|
+
|
|
75
|
+
With _this_ repository, we provide a PyTorch implementation of a new layer called “Hopfield” which allows to equip deep
|
|
76
|
+
learning architectures with Hopfield networks as new memory concepts.
|
|
77
|
+
|
|
78
|
+
The full paper is available at [https://arxiv.org/abs/2008.02217](https://arxiv.org/abs/2008.02217).
|
|
79
|
+
|
|
80
|
+
## Requirements
|
|
81
|
+
|
|
82
|
+
The software was developed and tested on the following 64-bit operating systems:
|
|
83
|
+
|
|
84
|
+
- CentOS Linux release 8.1.1911 (Core)
|
|
85
|
+
- macOS 10.15.5 (Catalina)
|
|
86
|
+
|
|
87
|
+
As the development environment, [Python](https://www.python.org) 3.8.3 in combination
|
|
88
|
+
with [PyTorch](https://pytorch.org) 1.6.0 was used (a version of at least 1.5.0 should be sufficient). More details on
|
|
89
|
+
how to install PyTorch are available on the [official project page](https://pytorch.org).
|
|
90
|
+
|
|
91
|
+
## Installation
|
|
92
|
+
|
|
93
|
+
The recommended way to install the software is to use `pip/pip3`:
|
|
94
|
+
|
|
95
|
+
```bash
|
|
96
|
+
$ pip3 install git+https://github.com/ml-jku/hopfield-layers
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
To successfully run the [Jupyter notebooks](https://jupyter.org) contained in [examples](examples/), additional
|
|
100
|
+
third-party modules are needed:
|
|
101
|
+
|
|
102
|
+
```bash
|
|
103
|
+
$ pip3 install -r examples/requirements.txt
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
The installation of the [Jupyter software](https://jupyter.org/install.html) itself is not covered. More details on how
|
|
107
|
+
to install Jupyter are available at the [official installation page](https://jupyter.org/install.html).
|
|
108
|
+
|
|
109
|
+
## Usage
|
|
110
|
+
|
|
111
|
+
To get up and running with Hopfield-based networks, only <i>one</i> argument needs to be set, the size (depth) of the
|
|
112
|
+
input.
|
|
113
|
+
|
|
114
|
+
```python
|
|
115
|
+
from hflayers import Hopfield
|
|
116
|
+
|
|
117
|
+
hopfield = Hopfield(input_size=...)
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
It is also possible to replace commonly used pooling functions with a Hopfield-based one. Internally, a <i>state
|
|
121
|
+
pattern</i> is trained, which in turn is used to compute pooling weights with respect to the input.
|
|
122
|
+
|
|
123
|
+
```python
|
|
124
|
+
from hflayers import HopfieldPooling
|
|
125
|
+
|
|
126
|
+
hopfield_pooling = HopfieldPooling(input_size=...)
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
A second variant of our Hopfield-based modules is one which employs a trainable but fixed lookup mechanism. Internally,
|
|
130
|
+
one or multiple <i>stored patterns</i> and <i>pattern projections</i> are trained (optionally in a non-shared manner),
|
|
131
|
+
which in turn are used as a lookup mechanism independent of the input data.
|
|
132
|
+
|
|
133
|
+
```python
|
|
134
|
+
from hflayers import HopfieldLayer
|
|
135
|
+
|
|
136
|
+
hopfield_lookup = HopfieldLayer(input_size=...)
|
|
137
|
+
```
|
|
138
|
+
|
|
139
|
+
The usage is as <i>simple</i> as with the main module, but equally <i>powerful</i>.
|
|
140
|
+
|
|
141
|
+
## Examples
|
|
142
|
+
|
|
143
|
+
Generally, the Hopfield layer is designed to be used to implement or to substitute different layers like:
|
|
144
|
+
|
|
145
|
+
- <b>Pooling layers:</b> We consider the Hopfield layer as a pooling layer if only one static state (query) pattern
|
|
146
|
+
exists. Then, it is de facto a pooling over the sequence, which results from the softmax values applied on the stored
|
|
147
|
+
patterns. Therefore, our Hopfield layer can act as a pooling layer.
|
|
148
|
+
|
|
149
|
+
- <b>Permutation equivariant layers:</b> Our Hopfield layer can be used as a plug-in replacement for permutation
|
|
150
|
+
equivariant layers. Since the Hopfield layer is an associative memory it assumes no dependency between the input
|
|
151
|
+
patterns.
|
|
152
|
+
|
|
153
|
+
- <b>GRU & LSTM layers:</b> Our Hopfield layer can be used as a plug-in replacement for GRU & LSTM layers. Optionally,
|
|
154
|
+
for substituting GRU & LSTM layers, positional encoding might be considered.
|
|
155
|
+
|
|
156
|
+
- <b>Attention layers:</b> Our Hopfield layer can act as an attention layer, where state (query) and stored (key)
|
|
157
|
+
patterns are different, and need to be associated.
|
|
158
|
+
|
|
159
|
+
The folder [examples](examples/) contains multiple demonstrations on how to use the <code>Hopfield</code>, <code>
|
|
160
|
+
HopfieldPooling</code> as well as the <code>HopfieldLayer</code> modules. To successfully run the
|
|
161
|
+
contained [Jupyter notebooks](https://jupyter.org), additional third-party modules
|
|
162
|
+
like [pandas](https://pandas.pydata.org) and [seaborn](https://seaborn.pydata.org) are required.
|
|
163
|
+
|
|
164
|
+
- [Bit Pattern Set](examples/bit_pattern/bit_pattern_demo.ipynb): The dataset of this demonstration falls into the
|
|
165
|
+
category of <i>binary classification</i> tasks in the domain of <i>Multiple Instance Learning (MIL)</i> problems. Each
|
|
166
|
+
bag comprises a collection of bit pattern instances, wheres each instance is a sequence of <b>0s</b> and <b>1s</b>.
|
|
167
|
+
The positive class has specific bit patterns injected, which are absent in the negative one. This demonstration shows,
|
|
168
|
+
that <code>Hopfield</code>, <code>HopfieldPooling</code> and <code>HopfieldLayer</code> are capable of learning and
|
|
169
|
+
filtering each bag with respect to the class-defining bit patterns.
|
|
170
|
+
|
|
171
|
+
- [Latch Sequence Set](examples/latch_sequence/latch_sequence_demo.ipynb): We study an easy example of learning
|
|
172
|
+
long-term dependencies by using a simple <i>latch task</i>,
|
|
173
|
+
see [Hochreiter and Mozer](https://link.springer.com/chapter/10.1007/3-540-44668-0_92). The essence of this task is
|
|
174
|
+
that a sequence of inputs is presented, beginning with one of two symbols, <b>A</b> or <b>B</b>, and after a variable
|
|
175
|
+
number of time steps, the model has to output a corresponding symbol. Thus, the task requires memorizing the original
|
|
176
|
+
input over time. It has to be noted, that both class-defining symbols must only appear at the first position of a
|
|
177
|
+
sequence. This task was specifically designed to demonstrate the capability of recurrent neural networks to capture
|
|
178
|
+
long term dependencies. This demonstration shows, that <code>Hopfield</code>, <code>HopfieldPooling</code> and <code>
|
|
179
|
+
HopfieldLayer</code> adapt extremely fast to this specific task, concentrating only on the first entry of the
|
|
180
|
+
sequence.
|
|
181
|
+
|
|
182
|
+
- [Attention-based Deep Multiple Instance Learning](examples/mnist_bags/mnist_bags_demo.ipynb): The dataset of this
|
|
183
|
+
demonstration falls into the category of <i>binary classification</i> tasks in the domain of <i>Multiple Instance
|
|
184
|
+
Learning (MIL)</i> problems, see [Ilse and Tomczak](https://arxiv.org/abs/1802.04712). Each bag comprises a collection
|
|
185
|
+
of <b>28x28</b> grayscale images/instances, whereas each instance is a sequence of pixel values in the range
|
|
186
|
+
of <b>[0; 255]</b>. The amount of instances per pag is drawn from a Gaussian with specified mean and variance. The
|
|
187
|
+
positive class is defined by the presence of the target number/digit, whereas the negative one by its absence.
|
|
188
|
+
|
|
189
|
+
## Disclaimer
|
|
190
|
+
|
|
191
|
+
Some implementations of this repository are based on existing ones of the
|
|
192
|
+
official [PyTorch repository v1.6.0](https://github.com/pytorch/pytorch/tree/v1.6.0) and accordingly extended and
|
|
193
|
+
modified. In the following, the involved parts are listed:
|
|
194
|
+
|
|
195
|
+
- The implementation of [HopfieldCore](hflayers/activation.py#L16) is based on the implementation
|
|
196
|
+
of [MultiheadAttention](https://github.com/pytorch/pytorch/blob/b31f58de6fa8bbda5353b3c77d9be4914399724d/torch/nn/modules/activation.py#L771)
|
|
197
|
+
.
|
|
198
|
+
- The implementation of [hopfield_core_forward](hflayers/functional.py#L8) is based on the implementation
|
|
199
|
+
of [multi_head_attention_forward](https://github.com/pytorch/pytorch/blob/b31f58de6fa8bbda5353b3c77d9be4914399724d/torch/nn/functional.py#L3854)
|
|
200
|
+
.
|
|
201
|
+
- The implementation of [HopfieldEncoderLayer](hflayers/transformer.py#L12) is based on the implementation
|
|
202
|
+
of [TransformerEncoderLayer](https://github.com/pytorch/pytorch/blob/b31f58de6fa8bbda5353b3c77d9be4914399724d/torch/nn/modules/transformer.py#L241)
|
|
203
|
+
.
|
|
204
|
+
- The implementation of [HopfieldDecoderLayer](hflayers/transformer.py#L101) is based on the implementation
|
|
205
|
+
of [TransformerDecoderLayer](https://github.com/pytorch/pytorch/blob/b31f58de6fa8bbda5353b3c77d9be4914399724d/torch/nn/modules/transformer.py#L303)
|
|
206
|
+
.
|
|
207
|
+
|
|
208
|
+
## License
|
|
209
|
+
|
|
210
|
+
This repository is BSD-style licensed (see [LICENSE](LICENSE)), except where noted otherwise.
|
|
@@ -0,0 +1,176 @@
|
|
|
1
|
+
# Hopfield Networks is All You Need
|
|
2
|
+
|
|
3
|
+
_Hubert Ramsauer<sup>1</sup>, Bernhard Schäfl<sup>1</sup>, Johannes Lehner<sup>1</sup>, Philipp Seidl<sup>1</sup>,
|
|
4
|
+
Michael Widrich<sup>1</sup>, Lukas Gruber<sup>1</sup>, Markus Holzleitner<sup>1</sup>, Milena Pavlović<sup>3, 4</sup>,
|
|
5
|
+
Geir Kjetil Sandve<sup>4</sup>, Victor Greiff<sup>3</sup>, David Kreil<sup>2</sup>, Michael Kopp<sup>2</sup>, Günter
|
|
6
|
+
Klambauer<sup>1</sup>, Johannes Brandstetter<sup>1</sup>, Sepp Hochreiter<sup>1, 2</sup>_
|
|
7
|
+
|
|
8
|
+
<sup>1</sup> ELLIS Unit Linz and LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria
|
|
9
|
+
<sup>2</sup> Institute of Advanced Research in Artificial Intelligence (IARAI)
|
|
10
|
+
<sup>3</sup> Department of Immunology, University of Oslo, Norway
|
|
11
|
+
<sup>4</sup> Department of Informatics, University of Oslo, Norway
|
|
12
|
+
|
|
13
|
+
---
|
|
14
|
+
|
|
15
|
+
##### Detailed blog post on this paper as well as the necessary background on Hopfield networks at [this link](https://ml-jku.github.io/hopfield-layers/).
|
|
16
|
+
|
|
17
|
+
---
|
|
18
|
+
|
|
19
|
+
The transformer and BERT models pushed the performance on NLP tasks to new levels via their attention mechanism. We show
|
|
20
|
+
that this attention mechanism is the update rule of a modern Hopfield network with continuous states. This new Hopfield
|
|
21
|
+
network can store exponentially (with the dimension) many patterns,converges with one update, and has exponentially
|
|
22
|
+
small retrieval errors. The number of stored patterns must be traded off against convergence speed and retrieval error.
|
|
23
|
+
The new Hopfield network has three types of energy minima (fixed points of the update):
|
|
24
|
+
|
|
25
|
+
1. global fixed point averaging over all patterns,
|
|
26
|
+
2. metastable states averaging over a subset of patterns, and
|
|
27
|
+
3. fixed points which store a single pattern.
|
|
28
|
+
|
|
29
|
+
Transformers learn an attention mechanism by constructing an embedding of patterns and queries into an associative
|
|
30
|
+
space. Transformer and BERT models operate in their first layers preferably in the global averaging regime, while they
|
|
31
|
+
operate in higher layers in metastable states. The gradient in transformers is maximal in the regime of metastable
|
|
32
|
+
states, is uniformly distributed when averaging globally, and vanishes when a fixed point is near a stored pattern.
|
|
33
|
+
Based on the Hopfield network interpretation, we analyzed learning of transformer and BERT architectures. Learning
|
|
34
|
+
starts with attention heads that average and then most of them switch to metastable states. However, the majority of
|
|
35
|
+
heads in the first layers still averages and can be replaced by averaging operations like the Gaussian weighting that we
|
|
36
|
+
propose. In contrast, heads in the last layers steadily learn and seem to use metastable states to collect information
|
|
37
|
+
created in lower layers. These heads seem a promising target for improving transformers. Neural networks that integrate
|
|
38
|
+
Hopfield networks that are equivalent to attention heads outperform other methods on immune repertoire classification,
|
|
39
|
+
where the Hopfield net stores several hundreds of thousands of patterns.
|
|
40
|
+
|
|
41
|
+
With _this_ repository, we provide a PyTorch implementation of a new layer called “Hopfield” which allows to equip deep
|
|
42
|
+
learning architectures with Hopfield networks as new memory concepts.
|
|
43
|
+
|
|
44
|
+
The full paper is available at [https://arxiv.org/abs/2008.02217](https://arxiv.org/abs/2008.02217).
|
|
45
|
+
|
|
46
|
+
## Requirements
|
|
47
|
+
|
|
48
|
+
The software was developed and tested on the following 64-bit operating systems:
|
|
49
|
+
|
|
50
|
+
- CentOS Linux release 8.1.1911 (Core)
|
|
51
|
+
- macOS 10.15.5 (Catalina)
|
|
52
|
+
|
|
53
|
+
As the development environment, [Python](https://www.python.org) 3.8.3 in combination
|
|
54
|
+
with [PyTorch](https://pytorch.org) 1.6.0 was used (a version of at least 1.5.0 should be sufficient). More details on
|
|
55
|
+
how to install PyTorch are available on the [official project page](https://pytorch.org).
|
|
56
|
+
|
|
57
|
+
## Installation
|
|
58
|
+
|
|
59
|
+
The recommended way to install the software is to use `pip/pip3`:
|
|
60
|
+
|
|
61
|
+
```bash
|
|
62
|
+
$ pip3 install git+https://github.com/ml-jku/hopfield-layers
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
To successfully run the [Jupyter notebooks](https://jupyter.org) contained in [examples](examples/), additional
|
|
66
|
+
third-party modules are needed:
|
|
67
|
+
|
|
68
|
+
```bash
|
|
69
|
+
$ pip3 install -r examples/requirements.txt
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
The installation of the [Jupyter software](https://jupyter.org/install.html) itself is not covered. More details on how
|
|
73
|
+
to install Jupyter are available at the [official installation page](https://jupyter.org/install.html).
|
|
74
|
+
|
|
75
|
+
## Usage
|
|
76
|
+
|
|
77
|
+
To get up and running with Hopfield-based networks, only <i>one</i> argument needs to be set, the size (depth) of the
|
|
78
|
+
input.
|
|
79
|
+
|
|
80
|
+
```python
|
|
81
|
+
from hflayers import Hopfield
|
|
82
|
+
|
|
83
|
+
hopfield = Hopfield(input_size=...)
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
It is also possible to replace commonly used pooling functions with a Hopfield-based one. Internally, a <i>state
|
|
87
|
+
pattern</i> is trained, which in turn is used to compute pooling weights with respect to the input.
|
|
88
|
+
|
|
89
|
+
```python
|
|
90
|
+
from hflayers import HopfieldPooling
|
|
91
|
+
|
|
92
|
+
hopfield_pooling = HopfieldPooling(input_size=...)
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
A second variant of our Hopfield-based modules is one which employs a trainable but fixed lookup mechanism. Internally,
|
|
96
|
+
one or multiple <i>stored patterns</i> and <i>pattern projections</i> are trained (optionally in a non-shared manner),
|
|
97
|
+
which in turn are used as a lookup mechanism independent of the input data.
|
|
98
|
+
|
|
99
|
+
```python
|
|
100
|
+
from hflayers import HopfieldLayer
|
|
101
|
+
|
|
102
|
+
hopfield_lookup = HopfieldLayer(input_size=...)
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
The usage is as <i>simple</i> as with the main module, but equally <i>powerful</i>.
|
|
106
|
+
|
|
107
|
+
## Examples
|
|
108
|
+
|
|
109
|
+
Generally, the Hopfield layer is designed to be used to implement or to substitute different layers like:
|
|
110
|
+
|
|
111
|
+
- <b>Pooling layers:</b> We consider the Hopfield layer as a pooling layer if only one static state (query) pattern
|
|
112
|
+
exists. Then, it is de facto a pooling over the sequence, which results from the softmax values applied on the stored
|
|
113
|
+
patterns. Therefore, our Hopfield layer can act as a pooling layer.
|
|
114
|
+
|
|
115
|
+
- <b>Permutation equivariant layers:</b> Our Hopfield layer can be used as a plug-in replacement for permutation
|
|
116
|
+
equivariant layers. Since the Hopfield layer is an associative memory it assumes no dependency between the input
|
|
117
|
+
patterns.
|
|
118
|
+
|
|
119
|
+
- <b>GRU & LSTM layers:</b> Our Hopfield layer can be used as a plug-in replacement for GRU & LSTM layers. Optionally,
|
|
120
|
+
for substituting GRU & LSTM layers, positional encoding might be considered.
|
|
121
|
+
|
|
122
|
+
- <b>Attention layers:</b> Our Hopfield layer can act as an attention layer, where state (query) and stored (key)
|
|
123
|
+
patterns are different, and need to be associated.
|
|
124
|
+
|
|
125
|
+
The folder [examples](examples/) contains multiple demonstrations on how to use the <code>Hopfield</code>, <code>
|
|
126
|
+
HopfieldPooling</code> as well as the <code>HopfieldLayer</code> modules. To successfully run the
|
|
127
|
+
contained [Jupyter notebooks](https://jupyter.org), additional third-party modules
|
|
128
|
+
like [pandas](https://pandas.pydata.org) and [seaborn](https://seaborn.pydata.org) are required.
|
|
129
|
+
|
|
130
|
+
- [Bit Pattern Set](examples/bit_pattern/bit_pattern_demo.ipynb): The dataset of this demonstration falls into the
|
|
131
|
+
category of <i>binary classification</i> tasks in the domain of <i>Multiple Instance Learning (MIL)</i> problems. Each
|
|
132
|
+
bag comprises a collection of bit pattern instances, wheres each instance is a sequence of <b>0s</b> and <b>1s</b>.
|
|
133
|
+
The positive class has specific bit patterns injected, which are absent in the negative one. This demonstration shows,
|
|
134
|
+
that <code>Hopfield</code>, <code>HopfieldPooling</code> and <code>HopfieldLayer</code> are capable of learning and
|
|
135
|
+
filtering each bag with respect to the class-defining bit patterns.
|
|
136
|
+
|
|
137
|
+
- [Latch Sequence Set](examples/latch_sequence/latch_sequence_demo.ipynb): We study an easy example of learning
|
|
138
|
+
long-term dependencies by using a simple <i>latch task</i>,
|
|
139
|
+
see [Hochreiter and Mozer](https://link.springer.com/chapter/10.1007/3-540-44668-0_92). The essence of this task is
|
|
140
|
+
that a sequence of inputs is presented, beginning with one of two symbols, <b>A</b> or <b>B</b>, and after a variable
|
|
141
|
+
number of time steps, the model has to output a corresponding symbol. Thus, the task requires memorizing the original
|
|
142
|
+
input over time. It has to be noted, that both class-defining symbols must only appear at the first position of a
|
|
143
|
+
sequence. This task was specifically designed to demonstrate the capability of recurrent neural networks to capture
|
|
144
|
+
long term dependencies. This demonstration shows, that <code>Hopfield</code>, <code>HopfieldPooling</code> and <code>
|
|
145
|
+
HopfieldLayer</code> adapt extremely fast to this specific task, concentrating only on the first entry of the
|
|
146
|
+
sequence.
|
|
147
|
+
|
|
148
|
+
- [Attention-based Deep Multiple Instance Learning](examples/mnist_bags/mnist_bags_demo.ipynb): The dataset of this
|
|
149
|
+
demonstration falls into the category of <i>binary classification</i> tasks in the domain of <i>Multiple Instance
|
|
150
|
+
Learning (MIL)</i> problems, see [Ilse and Tomczak](https://arxiv.org/abs/1802.04712). Each bag comprises a collection
|
|
151
|
+
of <b>28x28</b> grayscale images/instances, whereas each instance is a sequence of pixel values in the range
|
|
152
|
+
of <b>[0; 255]</b>. The amount of instances per pag is drawn from a Gaussian with specified mean and variance. The
|
|
153
|
+
positive class is defined by the presence of the target number/digit, whereas the negative one by its absence.
|
|
154
|
+
|
|
155
|
+
## Disclaimer
|
|
156
|
+
|
|
157
|
+
Some implementations of this repository are based on existing ones of the
|
|
158
|
+
official [PyTorch repository v1.6.0](https://github.com/pytorch/pytorch/tree/v1.6.0) and accordingly extended and
|
|
159
|
+
modified. In the following, the involved parts are listed:
|
|
160
|
+
|
|
161
|
+
- The implementation of [HopfieldCore](hflayers/activation.py#L16) is based on the implementation
|
|
162
|
+
of [MultiheadAttention](https://github.com/pytorch/pytorch/blob/b31f58de6fa8bbda5353b3c77d9be4914399724d/torch/nn/modules/activation.py#L771)
|
|
163
|
+
.
|
|
164
|
+
- The implementation of [hopfield_core_forward](hflayers/functional.py#L8) is based on the implementation
|
|
165
|
+
of [multi_head_attention_forward](https://github.com/pytorch/pytorch/blob/b31f58de6fa8bbda5353b3c77d9be4914399724d/torch/nn/functional.py#L3854)
|
|
166
|
+
.
|
|
167
|
+
- The implementation of [HopfieldEncoderLayer](hflayers/transformer.py#L12) is based on the implementation
|
|
168
|
+
of [TransformerEncoderLayer](https://github.com/pytorch/pytorch/blob/b31f58de6fa8bbda5353b3c77d9be4914399724d/torch/nn/modules/transformer.py#L241)
|
|
169
|
+
.
|
|
170
|
+
- The implementation of [HopfieldDecoderLayer](hflayers/transformer.py#L101) is based on the implementation
|
|
171
|
+
of [TransformerDecoderLayer](https://github.com/pytorch/pytorch/blob/b31f58de6fa8bbda5353b3c77d9be4914399724d/torch/nn/modules/transformer.py#L303)
|
|
172
|
+
.
|
|
173
|
+
|
|
174
|
+
## License
|
|
175
|
+
|
|
176
|
+
This repository is BSD-style licensed (see [LICENSE](LICENSE)), except where noted otherwise.
|