difflayers 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,79 @@
1
+ From Hopfield layers:
2
+
3
+ Copyright (c) 2020, Institute for Machine Learning, Johannes Kepler University Linz (Bernhard Schäfl)
4
+ All rights reserved.
5
+
6
+ All other contributions:
7
+ Copyright (c) 2020 the respective contributors
8
+ All rights reserved.
9
+
10
+ From PyTorch:
11
+
12
+ Copyright (c) 2016- Facebook, Inc (Adam Paszke)
13
+ Copyright (c) 2014- Facebook, Inc (Soumith Chintala)
14
+ Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
15
+ Copyright (c) 2012-2014 Deepmind Technologies (Koray Kavukcuoglu)
16
+ Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
17
+ Copyright (c) 2011-2013 NYU (Clement Farabet)
18
+ Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
19
+ Copyright (c) 2006 Idiap Research Institute (Samy Bengio)
20
+ Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
21
+
22
+ From Caffe2:
23
+
24
+ Copyright (c) 2016-present, Facebook Inc. All rights reserved.
25
+
26
+ All contributions by Facebook:
27
+ Copyright (c) 2016 Facebook Inc.
28
+
29
+ All contributions by Google:
30
+ Copyright (c) 2015 Google Inc.
31
+ All rights reserved.
32
+
33
+ All contributions by Yangqing Jia:
34
+ Copyright (c) 2015 Yangqing Jia
35
+ All rights reserved.
36
+
37
+ All contributions from Caffe:
38
+ Copyright(c) 2013, 2014, 2015, the respective contributors
39
+ All rights reserved.
40
+
41
+ All other contributions:
42
+ Copyright(c) 2015, 2016 the respective contributors
43
+ All rights reserved.
44
+
45
+ Caffe2 uses a copyright model similar to Caffe: each contributor holds
46
+ copyright over their contributions to Caffe2. The project versioning records
47
+ all such contribution and copyright details. If a contributor wants to further
48
+ mark their specific copyright on a particular contribution, they should
49
+ indicate their copyright solely in the commit message of the change when it is
50
+ committed.
51
+
52
+ All rights reserved.
53
+
54
+ Redistribution and use in source and binary forms, with or without
55
+ modification, are permitted provided that the following conditions are met:
56
+
57
+ 1. Redistributions of source code must retain the above copyright
58
+ notice, this list of conditions and the following disclaimer.
59
+
60
+ 2. Redistributions in binary form must reproduce the above copyright
61
+ notice, this list of conditions and the following disclaimer in the
62
+ documentation and/or other materials provided with the distribution.
63
+
64
+ 3. Neither the names of Facebook, Deepmind Technologies, NYU, NEC Laboratories America
65
+ and IDIAP Research Institute nor the names of its contributors may be
66
+ used to endorse or promote products derived from this software without
67
+ specific prior written permission.
68
+
69
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
70
+ AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
71
+ IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
72
+ ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
73
+ LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
74
+ CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
75
+ SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
76
+ INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
77
+ CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
78
+ ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
79
+ POSSIBILITY OF SUCH DAMAGE.
@@ -0,0 +1,210 @@
1
+ Metadata-Version: 2.4
2
+ Name: difflayers
3
+ Version: 0.1.0
4
+ Summary: difflayers: Diffusion-Augmented Hopfield Networks
5
+ Home-page: https://github.com/hopfileds/hopfield-layers
6
+ Author: Priyam Ghosh
7
+ Author-email: Priyam Ghosh <priyamghosh9753@gmail.com>
8
+ License: BSD
9
+ Project-URL: Homepage, https://github.com/hopfileds/hopfield-layers
10
+ Project-URL: Repository, https://github.com/hopfileds/hopfield-layers
11
+ Project-URL: Bug Tracker, https://github.com/hopfileds/hopfield-layers/issues
12
+ Keywords: hopfield networks,deep learning,attention,diffusion,graph
13
+ Classifier: Development Status :: 3 - Alpha
14
+ Classifier: Intended Audience :: Science/Research
15
+ Classifier: License :: OSI Approved :: BSD License
16
+ Classifier: Programming Language :: Python :: 3
17
+ Classifier: Programming Language :: Python :: 3.8
18
+ Classifier: Programming Language :: Python :: 3.9
19
+ Classifier: Programming Language :: Python :: 3.10
20
+ Classifier: Programming Language :: Python :: 3.11
21
+ Classifier: Programming Language :: Python :: 3.12
22
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
23
+ Classifier: Operating System :: OS Independent
24
+ Requires-Python: >=3.8
25
+ Description-Content-Type: text/markdown
26
+ License-File: LICENSE
27
+ Requires-Dist: torch>=1.9.0
28
+ Requires-Dist: numpy>=1.20.0
29
+ Requires-Dist: scipy>=1.7.0
30
+ Dynamic: author
31
+ Dynamic: home-page
32
+ Dynamic: license-file
33
+ Dynamic: requires-python
34
+
35
+ # Hopfield Networks is All You Need
36
+
37
+ _Hubert Ramsauer<sup>1</sup>, Bernhard Schäfl<sup>1</sup>, Johannes Lehner<sup>1</sup>, Philipp Seidl<sup>1</sup>,
38
+ Michael Widrich<sup>1</sup>, Lukas Gruber<sup>1</sup>, Markus Holzleitner<sup>1</sup>, Milena Pavlović<sup>3, 4</sup>,
39
+ Geir Kjetil Sandve<sup>4</sup>, Victor Greiff<sup>3</sup>, David Kreil<sup>2</sup>, Michael Kopp<sup>2</sup>, Günter
40
+ Klambauer<sup>1</sup>, Johannes Brandstetter<sup>1</sup>, Sepp Hochreiter<sup>1, 2</sup>_
41
+
42
+ <sup>1</sup> ELLIS Unit Linz and LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria
43
+ <sup>2</sup> Institute of Advanced Research in Artificial Intelligence (IARAI)
44
+ <sup>3</sup> Department of Immunology, University of Oslo, Norway
45
+ <sup>4</sup> Department of Informatics, University of Oslo, Norway
46
+
47
+ ---
48
+
49
+ ##### Detailed blog post on this paper as well as the necessary background on Hopfield networks at [this link](https://ml-jku.github.io/hopfield-layers/).
50
+
51
+ ---
52
+
53
+ The transformer and BERT models pushed the performance on NLP tasks to new levels via their attention mechanism. We show
54
+ that this attention mechanism is the update rule of a modern Hopfield network with continuous states. This new Hopfield
55
+ network can store exponentially (with the dimension) many patterns,converges with one update, and has exponentially
56
+ small retrieval errors. The number of stored patterns must be traded off against convergence speed and retrieval error.
57
+ The new Hopfield network has three types of energy minima (fixed points of the update):
58
+
59
+ 1. global fixed point averaging over all patterns,
60
+ 2. metastable states averaging over a subset of patterns, and
61
+ 3. fixed points which store a single pattern.
62
+
63
+ Transformers learn an attention mechanism by constructing an embedding of patterns and queries into an associative
64
+ space. Transformer and BERT models operate in their first layers preferably in the global averaging regime, while they
65
+ operate in higher layers in metastable states. The gradient in transformers is maximal in the regime of metastable
66
+ states, is uniformly distributed when averaging globally, and vanishes when a fixed point is near a stored pattern.
67
+ Based on the Hopfield network interpretation, we analyzed learning of transformer and BERT architectures. Learning
68
+ starts with attention heads that average and then most of them switch to metastable states. However, the majority of
69
+ heads in the first layers still averages and can be replaced by averaging operations like the Gaussian weighting that we
70
+ propose. In contrast, heads in the last layers steadily learn and seem to use metastable states to collect information
71
+ created in lower layers. These heads seem a promising target for improving transformers. Neural networks that integrate
72
+ Hopfield networks that are equivalent to attention heads outperform other methods on immune repertoire classification,
73
+ where the Hopfield net stores several hundreds of thousands of patterns.
74
+
75
+ With _this_ repository, we provide a PyTorch implementation of a new layer called “Hopfield” which allows to equip deep
76
+ learning architectures with Hopfield networks as new memory concepts.
77
+
78
+ The full paper is available at [https://arxiv.org/abs/2008.02217](https://arxiv.org/abs/2008.02217).
79
+
80
+ ## Requirements
81
+
82
+ The software was developed and tested on the following 64-bit operating systems:
83
+
84
+ - CentOS Linux release 8.1.1911 (Core)
85
+ - macOS 10.15.5 (Catalina)
86
+
87
+ As the development environment, [Python](https://www.python.org) 3.8.3 in combination
88
+ with [PyTorch](https://pytorch.org) 1.6.0 was used (a version of at least 1.5.0 should be sufficient). More details on
89
+ how to install PyTorch are available on the [official project page](https://pytorch.org).
90
+
91
+ ## Installation
92
+
93
+ The recommended way to install the software is to use `pip/pip3`:
94
+
95
+ ```bash
96
+ $ pip3 install git+https://github.com/ml-jku/hopfield-layers
97
+ ```
98
+
99
+ To successfully run the [Jupyter notebooks](https://jupyter.org) contained in [examples](examples/), additional
100
+ third-party modules are needed:
101
+
102
+ ```bash
103
+ $ pip3 install -r examples/requirements.txt
104
+ ```
105
+
106
+ The installation of the [Jupyter software](https://jupyter.org/install.html) itself is not covered. More details on how
107
+ to install Jupyter are available at the [official installation page](https://jupyter.org/install.html).
108
+
109
+ ## Usage
110
+
111
+ To get up and running with Hopfield-based networks, only <i>one</i> argument needs to be set, the size (depth) of the
112
+ input.
113
+
114
+ ```python
115
+ from hflayers import Hopfield
116
+
117
+ hopfield = Hopfield(input_size=...)
118
+ ```
119
+
120
+ It is also possible to replace commonly used pooling functions with a Hopfield-based one. Internally, a <i>state
121
+ pattern</i> is trained, which in turn is used to compute pooling weights with respect to the input.
122
+
123
+ ```python
124
+ from hflayers import HopfieldPooling
125
+
126
+ hopfield_pooling = HopfieldPooling(input_size=...)
127
+ ```
128
+
129
+ A second variant of our Hopfield-based modules is one which employs a trainable but fixed lookup mechanism. Internally,
130
+ one or multiple <i>stored patterns</i> and <i>pattern projections</i> are trained (optionally in a non-shared manner),
131
+ which in turn are used as a lookup mechanism independent of the input data.
132
+
133
+ ```python
134
+ from hflayers import HopfieldLayer
135
+
136
+ hopfield_lookup = HopfieldLayer(input_size=...)
137
+ ```
138
+
139
+ The usage is as <i>simple</i> as with the main module, but equally <i>powerful</i>.
140
+
141
+ ## Examples
142
+
143
+ Generally, the Hopfield layer is designed to be used to implement or to substitute different layers like:
144
+
145
+ - <b>Pooling layers:</b> We consider the Hopfield layer as a pooling layer if only one static state (query) pattern
146
+ exists. Then, it is de facto a pooling over the sequence, which results from the softmax values applied on the stored
147
+ patterns. Therefore, our Hopfield layer can act as a pooling layer.
148
+
149
+ - <b>Permutation equivariant layers:</b> Our Hopfield layer can be used as a plug-in replacement for permutation
150
+ equivariant layers. Since the Hopfield layer is an associative memory it assumes no dependency between the input
151
+ patterns.
152
+
153
+ - <b>GRU & LSTM layers:</b> Our Hopfield layer can be used as a plug-in replacement for GRU & LSTM layers. Optionally,
154
+ for substituting GRU & LSTM layers, positional encoding might be considered.
155
+
156
+ - <b>Attention layers:</b> Our Hopfield layer can act as an attention layer, where state (query) and stored (key)
157
+ patterns are different, and need to be associated.
158
+
159
+ The folder [examples](examples/) contains multiple demonstrations on how to use the <code>Hopfield</code>, <code>
160
+ HopfieldPooling</code> as well as the <code>HopfieldLayer</code> modules. To successfully run the
161
+ contained [Jupyter notebooks](https://jupyter.org), additional third-party modules
162
+ like [pandas](https://pandas.pydata.org) and [seaborn](https://seaborn.pydata.org) are required.
163
+
164
+ - [Bit Pattern Set](examples/bit_pattern/bit_pattern_demo.ipynb): The dataset of this demonstration falls into the
165
+ category of <i>binary classification</i> tasks in the domain of <i>Multiple Instance Learning (MIL)</i> problems. Each
166
+ bag comprises a collection of bit pattern instances, wheres each instance is a sequence of <b>0s</b> and <b>1s</b>.
167
+ The positive class has specific bit patterns injected, which are absent in the negative one. This demonstration shows,
168
+ that <code>Hopfield</code>, <code>HopfieldPooling</code> and <code>HopfieldLayer</code> are capable of learning and
169
+ filtering each bag with respect to the class-defining bit patterns.
170
+
171
+ - [Latch Sequence Set](examples/latch_sequence/latch_sequence_demo.ipynb): We study an easy example of learning
172
+ long-term dependencies by using a simple <i>latch task</i>,
173
+ see [Hochreiter and Mozer](https://link.springer.com/chapter/10.1007/3-540-44668-0_92). The essence of this task is
174
+ that a sequence of inputs is presented, beginning with one of two symbols, <b>A</b> or <b>B</b>, and after a variable
175
+ number of time steps, the model has to output a corresponding symbol. Thus, the task requires memorizing the original
176
+ input over time. It has to be noted, that both class-defining symbols must only appear at the first position of a
177
+ sequence. This task was specifically designed to demonstrate the capability of recurrent neural networks to capture
178
+ long term dependencies. This demonstration shows, that <code>Hopfield</code>, <code>HopfieldPooling</code> and <code>
179
+ HopfieldLayer</code> adapt extremely fast to this specific task, concentrating only on the first entry of the
180
+ sequence.
181
+
182
+ - [Attention-based Deep Multiple Instance Learning](examples/mnist_bags/mnist_bags_demo.ipynb): The dataset of this
183
+ demonstration falls into the category of <i>binary classification</i> tasks in the domain of <i>Multiple Instance
184
+ Learning (MIL)</i> problems, see [Ilse and Tomczak](https://arxiv.org/abs/1802.04712). Each bag comprises a collection
185
+ of <b>28x28</b> grayscale images/instances, whereas each instance is a sequence of pixel values in the range
186
+ of <b>[0; 255]</b>. The amount of instances per pag is drawn from a Gaussian with specified mean and variance. The
187
+ positive class is defined by the presence of the target number/digit, whereas the negative one by its absence.
188
+
189
+ ## Disclaimer
190
+
191
+ Some implementations of this repository are based on existing ones of the
192
+ official [PyTorch repository v1.6.0](https://github.com/pytorch/pytorch/tree/v1.6.0) and accordingly extended and
193
+ modified. In the following, the involved parts are listed:
194
+
195
+ - The implementation of [HopfieldCore](hflayers/activation.py#L16) is based on the implementation
196
+ of [MultiheadAttention](https://github.com/pytorch/pytorch/blob/b31f58de6fa8bbda5353b3c77d9be4914399724d/torch/nn/modules/activation.py#L771)
197
+ .
198
+ - The implementation of [hopfield_core_forward](hflayers/functional.py#L8) is based on the implementation
199
+ of [multi_head_attention_forward](https://github.com/pytorch/pytorch/blob/b31f58de6fa8bbda5353b3c77d9be4914399724d/torch/nn/functional.py#L3854)
200
+ .
201
+ - The implementation of [HopfieldEncoderLayer](hflayers/transformer.py#L12) is based on the implementation
202
+ of [TransformerEncoderLayer](https://github.com/pytorch/pytorch/blob/b31f58de6fa8bbda5353b3c77d9be4914399724d/torch/nn/modules/transformer.py#L241)
203
+ .
204
+ - The implementation of [HopfieldDecoderLayer](hflayers/transformer.py#L101) is based on the implementation
205
+ of [TransformerDecoderLayer](https://github.com/pytorch/pytorch/blob/b31f58de6fa8bbda5353b3c77d9be4914399724d/torch/nn/modules/transformer.py#L303)
206
+ .
207
+
208
+ ## License
209
+
210
+ This repository is BSD-style licensed (see [LICENSE](LICENSE)), except where noted otherwise.
@@ -0,0 +1,176 @@
1
+ # Hopfield Networks is All You Need
2
+
3
+ _Hubert Ramsauer<sup>1</sup>, Bernhard Schäfl<sup>1</sup>, Johannes Lehner<sup>1</sup>, Philipp Seidl<sup>1</sup>,
4
+ Michael Widrich<sup>1</sup>, Lukas Gruber<sup>1</sup>, Markus Holzleitner<sup>1</sup>, Milena Pavlović<sup>3, 4</sup>,
5
+ Geir Kjetil Sandve<sup>4</sup>, Victor Greiff<sup>3</sup>, David Kreil<sup>2</sup>, Michael Kopp<sup>2</sup>, Günter
6
+ Klambauer<sup>1</sup>, Johannes Brandstetter<sup>1</sup>, Sepp Hochreiter<sup>1, 2</sup>_
7
+
8
+ <sup>1</sup> ELLIS Unit Linz and LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria
9
+ <sup>2</sup> Institute of Advanced Research in Artificial Intelligence (IARAI)
10
+ <sup>3</sup> Department of Immunology, University of Oslo, Norway
11
+ <sup>4</sup> Department of Informatics, University of Oslo, Norway
12
+
13
+ ---
14
+
15
+ ##### Detailed blog post on this paper as well as the necessary background on Hopfield networks at [this link](https://ml-jku.github.io/hopfield-layers/).
16
+
17
+ ---
18
+
19
+ The transformer and BERT models pushed the performance on NLP tasks to new levels via their attention mechanism. We show
20
+ that this attention mechanism is the update rule of a modern Hopfield network with continuous states. This new Hopfield
21
+ network can store exponentially (with the dimension) many patterns,converges with one update, and has exponentially
22
+ small retrieval errors. The number of stored patterns must be traded off against convergence speed and retrieval error.
23
+ The new Hopfield network has three types of energy minima (fixed points of the update):
24
+
25
+ 1. global fixed point averaging over all patterns,
26
+ 2. metastable states averaging over a subset of patterns, and
27
+ 3. fixed points which store a single pattern.
28
+
29
+ Transformers learn an attention mechanism by constructing an embedding of patterns and queries into an associative
30
+ space. Transformer and BERT models operate in their first layers preferably in the global averaging regime, while they
31
+ operate in higher layers in metastable states. The gradient in transformers is maximal in the regime of metastable
32
+ states, is uniformly distributed when averaging globally, and vanishes when a fixed point is near a stored pattern.
33
+ Based on the Hopfield network interpretation, we analyzed learning of transformer and BERT architectures. Learning
34
+ starts with attention heads that average and then most of them switch to metastable states. However, the majority of
35
+ heads in the first layers still averages and can be replaced by averaging operations like the Gaussian weighting that we
36
+ propose. In contrast, heads in the last layers steadily learn and seem to use metastable states to collect information
37
+ created in lower layers. These heads seem a promising target for improving transformers. Neural networks that integrate
38
+ Hopfield networks that are equivalent to attention heads outperform other methods on immune repertoire classification,
39
+ where the Hopfield net stores several hundreds of thousands of patterns.
40
+
41
+ With _this_ repository, we provide a PyTorch implementation of a new layer called “Hopfield” which allows to equip deep
42
+ learning architectures with Hopfield networks as new memory concepts.
43
+
44
+ The full paper is available at [https://arxiv.org/abs/2008.02217](https://arxiv.org/abs/2008.02217).
45
+
46
+ ## Requirements
47
+
48
+ The software was developed and tested on the following 64-bit operating systems:
49
+
50
+ - CentOS Linux release 8.1.1911 (Core)
51
+ - macOS 10.15.5 (Catalina)
52
+
53
+ As the development environment, [Python](https://www.python.org) 3.8.3 in combination
54
+ with [PyTorch](https://pytorch.org) 1.6.0 was used (a version of at least 1.5.0 should be sufficient). More details on
55
+ how to install PyTorch are available on the [official project page](https://pytorch.org).
56
+
57
+ ## Installation
58
+
59
+ The recommended way to install the software is to use `pip/pip3`:
60
+
61
+ ```bash
62
+ $ pip3 install git+https://github.com/ml-jku/hopfield-layers
63
+ ```
64
+
65
+ To successfully run the [Jupyter notebooks](https://jupyter.org) contained in [examples](examples/), additional
66
+ third-party modules are needed:
67
+
68
+ ```bash
69
+ $ pip3 install -r examples/requirements.txt
70
+ ```
71
+
72
+ The installation of the [Jupyter software](https://jupyter.org/install.html) itself is not covered. More details on how
73
+ to install Jupyter are available at the [official installation page](https://jupyter.org/install.html).
74
+
75
+ ## Usage
76
+
77
+ To get up and running with Hopfield-based networks, only <i>one</i> argument needs to be set, the size (depth) of the
78
+ input.
79
+
80
+ ```python
81
+ from hflayers import Hopfield
82
+
83
+ hopfield = Hopfield(input_size=...)
84
+ ```
85
+
86
+ It is also possible to replace commonly used pooling functions with a Hopfield-based one. Internally, a <i>state
87
+ pattern</i> is trained, which in turn is used to compute pooling weights with respect to the input.
88
+
89
+ ```python
90
+ from hflayers import HopfieldPooling
91
+
92
+ hopfield_pooling = HopfieldPooling(input_size=...)
93
+ ```
94
+
95
+ A second variant of our Hopfield-based modules is one which employs a trainable but fixed lookup mechanism. Internally,
96
+ one or multiple <i>stored patterns</i> and <i>pattern projections</i> are trained (optionally in a non-shared manner),
97
+ which in turn are used as a lookup mechanism independent of the input data.
98
+
99
+ ```python
100
+ from hflayers import HopfieldLayer
101
+
102
+ hopfield_lookup = HopfieldLayer(input_size=...)
103
+ ```
104
+
105
+ The usage is as <i>simple</i> as with the main module, but equally <i>powerful</i>.
106
+
107
+ ## Examples
108
+
109
+ Generally, the Hopfield layer is designed to be used to implement or to substitute different layers like:
110
+
111
+ - <b>Pooling layers:</b> We consider the Hopfield layer as a pooling layer if only one static state (query) pattern
112
+ exists. Then, it is de facto a pooling over the sequence, which results from the softmax values applied on the stored
113
+ patterns. Therefore, our Hopfield layer can act as a pooling layer.
114
+
115
+ - <b>Permutation equivariant layers:</b> Our Hopfield layer can be used as a plug-in replacement for permutation
116
+ equivariant layers. Since the Hopfield layer is an associative memory it assumes no dependency between the input
117
+ patterns.
118
+
119
+ - <b>GRU & LSTM layers:</b> Our Hopfield layer can be used as a plug-in replacement for GRU & LSTM layers. Optionally,
120
+ for substituting GRU & LSTM layers, positional encoding might be considered.
121
+
122
+ - <b>Attention layers:</b> Our Hopfield layer can act as an attention layer, where state (query) and stored (key)
123
+ patterns are different, and need to be associated.
124
+
125
+ The folder [examples](examples/) contains multiple demonstrations on how to use the <code>Hopfield</code>, <code>
126
+ HopfieldPooling</code> as well as the <code>HopfieldLayer</code> modules. To successfully run the
127
+ contained [Jupyter notebooks](https://jupyter.org), additional third-party modules
128
+ like [pandas](https://pandas.pydata.org) and [seaborn](https://seaborn.pydata.org) are required.
129
+
130
+ - [Bit Pattern Set](examples/bit_pattern/bit_pattern_demo.ipynb): The dataset of this demonstration falls into the
131
+ category of <i>binary classification</i> tasks in the domain of <i>Multiple Instance Learning (MIL)</i> problems. Each
132
+ bag comprises a collection of bit pattern instances, wheres each instance is a sequence of <b>0s</b> and <b>1s</b>.
133
+ The positive class has specific bit patterns injected, which are absent in the negative one. This demonstration shows,
134
+ that <code>Hopfield</code>, <code>HopfieldPooling</code> and <code>HopfieldLayer</code> are capable of learning and
135
+ filtering each bag with respect to the class-defining bit patterns.
136
+
137
+ - [Latch Sequence Set](examples/latch_sequence/latch_sequence_demo.ipynb): We study an easy example of learning
138
+ long-term dependencies by using a simple <i>latch task</i>,
139
+ see [Hochreiter and Mozer](https://link.springer.com/chapter/10.1007/3-540-44668-0_92). The essence of this task is
140
+ that a sequence of inputs is presented, beginning with one of two symbols, <b>A</b> or <b>B</b>, and after a variable
141
+ number of time steps, the model has to output a corresponding symbol. Thus, the task requires memorizing the original
142
+ input over time. It has to be noted, that both class-defining symbols must only appear at the first position of a
143
+ sequence. This task was specifically designed to demonstrate the capability of recurrent neural networks to capture
144
+ long term dependencies. This demonstration shows, that <code>Hopfield</code>, <code>HopfieldPooling</code> and <code>
145
+ HopfieldLayer</code> adapt extremely fast to this specific task, concentrating only on the first entry of the
146
+ sequence.
147
+
148
+ - [Attention-based Deep Multiple Instance Learning](examples/mnist_bags/mnist_bags_demo.ipynb): The dataset of this
149
+ demonstration falls into the category of <i>binary classification</i> tasks in the domain of <i>Multiple Instance
150
+ Learning (MIL)</i> problems, see [Ilse and Tomczak](https://arxiv.org/abs/1802.04712). Each bag comprises a collection
151
+ of <b>28x28</b> grayscale images/instances, whereas each instance is a sequence of pixel values in the range
152
+ of <b>[0; 255]</b>. The amount of instances per pag is drawn from a Gaussian with specified mean and variance. The
153
+ positive class is defined by the presence of the target number/digit, whereas the negative one by its absence.
154
+
155
+ ## Disclaimer
156
+
157
+ Some implementations of this repository are based on existing ones of the
158
+ official [PyTorch repository v1.6.0](https://github.com/pytorch/pytorch/tree/v1.6.0) and accordingly extended and
159
+ modified. In the following, the involved parts are listed:
160
+
161
+ - The implementation of [HopfieldCore](hflayers/activation.py#L16) is based on the implementation
162
+ of [MultiheadAttention](https://github.com/pytorch/pytorch/blob/b31f58de6fa8bbda5353b3c77d9be4914399724d/torch/nn/modules/activation.py#L771)
163
+ .
164
+ - The implementation of [hopfield_core_forward](hflayers/functional.py#L8) is based on the implementation
165
+ of [multi_head_attention_forward](https://github.com/pytorch/pytorch/blob/b31f58de6fa8bbda5353b3c77d9be4914399724d/torch/nn/functional.py#L3854)
166
+ .
167
+ - The implementation of [HopfieldEncoderLayer](hflayers/transformer.py#L12) is based on the implementation
168
+ of [TransformerEncoderLayer](https://github.com/pytorch/pytorch/blob/b31f58de6fa8bbda5353b3c77d9be4914399724d/torch/nn/modules/transformer.py#L241)
169
+ .
170
+ - The implementation of [HopfieldDecoderLayer](hflayers/transformer.py#L101) is based on the implementation
171
+ of [TransformerDecoderLayer](https://github.com/pytorch/pytorch/blob/b31f58de6fa8bbda5353b3c77d9be4914399724d/torch/nn/modules/transformer.py#L303)
172
+ .
173
+
174
+ ## License
175
+
176
+ This repository is BSD-style licensed (see [LICENSE](LICENSE)), except where noted otherwise.