HDLRuby 2.11.12 → 3.0.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (86) hide show
  1. checksums.yaml +4 -4
  2. data/README.html +3274 -0
  3. data/README.md +556 -84
  4. data/ext/hruby_sim/hruby_sim_calc.c +2 -0
  5. data/lib/HDLRuby/backend/hruby_allocator.rb +2 -2
  6. data/lib/HDLRuby/backend/hruby_c_allocator.rb +7 -7
  7. data/lib/HDLRuby/hdr_samples/mei8_bench.rb +1 -1
  8. data/lib/HDLRuby/hdr_samples/with_bram.rb +3 -3
  9. data/lib/HDLRuby/hdr_samples/with_bram_frame_stack.rb +105 -0
  10. data/lib/HDLRuby/hdr_samples/with_bram_stack.rb +69 -0
  11. data/lib/HDLRuby/hdr_samples/with_register_stack.rb +150 -0
  12. data/lib/HDLRuby/hdr_samples/with_sequencer.rb +190 -0
  13. data/lib/HDLRuby/hdr_samples/with_sequencer_deep.rb +91 -0
  14. data/lib/HDLRuby/hdr_samples/with_sequencer_enumerable.rb +405 -0
  15. data/lib/HDLRuby/hdr_samples/with_sequencer_enumerator.rb +89 -0
  16. data/lib/HDLRuby/hdr_samples/with_sequencer_sync.rb +120 -0
  17. data/lib/HDLRuby/hdrcc.rb +15 -2
  18. data/lib/HDLRuby/hdrlib.rb +1 -1
  19. data/lib/HDLRuby/hruby_db.rb +2 -2
  20. data/lib/HDLRuby/hruby_high.rb +38 -20
  21. data/lib/HDLRuby/hruby_high_fullname.rb +3 -1
  22. data/lib/HDLRuby/hruby_low.rb +2 -2
  23. data/lib/HDLRuby/hruby_low2c.rb +58 -43
  24. data/lib/HDLRuby/hruby_low2hdr.rb +66 -40
  25. data/lib/HDLRuby/hruby_low2high.rb +86 -44
  26. data/lib/HDLRuby/hruby_low2seq.rb +26 -18
  27. data/lib/HDLRuby/hruby_low2sym.rb +14 -13
  28. data/lib/HDLRuby/hruby_low2vhd.rb +78 -43
  29. data/lib/HDLRuby/hruby_low_bool2select.rb +61 -46
  30. data/lib/HDLRuby/hruby_low_casts_without_expression.rb +56 -44
  31. data/lib/HDLRuby/hruby_low_cleanup.rb +18 -16
  32. data/lib/HDLRuby/hruby_low_fix_types.rb +64 -32
  33. data/lib/HDLRuby/hruby_low_mutable.rb +53 -118
  34. data/lib/HDLRuby/hruby_low_resolve.rb +26 -31
  35. data/lib/HDLRuby/hruby_low_with_bool.rb +33 -16
  36. data/lib/HDLRuby/hruby_low_with_port.rb +3 -3
  37. data/lib/HDLRuby/hruby_low_with_var.rb +23 -9
  38. data/lib/HDLRuby/hruby_low_without_concat.rb +19 -13
  39. data/lib/HDLRuby/hruby_low_without_namespace.rb +47 -32
  40. data/lib/HDLRuby/hruby_low_without_parinseq.rb +18 -12
  41. data/lib/HDLRuby/hruby_low_without_select.rb +36 -23
  42. data/lib/HDLRuby/hruby_low_without_subsignals.rb +29 -28
  43. data/lib/HDLRuby/hruby_rcsim.rb +79 -64
  44. data/lib/HDLRuby/hruby_rsim.rb +64 -15
  45. data/lib/HDLRuby/hruby_rsim_mute.rb +2 -3
  46. data/lib/HDLRuby/hruby_rsim_vcd.rb +28 -25
  47. data/lib/HDLRuby/hruby_values.rb +13 -2
  48. data/lib/HDLRuby/hruby_verilog.rb +90 -48
  49. data/lib/HDLRuby/soft/stacks.rb +219 -0
  50. data/lib/HDLRuby/std/bram.rb +9 -5
  51. data/lib/HDLRuby/std/clocks.rb +1 -1
  52. data/lib/HDLRuby/std/fsm.rb +29 -9
  53. data/lib/HDLRuby/std/sequencer.rb +1857 -0
  54. data/lib/HDLRuby/std/sequencer_sync.rb +400 -0
  55. data/lib/HDLRuby/std/std.rb +12 -0
  56. data/lib/HDLRuby/version.rb +1 -1
  57. data/tuto/adder_sat_flags_vcd.png +0 -0
  58. data/tuto/addsub_vcd.png +0 -0
  59. data/tuto/alu_vcd.png +0 -0
  60. data/tuto/bit_pong_vcd.png +0 -0
  61. data/tuto/checksum_vcd.png +0 -0
  62. data/tuto/circuit_hdr.odg +0 -0
  63. data/tuto/circuit_hdr.png +0 -0
  64. data/tuto/circuit_hie.odg +0 -0
  65. data/tuto/circuit_hie.png +0 -0
  66. data/tuto/circuit_view.odg +0 -0
  67. data/tuto/circuit_view.png +0 -0
  68. data/tuto/clock_counter_vcd.png +0 -0
  69. data/tuto/counter_ext_vcd.png +0 -0
  70. data/tuto/fact_vcd.png +0 -0
  71. data/tuto/hw_flow.odg +0 -0
  72. data/tuto/hw_flow.png +0 -0
  73. data/tuto/maxxer_vcd.png +0 -0
  74. data/tuto/pingpong0_vcd.png +0 -0
  75. data/tuto/pingpong1_vcd.png +0 -0
  76. data/tuto/pingpong2_vcd.png +0 -0
  77. data/tuto/ram_vcd.png +0 -0
  78. data/tuto/serializer_vcd.png +0 -0
  79. data/tuto/sw_flow.odg +0 -0
  80. data/tuto/sw_flow.png +0 -0
  81. data/tuto/the_counter_vcd.png +0 -0
  82. data/tuto/tutorial_sw.html +2359 -0
  83. data/tuto/tutorial_sw.md +2684 -0
  84. data/tuto/tutorial_sw.pdf +0 -0
  85. data/tuto/tutorial_sw_jp.md +417 -0
  86. metadata +44 -2
@@ -0,0 +1,2684 @@
1
+ # HDLRuby Tutorial for Software People
2
+
3
+ In this tutorial, you will learn the basics about the description of digital circuits using HDLRuby from the software point of view. In detail you will learn:
4
+
5
+ 1. [What is HDLRuby and how to use its framework.](#1-what-is-hdlruby-and-how-to-use-its-framework)
6
+
7
+ 2. [How to represent a circuit.](#2-how-to-represent-a-circuit-in-hdlruby)
8
+
9
+ 3. [How to describe an algorithm to be implemented by a circuit.](#3-how-to-describe-an-algorithm-to-be-implemented-by-a-circuit)
10
+
11
+ 4. [How to add parallelism to your algorithms.](#4-how-to-add-parallelism-to-your-algorithms)
12
+
13
+ Then, the following section will introduce advanced concepts about hardware design and HDLruby:
14
+
15
+ 5. [Toward lower level hardware design: the processes.](#5-toward-lower-level-hardware-design-the-processes)
16
+
17
+ 6. [Not enough? What about genericity, object orientation, metaprogramming, and reflection?](#6-not-enough-what-about-genericity-object-orientation-metaprogramming-and-reflection)
18
+
19
+ Within these topics, you will also have an explanation of how the following high-level concepts can be used in HDLRuby:
20
+
21
+ * Object-oriented programming
22
+
23
+ * Reflection
24
+
25
+ * Genericity
26
+
27
+ * Metaprogramming
28
+
29
+ But, before going further, here are a few...
30
+
31
+ ## Prerequisites
32
+
33
+ Since this tutorial is aimed at software people, it is assumed you have a good understanding of programming and the relevant tools (e.g., editor, compiler). However, it is not assumed that you have any knowledge about digital hardware design. Otherwise, knowing the Ruby programming language is recommended but not mandatory.
34
+
35
+ To use HDLRuby the following software is required:
36
+
37
+ * A distribution of the Ruby language.
38
+
39
+ * A text editor. If you like syntax highlighting or other fancy features, please choose one supporting Ruby.
40
+
41
+ * A command line interface (e.g., command prompt, terminal emulator.)
42
+
43
+ The following software is also recommended:
44
+
45
+ * A wave viewer supporting *vcd* files (e.g., [GTKWave](https://gtkwave.sourceforge.net/).)
46
+
47
+ ## 1. What is HDLRuby and how to use its framework
48
+
49
+ HDLRuby is a hardware description language (HDL) based on the Ruby programming language. It is implemented as a Ruby library so that, by construction, any Ruby code can be used and executed within the HDLRuby description.
50
+
51
+ Before going further, let us briefly explain what is a [HDL](#11-what-is-a-hardware-description-language-hdl). Then, more details will be given about how to [install HDLRuby](#12-installing-hdlruby) and how to [use it](#13-using-hdlruby).
52
+
53
+ ### 1.1. What is a hardware description language (HDL)
54
+
55
+ A hardware description language (HDL) is a formal language that is used for describing electronic circuits. Such circuits can be divided into two categories: analog circuits and digital circuits. While there exists HDL for describing the first category of circuits, a large majority of them only support the second one. Hence, in practice, HDL usually means language for describing digital circuits only. Among the multiple HDL, two became de-facto standards: Verilog HDL and VHDL.
56
+
57
+ Nowadays, there exist powerful software tools that can automatically produce circuits from HDL descriptions. However, like in software, there may be errors in the descriptions, or they may be sub-optimal, so that the final circuit does not work, or does not meet some constraints. Unfortunately, in hardware, producing a circuit is very time-consuming and expensive so contrary to software, errors or low-performance results must be avoided as early as possible. This is why it is common in hardware to **simulate** your circuit description before starting to produce it. In other words, while in software it is common to perform the following loop:
58
+
59
+ <p align="center">
60
+ <img src="sw_flow.png" width="50%">
61
+ </p>
62
+
63
+
64
+ In hardware, the design loop is more like as follows:
65
+
66
+ <p align="center">
67
+ <img src="hw_flow.png" width="50%">
68
+ </p>
69
+
70
+
71
+ At first, an HDL has the same look and feel compared to classical programming languages like C or Java: it includes expressions, control statements, and kinds of variables. However, the underlining model of computation is quite different, especially because circuits are inherently parallel devices. This will be explained progressively during this tutorial, but for now, it is enough to remember the following:
72
+
73
+ * HDL are used for describing digital circuits and the most common ones are Verilog HDL and VHDL.
74
+
75
+ * Using software tools called synthesis tools, you can produce real digital circuits from HDL description. But before that, you will need to simulate your description.
76
+
77
+ * An HDL looks like a common programming language but does not work the same.
78
+
79
+
80
+ #### And what about HDLRuby?
81
+
82
+ Well, HDLRuby is an HDL for describing digital circuits like Verilog HDL or VHDL but aiming at being more flexible and productive than them by relying on many concepts inherited from the Ruby programming language. So everything said earlier about HDL applies to HDLRuby, but we try to make it much easier for the designers.
83
+
84
+
85
+ ### 1.2. Installing HDLRuby
86
+
87
+ HDLRuby is distributed as a RubyGems package. It can therefore be installed using the following command:
88
+
89
+ ```bash
90
+ gem install HDLRuby
91
+ ```
92
+
93
+ If everything is OK, the following should be displayed:
94
+
95
+ ```
96
+ Fetching HDLRuby-<version>.gem
97
+ Building native extensions. This could take a while...
98
+ Successfully installed HDLRuby-<version>
99
+ Parsing documentation for HDLRuby-<version>
100
+ Done installing documentation for HDLRuby after <x> seconds.
101
+ ```
102
+
103
+ The `version` number should be the latest version of HDLRuby.
104
+
105
+ You can then check if HDLRuby is properly installed by using the following command:
106
+
107
+ ```bash
108
+ hdrcc --version
109
+ ```
110
+
111
+ And the result should be:
112
+
113
+ ```
114
+ <version>
115
+ ```
116
+
117
+ If the resulting `version` number does not match the install number, there may have been a problem somewhere. It may be a good idea to close the terminal or command prompt and try again the command in a new one.
118
+
119
+
120
+
121
+ ### 1.3. Using HDLRuby
122
+
123
+ Up to now, we said that HDLRuby is a language, it is in truth a complete framework for designing and simulating digital circuits. It includes several compilers, simulators, and libraries that are all accessible through a single command: `hdrcc`.
124
+
125
+ Basically, `hdrcc` is used as follows:
126
+
127
+ ```bash
128
+ hdrcc <options> <input file> <output directory>
129
+ ```
130
+
131
+ Where `options` specifies the action to be performed, `input file` specifies the input HDLRuby file, and `output directory` specifies the directory where the command results will be saved. As a general rule, when an input file is specified, an output directory must also be specified.
132
+
133
+ Several actions are possible using `hdrcc`, the main ones being the followings:
134
+
135
+ * Simulate a circuit description:
136
+
137
+ ```bash
138
+ hdrcc --sim <input file> <output directory>
139
+ ```
140
+
141
+ * Generate the equivalent Verilog HDL code:
142
+
143
+ ```bash
144
+ hdrcc --verilog <input file> <output directory>
145
+ ```
146
+
147
+ This second action is necessary if you want to produce a real circuit since HDLRuby is not yet supported by synthesis tools.
148
+
149
+ __Note__: VHDL generation is also possible using the following command.
150
+
151
+ ```bash
152
+ hdrcc --vhdl <input file> <output directory>
153
+ ```
154
+
155
+ And that's it! For details about all the actions that can be performed, how to write an input file, and what kind of output can be produced, let us see the remaining of the tutorial.
156
+
157
+
158
+ ## 2. How to represent a circuit in HDLRuby
159
+
160
+ In this section we will see:
161
+
162
+ * [How to declare a circuit.](#21-declaring-a-circuit)
163
+
164
+ * [How to reuse a circuit already declared.](#22-how-to-reuse-a-circuit-already-declared)
165
+
166
+ ### 2.1. Declaring a circuit
167
+
168
+ In HDLRuby as well as in other HDL, a circuit is viewed as a box that communicates with its environment with ports. The following charts illustrate such a view with a circuit including 5 ports:
169
+
170
+ <p align="center">
171
+ <img src="circuit_view.png" width="35%">
172
+ </p>
173
+
174
+
175
+ Ports are not simple entry points though, because they also have a data type and a direction that can be one of the following:
176
+
177
+ * `input`: such a port can be used for transmitting data from the outside of the circuit to its inside.
178
+
179
+ * `output`: such a port can only be used for transmitting data from the inside of the circuit to its outside.
180
+
181
+ * `inout`: such a port can only be used both as an `input` port and as an `output` port. Due to physical limitations, these ports are difficult to use though, and will therefore be ignored until the [advance part of this tutorial](#tri-state).
182
+
183
+
184
+ #### That's all very well, but when do I write HDLRuby code?
185
+
186
+ With that in mind, declaring a circuit consists in specifying its name and its ports. In HDLRuby this is done as follows:
187
+
188
+ ```ruby
189
+ system :my_circuit do
190
+ input :clk, :rst
191
+ [16].input :addr
192
+ input :ce
193
+ [8].input :data_in
194
+ [8].output :data_out
195
+ end
196
+ ```
197
+
198
+ So please write the code above in a file called (for example) `my_circuit.rb`, and let us explain its meaning.
199
+
200
+ * In the first line, the keyword `system` indicates a new circuit will be described. Its name is given after the colon, `my_circuit` here.
201
+
202
+ * The `do` &rarr; `end` block contains the description of the circuit. Here, only the ports are specified as follows:
203
+
204
+ - On the second line, `input` specifies two one-bit input ports named respectively `clk` and `rst`.
205
+
206
+ - The third line specifies a 16-bit input port named `addr`.
207
+
208
+ - The fourth line specifies a one-bit input port named `ce`.
209
+
210
+ - The fifth line specifies an eight-bit input port named `data_in`.
211
+
212
+ - The sixth line specifies an 8-bit output port named `data_out`.
213
+
214
+
215
+ And here is a diagram of this circuit:
216
+
217
+ <p align="center">
218
+ <img src="circuit_hdr.png" width="50%">
219
+ </p>
220
+
221
+
222
+ To sum up things:
223
+
224
+ * `system` declares a new circuit description.
225
+
226
+ * `input` specifies one or several input ports, `output` one or several output ports, and `inout` one or several input-output ports.
227
+
228
+ * The data type of the port is given before the direction as follows:
229
+
230
+ ```ruby
231
+ <type>.input <list of ports names>
232
+ ```
233
+
234
+ We will give later more details about data types in HDLRuby.
235
+
236
+
237
+ Now let us see if your circuit description is all right with the following command:
238
+
239
+ ```bash
240
+ hdrcc my_circuit.rb work
241
+ ```
242
+
243
+ ...Nothing happened? Great, that means that there was no syntax error in your description. Now let us try something else:
244
+
245
+ ```bash
246
+ hdrcc --hdr my_circuit.rb work
247
+ ```
248
+
249
+ If everything was OK, a file named `my_circuit.rb` should have appeared in the `work` directory. Open it with a text editor, its content should be the following:
250
+
251
+ ```ruby
252
+ system :"__:T:0:1" do
253
+ bit. input :clk
254
+ bit.input :rst
255
+ bit[15..0].input :addr
256
+ bit.input :ce
257
+ bit[7..0].input :data_in
258
+ bit[7..0].output :data_out
259
+
260
+ end
261
+ ```
262
+
263
+ It looks somewhat similar to the code you have just written. This is because it is the internal representation (IR) of your circuit in HDLRuby. You can see that the name of the circuit changed to some weird character string and that the data types also changed. The weird string is for avoiding name clashes, so you do not need to be concerned about it. The data types are low-level representations of the same data types that were used in the initial file. Still, this low-level representation is very close to the original one, but that will be less and less the case as the features are added to the circuit.
264
+
265
+ Now, out of curiosity, how will look the equivalent Verilog HDL code? For checking that just type the following command:
266
+
267
+ ```bash
268
+ hdrcc --verilog my_circuit.rb work
269
+ ```
270
+
271
+ If everything was OK, a file named `my_circuit.v` should have appeared in the `work` directory. Open it with a text editor, its content should be the following:
272
+
273
+ ```verilog
274
+ `timescale 1ps/1ps
275
+
276
+ module _v0_1( _v1_clk, _v2_rst, _v3_addr, _v4_data_in, _v5_data_out );
277
+ input _v1_clk;
278
+ input _v2_rst;
279
+ input [15:0] _v3_addr;
280
+ input _v4_ce;
281
+ input [7:0] _v5_data_in;
282
+ output [7:0] _v6_data_out;
283
+
284
+
285
+ endmodule
286
+ ```
287
+
288
+ The syntax looks indeed a little bit different from HDLRuby, but you should be able to recognize the description of the circuit. The name of the ports is different though, this is because HDLRuby supports any Unicode character for names and to avoid compatibility problems, it recreates the names when generating Verilog. Still, an effort is made to keep the original name, e.g., `clk` became `_v1_clk`. But, just for the fun, please replace `:addr` in the HDLRuby file with `:☺` and regenerate Verilog HDL from it... It works! And the result is:
289
+
290
+ ```verilog
291
+ `timescale 1ps/1ps
292
+
293
+ module _v0_1( _v1_clk, _v2_rst, _v3_, _v4_data_in, _v5_data_out );
294
+ input _v1_clk;
295
+ input _v2_rst;
296
+ input [15:0] _v3_;
297
+ input _v4_ce;
298
+ input [7:0] _v5_data_in;
299
+ output [7:0] _v6_data_out;
300
+
301
+
302
+ endmodule
303
+ ```
304
+
305
+ Unfortunately, there is no more smiling face. This is because Verilog HDL only supports a subset of ASCII for names. But even without smiling, the code is valid because the HDLRuby framework did recreate Verilog HDL-friendly names.
306
+
307
+
308
+ ### 2.2. How to reuse a circuit already declared
309
+
310
+ Like with functions in software, a circuit is often used as part of one or several larger circuits. Contrary to the software, however, the circuit must be physically copied for being reused. This copy is called an *instance* and the act of copying an *instantiation*. In HDLRuby, an instantiation is done as follows:
311
+
312
+ ```ruby
313
+ <circuit name>(:<copy name>)
314
+ ```
315
+
316
+ For example, if you what to use copies of the previously defined circuit `my_circuit` in a new circuit called `another_circuit` you can do as follows:
317
+
318
+ ```ruby
319
+ system :another_circuit do
320
+ input :clk, :rst
321
+ [16].input :addr
322
+ input :ce0, :ce1
323
+ [8].input :data_in
324
+ [8].output :data_out
325
+
326
+ my_circuit(:my_circuit0)
327
+ my_circuit(:my_circuit1)
328
+ end
329
+ ```
330
+
331
+ For testing purposes, write the code above into another file called `another_circuit.rb`, and try to generate Verilog HDL from it:
332
+
333
+ ```bash
334
+ hdrcc --verilog another_circuit.rb work
335
+ ```
336
+
337
+ Oh, it appears that something went wrong since the following should have appeared:
338
+
339
+ ```
340
+ another_circuit.rb:8:in `block in <main>': undefined HDLRuby construct, local variable or method `my_circuit'.
341
+ ```
342
+
343
+ This error message indicates that `my_circuit` is not known. This is because, like the Ruby language, in HDLRuby you must specify the files you are using. Please add as the first line in your `another_circuit.rb` file the following code:
344
+
345
+ ```ruby
346
+ require_relative "my_circuit.rb"
347
+ ```
348
+
349
+ Then retry the Verilog HDL generation command:
350
+
351
+ ```bash
352
+ hdrcc --verilog another_circuit.rb work
353
+ ```
354
+
355
+ Three new files should have appeared in the `work` directory: `_v10_5.v`, `_v8_4.v` and `another_circuit.v`. If you open the third file you should see:
356
+
357
+ ```verilog
358
+ `timescale 1ps/1ps
359
+
360
+ module _v0_3( _v1_clk, _v2_rst, _v3_addr, _v4_ce0, _v5_ce1, _v6_data_in, _v7_data_out );
361
+ input _v1_clk;
362
+ input _v2_rst;
363
+ input [15:0] _v3_addr;
364
+ input _v4_ce0;
365
+ input _v5_ce1;
366
+ input [7:0] _v6_data_in;
367
+ output [7:0] _v7_data_out;
368
+
369
+ _v8_4 _v9_my_circuit0();
370
+ _v10_5 _v11_my_circuit1();
371
+
372
+ endmodule
373
+ ```
374
+
375
+ Again, we can see similarities between the resulting Verilog HDL code and the original HDLRuby one. Still, what are `_v8_4` and `_v10_5`? You can see them by opening the corresponding files `_v8_4.rb` and `_v10_5.rb`, those are the descriptions of `my_circuit` in Verilog HDL.
376
+
377
+ > __But why two of them?__ I would like to answer that this is because of a limitation of Verilog HDL, but this is not the case. It is because HDLRuby's instantiation mechanism is very different from the Verilog HDL (and the VHDL) one, so for the moment, and only for easing the coding work of the HDLRuby compiler, one description of `my_circuit` is generated per instance.
378
+
379
+
380
+ Copying a circuit is easy, but it achieves no purpose if the copied circuit is not in a relationship with its environment. It is where the ports become useful: they are the communication points between a circuit and its outside world. Concretely, to interact with a circuit, its ports must be connected to something that will interact with them. How this interaction work is a story for the other sections of this tutorial. For now, let us focus on connections: in HDLRuby this is done using the assignment operator `<=` as follows:
381
+
382
+ * For an input port of the current circuit:
383
+ ```ruby
384
+ <something> <= <input port>
385
+ ```
386
+
387
+ * For an output port of the current circuit:
388
+ ```ruby
389
+ <output port> <= <something>
390
+ ```
391
+
392
+ Many things can be connected to a port, but right now, we only know about ports, so let us do the connection in `another_circuit` with them. So here is the new code of `another_circuit.rb`, please modify the file accordingly:
393
+
394
+ ```ruby
395
+ require_relative "my_circuit.rb"
396
+
397
+ system :another_circuit do
398
+ input :clk, :rst
399
+ [16].input :addr
400
+ input :ce0, :ce1
401
+ [8].input :data_in
402
+ [8].output :data_out
403
+
404
+ my_circuit(:my_circuit0)
405
+ my_circuit(:my_circuit1)
406
+
407
+ my_circuit0.clk <= clk
408
+ my_circuit0.rst <= rst
409
+ my_circuit0.addr <= addr
410
+ my_circuit0.ce <= ce0
411
+ my_circuit0.data_in <= data_in
412
+
413
+ my_circuit1.clk <= clk
414
+ my_circuit1.rst <= rst
415
+ my_circuit1.addr <= addr
416
+ my_circuit1.ce <= ce1
417
+ my_circuit1.data_in <= data_in
418
+ end
419
+ ```
420
+
421
+ If you are familiar with object-oriented or structured software programming, this code should be straightforward: the dot `.` operator is used to access a sub-element and in this case the ports of `my_circuit0` and `my_circuit1`. For example, the first connection line (line 10) connects the `clk` port of `another_circuit` to the one of `my_circuit0`, so that any data that goes through the former port will also go through the latter.
422
+
423
+ Now, the `data_out` ports are still not connected. It may be tempting to connect them like `data_in` as follows:
424
+
425
+ ```ruby
426
+ data_out <= circuit0.data_out
427
+ data_out <= circuit1.data_out
428
+ ```
429
+
430
+ This will work indeed, but not the way you may think: in hardware, you cannot normally connect to one port several different objects. It is like assigning several values at the **same** time to a single variable. What will happen in HDLRuby, is that only the last statement will be kept, i.e., port `data_out` of `circuit0` will not be connected.
431
+
432
+ With such kind of cases, what we often want to do is to connect to `data_out` some computation result between the output of `circuit0` and `circuit1`. This is the opportunity to see another kind of construct that can be connected to a port: an expression. Like in software, an expression represents an arithmetic and logic computation. For example, let us consider the following connection:
433
+
434
+ ```ruby
435
+ data_out <= my_circuit0.data_out + my_circuit1.data_out
436
+ ```
437
+
438
+ With this connection, the sum of the outputs `my_circuit0` and `my_circuit1` is transmitted through the output port `data_out` of `another_circuit`.
439
+
440
+ > __But when is this computation performed?__ This is a very good question: while in software, programs are executed one instruction after the other, in hardware, there is no such thing as the execution of instructions. The expression connected to `data_out` is not an instruction at all! It is a description of a part of the circuit that specifies that an adder (a circuit that does addition) must be instantiated with the output ports `data_out` of `my_circuit0` and `my_circuit1` connected to its inputs and its output connected to the output port `data_out` of `another_circuit`. The following figures show the schematic of this hardware portion:
441
+
442
+
443
+ <p align="center">
444
+ <img src="circuit_hie.png" width="40%">
445
+ </p>
446
+
447
+
448
+ > __So, when this expression is executed?__ It is continuously executed, i.e., as soon as one of the outputs `data_out` of `my_circuit0` or `my_circuit1` changes, so does the output `data_out` of `another_circuit`.
449
+
450
+ For trying this new circuit, please update the code of `another_circuit.rb` as follows:
451
+
452
+ ```ruby
453
+ require_relative "my_circuit.rb"
454
+
455
+ system :another_circuit do
456
+ input :clk, :rst
457
+ [16].input :addr
458
+ input :ce0, :ce1
459
+ [8].input :data_in
460
+ [8].output :data_out
461
+
462
+ my_circuit(:my_circuit0)
463
+ my_circuit(:my_circuit1)
464
+
465
+ my_circuit0.clk <= clk
466
+ my_circuit0.rst <= rst
467
+ my_circuit0.addr <= addr
468
+ my_circuit0.ce <= ce0
469
+ my_circuit0.data_in <= data_in
470
+
471
+ my_circuit1.clk <= clk
472
+ my_circuit1.rst <= rst
473
+ my_circuit1.addr <= addr
474
+ my_circuit1.ce <= ce1
475
+ my_circuit1.data_in <= data_in
476
+
477
+ data_out <= my_circuit0.data_out + my_circuit1.data_out
478
+ end
479
+ ```
480
+
481
+ Then, let us generate again Verilog HDL from it:
482
+
483
+ ```bash
484
+ hdrcc --verilog another_circuit.rb work
485
+ ```
486
+
487
+ Oh! If you get the following error message:
488
+
489
+ ```
490
+ another_circuit.rb:15:in `block in <main>': undefined method `addr' for #<HDLRuby::High::Namespace:<whatever number>>
491
+ ```
492
+
493
+ Do not forget to replace the smiling face with `addr` in `my_circuit.rb`.
494
+
495
+ When the compile succeeds (no error message), two new files appear in `work`, namely `_v20_4.v` and `_v23_5.v`. Those are the new descriptions of `my_circuit`, they did not change, but since new hardware has been added their name changed. For the interesting part, let us open again `another_circuit.v`, the result should be as follows:
496
+
497
+ ```verilog
498
+ `timescale 1ps/1ps
499
+
500
+ module _v0_3( _v1_clk, _v2_rst, _v3_addr, _v4_ce0, _v5_ce1, _v6_data_in, _v7_data_out );
501
+ input _v1_clk;
502
+ input _v2_rst;
503
+ input [15:0] _v3_addr;
504
+ input _v4_ce0;
505
+ input _v5_ce1;
506
+ input [7:0] _v6_data_in;
507
+ output [7:0] _v7_data_out;
508
+ wire _v8_0;
509
+ wire _v9_1;
510
+ wire [15:0] _v10_2;
511
+ wire _v11_3;
512
+ wire [7:0] _v12_4;
513
+ wire _v13_5;
514
+ wire _v14_6;
515
+ wire [15:0] _v15_7;
516
+ wire _v16_8;
517
+ wire [7:0] _v17_9;
518
+ wire [7:0] _v18_10;
519
+ wire [7:0] _v19_11;
520
+
521
+ _v20_4 _v21_my_circuit0(._v1_clk(_v8_0),._v2_rst(_v9_1),._v3_addr(_v10_2),._v22_ce(_v11_3),._v6_data_in(_v12_4),._v7_data_out(_v18_10));
522
+ _v23_5 _v24_my_circuit1(._v1_clk(_v13_5),._v2_rst(_v14_6),._v3_addr(_v15_7),._v22_ce(_v16_8),._v6_data_in(_v17_9),._v7_data_out(_v19_11));
523
+ assign _v8_0 = _v1_clk;
524
+
525
+ assign _v9_1 = _v2_rst;
526
+
527
+ assign _v10_2 = _v3_addr;
528
+
529
+ assign _v11_3 = _v4_ce0;
530
+
531
+ assign _v12_4 = _v6_data_in;
532
+
533
+ assign _v13_5 = _v1_clk;
534
+
535
+ assign _v14_6 = _v2_rst;
536
+
537
+ assign _v15_7 = _v3_addr;
538
+
539
+ assign _v16_8 = _v5_ce1;
540
+
541
+ assign _v17_9 = _v6_data_in;
542
+
543
+ assign _v7_data_out = (_v18_10 + _v19_11);
544
+
545
+
546
+ endmodule
547
+ ```
548
+
549
+ The code is starting to get complicated and seemed to be much different from the HDLRuby description than before. This is because this time, real syntactic limitations of Verilog HDL compared to HDLRuby have to be bypassed. Here, the limitation is that while in HDLRuby, ports can be connected wherever we want, in Verilog HDL, this must be done only while instantiating.
550
+
551
+ In fact, in HDLRuby too you can do the connection while instantiating, this is even recommended for better readability of the code. There are two ways to do so: by position (like for the arguments of a function call) or by name. Let us see both by editing again `another_circuit.rb`: please just replace both instantiation lines with the followings:
552
+
553
+ ```ruby
554
+ my_circuit(:my_circuit0).(clk,rst,addr,ce0,data_in,data_out)
555
+ my_circuit(:my_circuit1).(clk: clk, rst: rst, addr: addr, ce: ce1,
556
+ data_in: data_in, data_out: data_out)
557
+ ```
558
+
559
+ The instantiation for `my_circuit0` does the connection by position, that is to say, that each port given between the second parenthesis is connected in the order of declaration of in ports of `my_circuit.rb`. For `my_circuit1P` the connection is done by name: then syntax `<name>: <something>` connects ports named `name` of the instance of `my_circuit` to `something`. For example `clk: clk` connects port `clk` of `my_circuit0` to port `clk` of `another_circuit`.
560
+
561
+
562
+ #### That's all for this section!
563
+
564
+ ---
565
+
566
+ Now you know:
567
+
568
+ * How to declare a new circuit with its name and ports in HDLRuby.
569
+
570
+ * How to check it with hdrcc.
571
+
572
+ * How to convert it to Verilog HDL.
573
+
574
+ * How to reuse it into another circuit.
575
+
576
+ * And even, how to describe the computation of some expressions and connect them to an output port.
577
+
578
+ But you still do not know: how to describe more complex computations, e.g., controls, and how to simulate a circuit. Let us start slowly (from the software person's point of view) with how to describe an algorithm the simple way in HDLRuby.
579
+
580
+
581
+
582
+ ## 3. How to describe an algorithm to be implemented by a circuit
583
+
584
+ In this section we will see:
585
+
586
+ * [How to handle values in a circuit.](#31-how-to-handle-values-in-a-circuit)
587
+
588
+ * [How to describe an algorithm that a circuit can implement.](#32-how-to-describe-an-algorithm-that-a-circuit-can-implement)
589
+
590
+ ### 3.1. How to handle values in a circuit
591
+
592
+ In software, handling values looks straightforward enough: they are computed with expressions and stored in variables. In hardware it is slightly different since two kinds of components handle values: the wires and the storage components:
593
+
594
+ * The wires are abstractions of physical wires or sets of wires that transmit data to the hardware component they are connected to. A wire cannot hold a value: if a component drives a value to a wire, this wire takes this value, and it will last as soon as this drive stops.
595
+
596
+ * The storage components are abstractions of registers or memories that can hold values. Depending on how they are described, the value they hold can be changed on specific events like the rising edge of a clock. Hence, the storages component are closer to the software variable than the wires are.
597
+
598
+ With HDL like Verilog HDL, there is a real distinction between wires and storage components. However, with other HDL like VHDL, or here with HDLRuby, both are abstracted by a single concept: the signal. Specifically, when a signal is only driven on a given event, it becomes a storage element, otherwise, it will remain a wire.
599
+
600
+ __Note__: one may ask what about the `variable` construct in VHDL: those are syntactic intermediates used for simplifying the descriptions. HDLRuby also supports such a `variable` construct, but those are simply the variables of the Ruby language. Therefore, these variables can hold not only values but any Ruby or HDLRuby constructs.
601
+
602
+ In HDLRuby, there are three kinds of signals: the input ports, the output ports, the inout ports, and the inner signals. The first three kinds of signals have already been described in the section about declaring a [circuit](#circuit-declare). At that time they were called ports because it is what such constructs are called in common HDL. However, in HDLRuby there is no distinction at all between ports and signals, hence from now on, we will use the term `signal` in general, and the term `port` when we only want to refer to the interface of a circuit. The inner signals can be used inside a circuit and are usually used as an intermediate for internal computations. They are declared like the ports, but using `inner` instead of `input`, `output`, or `inout`. For example, the following code declares a module named `a_circuit` with two 1-bit inputs, one 8-bit output, and one 4-bit inner signal.
603
+
604
+ ```ruby
605
+ system(:a_circuit) do
606
+ input :u, :v
607
+ [8].output :z
608
+
609
+ [4].inner :sig
610
+ end
611
+ ```
612
+
613
+ __Note__: the inner signals as well as the ports can be declared anywhere within the body of a module. It is however recommended to declare the ports at the beginning, and the inner signal when relevant, e.g., it may not be a good idea to declare an inner signal at the beginning of a circuit description if it is only used at the end.
614
+
615
+ #### What about the data types?
616
+
617
+ For now, we only used positive integer values, e.g., `[8].inner` declares an 8-bit positive integer. To tell the truth, `[8]` is not the designation of the 8-bit positive integer data type, but a syntactic sugar that stands for: `bit[8]`. As a general rule, in HDLRuby, the designation of a data type is as follows:
618
+
619
+ ```ruby
620
+ <base>[<range>]
621
+ ```
622
+
623
+ Where `base` is another data type and `range` describes the range of indexes used for accessing its elements individually. In addition, there are three root data types used for building all the other ones:
624
+
625
+ * `bit`: for boolean, or 1-bit unsigned values, i.e., the only possible values are 0 and 1.
626
+
627
+ * `signed`: for 1-bit signed values in 2's complement, i.e., the only possible values are 0 and -1.
628
+
629
+ * `float`: for 1-bit floating point values, but, __WARNING__: this data type is not supported yet!
630
+
631
+ For example, a 32-bit input signed integer signal and an inner array of eight 24-bit integers are declared respectively as follows:
632
+
633
+ ```ruby
634
+ signed[32].input :sig
635
+ bit[24][8].inner :ar
636
+ ```
637
+
638
+ > Unfortunately, the following is not valid:
639
+ `[24][8].inner :ar`
640
+ This is because it conflicts with other constructs in the language.
641
+
642
+ When the code above is declared, you can access one element of array `ar` using the brackets like before, e.g., its first element is `ar[0]`, its second one `ar[1]`, and so on. But you can also access its last element as follows: `ar[-1]`, its element before the last one as follows `ar[-2]`, and so on. Moreover, since an integer is just an array of bits, it is possible to access its bits individually in the same way. For example, accessing the first bit of the last element of ar can be done as follows: `ar[-1][0]`.
643
+
644
+ But compared to the software you can do even more than that: you can also access to sub-range of elements of a signal. For example, assigning bits from index 4 to 27 of signal `sig` to element 1 of array `ar` can be written as follows using *range*:
645
+
646
+ ```ruby
647
+ ar[1] <= sig[27..4]
648
+ ```
649
+
650
+ A range is noted as follows: `<start value>..<end value>`, and indicates respectively the value of the first and the last indexes. Convenient, is not it? But wait... Why `27..4` and not `4..27`?
651
+
652
+ #### Let us talk about endianness.
653
+
654
+ If you are familiar with hardware or with low-level software, you may be aware of this topic. There are two possible ways for reading a value or the content of an array: from the most significant bit or index to the least significant one (big-endian) and the opposite (little-endian). Since big-endian is more natural for reading numbers it is the default mode in HDL, including HDLRuby. However, for an array, software programming usually does the opposite. In HDLRuby, using little-endian instead of big-endian is declared by using a negative number for the size instead of a positive one.
655
+
656
+ As a general rule, code will be much easier to understand if integer values are big-endian and the array accesses little-endian. That is to say that the following declaration is the best for `ar`:
657
+
658
+ ```ruby
659
+ bit[24][-8].inner :ar
660
+ ```
661
+
662
+ > One last thing: it is also possible to set the numbering of each bit of a signal using again a *range* instead of an integer indicating the size. For example, with the following declaration, `a` is an 8-bit big-endian integer with its bits numbered from 0 to 7, and `b` is a 9-bit little-endian integer with its bits numbered from 1 to 9:
663
+ `[7..0].inner :a`
664
+ `[1..9].inner :b`
665
+
666
+ So now you know about basic types in HDLRuby, but what happens when we make operations between signals of different types?
667
+
668
+ #### About casting and auto-casting
669
+
670
+ First, you can convert the type of a signal using the cast operator `as`. For example, in the following code, the result of a 16-bit unsigned addition is performed between the 8-bit signal `x` converted to 16-bit and the 16-bit signal `y`:
671
+
672
+ ```ruby
673
+ [8].inner :x
674
+ [16].inner :y, :z
675
+
676
+ <some code>
677
+
678
+ z <= x.as(bit[16]) + y
679
+ ```
680
+
681
+ However, if you do not use the cast operator, HDLRuby will do it automatically for you and cast the operands to the largest of their types. That is to say that for the following code the result is identical to the previous one:
682
+
683
+ ```ruby
684
+ [8].inner :x
685
+ [16].inner :y, :z
686
+
687
+ <some code>
688
+
689
+ z <= x + y
690
+ ```
691
+
692
+ This rule means that for the following code, `z` will have the result of an 8-bit addition since both operands `x` and `y` are 8-bit:
693
+
694
+ ```ruby
695
+ [8].inner :x, :y
696
+ [16].inner :z
697
+
698
+ <some code>
699
+
700
+ z <= x + y
701
+ ```
702
+
703
+ If you want a 16-bit addition, you must add a cast to at least one of the operands, e.g.:
704
+
705
+ ```ruby
706
+ [8].inner :x, :y
707
+ [16].inner :z
708
+
709
+ <some code>
710
+
711
+ z <= x.as(bit[16]) + y
712
+ ```
713
+
714
+ ---
715
+
716
+ These explanations about data types may have been not so fun, be we should better get over with them first so that now we can start the real fun with actual "programming"...
717
+
718
+ __Note__: "Programming" is put between quotes because this is not programming since there is no program, but instead, a description of a physical circuit.
719
+
720
+
721
+ ### 3.2. How to describe an algorithm that a circuit can implement
722
+
723
+ So now you know how to declare a circuit, use it, connect it, and declare variable-like constructs that are called signals. We will see later that signals are definitive **not** software variables, but let us go on with the illusion for a little while and see how you can describe an algorithm in HDLRuby. As a side note, the features described in this section are unique to HDLRuby so do not expect to be able to do the same with Verilog HDL or VHDL.
724
+
725
+ With software languages, when you want to describe an algorithm you use assignments but also control statements like `if`, `for`, or `while`. If you know higher-level languages like `Java` or `Ruby` you may also use design patterns like `iterators` (or `enumerators`). You can do the same in HDLRuby with constructs called *sequencers*. At first glance, a sequencer can be seen as the description of a hardware component implementing a software algorithm. Hence, a sequencer may look like a processor, but there are differences:
726
+
727
+ * It is not possible to change the algorithm implemented by a sequencer (you cannot "load" a new program).
728
+
729
+ * A sequencer starts execution according to a given start signal, and from there advances execution at each cycle of a given clock.
730
+
731
+ * You have control of the algorithm at the clock cycle level, and incidentally, a sequencer is much more efficient than a processor executing a program.
732
+
733
+
734
+ This section will explain the following about sequencers:
735
+
736
+ * [How to declare and control a sequencer.](#321-how-to-declare-and-control-a-sequencer)
737
+
738
+ * [How to write a structured programming algorithm in a sequencer.](#322-how-to-write-a-structured-programming-algorithm-in-a-sequencer)
739
+
740
+ * [How to use enumerators in a sequencer.](#323-how-to-use-enumerators-in-a-sequencer)
741
+
742
+ * [What happens when there are several sequencers?](#324-what-happens-when-there-are-several-sequencers)
743
+
744
+
745
+ #### 3.2.1 How to declare and control a sequencer
746
+
747
+ As said in the previous section, a sequencer advances execution according to a clock and starts execution according to a start signal. A clock is a single-bit signal that alternates from 0 to 1, and it is the transition from 0 to 1, or from 1 to 0 of this signal that will make a sequencer progress execution. Let us see two examples of sequencers to understand better:
748
+
749
+ ```ruby
750
+ system :the_counter do
751
+ inner :clk, :start
752
+
753
+ sequencer(clk,start) do
754
+ end
755
+ end
756
+ ```
757
+
758
+ In the code above, a module containing one sequencer is described. The sequencer is controlled by the start signal `start` and clock signal `clk`, which means that:
759
+
760
+ * it will advance execution every time `clk` switches from 0 to 1 - we call such an event a *positive edge* of `clk`;
761
+
762
+ * it will start when signal `start` switches from 1 to 0 - we call such an event a *negative edge* of `start`.
763
+
764
+ Why a positive edge for `clk` and a negative one for `start`? It does not matter much, and you can specify the edges you want using the `posedge` and `negedge` methods. For example, the following sequencer advances execution on a negative edge of `clk` and starts on a positive edge of `start`:
765
+
766
+ ```ruby
767
+ sequencer(clk.negedge,start.posedge) { }
768
+ ```
769
+
770
+ __Note__: the curly brackets used here are equivalent to the `do` and `end` keywords we used until now.
771
+
772
+ > __WARNING__: there is a caveat though, since a sequencer advances execution according to a clock signal, it will not be able to start properly unless the start signal is strictly slower than the clock.
773
+
774
+ But now, what will this sequencer do? ... Nothing right now since its body is empty. So let us start small and make it count the number of a few clock cycles:
775
+
776
+ ```ruby
777
+ system :the_counter do
778
+ inner :clk, :start
779
+ [8].inner :count
780
+
781
+ sequencer(clk,start) do
782
+ count <= 0
783
+ step
784
+ count <= count + 1
785
+ step
786
+ count <= count + 1
787
+ step
788
+ count <= count + 1
789
+ end
790
+ end
791
+ ```
792
+
793
+ The first line of the body of the counter looks like the connection of ports we described [previously](#circuit-use). However it is inside the body of a sequencer and will therefore be *executed* by it, that is to say, before this statement is executed, `count` may not be 0. More generally:
794
+
795
+ * Every assignment statement outside sequencers (and later processes) is a connection: the signal will *always* and *continuously* have the value that is assigned to them.
796
+
797
+ * The other assignment statements, e.g., the ones inside a sequencer, are called *transmission* in HDLRuby and happen only when "executed."
798
+
799
+ In our example, the count is set to 0 when the sequencer starts execution. But before that, no one knows the value of count: it is said to be *undefined*.
800
+
801
+ Just after this initial line, there is the `step` keyword. There we go from the software world to the hardware one: this keyword indicates that the sequencer ends execution here until the next clock cycle. Looking at the subsequent line you may understand why we waited: this line increases the value of `count` by one, i.e., the sequencer counted one clock cycle. Again, a `step` is inserted, and `count` is increased, and so on, so that this is indeed the code of a sequencer that counts the clock cycles. However the code ends after 3 clock cycles, and like with software, when the code of a sequencer is over, it stops execution.
802
+
803
+ This may look abstract, so let us try to simulate this circuit. For that purpose, write the code into a file named `the_counter.rb` (for example) and try the following command that will simulate the circuit:
804
+
805
+ ```bash
806
+ hdrcc --sim the_counter.rb the_counter
807
+ ```
808
+
809
+ If everything was all right the following text will appear:
810
+
811
+ ```
812
+ __:T:0:::2: 000
813
+ ```
814
+
815
+ It indicates that a signal named `__: T:0:::2` has been initialized to 0 (000), and that's all... Why is that, and why only that? First, this signal with a strange name has been generated by HDLRuby for the internal processing of the sequencer and is required to be initialized to 0. So, ok, the simulation did some initialization, but it did not seem to do any execution. This is because we did not give any description of the physical environment of the circuit, and especially, we did not say that the clock and the start signal need to change value. For describing the behavior of the environment of a circuit, we use a construct called in HDLRuby the *timed process*. They are somewhat close to sequencers but are not controlled by a clock signal but by physical time. So let us add the following code just after the sequencer (but still inside the module `the_counter`:
816
+
817
+ ```ruby
818
+ timed do
819
+ clk <= 0
820
+ start <= 0
821
+ !10.ns
822
+ clk <= 1
823
+ !10.ns
824
+ clk <= 0
825
+ start <= 1
826
+ !10.ns
827
+ clk <= 1
828
+ !10.ns
829
+ clk <= 0
830
+ start <= 0
831
+ !10.ns
832
+ clk <= 1
833
+ !10.ns
834
+ clk <= 0
835
+ !10.ns
836
+ clk <= 1
837
+ !10.ns
838
+ clk <= 0
839
+ !10.ns
840
+ clk <= 1
841
+ !10.ns
842
+ clk <= 0
843
+ !10.ns
844
+ clk <= 1
845
+ !10.ns
846
+ clk <= 0
847
+ !10.ns
848
+ clk <= 1
849
+ !10.ns
850
+ end
851
+ ```
852
+
853
+ Sorry, this code is a little bit long... Did you add it to the file `the_counter.rb`? now please try again the simulation command:
854
+
855
+ ```bash
856
+ hdrcc --sim the_counter.rb the_counter
857
+ ```
858
+
859
+ You should have the following result:
860
+
861
+ ```
862
+ __:T:0:::2: 000
863
+ __:T:0::clk: 0
864
+ __:T:0::start: 0
865
+ # 10000ps
866
+ __:T:0::clk: 1
867
+ __:T:0:::1: 000
868
+ # 20000ps
869
+ __:T:0::clk: 0
870
+ __:T:0::start: 1
871
+ # 30000ps
872
+ __:T:0::clk: 1
873
+ __:T:0:::2: 001
874
+ # 40000ps
875
+ __:T:0::start: 0
876
+ __:T:0::clk: 0
877
+ # 50000ps
878
+ __:T:0::clk: 1
879
+ __:T:0:::1: 001
880
+ __:T:0::count: 00000000
881
+ __:T:0:::2: 010
882
+ # 60000ps
883
+ __:T:0::clk: 0
884
+ # 70000ps
885
+ __:T:0::clk: 1
886
+ __:T:0:::1: 010
887
+ __:T:0::count: 00000001
888
+ __:T:0:::2: 011
889
+ # 80000ps
890
+ __:T:0::clk: 0
891
+ # 90000ps
892
+ __:T:0::clk: 1
893
+ __:T:0:::1: 011
894
+ __:T:0::count: 00000010
895
+ __:T:0:::2: 100
896
+ # 100000ps
897
+ __:T:0::clk: 0
898
+ # 110000ps
899
+ __:T:0::clk: 1
900
+ __:T:0:::1: 100
901
+ __:T:0::count: 00000011
902
+ __:T:0:::2: 101
903
+ # 120000ps
904
+ __:T:0::clk: 0
905
+ # 130000ps
906
+ __:T:0::clk: 1
907
+ __:T:0:::1: 101
908
+ __:T:0:::2: 000
909
+ # 140000ps
910
+ ```
911
+
912
+ If you look at the last line of the simulation result you should have `# 140000ps`, which means that final the simulation time was 140000 picoseconds (140 nanoseconds). If you look at the result in totality, you will see an alternation of signal names followed by a value in binary (e.g., `__:T:0::clk: 1` means that the `clk` signal is 1) and an indication of the time of the simulation. The names look indeed messy, but please only focus on their last parts: that will be the names of the signals you defined in your HDLRuby file. You will also notice some signals that you did not declare (e.g., `__:T:0:::1` and `__:T:0:::2`, but those are only signals generated by HDLRuby.
913
+
914
+ Now, let us see what happened, and what was the meaning of the `timed` construct we added. As explained before, `timed` constructs are like sequencers, but controlled by physical time instead of a clock signal. Inside, you can put assignments but also statements that advance the time. For instance: `!10.ns` is a statement that advances time by 10 nanoseconds (i.e., 10000 picoseconds). Hence, with such a construct you can declare that a signal will take a given value at a given time. For this example, it is enough to tell how the `clk` and `start` signal change over time to make the sequencer run by itself. Hence, the `clk` signal is made to switch from 0 to 1 then from 1 to 0 every 10 nanoseconds, which simulates a 50 MHz clock, and the start signal is set to 1 and then to 0 so that the sequencer starts execution. The first thing it does is set `counter` to 0, and then the following cycles this counter is increased until its execution completes. Even when the sequencer completes execution, the simulation goes on though, until the timed process also completes.
915
+
916
+ > __WARNING__: in HDLRuby, everything is synthesizable, i.e., it can be turned into a real circuit, **apart** from the timed processes. These constructs are only there for simulating the physical environment of a circuit.
917
+
918
+ #### Ugh! The simulation result is ugly!
919
+
920
+ Indeed, the text output of the simulator is hard to read, and therefore we highly recommend generating a graphical one using the following command:
921
+
922
+ ```bash
923
+ hdrcc --sim --vcd the_counter.rb the_counter
924
+ ```
925
+
926
+ The new option `--vcd` makes the simulator produce a *Value Change Dump* file (VCD) that can be visualized graphically by many HW design tools. If you do not have any such tool you can get [GTKWave](https://gtkwave.sourceforge.net/) which is under GNU General Public License v2.0.
927
+
928
+ The resulting vcd file can be found in the `the_counter` directory with the name `hruby_simulator.vcd`. If you open it and select the `clk`, `start`, and `counter` signals you will see something like the following picture:
929
+
930
+ ![hruby_simulator.vcd](the_counter_vcd.png)
931
+
932
+ Such a graph is called a time chart and displays the evolution of signals with time on the X-axis. You can see that `clk` is indeed alternating from 0 to 1, and that `start` is set to 1 for one clock cycle only. After this later signal becomes 0 again, the sequencer starts. This can be verified by looking at the value of `counter`: at first, it is undefined, that it is set to 0, then increased by 1 at each clock cycle until it reaches 3.
933
+
934
+ ---
935
+
936
+ So now, you know:
937
+
938
+ * How to describe an algorithm in HDLRuby.
939
+
940
+ * How to simulate it.
941
+
942
+ * How to display a cool graph showing the simulation results (maybe?)
943
+
944
+ Hum, the algorithm part is quite limited, and the timed processes seem to be real pains to write. So let us see how to do better in the following section.
945
+
946
+
947
+ #### 3.2.2. How to write a structured programming algorithm in a sequencer
948
+
949
+ Now, when we speak about algorithms, we often think about software constructs like `if` for conditional executions and `for` for loops. Unfortunately, in HW design this is usually not that simple at all... But for the HDLRuby sequencers, it is indeed that simple: all these control statements are supported without any limitation. The only thing you have to be careful about is their name: to avoid confusion with the Ruby language, their names are all prefixed with an `s` (for *sequencer*), e.g, you have to use `sif` for doing an *if*.
950
+
951
+ In detail here is a list of the control statements you can use within a sequencer:
952
+
953
+ * `sif(<condition>) <block>`: executes `block` if `condition` is true (i.e., different from 0).
954
+
955
+ - `selsif(<condition>) <block>`: executes `block` if the previous `sif` and `selsif` conditions are false and if the current `condition` is true.
956
+
957
+ * `selse <block>`: executes `block` if the conditions of the previous `sif` and `selsif` are false (i.e., equals 0). It is also used for giving the default block for the `scase` statements (see below).
958
+
959
+ * `scase(<choice>)`: initializes a multiple case statement.
960
+
961
+ * `swhen(<value>) <block>`: executes block if `value` is equal to the `choice` of the previous `scase` statement.
962
+
963
+ * `sloop <block>`: loops infinitely repeating the execution of `block` (yes, like Ruby, if you want to do an infinite loop, do it explicitly, not with constructs like `while(true)`... even if you can).
964
+
965
+ * `swhile(<condition>) <block>`: repeats the execution of `block` as long as `condition` is true.
966
+
967
+ * `sfor(<enumerable>) <block>`: executes `block` on each element of `enumerable `.
968
+
969
+ * `sbreak`: ends the current loop (even if it is a `sloop`).
970
+
971
+ * `scontinue`: ends the current iteration.
972
+
973
+ > __IMPORTANT__: each control statement requires one clock cycle for each branch they perform, independently of what their block contains. For example, in the following code, the value of `y` will be set one cycle after `x` is set to 0, and one cycle before `z` is set to 1.
974
+
975
+ ```ruby
976
+ x <= 0
977
+ sif(~x) { y <= 2 }
978
+ z <= 1
979
+ ```
980
+
981
+ Let us see the details about these constructs with a few examples. First, let us make a circuit that computes the factorial of an input value when asked for:
982
+
983
+ ```ruby
984
+ system :fact do
985
+ input :clk, :start, :req
986
+ [5].input :data_in
987
+ output :ack
988
+ [32].output :data_out
989
+
990
+ [4].inner :val
991
+ [24].inner :res
992
+
993
+ sequencer(clk,start) do
994
+ sloop do
995
+ ack <= 0
996
+ swhile(req != 1)
997
+ val <= data_in
998
+ res <= 1
999
+ swhile(val>1) do
1000
+ res <= res*val
1001
+ val <= val - 1
1002
+ end
1003
+ data_out <= res
1004
+ ack <= 1
1005
+ end
1006
+ end
1007
+ end
1008
+ ```
1009
+
1010
+ The code is more complex that what we have seen up to now, so let us study it progressively:
1011
+
1012
+ * The circuit is named `fact` (for factorial).
1013
+
1014
+ * It has four inputs:
1015
+
1016
+ - `clk` and `start`: the signals that controls the execution of the sequencer (nothing new here).
1017
+
1018
+ - `req`: the signal that will ask for a factorial computation (*req* stands for require).
1019
+
1020
+ - `data_in`: the signal that will provide the input value. It is a 4-bit unsigned integer.
1021
+
1022
+ * It has two outputs:
1023
+
1024
+ - `ack`: the signal that will tell that the current computation of factorial is over. Such a signal is necessary for the environment to know if the output result is valid or not since the computation time of a factorial depends on the input value.
1025
+
1026
+ - `data_out`: the signal that will hold the computation result. It is a 24-bit unsigned integer, much larger than the input, but we are computing factorials here! (This is not enough yet: 15! equals 1307674368000 which requires 41 bits... But 24-bit was the largest that could fit in the picture of the time chart that is shown after.)
1027
+
1028
+ * The sequencer includes an infinite loop (`sloop`) so that it does not stop after one computation. Within this loop:
1029
+
1030
+ - It first sets `ack` to 0 to tell that currently, its output is not valid (no computation is done yet).
1031
+
1032
+ - Then it waits for a request, i.e., it waits for that signal `req` is 1. This loop (`swhile(req != 1)`) does not have any block to execute, it will therefore loop without doing anything.
1033
+
1034
+ - When `req` becomes 1, it loads the input value into an internal temporary signal (`val`) and initializes the computed value (`res`) to one.
1035
+
1036
+ - Finally, it loops by multiplying `res` by `val` and decreasing `val` until 1 is reached before assigning the computation result to `data_out` and setting `ack` to 1 for telling the computation is over.
1037
+
1038
+ #### Does it work?
1039
+
1040
+ Now let us simulate this circuit. For that, you now know that we must describe the environment using a timed process. But this time, let us do this properly by putting the description of the environment in a different module to separate the circuit description from the simulation: please write in a file named `fact.rb` the code of module `fact`, and then the following code:
1041
+
1042
+ ```ruby
1043
+ system :fact_bench do
1044
+ inner :clk, :start, :req, :ack
1045
+ [4].inner :x
1046
+ [24].inner :z
1047
+
1048
+ fact(:factI).(clk,start,req,x,ack,z)
1049
+
1050
+ def clk!
1051
+ clk <= 0
1052
+ !10.ns
1053
+ clk <= 1
1054
+ !10.ns
1055
+ end
1056
+
1057
+ timed do
1058
+ start <= 0
1059
+ !10.ns
1060
+ clk!
1061
+ start <= 1
1062
+ req <= 0
1063
+ clk!
1064
+ start <= 0
1065
+ repeat(2) do
1066
+ clk!
1067
+ end
1068
+ req <= 1
1069
+ x <= 10
1070
+ clk!
1071
+ req <= 0
1072
+ repeat(20) do
1073
+ clk!
1074
+ end
1075
+ end
1076
+ end
1077
+ ```
1078
+
1079
+ This module introduces the `repeat` construct. It can be used within a timed process only and have the following syntax:
1080
+
1081
+ ```ruby
1082
+ repeat(<number>) <block>
1083
+ ```
1084
+
1085
+ It executes the `block` `number` times with the number of iterations as an argument.
1086
+
1087
+ __WARNING__: the repeat construct cannot be used outside a timed process. Inside a sequencer, you can use instead `swhile` or `sfor`.
1088
+
1089
+ > There seems to be another new kind of command in the timed process: `clk!`. This is not a command though, but it is a Ruby function defined just before with the `def` block. If you are not familiar with Ruby, `def` is used for defining a new function (actually a method), also, it is not necessary to put parenthesis when calling a function. Hence, `clk!` is simply the call of this previously defined function (in Ruby, function names can terminate with characters like `!` or `?`). If you look at the body of this function, you will see not plain Ruby code, but again HDLRuby code:
1090
+
1091
+ ```Ruby
1092
+ clk <= 0
1093
+ !10.ns
1094
+ clk <= 1
1095
+ !10.ns
1096
+ ```
1097
+
1098
+ > __IMPORTANT__: as said when presenting HDLRuby, this language is implemented on top of the Ruby language, and is fully compatible with it. For instance, you can write any Ruby code within HDLRuby constructs (e.g., `def`), and you can write HDLRuby code within Ruby constructs. However, there is an important difference: Ruby code is executed at compile time (i.e., when hdrcc runs) and does not produce any hardware, whereas HDLRuby code is the description of the hardware that will be produced and will be then executed either through simulation or after production physically.
1099
+ Then, what calling `clk!` do is paste the HDLRuby code in place. Here it is used to shorten the code: instead of setting each time the clock to 0, advancing time, then setting it to 1, writing `clk!` is enough to obtain the same result.
1100
+ It is from this capability to mix Ruby and HDLRuby that comes the *meta programmability* of HDLRuby.
1101
+
1102
+ Finally, when you simulate with the following command:
1103
+
1104
+ ```bash
1105
+ hdrcc --sim --vcd fact.rb fact
1106
+ ```
1107
+
1108
+ You should obtain the following kind of resulting VCD file:
1109
+
1110
+ ![hruby_simulator.vcd](fact_vcd.png)
1111
+
1112
+
1113
+
1114
+
1115
+ #### Now about `sfor`
1116
+
1117
+ With a for statement you may expect something like C with the definition of an index, its initialization, and its increment. However, the `sfor` of HDLRuby is simpler: it only takes as argument an enumerable object and will iterate on each of its elements. An enumerable object is any Ruby enumerable object as well as any signal. For example, the following circuit outputs bit by bit the content of a 32-bit input value:
1118
+
1119
+ ```ruby
1120
+ system :serializer do
1121
+ input :clk, :start, :req
1122
+ [32].input :data_in
1123
+ output :tx, :data_out
1124
+
1125
+ sequencer(clk,start) do
1126
+ tx <= 0
1127
+ data_out <= 0
1128
+ sloop do
1129
+ sif(req) do
1130
+ tx <= 1
1131
+ sfor(data_in) do |b|
1132
+ data_out <= b
1133
+ end
1134
+ tx <= 0
1135
+ data_out <= 0
1136
+ end
1137
+ end
1138
+ end
1139
+ end
1140
+ ```
1141
+
1142
+ This circuit takes as input `data_in` and sends its content bit by bit to `data_out` when `req` is set to 1. In detail, it first indicates the beginning of the transmission by setting `tx` to 1, then from the next clock cycle will send one bit of `data_in` per clock cycle. When the transmission is over, it sets tx to 0 again. There are a few more details to unwrap:
1143
+
1144
+ * By default both `tx` and `data_out` are set to 0, this is done at the beginning of the sequencer but also just after the `sfor`.
1145
+
1146
+ * `tx` is set to 1 just before the `sfor` loop, i.e., one cycle before the actual transmission. As said in the description of the [control statements](#sequencer-structured), each of them requires one clock cycle before branching.
1147
+
1148
+ > This for loop could also have been implemented using an index as follows:
1149
+
1150
+ ```ruby
1151
+ sfor(0..31) do |i|
1152
+ data_out <= data_in[i]
1153
+ end
1154
+ ```
1155
+
1156
+ > However, the advantage of the first version of the code is that it is still valid when the data type of `data_in` is changed whereas the second version must be rewritten to adjust the range of the index.
1157
+
1158
+ If you want, you can try this serializer, you need to add a module representing the environment. For example:
1159
+
1160
+ ```ruby
1161
+ system :serializer_bench do
1162
+ inner :clk, :start, :req, :tx
1163
+ [8].inner :x
1164
+ inner :z
1165
+
1166
+ serializer(:serializerI).(clk,start,req,x,tx,z)
1167
+
1168
+ def clk!
1169
+ clk <= 0
1170
+ !10.ns
1171
+ clk <= 1
1172
+ !10.ns
1173
+ end
1174
+
1175
+ timed do
1176
+ start <= 0
1177
+ !10.ns
1178
+ clk!
1179
+ start <= 1
1180
+ req <= 0
1181
+ clk!
1182
+ start <= 0
1183
+ repeat(2) do
1184
+ clk!
1185
+ end
1186
+ req <= 1
1187
+ x <= _hAB
1188
+ clk!
1189
+ req <= 0
1190
+ repeat(16) do
1191
+ clk!
1192
+ end
1193
+ end
1194
+ end
1195
+ ```
1196
+
1197
+ > There is a new notation: `_hAB`, this is used for representing fixed bit-width hexadecimal. Since there are two digits, it is an 8-bit value. You can also use binary representations as follows: `_b10101011`.
1198
+
1199
+ When everything is put in a file named: `serializer.rb` and simulated as follows:
1200
+
1201
+ ```bash
1202
+ hdrcc --sim --vcd serializer.rb serializer
1203
+ ```
1204
+
1205
+ You should obtain the following time chart:
1206
+
1207
+ ![hruby_simulator.vcd](serializer_vcd.png)
1208
+
1209
+
1210
+
1211
+ ---
1212
+
1213
+ So now, you know:
1214
+
1215
+ * How to describe fully fledged algorithms with sequencers.
1216
+
1217
+ Also, if you are not familiar with languages like Ruby or Python, the `sfor` may look great compared to what C can provide. But this is just the beginning, HDLRuby can do better than that.
1218
+
1219
+
1220
+ #### 3.2.3. How to use enumerators in a sequencer
1221
+
1222
+ If sometimes you program with Ruby, you may know about enumerators: they are objects used for processing iteratively several elements of objects. The HDLRuby's sequencer provides the same concept: it is possible to build hardware enumerators for any enumerable objects and they will run like Ruby's.
1223
+
1224
+ > We recommend before going on with this tutorial you study the Ruby enumerable and enumerators. Indeed, since the HDLRuby enumerators and enumerable objects do reproduce the functionalities of the Ruby ones, we will only point out what is specific to the hardware.
1225
+
1226
+ #### First, what is an enumerable?
1227
+
1228
+ So, enumerators iterate over enumerable, but what is it? In HDLRuby, an enumerable is either a Ruby enumerable object, an enumerator (Ruby or HDLRuby), or a signal of any type. If the signal is an array, the iteration will be over its elements, if it is a number, it will be done over its bits. For example, `0..2` is an enumerable including 0, 1, and 2 as elements, `[3,6,7]` is one including 3, 6, and 7, and the following signal `sig` includes 0, 1, 1, 0, 1, 1, 1, 0:
1229
+
1230
+ ```ruby
1231
+ [8].inner sig: _b01110110
1232
+ ```
1233
+
1234
+ #### Let us build HDLRuby enumerators.
1235
+
1236
+ Like Ruby builds enumerators using the `each` method and its derivates (`each_with_index` and so on,) HDLRuby uses the `seach` method to build its enumerator. For example, an enumerator over the bits of signal `sig` will be built as follows: `sig.seach`, and if you want an enumerator with index, just like Ruby: `sig.seach_with_index`, and so on. Then, an enumerator can be executed when create by providing the block that will be applied on each element like in ruby, or executed later using the `seach` method again. E.g., the following sequencer code first sums the bits of signal sig at once, then do it again later with another enumerator previously stored in the Ruby variable `another_enumerator`:
1237
+
1238
+ ```ruby
1239
+ input :clk, :start
1240
+
1241
+ [8].inner sig: _b01110110
1242
+ [4].inner num0, :num1
1243
+
1244
+ sequencer(clk,start) do
1245
+ another_enum = sig.seach
1246
+ num0 <= 0
1247
+ num1 <= 0
1248
+ sig.seach {|b| num0 <= num0 + b }
1249
+ sig <= _hCD
1250
+ another_enum.seach {|b| num1 <= num1 + b }
1251
+ end
1252
+ ```
1253
+
1254
+ That's all that we will explain here, the remaining is exactly like Ruby. Moreover, almost all the methods related to enumerable and enumerator have been reimplemented in HDLRuby using the prefix `s`, so please refer to the main documentation of HDLRuby about [enumerators](README.md#enumerator) for details. Let us just see a few examples:
1255
+
1256
+ * Count the number of 1 bit of a signal - again - but within an infinite loop, and with a much shorter code using the HDLRuby version of the Ruby enumerator method `reduce`:
1257
+
1258
+ ```ruby
1259
+ input :clk,:start
1260
+ [8].input :sin
1261
+ [4].output :sout
1262
+
1263
+ sequencer(clk,start) do
1264
+ sloop do
1265
+ sout <= sin.sreduce(_h0,&:+)
1266
+ end
1267
+ end
1268
+ ```
1269
+
1270
+ Here are a couple of remarks about this code:
1271
+
1272
+ - You may have noticed that the initial value of reduce is set to a 4-bit 0 (`_h0`, since a hexadecimal digit is 4-bit). If this value is not set, the data type of the elements will be used for the additions, in this case, 1-bit (and a 1-bit addition is actually an exclusive or).
1273
+
1274
+ - While the count of the bits is a single-line statement, it is executed like a usual sequencer loop and therefore requires 8 clock cycles for completing.
1275
+
1276
+ * Fill array `ar` from input `sin` one byte per cycle then sort it and outputs its content one byte per cycle.
1277
+
1278
+ ```ruby
1279
+ input :clk,:start
1280
+ [8].input :sin
1281
+ [8].output :sout
1282
+
1283
+ bit[8][-16].inner :ar
1284
+
1285
+ sequencer(clk,start) do
1286
+ 16.stimes {|i| ar[i] <= sin }
1287
+ ar.ssort
1288
+ 16.stimes {|i| sout <= ar[i] }
1289
+ end
1290
+ ```
1291
+
1292
+ In this example, `16.stimes` generates an enumerator over the `0..7` range, and is a way to build an enumerator from an integer value.
1293
+
1294
+ * Apply a 4-point FIR filter over an array obtained from input signal `sin` with 0-padding at the beginning and output the result to `sout`
1295
+
1296
+ ```ruby
1297
+ input :clk,:start
1298
+ [8].input :sin
1299
+ [8].output :sout
1300
+
1301
+ bit[8][-4].inner coefs: [_h01,_h05,_h0A,_hFE]
1302
+ bit[8][-16].inner : ar
1303
+
1304
+ sequencer(clk,start) do
1305
+ 16.stime {|i| ar[i] <= sin }
1306
+ sout <= ([_h00]*3).seach + ar.seach).seach_slice(4).sreduce(_h00) do |a,b,c,d|
1307
+ a*coefs[0] + b * coefs[1] + c * coefs[2] + d * coefs[3]
1308
+ end
1309
+ end
1310
+ ```
1311
+
1312
+ In this example, `[_h00]*3` builds an array of three 8-bit zeros for the padding, `seach` creates the iterators over this padding. This iterator is added to one over `ar` which creates a global iterator over them all. `seach_slice` and `sreduce` work the same way their Ruby equivalent `each_slice` and `reduce` do.
1313
+
1314
+
1315
+
1316
+ #### Create your enumerator.
1317
+
1318
+ The method `seach` and its derivates make it possible to iterate over signals or other enumerable objects. But you can also create your enumerator that can iterate on whatever you want and the way you want. For that purpose, please use the following command:
1319
+
1320
+ ```ruby
1321
+ <variable> = senumerator(<element type>, <number of elements>) <block>
1322
+ ```
1323
+
1324
+ Where `block` is a block of sequencer code for accessing an element by index. For example, the following defines the enumerator named `enum0` enumerating over the 100 first square numbers in 32-bit:
1325
+
1326
+ ```ruby
1327
+ enum0 = senumerator(bit[32],100) {|i| i*i }
1328
+ ```
1329
+
1330
+ It is also possible to put a sequencer control statement in the access block, for example, the following enumerator accesses each bit of signal x waiting for 2 cycles between each access:
1331
+
1332
+ ```ruby
1333
+ [128].input :x
1334
+ enum1 = senumerator(bit,x.width) do |i|
1335
+ step
1336
+ step
1337
+ x[i]
1338
+ end
1339
+ ```
1340
+
1341
+ ---
1342
+
1343
+ Now you know:
1344
+
1345
+ * You create enumerators.
1346
+
1347
+ * How to use enumerators for easily building complex algorithms.
1348
+
1349
+ * How to create your enumerators.
1350
+
1351
+ And with that, you may be able to build sequential hardware almost as easily as software. But why bother? Simply because the performance of the resulting hardware is much better than that of software, at least in terms of energy and/or power consumption.
1352
+
1353
+ But this is not all: contrary to software, hardware is inherently parallel, a property that has not been considered at all yet. From now on, that will be the main topic of this tutorial. We will start slowly by studying what happens when there are several sequencers.
1354
+
1355
+
1356
+
1357
+ #### 3.2.4. What happens when there are several sequencers?
1358
+
1359
+
1360
+ #### General considerations
1361
+
1362
+ You can have as many sequencers as you want in a module, and of course, you can have as many modules as you want in a circuit description. In total, all these sequencers will run independently of one another and in parallel. Furthermore, these parallel sequencers can be controlled by the same clock and start signals or by different ones.
1363
+
1364
+ As long as the sequencers do not access the same signals, they will not interfere with one another and what has been explained up to now is enough. However, it is common that the various parts of a circuit cooperate, and therefore exchange information, i.e., access common signals. There are two ways to handle access to the same signal from several sequencers...
1365
+
1366
+ #### The easy way: stick to a simple rule
1367
+
1368
+ There is a simple rule to follow in hardware design to avoid any trouble when accessing a signal:
1369
+
1370
+ > A signal can be read from anywhere at any time, but only one component (e.g., a port connection or a sequencer) can write to a given signal.
1371
+
1372
+ This rule is not absolute as you will see in the [hard way](#the-hard-way-arbitrating-between-writes-to-signals) to access signals, but it has the advantage of not requiring any additional hardware construct to be implemented. Hence, we do recommend following this rule as much as possible.
1373
+
1374
+ Let us see an example for understanding how several sequencers can interact while following this rule. First, lets us consider a sequencer that increases periodically a value and sends it to another sequencer that will count the number of bits of this value and tells the first one to proceed with the increase:
1375
+
1376
+ ```ruby
1377
+ system :bit_pong do
1378
+ inner :clk,:start
1379
+ inner :req, :ack
1380
+ [4].inner :value
1381
+ [3].inner :count
1382
+
1383
+ sequencer(clk,start) do
1384
+ req <= 0
1385
+ value <= 0
1386
+ sloop do
1387
+ req <= 1
1388
+ swhile(ack != 1)
1389
+ value <= value + 1
1390
+ req <= 0
1391
+ end
1392
+ end
1393
+
1394
+ sequencer(clk,start) do
1395
+ ack <= 0
1396
+ count <= 0
1397
+ sloop do
1398
+ ack <= 0
1399
+ swhile(req != 1)
1400
+ count <= value.sreduce(_h0,:+)
1401
+ ack <= 1
1402
+ end
1403
+ end
1404
+
1405
+ def clk!
1406
+ !10.ns
1407
+ clk <= 1
1408
+ !10.ns
1409
+ clk <= 0
1410
+ end
1411
+
1412
+ timed do
1413
+ clk <= 0
1414
+ start <= 0
1415
+ clk!
1416
+ start <= 1
1417
+ clk!
1418
+ start <= 0
1419
+ repeat(100) { clk! }
1420
+ end
1421
+
1422
+ end
1423
+ ```
1424
+
1425
+ If you write this code in a file named `bit_pong.rb` and simulate it as follows:
1426
+
1427
+ ```bash
1428
+ hdrcc --sim --vcd bit_pong.rb bit_pong
1429
+ ```
1430
+
1431
+ You will obtain the following kind of time chart:
1432
+
1433
+ ![hruby_simulator.vcd](bit_pong_vcd.png)
1434
+
1435
+ You may notice a detail that is crucial in hardware: the increase of `value` by the first sequencer starts not when `ack` becomes 1, but the next clock cycle. This is the second important rule in hardware design:
1436
+
1437
+ > When a signal is read in a clock-synchronized component (e.g., a sequencer), the real value is not the value of the signal at this exact instant, but the one it has a little while before. This is because hardware implementation of signals takes physical time to propagate.
1438
+
1439
+ With this simple rule and enough training, one can easily and efficiently implement a large range of applications. But a few cases do require several components (e.g., sequencers) to write to the same signal. For example, when a resource must be shared among several components. Let us see what can be done in such cases.
1440
+
1441
+
1442
+ #### The hard way: arbitrating between writes to signals
1443
+
1444
+ The problem when several components write to the same signal is handling the cases where different values are assigned at the *same* time. This requires extra arbitration circuitry that must be described by the designer or generated by the synthesis tool. Since there is no general good way of arbitrating, the first option is the most common, but with HDLRuby's sequencer, there is an intermediate solution that does not require designing a new circuit but still lets the designer control the arbitration. It is this approach that is explained here, for the design from scratch of an arbitration please consult tutorials about advanced RTL design.
1445
+
1446
+ #### The hard way but not so much: shared signals
1447
+
1448
+ The HDLRuby sequencer supports another kind of signal called the *shared signals*. They are inner signals of a module that can be written to by any number of sequencers without causing any trouble. They are declared like any other inner signals, but using the `shared` keyword instead of `inner`. For example, a shared 8-bit signed signal named `sh` can be declared as follows:
1449
+
1450
+ ```Ruby
1451
+ signed[8].shared :sh
1452
+ ```
1453
+
1454
+ After being declared, shared signals can be used exactly like any other inner signals, but without any restriction when written within a sequencer.
1455
+
1456
+ #### But what about the arbitration?
1457
+
1458
+ By default, the arbitration when writing to a shared signal is blunt: the first declared sequencer that writes to this signal is **the only one** that changes its value. The other sequencers' writes are simply ignored. For example, in the following, code, the value of `sh` is 55, whatever the second sequencer does:
1459
+
1460
+ ```ruby
1461
+ signed[8].shared :sh
1462
+
1463
+ sequencer(clk,start) do
1464
+ sh <= 55
1465
+ end
1466
+
1467
+ sequencer(clk,start) do
1468
+ 100.stimes { |i| sh <= i }
1469
+ end
1470
+ ```
1471
+
1472
+ This is not so useful that way, but you can control that by setting which sequencer has exclusive access using the `select` method of a shared signal. For example, in the following code, the first sequencer first writes 55 to `sh` but then gives access to the second one:
1473
+
1474
+ ```ruby
1475
+ signal[8].shared :sh
1476
+
1477
+ sequencer(clk,start) do
1478
+ sh <= 55
1479
+ sh.select <= 1
1480
+ end
1481
+
1482
+ sequencer(clk,start) do
1483
+ 100.stimes { |i| sh <= i }
1484
+ end
1485
+ ```
1486
+
1487
+ With this code, the value of `sh` will be at each clock: 55, 1, 2, 3,...
1488
+
1489
+ This is simple enough, right? However, there is a limitation: only one sequencer can access to the `select` of a shared signal... Otherwise, another layer of arbitration is required and we are back to the beginning. Often, this is all right, but sometimes, complex arbitration between several sequencers and several signals makes things messy.
1490
+
1491
+ #### Enter the global arbiters
1492
+
1493
+ So, you have multiple sequencers fighting with one another for accessing several shared signals, what to do? You need a global arbiter that will decide for everyone. The sequencer library provides such a module, named `arbiter` that is instantiated like any module. However, it does not have any specific port, instead, it must be connected to the shared signals it will arbitrate. For example, an arbiter named `arb01` controlling shared signals `sh0` and `sh1` can be declared as follows:
1494
+
1495
+ ```ruby
1496
+ arbiter(:arb01).(sh0,sh1)
1497
+ ```
1498
+
1499
+ Then, when a sequencer wants to write to either `sh0` or `sh1` it must ask the right by sending 1 to the arbiter, and when the access is done, it must release that right by sending 0 so that it does not monopolize the signal. For example, the following sequencers write in turn to the 2-bit shared signal `pingpong`:
1500
+
1501
+ ```ruby
1502
+ system :pingpong do
1503
+ inner :clk, :start
1504
+
1505
+ [2].shared :pingpong
1506
+ arbiter(:arb).(pingpong)
1507
+
1508
+ sequencer(clk,start) do
1509
+ sloop do
1510
+ arb <= 1
1511
+ pingpong <= 1
1512
+ step
1513
+ arb <= 0
1514
+ end
1515
+ end
1516
+
1517
+ sequencer(clk,start) do
1518
+ sloop do
1519
+ arb <= 1
1520
+ pingpong <= 2
1521
+ step
1522
+ arb <= 0
1523
+ end
1524
+ end
1525
+
1526
+ timed do
1527
+ clk <= 0
1528
+ start <= 1
1529
+ !10.ns
1530
+ clk <= 1
1531
+ !10.ns
1532
+ clk <= 0
1533
+ start <= 0
1534
+ !10.ns
1535
+ repeat(20) { clk <= ~clk ; !10.ns }
1536
+ end
1537
+ end
1538
+ ```
1539
+
1540
+ Now, you should be used to it, so please try to simulate the code above. If you look at the VCD file in detail you will see a lot of signals but not any signal called `pingpong`. This is because shared signals hide underlining hardware that is exposed at simulation. Usually, what is relevant is the output value of the shared signal, which is called `<shared signal name>_out$<number>`. For our `pingpong` it is `pingpong_out$2`. Hence, you should get the following graph:
1541
+
1542
+ ![hruby_simulator.vcd](pingpong0_vcd.png)
1543
+
1544
+ Well, this is not an interesting result: `pingpong` is always 1, what about the second sequencer? There are two reasons for this:
1545
+
1546
+ * By default, an arbiter gives priority to sequencers in order of declaration.
1547
+
1548
+ * When a sequencer is denied access, its write access is ignored and that's all.
1549
+
1550
+ Here, it happens that both sequencers always write to `pingpong` at the exact *same* time and the first one has higher priority. For instance, if the second sequencer were delayed a bit compared to the first one, it would obtain access too:
1551
+
1552
+ ```ruby
1553
+ system :pingpong do
1554
+ inner :clk, :start
1555
+
1556
+ [2].shared :pingpong
1557
+ arbiter(:arb).(pingpong)
1558
+
1559
+ sequencer(clk,start) do
1560
+ sloop do
1561
+ pingpong <= 1
1562
+ arb <= 1
1563
+ step
1564
+ arb <= 0
1565
+ end
1566
+ end
1567
+
1568
+ sequencer(clk,start) do
1569
+ step
1570
+ sloop do
1571
+ pingpong <= 2
1572
+ arb <= 1
1573
+ step
1574
+ arb <= 0
1575
+ end
1576
+ end
1577
+
1578
+ timed do
1579
+ clk <= 0
1580
+ start <= 1
1581
+ !10.ns
1582
+ clk <= 1
1583
+ !10.ns
1584
+ clk <= 0
1585
+ start <= 0
1586
+ !10.ns
1587
+ repeat(20) { clk <= ~clk ; !10.ns }
1588
+ end
1589
+ end
1590
+ ```
1591
+
1592
+ And the simulation result should be:
1593
+
1594
+ ![hruby_simulator.vcd](pingpong1_vcd.png)
1595
+
1596
+ Still, it may be annoying that some writes of a sequencer can be ignored. In such a case, you can use a blocking version of the arbiter called the `monitor`. This module is used like the arbiter, but when write access is required, the sequencer will be blocked until the access is granted. Hence to avoid confusion with the arbiters' syntax, requiring access to a monitor is done by the `lock` method, and releasing it is done by the `unlock` one. For example, with the following code, the expected pingpong exchange will happen even when both sequencers try to write at the same time:
1597
+
1598
+ ```ruby
1599
+ system :pingpong do
1600
+ inner :clk, :start
1601
+
1602
+ [2].shared :pingpong
1603
+ monitor(:arb).(pingpong)
1604
+
1605
+ sequencer(clk,start) do
1606
+ sloop do
1607
+ pingpong <= 1
1608
+ arb.lock
1609
+ arb.unlock
1610
+ end
1611
+ end
1612
+
1613
+ sequencer(clk,start) do
1614
+ sloop do
1615
+ pingpong <= 2
1616
+ arb.lock
1617
+ arb.unlock
1618
+ end
1619
+ end
1620
+
1621
+ timed do
1622
+ clk <= 0
1623
+ start <= 1
1624
+ !10.ns
1625
+ clk <= 1
1626
+ !10.ns
1627
+ clk <= 0
1628
+ start <= 0
1629
+ !10.ns
1630
+ repeat(20) { clk <= ~clk ; !10.ns }
1631
+ end
1632
+ end
1633
+ ```
1634
+
1635
+ As seen in the example, since the monitors locks processes, no `step` is required, and the simulation result should be:
1636
+
1637
+ ![hruby_simulator.vcd](pingpong2_vcd.png)
1638
+
1639
+ > __WARNING__: while with an arbiter it was of no importance, with a monitor the lock must be made *after* the shared signal is written, otherwise this value will be taken into account one cycle later.
1640
+
1641
+ By the way, why does HDLRuby provide the arbiters if the monitors do a better job? For performance reasons first: using an arbiter produces a smaller and faster circuit. Also, it is sometimes better not to block a sequencer: for example, maybe you want to write to a shared signal if possible, and if not do something else. this can be done as follows:
1642
+
1643
+ ```ruby
1644
+ sequencer(clk,start) do
1645
+ arb <= 1
1646
+ sif(sh.select == arb.cur_priority) { sh <= 2 }
1647
+ selse { <do something else> }
1648
+ arb <= 0
1649
+ end
1650
+ ```
1651
+
1652
+ In the code above, the sequencer checks if it has access by comparing the shared signal selection with its priority (obtained by `arb.cur_priority`).
1653
+
1654
+ #### I do not like the priority rule of the arbiter / monitor
1655
+
1656
+ Indeed, sometimes we need to define our priority rules. This can be done when instantiating an arbiter or a monitor in two possible fashions:
1657
+
1658
+ * You can provide your priority table as an array containing the declaration order of the sequencers in order of priority, e.g., for reversing the priority in our example:
1659
+
1660
+ ```ruby
1661
+ arbiter(:arb,[1,0]).(pingpong)
1662
+ ```
1663
+
1664
+ * You can provide your policy as a chunk of HDLRuby code whose argument is the declaration order of a sequencer and that returns true if the sequencer is granted priority. Here is an example that switches priority at each acquirement:
1665
+
1666
+ ```ruby
1667
+ inner switch: 1
1668
+ arbiter(:arb) { |i| switch <= ~switch ; i == switch }.(pingpong)
1669
+ ```
1670
+
1671
+
1672
+ ---
1673
+
1674
+ Now you know:
1675
+
1676
+ * That sequencers run in parallel.
1677
+
1678
+ * How to synchronize them.
1679
+
1680
+ * How to share resources between them.
1681
+
1682
+ * How to make intricate deadlocks and how to escape them using arbiters and monitors... Just to make even more intricate ones (trust me with that).
1683
+
1684
+ * And...
1685
+
1686
+ #### That's all for this section!
1687
+
1688
+ Congratulations! You know everything about describing a working circuit and using your knowledge of software design, I am sure you can already do great things.
1689
+
1690
+ But in hardware, you can go much more into the details for even more performance. So let us go to the more advanced parts of this tutorial.
1691
+
1692
+
1693
+
1694
+
1695
+ ## 4. How to add parallelism to your algorithms
1696
+
1697
+ In software, when you want to do parallelism you usually need specific libraries because the model of computation of software is inherently sequential. But this is not the case with the HDLRuby sequencers because:
1698
+
1699
+ * [A sequencer is not a program](#41-a-sequencer-is-not-a-program)
1700
+
1701
+ * [Adding parallelism is genuine](#42-parallel-control-statements-and-enumerators)
1702
+
1703
+ ### 4.1. A sequencer is not a program
1704
+
1705
+ We said earlier that hardware is inherently parallel and we showed that multiple sequencers run concurrently by default. However, you may want to increase parallelism within an algorithm too: after all, it is what recent processors do, so why not hardware?
1706
+
1707
+ To tell the truth, we lied a little when we said that the sequencers were "executing" statements in order. They do not execute anything and statements are not *instructions*. Rather, they are *instantiations* of circuits performing basic computations, and the sequencer only schedules when each circuit is to be used. For example, the following code describes two 3-value addition circuits selected respectively at the first and second clock cycles:
1708
+
1709
+ ```ruby
1710
+ sequencer(clk,start) do
1711
+ d <= a + b + c
1712
+ step
1713
+ z <= u + v + w
1714
+ end
1715
+ ```
1716
+
1717
+ Now, what about the following:
1718
+
1719
+ ```ruby
1720
+ sequencer(clk,start) do
1721
+ d <= a + b + c
1722
+ z <= u + v + w
1723
+ end
1724
+ ```
1725
+
1726
+ ... Yes, as said early, when there is no control statement or step, the clock does not advances. That means that both adder circuits are selected at the same time and do "run" in parallel. Then what about the following:
1727
+
1728
+ ```ruby
1729
+ sequencer(clk,start) do
1730
+ d <= a + b + c
1731
+ z <= u + v + d
1732
+ ```
1733
+
1734
+ ... This time, `z` requires the value of `d`, but both circuits will again be selected at the same time. However, the output of the first one is connected to the third input of the second one: since they are combinatorial, waiting a little bit is enough for obtaining the right `d` input for computing `z`.
1735
+
1736
+ > __WARNING__: in hardware design with HDLRuby (and with all similar languages like Verilog VHDL or VHDL,) it is assumed that a clock is slow enough for the relevant combinatorial circuits to complete computation before the next cycle. If this is not the case, the resulting circuits will not function properly. Fortunately, the synthesis frameworks usually provide tools for verifying these timings.
1737
+
1738
+
1739
+ ### 4.2. Parallel control statements and enumerators
1740
+
1741
+ Sequencers provide many constructs for easy control and enumeration. However, those constructs are sequential by construction. What if you would like to do the same in parallel? This is possible, but there is an important restriction:
1742
+
1743
+ > __WARNING__: there cannot be any loop within a parallel section of an algorithm.
1744
+
1745
+ #### 4.2.1. Parallel conditionals
1746
+
1747
+ With that being said, even without loops you can do a lot of things. First, let us see parallel conditional and case statements. We already presented the sequential versions: `sif` and `scase`. The parallel versions have the same syntax but use an `h` instead of an `s`:
1748
+
1749
+ * `hif(<condition>) <block>`: executes `block` if `condition` is true (i.e., different from 0).
1750
+
1751
+ - `helsif(<condition>) <block>`: executes `block` if the conditions of the previous `hif` and `helsif` are false and if the current `condition` is true.
1752
+
1753
+ * `helse <block>`: executes `block` if the conditions of the previous `hif` and `helsif` are false (i.e., equals 0). It is also used for giving the default block for the `hcase` statement (see below).
1754
+
1755
+ * `hcase(<choice>)`: initializes a multiple case statement.
1756
+
1757
+ > __WARNING__: these are parallel statements, hence their blocks **CANNOT** contain any sequential statement like `step`, `sif`, `sloop`, or even `sbreak`.
1758
+
1759
+ However, they have the advantage of not waiting for the next clock cycle, hence high-speed algorithms can be devised when using them. For example, the following assigns to `d` the max of signals `u`, `v`, and `w` in less than one cycle:
1760
+
1761
+ ```ruby
1762
+ sequencer(clk,start) do
1763
+ hif(u > v) do
1764
+ hif(u > w) { d <= u }
1765
+ helse { d <= w }
1766
+ end
1767
+ helsif(v > w) { d <= v }
1768
+ helse { d <= w }
1769
+ ...
1770
+ end
1771
+ ```
1772
+
1773
+ #### 4.2.2. Yes, but without loops, we are quite limited no?
1774
+
1775
+ There is a whole range of computations that do not require any loop: the *dataflow* computations. Indeed, more often than not, in software programs, the loops like `for` or `while` are only there for ease of design or conciseness of code, and do not have any algorithmic purpose. For example, if you want to pick up the max value of a fixed-sized array, you do not need a loop for that... Heh, we just did that in the previous section (just replace `u`, `v`, and `w` with `ar[0]`, `ar[1]`, and `ar[2]`).
1776
+
1777
+ Still, the code of the previous section was only for three variables, and the code was already a bit messy. Fortunately, you can use Ruby for this purpose... Do you remember? In HDLRuby, Ruby code is executed at compile time, so Ruby loops or iterators can be used to produce complex hardware descriptions cleanly. However...
1778
+
1779
+ > __WARNING__: Ruby code itself does not generate nor interact with hardware, for obtaining real hardware description you **must** put HDLRuby code within the Ruby loops.
1780
+
1781
+ Also:
1782
+
1783
+ > __WARNING__: Ruby iterators **do not** apply on HDLRuby construct. But, you can convert an HDLRuby signal to a Ruby array for applying Ruby iterators on it. For that purpose, please use the `to_a` method.
1784
+
1785
+ Let us wrap everything we said with a sequencer selecting in parallel the largest element of an array signal of sixteen 8-bit elements:
1786
+
1787
+ ```ruby
1788
+ system :maxxer do
1789
+ inner :clk, :start
1790
+
1791
+ bit[8][-8].inner ar: [ _h05, _hA3, _h54, _h2F, _h33, _hE4, _h6B, _h1C ]
1792
+ bit[8].inner :ar_max
1793
+
1794
+ sequencer(clk,start) do
1795
+ ar_max <= 0
1796
+ ar.to_a.each do |e|
1797
+ hif(ar_max < e) { ar_max <= e }
1798
+ end
1799
+ end
1800
+
1801
+ timed do
1802
+ clk <= 0
1803
+ start <= 1
1804
+ !10.ns
1805
+ clk <= 1
1806
+ !10.ns
1807
+ clk <= 0
1808
+ start <= 0
1809
+ !10.ns
1810
+ repeat(20) { clk <= ~clk ; !10.ns }
1811
+ end
1812
+ end
1813
+ ```
1814
+
1815
+ The simulation result should be:
1816
+
1817
+ ![hruby_simulator.vcd](maxxer_vcd.png)
1818
+
1819
+ As promised, the max is obtained at the first cycle of the sequencer execution!
1820
+
1821
+ However, the code given below will not produce anything other than setting `ar_max` to 0, because the Ruby `max` method does not use any HDLRuby code:
1822
+
1823
+ ```ruby
1824
+ sequencer(clk,start) do
1825
+ ar_max <= 0
1826
+ ar.to_a.max
1827
+ end
1828
+ ```
1829
+
1830
+ So, you can use `each` but not `max`... Why? Because the `max` method does the comparison between elements at HDLRuby compile time and does not generate any hardware. More generally:
1831
+
1832
+ > If the method iterates without any **implicit** computation, then it can be used for parallel HDLRuby description.
1833
+
1834
+ Yes, this is becoming confusing because it is hard to know what Ruby does when executing its methods. Unfortunately, there is nothing to help with that yet, but we hope to clarify things in future versions of HDLRuby. For now, if unsure, it is recommended to stick to the following Ruby methods:
1835
+
1836
+ * `each`, `each_cons`, `each_slice`
1837
+
1838
+ * `reduce` (or `inject`)
1839
+
1840
+ * `zip`
1841
+
1842
+
1843
+ ---
1844
+
1845
+ Now you know:
1846
+
1847
+ * How to use Ruby code for concisely describing parallel HDLRuby code.
1848
+
1849
+
1850
+ #### 4.2.3. Parallel is faster, so why do sequential computations?
1851
+
1852
+ Why parallel dataflow computations are indeed faster than sequential ones, in theory, they also have some drawbacks.
1853
+
1854
+ * First, not every algorithm can be implemented by a dataflow computation. For example, the initial sequencer example that counts the clock cycles is necessarily sequential. More generally, any algorithm that has as input or output a sequence of values must be sequential. But that is not all:
1855
+
1856
+ > Any algorithm that processes data of arbitrary size requires at least partly some sequential computation.
1857
+
1858
+ For example, how can you process a text whose size is unknown without any sequential computation... You can say, OK, let us first look for the end of the text and then we can process it in parallel but wait... Looking for this end will require some sequential processing, e.g., looking at one character after the other if it is the end (EOF).
1859
+
1860
+ * Second, parallel implementation of algorithms requires much more chip area than sequential ones. This is expensive, and there are some physical limitations with the size of a chip.
1861
+
1862
+ * Third, while faster in theory, parallel implementations may end slower than sequential ones in practice. This is because, the large a combinatorial circuit is, the longer its delays are.
1863
+
1864
+
1865
+ Now, the big question is: how do we know which part of our circuit would better be parallel?
1866
+
1867
+ > Unfortunately, the best answer is the designer's experience.
1868
+
1869
+ Indeed, some design tools can choose for yourself, but the solution they give may not match your expectations. For HDLRuby, the idea is to let the designer decide but help him with as easy to use as possible sequential and parallel constructs.
1870
+
1871
+
1872
+ #### That's all for this section!
1873
+
1874
+ That was short this time because almost all have been already said in the previous sessions. But now it is time to go past the sequencers and dive into real RTL design.
1875
+
1876
+
1877
+
1878
+ ## 5. Toward lower level hardware design: the processes
1879
+
1880
+ Why sequencers are easy to use with a software mindset, they are implemented on top of a control circuit that schedules their operations. While this circuit does not represent much overhead, you may want top-notch optimizations, or simply you may want more precise control over what hardware will be produced. It is therefore time to leave the comfort of the sequencer and study genuine hardware description constructs: the *processes*. Things will become more difficult, but here is an attempt to do it progressively:
1881
+
1882
+ * [What are the processes?](#51-what-are-the-processes)
1883
+
1884
+ * [What are the connection processes?](#52-what-are-the-connection-processes)
1885
+
1886
+ * [What are the combinatorial processes?](#53-what-are-the-combinatorial-processes)
1887
+
1888
+ * [What are the clocked processes?](#54-what-are-the-clocked-processes)
1889
+
1890
+ * [What about the control statements in processes?](#55-what-about-the-control-statements-in-processes)
1891
+
1892
+ * [How to share signals in processes?](#56-how-to-share-signals-in-processes)
1893
+
1894
+
1895
+ ### 5.1. What are the processes?
1896
+
1897
+ First, we must make things clear:
1898
+
1899
+ > Processes in hardware has very little (nothing?) to do with any kind of software process.
1900
+
1901
+ In hardware, a process is a list of data flow statements that are activated (we would say *executed* if they were software instructions) on a common condition. Depending on the activation condition, there are three kinds of processes:
1902
+
1903
+ * The connection processes (in HDLRuby they are considered to be processes)
1904
+
1905
+ * The combinatorial processes
1906
+
1907
+ * The clocked processes
1908
+
1909
+ We also saw earlier the timed processes, they are indeed kinds of processes, but they do not represent hardware and are therefore not considered in this section.
1910
+
1911
+ We will detail these three kinds of hardware processes in further sections, but there is another property to discuss before: that is the way the data flow statements are scheduled within a process. Like with the sequencer between clock cycles, everything is parallel, but unlike it, there are two possible ways for handling dependencies:
1912
+
1913
+ * Blocking assignments scheduling where the result of a statement can be used directly as input of the following statements. Such processes are declared with the `seq` keyword.
1914
+
1915
+ * Non-blocking assignments where all the statements are independent of each other. Such processes are declared with the `par` keyword.
1916
+
1917
+ For example, if we assume that the initial value of `x` is 1 and that the initial value of `y` is 0, when the following process is activated, the value of `z` will become 4:
1918
+
1919
+ ```ruby
1920
+ seq(<activation condition>) do
1921
+ y <= x + 1
1922
+ z <= y + 2
1923
+ end
1924
+ ```
1925
+
1926
+ Whereas, with the following process, `z` will become 2, i.e., it will be the result of the addition of the initial value of `y` which is still 0:
1927
+
1928
+ ```ruby
1929
+ par(<activation condition>) do
1930
+ y <= x + 1
1931
+ z <= y + 2
1932
+ end
1933
+ ```
1934
+
1935
+ > There is a couple of caveats with the non-blocking processes. First, only the last assignment prevails when a signal is assigned multiple times. For example, with the following process, assuming the same initial values for x and y as before, z will become 3 and not 2:
1936
+
1937
+ ```ruby
1938
+ par(<activation condition>) do
1939
+ y <= x + 1
1940
+ z <= y + 2
1941
+ z <= y + 3
1942
+ end
1943
+ ```
1944
+
1945
+ > Second, the order of statements assigning to different values does not affect the result. For example, the following process is identical to the previous one:
1946
+
1947
+ ```ruby
1948
+ par(<activation condition>) do
1949
+ z <= y + 2
1950
+ z <= y + 3
1951
+ y <= x + 1
1952
+ end
1953
+ ```
1954
+
1955
+ > But this one is different (we switched two assignments to the same signal):
1956
+
1957
+ ```ruby
1958
+ par(<activation condition>) do
1959
+ y <= x + 1
1960
+ z <= y + 3
1961
+ z <= y + 2
1962
+ end
1963
+ ```
1964
+
1965
+ Now let us see the different kinds of processes in detail.
1966
+
1967
+
1968
+ ### 5.2. What are the connection processes?
1969
+
1970
+ We already mentioned the connections at the beginning of this tutorial about how to [reuse circuits](#circuit-use). Each connection is considered to be a standalone process. Such a process is continuously activated, and its left value is *always* equal to its right value.
1971
+
1972
+ Since the connection processes are one-statements only, they do not require a `seq` or `par` keyword. For example, the following module includes two connections to its output ports:
1973
+
1974
+ ```ruby
1975
+ system :a_system_with_cnx do
1976
+ [8].input :x, :y
1977
+ [8].output :u,:v
1978
+
1979
+ u <= x + y
1980
+ v <= x - y
1981
+ end
1982
+ ```
1983
+
1984
+ ### 5.3. What are the combinatorial processes?
1985
+
1986
+ These processes are declared with a `seq` or `par` keywords with a list of signals as activation conditions. They have activated each time one or more signals of their activation condition changed value. For example, the following process will only be activated when `x` or `y` changes value, but not when `z`, `u` or `v` does:
1987
+
1988
+ ```ruby
1989
+ par(x,y) do
1990
+ u <= x + z
1991
+ v <= y - z
1992
+ end
1993
+ ```
1994
+
1995
+ Another kind of combinatorial process is activated each time a signal its computations depend on changes in value. They are declared with the `seq` or `par` keywords without any activation condition. For example, the following process will be activated when `x`, `y`, or `z` change value:
1996
+
1997
+ ```ruby
1998
+ par do
1999
+ u <= x + z
2000
+ v <= y - z
2001
+ end
2002
+ ```
2003
+
2004
+ Here is a more practical example: a circuit that computes the 8-bit checksum of an input 32-bit vector.
2005
+
2006
+ ```ruby
2007
+ system :checksum do
2008
+ [32].input :x
2009
+ [32+8].output :z
2010
+
2011
+ [8].inner :sum
2012
+
2013
+ seq do
2014
+ sum <= x[31..24] ^ x[23..16] ^ x[15..8] ^ x[7..0]
2015
+ z <= [x,sum]
2016
+ end
2017
+ end
2018
+ ```
2019
+
2020
+ In this code, the statement `z <= [x,sum]` concatenates the value of `x` with the value of `sum` using the concatenation operator `[]`.
2021
+
2022
+ > Please note that the concatenation operator is different from the access operator even if they use the same characters: the syntax `<sig>[<integer or range]` is the access operator, and the syntax `[<list of signals or values>]` is the concatenation operator.
2023
+
2024
+ You can check this module with the following simulation environment:
2025
+
2026
+ ```ruby
2027
+ system :checksum_sim do
2028
+ [32].inner :x
2029
+ [40].inner :z
2030
+
2031
+ checksum(:checksumI).(x,z)
2032
+
2033
+ timed do
2034
+ x <= 0
2035
+ !10.ns
2036
+ x <= _h34F4AC68
2037
+ !10.ns
2038
+ end
2039
+ end
2040
+ ```
2041
+
2042
+ The simulation result should be:
2043
+
2044
+ ![hruby_simulator.vcd](checksum_vcd.png)
2045
+
2046
+ That is to say that the checksum of `x` is `04` appended at the end of `z`.
2047
+
2048
+ #### Combinatorial processes that cannot become combinatorial circuits.
2049
+
2050
+ In the final hardware, a circuit whose output signals are also used as inputs is **not** a combinatorial circuit, and may not be a viable circuit at all. Such non-combinatorial circuits are used for making memory points (e.g., registers), or local synchronization (e.g., C elements). It is possible to describe such an element with a combinatorial process: it is enough to make a signal dependent on itself in a process. For example, for both combinatorial processes, signals `x` and `u` depend on themselves, and are therefore not describing combinatorial circuit:
2051
+
2052
+ ```ruby
2053
+ par do
2054
+ x <= x + 1
2055
+ end
2056
+
2057
+ seq do
2058
+ u <= v + 1
2059
+ v <= u + w
2060
+ end
2061
+ ```
2062
+
2063
+ > __WARNING__: while such processes are accepted in HDLRuby (as well as in other hardware description languages like Verilog HDL), it is strongly recommended **not** to use them because the behavior of the resulting circuit is hard to predict. If you want to use memory elements, you should use the clocked processes instead.
2064
+
2065
+
2066
+ ### 5.4. What are the clocked processes?
2067
+
2068
+ These processes are declared with the `par` or `seq` keyword with a list of signal edges as an activation condition. There are two kinds of signal edges:
2069
+
2070
+ * The positive edges indicate changes in single-bit signals from 0 to 1. They are noted as follows: `sig.posedge` where `sig` is a single-bit signal.
2071
+
2072
+ * The negative edges indicate changes in single-bit signals from 1 to 0. They are noted as follows: `sig.negedge` where `sig` is a single-bit signal.
2073
+
2074
+ These processes are usually used for describing parts of circuits that are synchronized by a global clock. This includes the cases where you want to memorize a value.
2075
+
2076
+ Here is a simple example for illustrating this kind of process:
2077
+
2078
+ ```ruby
2079
+ system :clock_counter do
2080
+ input :clk, :run
2081
+ [32].output :count
2082
+
2083
+ par(clk.posedge) do
2084
+ hif(run) { count <= count + 1 }
2085
+ helse { count <= 0 }
2086
+ end
2087
+ end
2088
+
2089
+ system :clock_counter_sim do
2090
+ inner :clk, :run
2091
+ [32].inner :count
2092
+
2093
+ clock_counter(:clock_counterI).(clk,run,count)
2094
+
2095
+ timed do
2096
+ clk <= 0
2097
+ run <= 0
2098
+ !10.ns
2099
+ clk <= 1
2100
+ !10.ns
2101
+ run <= 1
2102
+ repeat(30) { clk <= ~clk ; !10.ns }
2103
+ end
2104
+ end
2105
+ ```
2106
+
2107
+ The result should be:
2108
+
2109
+ ![hruby_simulator.vcd](clock_counter_vcd.png)
2110
+
2111
+ Did you get what the `clock_counter` circuit does? Yes, it simply counts the number of the positive edge of the signal `clk` when `run` is 1: signal `count` is initialized at 0 when declared, then the process which is activated on each positive edge of `clk` increases this signal by one. If you remember the sequencers, these clocked processes should look somewhat similar. Indeed, sequencers are built upon such processes. However, within processes, the fancy control statements like `sif`, `sloop`, or the enumerators **cannot** be used.
2112
+
2113
+ As seen in this code, a previously seen control statement has been used: the `hif`. Yes, this statement was used for parallel computation in sequencers... But more generally, are there any other control statements that can be used in processes?
2114
+
2115
+ ### 5.5. What about the control statements in processes?
2116
+
2117
+ There is not much to say in this section... Because all have been said in the section about [parallelism in sequencer](#parallelism). The fact is all the constructs described in this previous section were initially designed for the processes, and by extension could also be used in sequencers.
2118
+
2119
+ These statements can be used for both clocked processes and combinatorial processes. For example, the following is a combinatorial circuit that implements a simple ALU (Arithmetic and Logic Unit), able to perform the addition, the subtraction, the bitwise AND, and the bitwise OR of the two input values, while the operation is selected by a third input signal.
2120
+
2121
+ ```ruby
2122
+ system :alu do
2123
+ [8].input :x,:y
2124
+ [2].input :opr
2125
+ [8].output :z
2126
+
2127
+ par do
2128
+ hcase(opr)
2129
+ hwhen(0) { z <= x + y }
2130
+ hwhen(1) { z <= x - y }
2131
+ hwhen(2) { z <= x & y }
2132
+ hwhen(3) { z <= x | y }
2133
+ end
2134
+ end
2135
+ ```
2136
+
2137
+ This circuit is combinatorial, hence every time `x`, `y`, or `opr` change value, `z` will also instantaneously change (in reality, there is a small physical delay that is not simulated).
2138
+
2139
+ You can check it as follows:
2140
+
2141
+ ```ruby
2142
+ system :alu_sim do
2143
+ [8].inner :x, :y
2144
+ [2].inner :opr
2145
+ [8].inner :z
2146
+
2147
+ alu(:aluI).(x,y,opr,z)
2148
+
2149
+ timed do
2150
+ x <= _h2A
2151
+ y <= _h31
2152
+ opr <= 0
2153
+ !10.ns
2154
+ opr <= 1
2155
+ !10.ns
2156
+ opr <= 2
2157
+ !10.ns
2158
+ opr <= 3
2159
+ !10.ns
2160
+ end
2161
+ end
2162
+ ```
2163
+
2164
+ The result should be:
2165
+
2166
+ ![hruby_simulator.vcd](alu_vcd.png)
2167
+
2168
+ ...So we can do controls with processes, but what about sharing signals?
2169
+
2170
+
2171
+ ### 5.6. How to share signals in processes?
2172
+
2173
+ This was the last feature of the sequencer: signals that can be written to by multiple sequencers. But can we do the same with processes?
2174
+
2175
+ - The short answer is no: the shared signals and the arbiter or monitor constructs can only be used in sequencers.
2176
+
2177
+ - The long answer is yes, there are even two possible ways, but you will have to describe the underlying hardware from scratch.
2178
+
2179
+
2180
+ #### 5.6.1. The first way: use a multiplexer.
2181
+
2182
+ A multiplexer is a circuit that selects one input among several ones to be output. Usually, this is done according to an additional input that indicates the number of the selected input. In HDLRuby, there are two simple ways to describe multiplexers:
2183
+
2184
+ - The use of the `hcase` statements. We had an example previously with the ALU:
2185
+
2186
+ ```ruby
2187
+ system :alu do
2188
+ [8].input :x,:y
2189
+ [2].input :opr
2190
+ [8].output :z
2191
+
2192
+ par do
2193
+ hcase(opr)
2194
+ hwhen(0) { z <= x + y }
2195
+ hwhen(1) { z <= x - y }
2196
+ hwhen(2) { z <= x & y }
2197
+ hwhen(3) { z <= x | y }
2198
+ end
2199
+ end
2200
+ ```
2201
+
2202
+ With this example, all the operations are performed in parallel, and the `hcase` statement selects which result is to be sent to `z` according to the selection signal `opr`.
2203
+
2204
+ - The use of the `mux` operator that has the following syntax: `mux(<selection>,<input 0>,<input 1>, ...)` where `selection` is the selection signal, and the other arguments are the possible inputs. The same ALU circuit can therefore be described as follows:
2205
+
2206
+ ```ruby
2207
+ system :alu do
2208
+ [8].input :x,:y
2209
+ [2].input :opr
2210
+ [8].output :z
2211
+
2212
+ z <= mux(opr, x+y, x-y, x & y, x | y)
2213
+ end
2214
+ ```
2215
+
2216
+ A connection is used instead of a combinatorial process for this second example since there is a single statement.
2217
+
2218
+
2219
+ #### 5.6.2. The second way: use of tri-state bus <a name="tri-state"></a>
2220
+
2221
+ Multiplexers are easy to describe and use, but their size grows exponentially with the number of inputs and linearly with their bit width. Moreover, the number of inputs of a chip is often strongly limited. For all these reasons, an alternative is sometimes used: the tri-state buses. They are signals where several processes or circuits can write a value provided that only one does it at a time and that when not writing the accesses are set in high impedance.
2222
+
2223
+ > Before explaining further the tri-state buses, we need to clarify a possible misconception about hardware values: we often spoke about bits, and you may have assumed that their possible values were 0 or 1 for standard boolean logic. However, in hardware design it is common to use a 4-valued logic that represents the relevant states of a wire in a circuit:
2224
+
2225
+ - `0` for ground level, used for representing `false` or the digit `0`;
2226
+
2227
+ - `1` for power supply level, used for representing `true` or the digit `1`;
2228
+
2229
+ - `Z` for high impedance, used in tri-state buses;
2230
+
2231
+ - `X` for undefined level, usually indicating a problem in the circuit or an uninitialized state.
2232
+
2233
+ Physically, when a signal is set to the `Z` state, it is possible to safely write another value on it, its state becoming this written value. Such a signal is then called a tri-state signal or bus.
2234
+
2235
+ Tri-state signals can be `inner` or `inout`, and as long as there are no simultaneous `0` or `1` writes on them, they can be used freely... In theory. However, in practice, synthesis tools prioritize safety over flexibility and will reject several descriptions that should work. This is highly dependent on the tool you are using, hence we cannot give precise guidelines for the proper use of such signals. Instead, here is an example of tri-state bus usage that should work with a majority of the design tools, please feel free to adapt it to your needs:
2236
+
2237
+ ```ruby
2238
+ system :ram8 do
2239
+ input :clk, :en, :rwb
2240
+ [8].input :addr
2241
+ [8].inout :data
2242
+
2243
+ bit[8][-2**8].inner :content
2244
+
2245
+ [8].inner :tmp
2246
+
2247
+ par do
2248
+ hif(en & rwb) { data <= tmp }
2249
+ helse { data <= _bZZZZZZZZ }
2250
+ end
2251
+
2252
+ par(clk.posedge) do
2253
+ hif(en) do
2254
+ # Read case
2255
+ hif(rwb) { tmp <= content[addr] }
2256
+ helse { content[addr] <= data }
2257
+ end
2258
+ end
2259
+ end
2260
+
2261
+ system :ram8_sim do
2262
+ inner :clk, :en, :rwb
2263
+ [8].inner :addr, :data
2264
+
2265
+ [8].inner :value
2266
+
2267
+ ram8(:ram8I).(clk,en,rwb,addr,data)
2268
+
2269
+ par do
2270
+ hif(en & ~rwb) { data <= value }
2271
+ helse { data <= _bZZZZZZZZ }
2272
+ end
2273
+
2274
+ timed do
2275
+ addr <= 0
2276
+ en <= 0
2277
+ rwb <= 1
2278
+ value <= 0
2279
+ clk <= 0
2280
+ !10.ns
2281
+ clk <= 1
2282
+ !10.ns
2283
+ repeat(256) do
2284
+ en <= 1
2285
+ rwb <= 1
2286
+ clk <= 0
2287
+ !10.ns
2288
+ clk <= 1
2289
+ !10.ns
2290
+ en <= 0
2291
+ addr <= addr + 1
2292
+ value <= value + 1
2293
+ clk <= 0
2294
+ !10.ns
2295
+ clk <= 1
2296
+ !10.ns
2297
+ en <= 1
2298
+ rwb <= 0
2299
+ clk <= 0
2300
+ !10.ns
2301
+ clk <= 1
2302
+ !10.ns
2303
+ en <= 0
2304
+ clk <= 0
2305
+ !10.ns
2306
+ clk <= 1
2307
+ !10.ns
2308
+ end
2309
+ end
2310
+ end
2311
+ ```
2312
+
2313
+ And the simulation result should be:
2314
+
2315
+ ![hruby_simulator.vcd](ram_vcd.png)
2316
+
2317
+ The first module is an 8-bit address 8-bit data RAM with a single data bus for both reading and writing. When the RAM is not read (`en` is 0 or `rwb` is 0), the data bus is set to `Z`. The second module simply accesses the RAM, and for that purpose uses a process specifically for writing to the data bus: when a write operation is performed (`en` is 1 for activating the access to the RAM and `rwb` is set to 0), the value is written to the bus, otherwise, a value is nonetheless written but it is a `Z` value. The coding style of this process is crucial for ensuring synthesis tools will succeed: we cannot guarantee it will work if it is described differently.
2318
+
2319
+
2320
+
2321
+ #### This section may have been quite a leap from the software mindset...
2322
+
2323
+ ---
2324
+
2325
+ But, now you may understand:
2326
+
2327
+ * How to describe low-level hardware with HDLRuby with processes.
2328
+
2329
+ * The existence of three kinds of processes: the connections, combinatorial, and clocked processes.
2330
+
2331
+ * How to use controls in processes.
2332
+
2333
+ * How to write several values to a single signal without the comfort brought by the sequencers and their shared signals.
2334
+
2335
+ * What is a tri-state bus and how to use it.
2336
+
2337
+ With that, you have all the constructs required for describing optimized hardware. Of course, mastering them requires a lot of know-how, but this is out of the scope of this tutorial. To go further, please follow lessons about hardware design using popular hardware description languages like Verilog HDL or VHDL since what you can do with them, you can do it with HDLRuby.
2338
+
2339
+ However, there remains a few major features for more efficient coding inherited from the Ruby language.
2340
+
2341
+
2342
+ ## 6. Not enough? What about genericity, object orientation, metaprogramming, and reflection?
2343
+
2344
+ Let us imagine you spent hours designing a circuit processing 8-bit values for outputting a 16-bit result and you learn that the input should be 16-bit and the output 8-bit. Well, that's not so tough, you just need to modify the data types in your circuit description... **everywhere**. When you just finished, you learn that it would be great if there was also another version of the circuit that could output flags about the result. Ok, so let us make another version of the circuit, and since it is not so different, let us do some copy and paste, and then make modifications for the flags... But whoop! You did not notice that with your modifications you unintentionally modified a few lines of code introducing new bugs... And of course, you realize it after hours of testing.
2345
+
2346
+ Such situations happen all the time in software design... But it is much worse in hardware. Indeed, due to the variety of hardware circuits, it is very common to want to reuse a design with different bit widths. This is why, existing hardware description language support genericity to a certain degree. However, HDLRuby ruby goes further in term of genericity, and also add object-oriented programming, metaprogramming, and reflection concepts to maximize the possibilities of code reuse. From now on, let us detail:
2347
+
2348
+ * [Genericity in HDLRuby](#61-genericity-in-hdlruby)
2349
+
2350
+ * [Object-oriented programming in HDLRuby](#62-object-oriented-programming-in-hdlruby)
2351
+
2352
+ * [Metaprogramming and reflection in HDLRuby](#63-metaprogramming-and-reflection-in-hdlruby)
2353
+
2354
+ ### 6.1. Genericity in HDLRuby
2355
+
2356
+ In HDLRuby, genericity is supported through the *generic module* constructs. These constructs describe circuits like standard module does, but with the addition of parameters that control their content. They are declared as follows:
2357
+
2358
+ ```ruby
2359
+ system :<name> do |<list of parameters>|
2360
+ <some hardware description code>
2361
+ end
2362
+ ```
2363
+
2364
+ The list of parameters is simply a list of variable names that can represent anything, e.g., some values of any type, some data types, or even some chunks of Ruby or HDLRuby code.
2365
+
2366
+ Before being used, a generic module must be specialized by specifying each of its parameters. This is done as follows:
2367
+
2368
+ ```ruby
2369
+ <generic system>(<list of objects>)
2370
+ ```
2371
+
2372
+ Where `generic system` is the name of the generic module to specialize.
2373
+
2374
+ And there is nothing more to explain since there is no restriction with the parameters. Instead, here are a couple of examples. First, a module counting the clock cycles whose counter bit width is a generic parameter:
2375
+
2376
+ ```ruby
2377
+ system :counter do |size|
2378
+ input :clk, :rst
2379
+ [size].output :count
2380
+
2381
+ par(clk.posedge) do
2382
+ hif(rst) { count <= 0 }
2383
+ helse { count <= count + 1 }
2384
+ end
2385
+ end
2386
+ ```
2387
+
2388
+ If you want to instantiate this circuit, you will need first to specialize it. For example, here are two instances, named `counterI8` and `counterI16` that have respectively a 8-bit and a 16-bit output:
2389
+
2390
+ ```ruby
2391
+ counter(8).(:counterI8)
2392
+ counter(16).(:counterI16)
2393
+ ```
2394
+
2395
+ The second example is again a counter of clock cycles but whose full data type of the output is a generic parameter (not only its size), and which has a given value that when reached triggers some hardware code also provided as a generic parameter:
2396
+
2397
+ ```ruby
2398
+ system :counter_ext do |typ,trig,code|
2399
+ input :clk, :rst
2400
+ typ.output :count
2401
+
2402
+ seq(clk.posedge) do
2403
+ hif(rst) { count <= 0 }
2404
+ helse do
2405
+ count <= count + 1
2406
+ hif(count == trig) { code.call(count) }
2407
+ end
2408
+ end
2409
+ end
2410
+ ```
2411
+
2412
+ There is more to unwrap here:
2413
+
2414
+ * `typ` is the data type of the output, and is therefore used as is for declaring `count`
2415
+
2416
+ * `trig` is the value that will activate the hardware provided as a generic parameter.
2417
+
2418
+ * `code` is a chunk of HDLRuby code. To paste this code chunk, the `call` method is used, and for giving access to the `count` signal to this code chunk, it is passed as an argument of the method.
2419
+
2420
+ And here is one instance of this module called `counter_extI`, with a 16-bit signed output, a trigger at value 5, and a triggered code that sets the count to -4:
2421
+
2422
+ ```ruby
2423
+ system :counter_ext_sim do
2424
+ inner :clk,:rst
2425
+ signed[16].inner :count
2426
+ counter_ext(signed[16],5, proc { |count| count <= -4 }).(:counter_extI).(clk,rst,count)
2427
+
2428
+ timed do
2429
+ clk <= 0
2430
+ rst <= 0
2431
+ !10.ns
2432
+ clk <= 1
2433
+ !10.ns
2434
+ clk <= 0
2435
+ rst <= 1
2436
+ !10.ns
2437
+ clk <= 1
2438
+ !10.ns
2439
+ clk <= 0
2440
+ rst <= 0
2441
+ repeat(20) { !10.ns; clk <= ~clk }
2442
+ !10.ns
2443
+ end
2444
+ end
2445
+ ```
2446
+
2447
+ In this code, the syntax `proc <block>` is for creating a chunk of HDLRuby (or Ruby) code from the content of `block`. With this environment, the simulation result should be:
2448
+
2449
+ ![hruby_simulator.vcd](counter_ext_vcd.png)
2450
+
2451
+
2452
+ ### 6.2. Object-oriented programming in HDLRuby
2453
+
2454
+ Genericity allows using a single design for several different circuits but it does not allow modifying this design. This possibility is brought by object-oriented programming and its key concept: [inheritance](#621-inheritance). With it, it is possible to modify afterward a circuit design without touching the original description, to define a new circuit based on an original one without redoing everything from scratch, and to change a single instance of a circuit without influencing the other ones, all that with a simple [construct](#622-modifying-a-module-or-an-instance). Finally, it is possible under certain conditions to [reimplement parts of the code](#623-overriding).
2455
+
2456
+
2457
+ #### 6.2.1 Inheritance
2458
+
2459
+ In HDLRuby, inheritance can be applied to modules. When a module inherits from another one, the latter is called its *superclass module*. By default, a module contains all the ports and all the content of its superclass modules. Then, new ports or content can be added to the later at will. Inheritance is declared as follows:
2460
+
2461
+ ```ruby
2462
+ system :<name>, <list of superclass modules> <block>
2463
+ ```
2464
+
2465
+ Alternatively, the following syntax is also possible:
2466
+
2467
+ ```ruby
2468
+ system :<name> do
2469
+ include <list of superclass modules>
2470
+ <HDLRuby code>
2471
+ end
2472
+ ```
2473
+
2474
+ For example, let us consider the following 8-bit adder/subtracter module:
2475
+
2476
+ ```ruby
2477
+ system :addsub8 do
2478
+ input :s
2479
+ signed[8].input :x, :y
2480
+ signed[8].output :z
2481
+
2482
+ par do
2483
+ hif(s) { z <= x - y }
2484
+ helse { z <= x + y }
2485
+ end
2486
+ end
2487
+ ```
2488
+
2489
+ And let us assume we want to design a new 8-bit adder/subtracter module that also outputs a zero flag. That can be done with inheritance as follows:
2490
+
2491
+ ```ruby
2492
+ system :addsub_zf8, addsub8 do
2493
+ output :zf
2494
+
2495
+ zf <= (z == 0)
2496
+ end
2497
+ ```
2498
+
2499
+ So why is there an alternate syntax with the `include` keyword? This is for easing the mix of genericity and inheritance. For example, the previous circuit was 8-bit only. What if you want to do the same with generic input and output widths? For the initial counter, this is straightforward:
2500
+
2501
+ ```ruby
2502
+ system :addsub do |width|
2503
+ inner :s
2504
+ [width].input :x, :y
2505
+ [width].output :z
2506
+
2507
+ par do
2508
+ hif(s) { z <= x - y }
2509
+ helse { z <= x + y }
2510
+ end
2511
+ end
2512
+ ```
2513
+
2514
+ But for the second one? It needs to transmit its generic parameter to its superclass module, but the parameter is only available within its block description. In such a case, the second syntax is useful:
2515
+
2516
+ ```ruby
2517
+ system :addsub_zf do |width|
2518
+ include addsub(width)
2519
+ output :zf
2520
+
2521
+ zf <= (z == 0)
2522
+ end
2523
+ ```
2524
+
2525
+ #### 6.2.2. Modifying a module or an instance
2526
+
2527
+ Let us assume that you think that after all, the adder-subtracter circuit you described previously should definitively have a `zf` flag for your new project. You can of course create a new module with inheritance for that, but you can also modify afterward the initial `addsub` module using the `open` method as follows:
2528
+
2529
+ ```ruby
2530
+ addsub.open do
2531
+ output :zf
2532
+
2533
+ zf <= (z == 0)
2534
+ end
2535
+ ```
2536
+
2537
+ After that, all the instances that are created from `addsub` will include the `zf` flag computation and output.
2538
+
2539
+ In contrast, you may want that only one instance has the flag. In such a case, it may not be necessary to define a new module for that but just modify this instance only. This is possible using again the `open` method but on the instance. For example with the instance used in the following simulation environment:
2540
+
2541
+ ```ruby
2542
+ system :addsub_sim do
2543
+ signed[8].inner :x, :y, :z
2544
+ inner :zf, :s
2545
+
2546
+ addsub(signed[8]).(:addsubI8).(s,x,y,z)
2547
+
2548
+ addsubI8.open do
2549
+ output :zf
2550
+
2551
+ zf <= (z == 0)
2552
+ end
2553
+
2554
+ zf <= addsubI8.zf
2555
+
2556
+ timed do
2557
+ x <= 0
2558
+ y <= 0
2559
+ s <= 0
2560
+ repeat(3) { !10.ns; x <= x + 1 }
2561
+ repeat(3) { !10.ns; y <= y + 1 }
2562
+ s <= 1
2563
+ repeat(3) { !10.ns; x <= x - 1 }
2564
+ repeat(3) { !10.ns; y <= y - 1 }
2565
+ !10.ns
2566
+ end
2567
+ end
2568
+ ```
2569
+
2570
+ And the simulation result should be:
2571
+
2572
+ ![hruby_simulator.vcd](addsub_vcd.png)
2573
+
2574
+ #### 6.2.3. Overriding
2575
+
2576
+ Let us consider the following circuit that performs an addition a computes four flags:
2577
+
2578
+ ```ruby
2579
+ system :adder_flags do |width|
2580
+ signed[width].input :x, :y
2581
+ signed[width].output :z
2582
+ output :cf, :zf, :sf, :of
2583
+
2584
+ [cf,z] <= x.as(signed[width+1]) + y
2585
+
2586
+ zf <= (z == 0)
2587
+ sf <= z[-1]
2588
+ of <= (x[-1] & y[-1] & ~z[-1]) | (~x[-1] & ~y[-1] & z[-1])
2589
+ end
2590
+ ```
2591
+
2592
+ And let us assume we want to make a circuit that does the same but saturates the addition result between -300 and +300. For that purpose, the computation of `z` must be replaced. This is called `overriding` and for sake of code readability overriding is only permitted for *named sub-sections* of a module. A named subsection is declared as follows:
2593
+
2594
+ ```ruby
2595
+ sub(:<name>) <block>
2596
+ ```
2597
+
2598
+ Where `block` can contain any HDLRuby code apart from the port definition. A named subsection can be overridden anywhere in the code, and also in subclass modules by redefining it. In our previous example, if `adder_flags` is rewritten as follows:
2599
+
2600
+ ```ruby
2601
+ system :adder_flags do |width|
2602
+ signed[width].input :x, :y
2603
+ signed[width].output :z
2604
+ output :cf, :zf, :sf, :of
2605
+
2606
+ sub(:add) do
2607
+ [cf,z] <= x.as(signed[width+1]) + y
2608
+ end
2609
+
2610
+ zf <= (z == 0)
2611
+ sf <= z[-1]
2612
+ of <= (x[-1] & y[-1] & ~z[-1]) | (~x[-1] & ~y[-1] & z[-1])
2613
+ end
2614
+ ```
2615
+
2616
+ Then a new module with saturated addition can be defined as follows using inheritance and overriding:
2617
+
2618
+ ```ruby
2619
+ system :adder_sat_flags do |width|
2620
+ include adder_flags(width)
2621
+
2622
+ sub(:add) do
2623
+ seq do
2624
+ [cf,z] <= x.as(signed[width+1]) + y
2625
+ hif(z<-300) { z <= -300 }
2626
+ hif(z>300) { z <= 300 }
2627
+ end
2628
+ end
2629
+ end
2630
+ ```
2631
+
2632
+ If you wish to see how this adder with saturation works, here is a simple simulation environment for it:
2633
+
2634
+ ```ruby
2635
+ system :adder_sat_flags_sim do
2636
+ signed[16].inner :x, :y, :z
2637
+ inner :cf, :zf, :sf, :of
2638
+
2639
+ adder_sat_flags(16).(:adderI).(x,y,z,cf,zf,sf,of)
2640
+
2641
+ timed do
2642
+ x <= 0
2643
+ y <= 0
2644
+ repeat(3) { !10.ns; x <= x + 1 }
2645
+ repeat(3) { !10.ns; y <= y + 1 }
2646
+ !10.ns
2647
+ x <= 299
2648
+ y <= 0
2649
+ repeat(3) { !10.ns; x <= x + 1 }
2650
+ !10.ns
2651
+ x <= -1
2652
+ y <= -300
2653
+ repeat(3) { !10.ns; x <= x + 1 }
2654
+ !10.ns
2655
+ end
2656
+ end
2657
+ ```
2658
+
2659
+ And the simulation result should be:
2660
+
2661
+ ![hruby_simulator.vcd](adder_sat_flags_vcd.png)
2662
+
2663
+ __Note__: with this circuit, the `of` flag (overflow) is always 0 because there cannot be any overflow for a 10-bit number with saturation at -300, 300.
2664
+
2665
+
2666
+
2667
+ ### 6.3. Metaprogramming and reflection in HDLRuby
2668
+
2669
+ There is not much to say about metaprogramming and reflection because both have been used extensively in this tutorial.
2670
+
2671
+ * Metaprogramming consists in using programs that generate the final code. In HDLRuby, this is done by using Ruby code for generating, passing as an argument, and appending chunks of HDLRuby code. This has been done when presenting how to describe parallel code in sequencers, but this can be used everywhere, e.g., within the process too. Moreover, the possibility to use a chunk of code as a generic argument presented in this section is also metaprogramming.
2672
+
2673
+ * Reflection has been used in several cases, e.g., the `width` method for knowing the bit width of a type or the `open` method. For generally, HDLRuby provides a large number of methods for inspecting and modifying directly the objects of the internal representation. Please consult the documentation of the HDLRuby classes for details about them (good luck, this is low-level coding documentation).
2674
+
2675
+ > __Note__: the sequencers, the shared signals, the arbiter, and the monitor are not native HDLRuby constructs, they have all been implemented with metaprogramming and reflection.
2676
+
2677
+
2678
+ ## 7. What next?
2679
+
2680
+ There are still many aspects of HDLRuby that have not been addressed in this tutorial. For example, finite state machines (FSM) and decoders are crucial hardware components that you should learn about, and which HDLRuby provides specific constructs for easier design. So from now on, please consult the main documentation of HDLRuby, and have a look at the code samples provided in the HDLRuby distribution. They can be copied to your working directory using the following command:
2681
+
2682
+ ```bash
2683
+ hdrcc --get-samples
2684
+ ```