High-speed hardware implementation of fixed and runtime variable window length 1-D median filtersSuplementary results regarding this paper can be found below. Please click here to download the Verilog source codes for regenerating the results of the paper. The article and its source codes can be cited as follows: Nikahd, Eesa, Payman Behnam, and Reza Sameni. "High-speed hardware implementation of fixed and runtime variable window length 1-D median filters." IEEE Transactions on Circuits and Systems II: Express Briefs 63, no. 5 (2016): 478-482. |
---|
Supplementary Results
Nonlinear digital filters play an important role in digital signal processing applications. In this paper, a novel architecture is proposed for the hardware implementation of fixed and run-time variable window length one-dimensional (1-D) median filters. In the proposed architecture, the maximum working clock frequency is almost independent of the median filter window length, while the hardware complexity is proportional to the number of samples in the window. This feature enables the construction of filters with relatively large window lengths with negligible reduction in the maximum clock frequency; while in previous architectures the maximum clock frequency drops significantly as the window length is increased. The benchmark results show the efficiency of the proposed architecture in comparison with state-of-the-art techniques.
The following additional analyses have been carried-out during the revision and are brought here for reviewers' information. However, due to the lack of space only a summary of these results has been added to the paper.
1- Detailed post-synthesis power results for the variable length case
The detailed dynamic and static power consumption ratios are as follows. The power result are extracted from Design Compiler Tool by setting true the "power_preserve_rtl_hier_names" attribute. In Table IV of the paper, only the total power (static plus dynamic) have been added to save space.
MAXIMUM THROUGHPUT, AREA AND POWER RESULTS FOR VARIABLE WINDOW LENGTH ON 45NM CMOS ASIC
Max L |
Max Freq.(MHZ) |
Area (um2) |
Power (micro Watt) |
|||||
8-bit |
16-bit |
8-bit |
16-bit |
8-bit |
16-bit |
|||
Static |
Dyn. |
Static |
Dyn. |
|||||
5 |
1324.2 |
1324.2 |
1029.1 |
1873.2 |
9.2 |
70.2 |
17.9 |
131.1 |
8 |
1309.7 |
1309.7 |
1625.4 |
3086.4 |
15.7 |
113.1 |
29.5 |
212.3 |
20 |
1261.6 |
1255.2 |
3946.3 |
7891.5 |
43.7 |
287.2 |
83.3 |
530.8 |
51 |
1219.7 |
1210.8 |
9614.1 |
18113.6 |
104.2 |
692 |
188.8 |
1298.3 |
100 |
1157.8 |
1148.1 |
19783.1 |
38397.9 |
213.4 |
1410.8 |
385.2 |
2641.7 |
201 |
1089.2 |
1075.2 |
39411.6 |
76469.9 |
435.5 |
2852.2 |
792.7 |
537.2 |
301 |
998.9 |
974.2 |
59099.2 |
110625.4 |
685.3 |
4308 |
1235.6 |
8079 |
600 |
915.8 |
880.1 |
118286.2 |
211306.2 |
1372.8 |
8667.6 |
2467 |
16261.7 |
1001 |
877.1 |
828.4 |
233693.2 |
403147.1 |
2298.2 |
14382.9 |
4101.2 |
26101.5 |
2- FPGA-based implementation
For FPGA implementation, the Quartus II 12.1 from Altera was used as the synthesis tool and an Altera Cyclone IVE FPGA with the following specifications was used as the target device: chip number EP4CE115F29I8L, 114480 logic cells, 3981312 memory bits, 1.0v core voltage. Tables I to V show the results for fixed window length in terms of maximum clock frequency and logic cells with 8-bit and 16-bit word lengths, respectively. As expected the overall trend of the maximum frequencies and area are the same as the ASIC results. Table III repeat these results for variable length filter.
Table I. Maximum Frequency And Area Results For Fixed Window Length And 8-Bit Word Length On Altera Cyclone IVE FPGA
L |
Max Freq.(MHZ) |
Area(um2) |
||||
Moshnyaga[4] |
Chen[10] |
Our arch. |
Moshnyaga[4] |
Chen[10] |
Our arch. |
|
5 |
144.7 |
151.4 |
157.6 |
210 |
201 |
197 |
8 |
127.8 |
141.3 |
145.8 |
354 |
345 |
337 |
20 |
91.8 |
110.7 |
140.5 |
1101 |
919 |
865 |
51 |
55.5 |
73.2 |
129.6 |
2689 |
2483 |
2187 |
100 |
36.1 |
44.3 |
127.3 |
5198 |
4498 |
4389 |
201 |
22.2 |
24.1 |
120.5 |
10229 |
9811 |
8789 |
301 |
15.8 |
16.7 |
116.8 |
16921 |
15623 |
13191 |
600 |
7.6 |
8.1 |
110.2 |
34989 |
33042 |
26393 |
1001 |
4.2 |
4.9 |
107.1 |
57289 |
55375 |
43995 |
Table II. Maximum Frequency And Area Results For Fixed Window Length And 16-Bit Word Length On Altera Cyclone IVE FPGA
L |
Max Freq.(MHZ) |
Area(um2) |
||||
Moshnyaga[4] |
Chen[10] |
Our arch. |
Moshnyaga[4] |
Chen[10] |
Our arch. |
|
5 |
132.9 |
135.8 |
142.2 |
332 |
291 |
319 |
8 |
123.8 |
128.6 |
137.8 |
575 |
476 |
557 |
20 |
88.8 |
99.1 |
121.4 |
1402 |
1102 |
1381 |
51 |
52.9 |
72.9 |
119.7 |
3481 |
2907 |
3464 |
100 |
33.5 |
43.1 |
116.2 |
6929 |
6107 |
6812 |
201 |
20.9 |
22.9 |
111.6 |
15498 |
13028 |
13865 |
301 |
13.3 |
15.8 |
109.7 |
23987 |
20452 |
20811 |
600 |
6.9 |
7.4 |
104.4 |
43987 |
40701 |
41233 |
1001 |
3.9 |
4.2 |
100.38 |
72567 |
68272 |
69282 |
Table III. Maximum Frequency And Area FPGA Results For Variable Window Size, 8-Bit And 16-Bit Word Length
Window-size |
Max Freq.(MHZ) |
Area (Logic Cell) |
||
8-bit |
16-bit |
8-bit |
16-bit |
|
5 |
133.07 |
116.5 |
258 |
491 |
8 |
125.1 |
114.0 |
410 |
785 |
20 |
109.5 |
102.8 |
1050 |
2014 |
51 |
99.38 |
86.6 |
2694 |
5086 |
100 |
89.04 |
78.6 |
5386 |
10179 |
201 |
79.27 |
69.9 |
10763 |
20371 |
301 |
74.6 |
66.5 |
16159 |
30554 |
600 |
65.5 |
56.6 |
32389 |
61332 |
1001 |
60.6 |
52.2 |
53957 |
102563 |
Table IV. Hardware Utilization For Fixed Window Length And 8-bit Word Length On FPGA
Win. Len. |
Total logic elements (%) |
Dedicated logic registers (%) |
5 |
1 |
1 |
8 |
1 |
1 |
20 |
1 |
1 |
51 |
2 |
1 |
100 |
4 |
1 |
201 |
8 |
3 |
301 |
12 |
4 |
600 |
23 |
8 |
1001 |
38 |
14 |
Table V. Hardware Utilization For Fixed Window Length And 16-bit Word Length On FPGA
Win. Len. |
Total logic elements (%) |
Dedicated logic registers (%) |
5 |
1 |
1 |
8 |
1 |
1 |
20 |
1 |
1 |
51 |
3 |
1 |
100 |
6 |
3 |
201 |
12 |
6 |
301 |
18 |
8 |
600 |
36 |
17 |
1001 |
61 |
28 |
3- Detailed hardware utilization
In Comparison with the latest work, Chen[10], our ASIC results show that in the proposed architecture for different sizes, the combinational area occupies about 42%-44% of the total area, non-combinational area takes up 32% and the net interconnections area occupies about 24% -26% of the total area. These results for the Chen [10] are 47%-49%, 24%-26% and 25%-27% respectively. These results are reflected in Tables IV, and V. Considering only the component areas (as noted by one of the reviewers), we have: For different word and window lengths, the average resource utilization ratio of the proposed architecture is as follows: the combinational area occupies about 57% and the noncombinational area takes 43% (27% of which is due to the register chain), of the total component areas. For Chen's architecture, these ratios are 66% and 34%, respectively.
Table IV. Hardware Utilization For Fixed Window Length And 8-bit Word Length On ASIC
Win. Len. |
Comb Area/ Total Area | Non Comb Area/ Total Area | Net Area/ Total Area |
Comb Area/ Total Area [10] |
Non Comb Area/ Total Area [10] |
Net Area/ Total Area[10] |
5 |
44% |
32% |
24% |
49% |
26% |
25% |
8 |
44% |
32% |
24% |
49% |
26% |
25% |
25 |
44% |
32% |
24% |
49% |
26% |
25% |
51 |
44% |
32% |
24% |
49% |
26% |
25% |
100 |
44% |
32% |
24% |
49% |
26% |
25% |
201 |
44% |
32% |
24% |
51% |
24% |
25% |
301 |
44% |
32% |
24% |
51% |
24% |
25% |
600 |
44% |
32% |
24% |
51% |
24% |
25% |
1001 |
44% |
32% |
24% |
51% |
24% |
25% |
Table V. Hardware Utilization For Fixed Window Length And 16-bit Word Length On ASIC
Win. Len. |
Comb Area/ Total Area | Non Comb Area/ Total Area | Net Area/ Total Area |
Comb Area/ Total Area[10] |
Non Comb Area/ Total Area [10] |
Net Area/ Total Area[10] |
5 |
42% |
32% |
26% |
47% |
26% |
27% |
8 |
42% |
32% |
26% |
47% |
26% |
27% |
25 |
42% |
32% |
26% |
47% |
26% |
27% |
51 |
42% |
32% |
26% |
47% |
26% |
27% |
100 |
42% |
32% |
26% |
47% |
26% |
27% |
201 |
42% |
32% |
26% |
48% |
25% |
27% |
301 |
42% |
32% |
26% |
48% |
25% |
27% |
600 |
42% |
32% |
26% |
48% |
25% |
27% |
1001 |
42% |
32% |
26% |
48% |
25% |
27% |
4- Hardware sharing for high system clock and low data throughputs
The memory elements of our architecture cannot be shared and based on the results of the proposed architecture approximately 32% of total area belongs to non-combinational elements. In the proposed architecture, two comparators and a controller in each cell are combinational resources. These elements can be shared. To do so, we can make use of one comparator (instead of two) in each cell; but in this case, we need two clock cycles to generate the output. By this way, in the first clock, the oldest element in the register chain is compared to thecontent of Ri of all array cells in parallel using a comparator in each cell. The old cell in the array cells is specified and the right side elements of this cell i are shifted to the left. In the second clock, the new input sample is compared with the content of all array cells inparallel and the right position for the new sample is located. This position and its right-side elements are shifted right and the new sample is located in this place. Therefore, we can save the area utilization by sharing the comparators. However, the latency is increased to two clock cycles (decreasing the actual working frequency). We can also share the control unit by using a multiplexer, but since its design is simple, it seems that it cannot save the occupied area significantly. Additionally, by adding some multiplexers and slight modification in the control unit, one can also automatically select one of two proposed designs easily. We leave this extension for future research.