Enhanced SPIHT Algorithm with Pipelined Datapath Architecture Design

Serap Çekli, Ali Akman
Department of Computer Engineering, Maltepe University, Istanbul, Turkey


ABSTRACT

Set partitioning in hierarchical trees (SPIHT) is an efficient algorithm which is used for the image compression widely. SPIHT operates sequentially so, its parallel implementation is difficult. In this study, the SPIHT algorithm is improved for providing that it is suitable for the parallel processing applications, and the corresponding pipelined datapath is designed for the proposed enhanced SPIHT algorithm. The datapath is designed to have three stages as preprocessing, list generation and output stream. In the preprocessing stage, the flags which are supports the list generation stage are constituted. List of insignificant sets (LIS), list of insignificant pixels (LIP) and list of significant pixels (LSP) are formed in list generation stage. These lists contain the bit values which generate the output bit stream. The performance of the improved datapath design has been tested by compressing different images, and the obtained results are given.

Keywords: SPIHT, enhanced SPIHT, image compression, pipelined datapath

Introduction

The images require large storage capacity and high transmission bandwidth resulting from including considerable information. The compression operation which provides to eliminate the redundant information is advantageous for many aspects such as storage capacity, transmission speed.

Therefore, several studies have been fulfilled for the image compression, and still much efforts continue to improve the compression performance from some different aspects. Some of these studies have been focused on wavelet transform which provides the multi resolution analysis, based compression algorithms such as embedded zero tree wavelet (EZW), set partitioning in hierarchical trees (SPIHT), optimal truncated embedded block coding (EBCOT) used in JPEG2000 standard [1-3].

Set partitioning in hierarchical trees is a discrete wavelet transform (DWT) based, efficient, adaptive and computationally fast compression algorithm [2]. This algorithm is known as an enhanced implementation of the EZW algorithm which has improved performance, which provides embedded coding output. The algorithm provides satisfactory PSNR (peak signal to noise ratio) values, effective image quality and compression ratio, and fast processing time. It can be used for the lossy or lossless data compression according to the necessary application. By means of the mentioned advantages, SPIHT is a suitable option for image compression applications especially for small devices with limited power consumption and memory capacity [4, 5].

Until today, some studies have been performed to increase the performance of the SPIHIT coders, and many researches are carried on in this field yet. Several studies which have been fulfilled have aimed at providing improvement of the compression performance [6], reduction of the memory requirement [4, 5, 7, 8], decrease of the output coding time [9-12], to provide efficient hardware resource usage [13, 14], security of the SPIHT algorithm [15, 16] or biomedical applications [17].
The use of breadth first search (BFS) procedure is a suitable option while implementing the architectural design of SPIHT algorithm to examine the SOTs. At first, BFS deals with the examination of neighboring pixels in the same level. By this means, extra register necessity is eliminated when the inspection is carried out for the decision process on a tree, and also parallel processing architecture could be designed.

The rest of the study can be summarized as follows; The SPIHT algorithm and the system architecture are described after the underlying concepts of the SPIHT algorithm are given. Then, the obtained results have been presented and discussed. As the last, the conclusion section is given.

**SPIHT algorithm**

In the SPIHT algorithm, the image is divided into quadtree structure repeatedly by calculating the DWT coefficients. In the algorithm, the values or positions of the pixels in the subbands are not transmitted. Instead, the decision points that define the structural properties of the image in the tree structure and the results of the decisions at these points are coded as output. Since the positions of the decision points in the tree structure and the output values are known, the image can be recovered in the decoder.

The SPIHT algorithm consists of three steps which are initialization, sorting pass, and refinement pass. The algorithm uses three types of list structures for memory operations: LIS; list of insignificant pixels, and LSP; list of significant pixels. A threshold value is determined considering insignificant sets, LIP; list of insignificant pixels, and LSP; list of significant pixels. The significance test applied to a given set of is defined in the following equation;

\[
S_n(T) = \begin{cases} 
1, & \max_{(i,j) \in T} |c_{i,j}| \geq 2^n \\
0, & \text{otherwise}
\end{cases}
\]

(1)

**Enhanced SPIHT algorithm**

One of the main obstacles to design of the high performance digital system of the SPIHT compression algorithm is difficulty of the parallel implementation resulting from use of lists obtained in one step of algorithm in the next step. In this study, a novel algorithm which reconfigures the constitution of LIP and LIS is proposed to eliminate this drawback, and the outline of the proposed algorithm is given in the Figure 2. The algorithm performs list constitution based on BFS in quadtree structure.

In the step of construction of LIP and LIS, the constitution of LIP and LIS for the corresponding threshold value is fulfilled. The dependency between consecutive threshold values is removed because the constitution of these lists are repeated again for each new threshold value. Construction of LIP and LIS starts with four coefficients located at the lowest frequency band of wavelet transform coefficient matrix. Coefficient coordinates not in LSP are inserted in LIP. Other entries of LIP are obtained while the construction of LIS.

LIS is obtained by testing in sequence the roots of three subtrees. The root \((i,j)\) of subtree is inserted to LIS. The offspring \((O(i,j))\) of that coordinate not in LSP is inserted to LIP, if the result of significance test of all descendants of \((i,j)\) coordinate is \(S_n(D(i,j))=1\). After that, significance test of all descendants excluding offspring of \((i,j)\) coordinate is performed.

If \(S_n(L(i,j))=1\) then \((i,j)\) subtree is tested using BFS. In this test, if significance test of all descendant excluding offspring of the parent pixel of \((k,l)\) coordinate in subtree is \(S(L(\text{parent}(k,l)))=1\), this \((k,l)\) coordinate is added to LIS. If the significance test of all descendant of that \((k,l)\) coordinate is \(S(D(k,l))=1\), then offspring \((O(k,l))\) not in LSP are added to LIP. Thus, test continues on the other subtree upon completion of scanning entire subtree.

Each pixel has four offspring except for the upper left pixel. \(O(i,j)\): Defines the set of the coordinates of the four offspring of a pixel at the \((i,j)\).

- \(D(i,j)\): The set of all descendants of a position \((i,j)\).
- \(H(i,j)\): The set including the coordinates of the roots of all SOTs.
- \(L(i,j)\): The set of the coordinates of all descendants excluding offspring of a position \((i,j)\), and given as \(D(i,j) - O(i,j)\).

The entries in the lists are determined by a \((i,j)\) coordinate. These coordinates represent the individual pixels in LIP and LSP, and also an A type entry of the set \(D(i,j)\) or a B type entry of the set \(L(i,j)\).

The results of the decisions at these points are coded as output.
In LIS output stream, two bits are coded in for significance of all descendant and all descendant excluding offspring for each coefficient. Each coefficient in LIP is subjected to significance test. If the result of significance test for coefficient is 1, a 1 and sign of coefficient is added to LIP output stream. This coefficient coordinate is added to LSP. Otherwise, 0 is coded to LIP output stream. LIP output stream and LSP are completed by this manner. The LSP output stream is similar to the refinement pass in the original algorithm. The threshold value is halved and the process continues by the construction of the new LIP and LIS. The enhanced SPIHT coding algorithm is given in Figure 2.

**Proposed SPIHT architecture**

The architectural design of the enhanced SPIHT algorithm is described in this section. This architectural design has been performed in pipelined datapath form. The system flow diagram of the designed architecture is given in Figure 3.

The block diagram of the proposed SPIHT architecture is given in Figure 4. The architecture is designed in the form of a three stage pipelined architecture corresponding with the system flow in Figure 3. The first stage of the designed architecture is the preprocessing stage. This stage starts with the reading the four successive coefficients from the Main Memory. The transform coefficients are stored in the main memory according to the Morton scanning order from lower addresses to higher addresses. In the preprocessing unit, these coefficients are read in the reverse of the Morton scanning order beginning from the root node.
higher addresses. In this manner, data gathering starts from the lower right corner of the image, so that it is avoided that examination of the same pixels by visiting same positions.

Performing reading operation of the coefficients in this way provides considerable advantages in the significance tests of all descendants \((D(i,j))\) and all descendants excluding offspring \((L(i,j))\). If a coefficient is significant then all descendants of the parent and also all descendants of grandparent excluding offspring of that coefficient is significant. Since the coefficients in the SPIHT algorithm are arranged according to the quadtree structure, the same situation is also valid for the parent and grandparent of each four coefficients. By this means, reading of four coefficients from the main memory at the same time for the preprocessing stage provides efficiency in execution time. Moreover, four coefficients can be tested in parallel. Coefficient test unit fulfils the determination operation of five flags of a coefficient. These flag bits are defined as the significance of the coefficient, the sign bit if the coefficient is significant, significance of all descendants of the coefficient, the significance of all descendants excluding offspring of the coefficient, and the refinement bit. The flags of four coefficients are stored in the significance flag memory by aligning in accordance with the data structure given in Figure 5.

The flag data of concerned coefficient is read from the significance flags memory to find the significance flag value of all descendants or all descendants excluding offspring for a pixel. This data contains flags which possibly may have been written earlier belonging to offspring while examining offspring of that pixel.

Therefore, that data is necessary to calculate the significance flag of all descendants or all descendants excluding offspring for a coefficient. If the significance flag of a coefficient is 1, this situation affects the significance flags of all descendants of parent of the coefficient and all descendants excluding offspring of grandparent of the coefficient. Hence, the flag data stored in the significance flags memory of the parent pixels and grandparent pixels should be updated. The related fields of the data structure is updated by taking the flag data of parent pixel from memory, and data is stored to significance flags memory again. The same operation process is repeated for the grandparent pixel.

Another important unit at this stage is significance flags memory. The memory structure shown in Figure 6 also acts as pipeline register between pipeline stages since enhanced SPIHT encoder is designed to conform to pipelined datapath architecture. This designed memory structure is consisting of two individual memory unit, essentially. The use of data calculated in previous stage is made possible in the second stage while significance flag data is created in the first stage.

![Figure 4. SPIHT encoder architecture](image-url)
The second stage of the proposed architecture is generation of SPIHT Lists. Lists are constructed using flags generated in the first stage. Because of the quadtree structure and the fact that the flags are structured in quadtree form at first stage, the coefficients are also processed as quadtree form at this stage.

Initially, the flags of the first four coefficients are read. The second, third and fourth coefficients of the first four coefficients are the roots of quadtrees. Each quadtree is processed in parallel.

LIP, LIS and LSP are generated in FIFO (first in first out) memory structure. The size of lists is determined depending on image size and are not altered later. Some fields may stay empty in lists because all coefficients may not enter into these lists in an encoding stage. A 1-bit data, v-bit, is used to discriminate the empty and used fields. If this bit is 1, then this means that this field is used, and next bits will be coded in the output stream. There is a two-bit data field in LIP for each coefficient which enters this list except for the v-bit. These fields are the coefficient significance flag (CSF) and coefficient sign flag (CSGF). Two fields are defined in the LIS except for the v-bit. These fields are CDF and CLF bits. The CRB bit is existed in LSP except for the v-bit.
Besides other list memories, another memory (LSP check) else is used to indicate which coefficient is inserted into LSP. The size of this memory is $N \times N$ bits. In LSP check memory, corresponding Morton order number bit of this coefficient is set as 1 when a coefficient enters into LSP. LIP and LIS memories are reseted at beginning of each quantization step. LSP and LSP check memories are used during the encoding without cleared. In other words, the entries in a quantization step are added following the entries of previous step. The memory structures of these lists are designed in complying with the memory structure used in the first stage and given in Figure 6.

Since the coefficients are processed using BFS in this stage the BFS address generator block given in Figure 7 is crucial.

Three registers are used in the design. Address of the coefficient at the root of the tree which will be scanned initially is recorded to the first register. The addresses of the coefficients on the left and right sides of these levels are recorded in the Low and High registers as the sublevels in the tree are examined by. The address scan starts from the Low register and continues to until the reach of the value in the High register. When the High register value is reached, the register values are updated according to the new level by descending one level down in the tree.

The third stage is the output stream stage. In the second step, list data stored in FIFO memory is transmitted to the output stream. Firstly, LIS FIFO memory, secondly LIP FIFO memory, and finally LSP FIFO memory contents are transmitted to the output stream.

Results

The designed SPIHT datapath has been tested using different images with 256×256 pixels. The wavelet coefficients of images at this size can be expressed by a 16-bit sign-magnitude representation.

First of all, the memory analysis necessary for encoding the images used in the tests has been performed. Since the coefficients and the output stream are stored in main memory, the memory space used for this data is not included in the analysis. Two types of memory have been used in the SPIHT datapath: The registers used for the intermediate operations, and SRAM between pipeline stages memories (significance flags memory and list memories). The analysis is performed for SRAM memory elements, since the memory space used for the registers is negligible as compared to the other memory space.

In significance flags memory, a 20-bit field is used for each of the four coefficients. Therefore, the total memory space used;

$$\text{Significance Flags Memory Space} = (2N^2/4) \times 20\text{bit}$$  \hspace{1cm} (2)

If $N=256$ then this value is 80 Kbyte. The contents of LIP and LIS memories change at every calculation step. The size of the list memories has been determined by taking into account the maximum list size obtained in the tests performed and fixed for each calculation step. The number of coefficients entering the LIS increases as the threshold value decreases. The number of coefficients have the maximum value when the threshold value is equal to 1, and all the coefficients enter into this list except for the first one.

Besides other list memories, another memory (LSP check) else is used to indicate which coefficient is inserted into LSP. The size of this memory is $N \times N$ bits. In LSP check memory, corresponding Morton order number bit of this coefficient is set as 1 when a coefficient enters into LSP. LIP and LIS memories are reseted at beginning of each quantization step. LSP and LSP check memories are used during the encoding without cleared. In other words, the entries in a quantization step are added following the entries of previous step. The memory structures of these lists are designed in complying with the memory structure used in the first stage and given in Figure 6.

Since the coefficients are processed using BFS in this stage the BFS address generator block given in Figure 7 is crucial.

Three registers are used in the design. Address of the coefficient at the root of the tree which will be scanned initially is recorded to the first register. The addresses of the coefficients on the left and right sides of these levels are recorded in the Low and High registers as the sublevels in the tree are examined by. The address scan starts from the Low register and continues to until the reach of the value in the High register. When the High register value is reached, the register values are updated according to the new level by descending one level down in the tree.

The third stage is the output stream stage. In the second step, list data stored in FIFO memory is transmitted to the output stream. Firstly, LIS FIFO memory, secondly LIP FIFO memory, and finally LSP FIFO memory contents are transmitted to the output stream.

Results

The designed SPIHT datapath has been tested using different images with 256×256 pixels. The wavelet coefficients of images at this size can be expressed by a 16-bit sign-magnitude representation.

First of all, the memory analysis necessary for encoding the images used in the tests has been performed. Since the coefficients and the output stream are stored in main memory, the memory space used for this data is not included in the analysis. Two types of memory have been used in the SPIHT datapath: The registers used for the intermediate operations, and SRAM between pipeline stages memories (significance flags memory and list memories). The analysis is performed for SRAM memory elements, since the memory space used for the registers is negligible as compared to the other memory space.

In significance flags memory, a 20-bit field is used for each of the four coefficients. Therefore, the total memory space used;

$$\text{Significance Flags Memory Space} = (2N^2/4) \times 20\text{bit}$$  \hspace{1cm} (2)

If $N=256$ then this value is 80 Kbyte. The contents of LIP and LIS memories change at every calculation step. The size of the list memories has been determined by taking into account the maximum list size obtained in the tests performed and fixed for each calculation step. The number of coefficients entering the LIS increases as the threshold value decreases. The number of coefficients have the maximum value when the threshold value is equal to 1, and all the coefficients enter into this list except for the first one.
\[ LIS\ Memory = 2(N^2 - 1) \times 3bit \]  \hspace{1cm} (3)

If \( N = 256 \) then this value is 48 Kbyte. The number of coefficients entering LIP memory has been determined by conducting different trials. As a result of the different trials, it has been concluded that the maximum size of LIP changes from 45% to 65% of number of coefficients. The size of the memory has been determined as given follows ensuring that not to overflow the memory.

\[ LIP\ Memory = 2(0.75N^2) \times 3bit \]  \hspace{1cm} (4)

If \( N = 256 \) then this value is 36 Kbyte. LSP memory and LSP check memory contains all the coefficients, so the total LSP memory is given as;

\[ LSP\ Memory = N^2 \times 3bit \]  \hspace{1cm} (5)

If \( N = 256 \) then this value is 24 Kbyte. Total memory space is 188 Kbyte if \( N = 256 \).

The images used in the trials are given in Figure 8. The simulation of the design has been performed by storing the wavelet transform coefficients of these images to main memory. The output bit streams for the different bit rates are also has been stored in the main memory. The images have been recovered by decoding the output bit streams in the decoder. The recovered images are shown in Figure 9. The PSNR criterion has been used to measure the similarity of the reconstructed images with the original image. The variation of PSNR values with bit rate of the reconstructed images is given in Figure 10.

**Conclusion**

In this study, the SPIHT image compression algorithm has been improved providing that the more compatible with hardware design process. The pipelined datapath design of the proposed enhanced SPIHT algorithm has been also performed. This datapath is comprised of three stages as preprocessing, list generation and output stream. As distinct from present studies, taking the coefficients in reverse Morton order avoids the unnecessary repeated operations for significance tests of all descendants and all descendants excluding offspring, in preprocessing stage. In list generation stage, the BFS has been considered, and a BFS address generator has been designed. The designed datapath has been tested by using different images. According to these tests, memory analysis of the datapath has been fulfilled and the obtained compression results has been given. The proposed method can be used for fast and less complex calculation required image compression systems in which low power and low capacity devices.

**Peer-review:** Externally peer-reviewed.

**Conflict of Interest:** The authors have no conflicts of interest to declare.

**Financial Disclosure:** The authors declared that this study has received no financial support.

**References**

Serap Çekli was born in Germany in 1978. She received her BSc. degree in Electronics Engineering from Istanbul University in 2000, MSc. degree in Electronics and Communication Eng. from Istanbul Technical University in 2003. She has received PhD. degree from Istanbul University, Elect. Electronics Eng. Dept. in 2009. She worked for Istanbul University Engineering Faculty as a research assistant between 2001-2009. She has been working as an assistant professor at Maltepe University, Computer Engineering Department since 2009. Her research interests are digital systems, digital design, computer organization and architecture.

Ali Akman received her BSc., MSc. and PhD. degrees in Electronics Engineering from Uludağ University, Bursa, Turkey in 1993, 1996 and 2004, respectively. He has been working as a research assistant and lecturer. His research interests are digital systems, computer architecture, embedded systems, networked embedded systems and wireless sensor networks.