# DESIGN AND IMPLEMENTATION OF A DIGITAL DOWN CONVERTER CHIP

Ulf Sjöström, Magnus Carlsson<sup> $\dagger$ </sup> and Magnus Hörlin<sup> $\dagger$ </sup>

National Defence Research Establishment, Box 1165, S-581 11 Linköping, Sweden Tel: +46 13 318000 Fax: +46 13 318170 e-mail: ulfs@lin.foa.se

#### ABSTRACT

The design and implementation of a CMOS ASIC containing a digital down converter and a channel equalizer for a digital array antenna system is presented. The chip performs nearly  $342x10^6$  *MAC/s* (multiply-accumulate/second) at an internal bit-rate of 51.6 *MHz*. The circuit is based on a highly flexible architecture with few full-custom bit-serial arithmetic units.

#### **1** INTRODUCTION

A demonstration system for an S-band digital beamforming array antenna has been developed at the National Defence Research Establishment in Linköping, Sweden [1].

A DBF array antenna has a number of advantages, e.g., fast adaptive nulling, super resolution and direction finding, ultra-low sidelobes, closely spaced multiple beams, possibility for space-time adaptive processing, etc. However, this implies a number of specific requirements on the signal processing, e.g., channel equality, dynamic range, I/Q-balance, linearity and low noise [2].

These requirements can be met using digital I/Q-split, digital down conversion and a digital channel equalization in each channel. As a consequence, high performance in terms of numerical accuracy, high sampling-rate and number of arithmetic operations, is needed. Since the signal processing should be performed in each single channel there will be additional constraints on physical size, power dissipation and low cost. ASIC implementation becomes the most feasible solution.



Figure 1 Digital Array Antenna System.

# 1.1 The demonstrator system

The 12 channel demonstration receiver system is configured as shown in figure 1. It consists of an antenna array followed by a calibration network and analog reciever modules that <sup>†</sup>Department of Electrical Engineering, Linköping University, S-581 83 Linköping, Sweden Tel: +46 13 281000 Fax: +46 13 139282 e-mail: magnusc, magnusch@isy.liu.se

down converts the 2.8-3.3 *GHz* RF signal to a 19.35 *MHz* IF signal with 3 *MHz* bandwidth. The 25.8 *MHz* sampling is performed by a 12 bit analog-to-digital converter (ADC).

The digital part starts with a digital down converter (DDC) and then a channel equalizer (EQU). A digital beamformer (DBF) and a beam controller are used to calculate the output beams. Twelve channels are a compromise between development cost and the possibility to take full advantage of array signal processing. In a real application the number of channels would be substantially higher.

# 2 DIGITAL FILTER BLOCK

The DDC and EQU parts of figure 1 are implemented by the presented chip. Figure 2 shows the digital filter subsystem in detail. It contains an IQ-split stage followed by two decimation stages where the sampling-rate is reduced to  $3.225 \ MHz$ . The channel equalizer stage is essential to minimize the amplitude and phase differences between the channels. It consists of a complex programmable 12-tap FIR filter.



*Figure 2 Digital filter block diagram.* 

#### 2.1 Filter Design

The objective when designing the filters, apart from fulfilling the specifications, was to find a configuration well suited for ASIC implementation. Half-band finite impulse response (FIR) filters were compared to half-band wave digital lattice filters (WDLF) with approximately linear phase [3]. The WDLF gives lower number of operations and reduced memory requirements but the FIR solution was chosen since it gave better results in terms of simpler implementation, and exactly linear phase.

#### 2.2 IQ stage

The I/Q extraction and demodulation is accomplished by a standard solution using a *Hilbert filter* with 11 significant bits

for the coefficients, figure 3. This results in a mirror image suppression of 96 dB, figure 4. Due to the multiplexer at the input where the even samples are switched to the upper branch and the odd to the lower, a reduction of the sampling-rate with a factor of two is accomplished and the hardware can operate at the lower output sampling-rate. Still this is not a true decimation since the net data-rate remains unchanged.



Figure 3 FIR IQ-split and decimation filters.

#### 2.3 Decimation stage

A bandpass decimation is required in the next stage since the desired signal is centered around 6.45 MHz after sampling. This can be achieved by a highpass decimation followed by a lowpass decimation accordingly to figure 2. The decimation should be performed uniformly for both for the *I* and the *Q* channels, i.e., identical filters must be used not to introduce additional errors to the signal.

Half-band FIR filters were used for both stages giving a significant save in hardware requirements. The three multiplier highpass filter also takes care of spuriouses from the last local oscillator stage by placing a transmission zero at DC. Due to the narrower transition band a higher order filter is required for the lowpass stage, figure 3. The coefficients were quantized to 11 bit (sign included) for both the decimation stages.

In figure 4 it can be noted that the attenuation in the stopbands are well under the specifications since half-band FIR filters in this case become constrained by the passband requirements. It should be pointed out that many different solutions has been evaluated, but the presented configuration gave the best trade-off between performance and implementation complexity. The overall response corresponds to a FIR filter with 210 complex valued taps.

### 2.4 Channel Equalizer

The channel equalizer will be used for elimination of internal channel inequalities. In a calibration phase the channels are measured and weights are calculated and loaded to the equalizer. With the selected signal-bandwidth and the measured parameter spread of the reciever modules, it did turn out that a 12 tap programmable complex FIR filter was sufficient.



Figure 4 Overall magnitude response for the DDC.

# **3** ARCHITECTURE

To meet the requirements, an internal 16 bit data representation is sufficient. Since one of the main goals was to make the chip as area efficient as possible, bit-serial arithmetic became the natural choice. However, two main problems arise when implementing the four different filter stages bit-serially. First, an input bit-rate of 16x25.8 = 412.8 MHz would demand high-speed circuits and interfaces. Secondly, three different internal bit-rates are necessary in the individual filter stages and this is cumbersome to handle in a bit-serial system.

These problems were solved using state-decimation techniques and multiple-path time-multiplexing, figure 5. By taking care of eight samples simultaneously, the bit-rate in every stage is reduced to a more manageable 51.6 *MHz*, because it becomes possible to use the output 3.225 *MHz* sampling-rate on the entire chip. The price to pay for this is that a total of four IQ-filters and four HP-filters are needed, figure 5.

## **4 DISTRIBUTED ARITHMETIC**

It is well known that an efficient way of implementing fixed coefficient digital FIR filters is to use distributed arithmetic [4]. A complete innerproduct can be implemented by one shift-accumulator (SA) and one look-up table (LUT) containing  $2^N$  entries.

Using bitserial input data, one bit each of the *N* inputs are used to point out an address to the LUT. This value is added to an earlier value, divided by two, i.e., right-shifted one position, and stored. By using offset binary coding (OBC) the size of the LUT can be halved [5].

#### 4.1 Distributed Arithmetic Processor

To increase the throughput, the used SA is pipelined with two extra registers, figure 6. When OBC is used, the SA must be able to perform both addition and subtraction. This is accomplished by the XOR-gates and programmable carry-save registers. A large effort was put into create regular, area efficient and high performance layouts for the SA.



Figure 5 DDC Architecture.

The LUT can be implemented as a small ROM since the DDC uses fixed coefficients. In the IQ- and HP-stages N = 4, which means that the ROM contains only  $2^{(4-1)} = 8$  entries. In the LP-stage however, the innerproduct is calculated using two separate DAP's, one with N = 7 and the other with N = 6, to avoid a large and slow ROM.



Figure 6 A four bit Shift-accumulator blockdiagram.

#### 4.2 Complex Multiplier

An efficient implementation of the complex multiplier (CMULT) used in the EQU can be obtained by using two DAP's. The basic idea is to calculate the short (two terms) innerproduct for both  $y_{re}$  and  $y_{im}$  on separate DAP's.



Figure 7 Complex multiplier based on DA with OBC.

By using OBC the LUT is reduced to two words for both cases and by identifying the contents of both tables it shows that they are identical whereas it is possible to share them among the DAP's, see figure 7.An advantage is that the same basic arithmetic unit can be used for the EQU as well as for the DDC. It has been shown [6] that this CMULT is suitable for implementing the beamformer of figure 1.

#### 5 IMPLEMENTATION

The implementation is made in a 0.8  $\mu$ m CMOS technology. The low clock-rate allows the use of a non-overlapping twophase clocking strategy [7]. This strategy gives simple and area efficient latches and register cells.

To further reduce some critical logic blocks, pass-transistor logic was used. This has ensured fast, simple, compact and efficient layouts with low power dissipation. A goal for all included modules has been that they should be able to operate at 200 *MHz*, giving a design margin close to four, since a 51.6 *MHz* clock is sufficient for this application.

# 5.1 Floorplan

A large effort was put in to make the chip as small as possible, both during the work on the arithmetic modules and on the global floorplan [8]. Another important issue was to minimize the routing distance between the different stages. The final result fulfilled both these requirements, as can be seen in figure 7. To prevent too high power dissipation locally on the chip, the clock distribution was divided between local clock buffers on every SA and the main clock buffer (CBuf).

#### 5.2 Filters

The DDC is implemented accordingly to figure 5, with 4 DAP's for each filter stage. The LP-stage occupies a larger area only because of the increased number of registers.

The EQU is implemented with direct mapping of the complex FIR-filter to 12 CMULT's, two sets of 16 bit shift-registers and two adder-trees. One for the real part and the other for the imaginary part. Since the coefficients are loaded serially into the EQU a coefficient register has been implemented to simplify the programming. It consist of an asynchronous static register with parallel load, controlled by a write signal. It also contains a dynamic synchronized shift-register which shifts in the coefficients to the EQU. This way asynchronous loading of new weights is supported.

#### 5.3 Parallel/Serial conversion

A parallel/serial register bank (P/S-REG) is needed to convert the 12 bit parallel binary offset input from the ADC, to the internal 16 bit two's complement bit-serial representa-

tion. The P/S-REG can be combined with the branching network at the input of the DDC, cf. figure 5. This unit is implemented with 16 parallel loadable shift-registers in 2 groups and 8 muxes. The conversion between binary offset and two's complement is performed by inverting the most significant bit. The data word length is extended by resetable 4-bit registers at the output.

#### 5.4 Control unit

Very few global control signal are needed since similar arithmetic units are used in all stages. The control signals are generated by a simple state-machine in the form of a selfaddressing clocked ROM.

To generate the non overlapping two-phase clock as safe as possible, a solution using an on chip phased locked loop (PLL) was employed. It multiplies the input clock with a factor of four. Some additional logic generates the non-overlapping clocks with 25% duty cycle each. This results in a robust solution where clock skew problems are virtually eliminated at the cost of a shorter evaluation time due to the short duty cycle.

To be able to test each stage individually, multiplexers are included between every filter stage. This makes it possible to use only selected parts of the chip, if desired. Measures to bypass the PLL to use an external clock has also been taken.



Figure 8 Chip Layout.

## 5.5 Figures of merit

The chip contains approximately 105'000 transistors and occupies 4.6x4.4  $mm^2$  (4.0x3.6  $mm^2$  core area) in a 0.8  $\mu$ m CMOS technology (AMS CYB). HSPICE simulations has indicated a total power dissipation of less than 1 *W* at 5 *V* and 51.6 *MHz* internal clock. Both area and power dissipation could probably be reduced by relaxing the design margin. The chip is packed in a PGA 85a package and is currently under test.

# 6 CONCLUSIONS

This paper has demonstrated an implementation of a digital down converter and channel equalizer for a digital array antenna application.

It has been demonstrated that high performances can be achieved even using a moderate clock-rate and bit-serial arithmetic. This has been made possible by optimizing the DSP algorithms for the architecture. The proposed architecture is highly modular and sustains simple control and communication. It also offers the possibility to exploit the speedarea trade-offs.

Distributed arithmetic ensures high area efficiency and low power dissipation. The chosen clock strategy and logic style has also contributed to this.

The built in design margin could be used for either voltagescaling or increased clockrate. In current technology it should be possible to achieve more than  $10^{10}$  *MAC/s* on a single chip. Clearly, this makes the proposed concept interesting to many other front-end systems.

#### References

- [1] L. Petterson, M. Danestig and U. Sjöström: "An Experimental S-Band Digital Beamforming Antenna", Accepted for publication at Int. Symp. on Phased Array Systems and Technology, Boston, USA, October 1996.
- [2] H. Steyskal: "Digital Beamforming Antennas: An Introduction", *Microwave Journal*, Vol. 30, No. 1, January 1987.
- [3] U. Sjöström: "Design of a Digital Down Converter", FOA document D--95-00133-3.2--SE, Linköping, Sweden, June 1995.
- [4] A. Peled, and B. Liu: "A New Hardware Realization of Digital Filters", *IEEE Trans. Acoust., Speech and Signal Processing*, Vol. ASSP-22, No. 6, pp. 456-462, December 1974.
- [5] S. G. Smith and P. B. Denyer: "Serial-Data Computations", Kluwer Academic Publisher, Boston, USA, 1988.
- [6] U. Sjöström: "Design of a Digital Down Converter", FOA Document D--95-001333-3.2--SE, 32 pages, Linköping, Sweden, June 1995.
- [7] C. Mead and L. Conway: "Introduction to VLSI Systems", Addison-Wesley, Series in Computer Science, USA, 1980.
- [8] M. Carlsson and M. Hörlin: "ASIC design of a digital down converter", LiTH-ISY-EX-1609, Linköping, Sweden, December, 1995 (in swedish).