# Optimising the Implementation of a FFT-based Multicarrier CDMA receiver

P. M. Grant<sup>1</sup>, A. C. McCormick<sup>2</sup>, J. S. Thompson<sup>1</sup>, T. Arslan<sup>1</sup> and A. T. Erdogan<sup>1\*</sup>
<sup>1</sup> Department of Electronics and Electrical Engineering, The University of Edinburgh The King's Buildings, Mayfield Road, Edinburgh, EH9 3JL, UK.
<sup>2</sup> Now at Alphadata Parallel Systems, 58 Timber Bush, Edinburgh, EH6 6QH, UK. email:Peter.Grant@ee.ed.ac.uk

# ABSTRACT

Implementation aspects of a multicarrier code division multiple access (MC-CDMA) receiver with software reconfigurable power consumption is discussed. This receiver allows the power consumption to be reduced at the expense of an increase in processing delay. This delay may be important in real time response applications such as voice communications but not in other applications such as downloading data and therefore the power consumption can be optimised to the traffic. Power reduction is achieved at the cost of increased receiver latency. The performance of MC-CDMA receivers is also briefly discussed at the end of the paper.

### 1 Introduction

Multi-carrier CDMA [1, 2] is a spread spectrum technology which combines the advantages of OFDM (orthogonal frequency division multiplexing) and CDMA (code division multiple access) to produce a spectrally efficient multi-user radio access system. This radio access system may be utilised in future mobile wireless systems, and hence the power consumption of terminals is an issue. In [3] the implementation of the combiner circuit of a low power MC-CDMA receiver, using a block processing algorithm was investigated. This paper extends that work to implement the entire baseband receiver circuit including the FFT.

Future mobile wireless systems will have to deal with a variety of traffic such as voice, data and video. Each of these types of traffic has different requirements for bandwidth and latency. The power reduction was achieved at the cost of increasing the latency of the data symbols. For some traffic, this latency may not be acceptable. Therefore a more flexible receiver is proposed, which can be reconfigured to either provide low latency or low power performance.

#### 2 Multi-Carrier CDMA

The multi-carrier CDMA considered here uses 1 code chip per carrier and should not be confused with direct sequence CDMA systems transmitted on multiple carriers [4]. The signal is spread before being converted into a parallel data stream which is then transmitted over multiple carriers. If the processing gain is equal to the number of carriers then this system modulates all the carriers with the same data bit, but with a phase shift on each carrier determined by the spreading code, as shown in Figure 1. This multi-carrier modulation can also be implemented using an inverse FFT.





If the kth chip of the spreading code for user u is defined as  $c(k, u) \in \{-1, +1\}$  then the transmitted baseband signal for mth data symbol b(m) is:

$$x(n) = \sum_{n=0}^{N-1} \exp(j2\pi kn/N)c(k,u)b(m)$$
 (1)

To overcome the effect of inter-symbol interference, this baseband signal is cyclically extended by more than the channel delay spread, to allow transmission of an interference free symbol. The full structure of a MC-CDMA transmitter is shown in Figure 2.

This work was funded by the EPSRC Grant No. GR/L/ 98091.



Figure 2: Multi-Carrier CDMA Transmitter

By using a guard interval, the receiver selects the portion of the signal that is free from inter-symbol interference. This is processed by an FFT block to demodulate the multiple carriers.

The channel effect of a multipath channel h(n) at the output of the FFT is narrowband for each carrier, H(k), and therefore the equalisation and despreading can be incorporated into a single combining operation to estimate the transmitted data bit. If the output of the FFT block at frequency bin k is defined as Y(k) then the combining operation can be represented by:

$$\hat{x}(n) = \operatorname{sign}\{\sum_{k=0}^{N-1} \Re\{c(k, u)A(k)Y(k)\}\}$$
(2)

The entire receiver structure is shown in Figure 3.

The equaliser coefficient A(k) can be used to implement a frequency domain rake filter if it is set to match the channel:  $A(k) = H^*(k)$ . Linear multiuser detection can also be implemented by simply changing the coefficient A(k). A decorrelator can be implemented by setting  $A(k) = H^*(k)/|H(k)|^2$  and the MMSE solution can be implemented using A(k) = $H^*(k)/(|H(k)|^2+\lambda)$  where  $\lambda$  is a parameter dependent upon the signal to noise level and number of users.

#### 3 Implementation

To implement a MC-CDMA receiver requires 3 blocks: guard interval removal, FFT and the combiner. However these need not be implemented individually. The guard interval removal is merely the selection of the ISI free part of the received signal and therefore this can be implemented very simply by controlling when the signal is sampled. The FFT and the combiner both involve multiplications and additions and therefore it is possible to implement both using the same hardware with appropriate control signals.



Figure 3: Multi-Carrier CDMA Receiver

Figure 4 shows the overall architecture of the circuit. It contains 4 memory blocks, a control unit, an arithmetic unit and an interface to the analogue receiver. Two of the memory blocks are interchangeable buffers, one is used to store the incoming signal, which is sampled at a rate much lower than the processing clock rate, and the other is used as the workspace for the FFT operation. These blocks are interchanged when the processing of the FFT is finished and sufficient new samples are in the other buffer.



Figure 4: Receiver Circuit Block Diagram

The other memories are used to store the equaliser coefficients and to sum the totals produced by the combiner operation. The arithmetic unit provides the multipliers required for both the FFT and combining algorithms as well as a divider to compute the MMSE equaliser coefficients.

The reconfigurable power consumption is provided by implementing a flexible length block algorithm. In the circuit the granularity of processing can be varied to achieve the desired balance between power consumption and minimising the delay in the data. The data are divided into blocks of M data bits. This requires memory buffers of size NM complex elements, where N is the size of the FFT.

The highest power, lowest latency mode operates when M = 1. In this case the data bits are processed one at a time. The received signal is buffered in sections of N samples (in bit reversed order). When one section is buffered, the memory buffers are swapped and the FFT operation is then performed on the N point memory through  $\log_2 N$  stages, with the results overwriting the data. To allow an easier implementation of a low power block based algorithm the butterfly operations of the FFT are performed using two multipli-



Figure 5: Algorithm for FFT and Combiner Operation

ers, achieving the 4 multiplications required in 2 clock cycles. Otherwise the FFT operates sequentially, completing each stage in N clock cycles. The combiner stage also uses the two multipliers performing its 2N multiplications in N clock cycles.

When switching to a lower power mode, data is buffered in blocks containing M data bits. The sampled data is stored in sections of MN samples, with the least significant address bits reverse ordered in preparation for the FFT. The FFT is then performed in block fashion on the M data bits. Each butterfly operation is performed sequentially on the equivalent array elements for each successive data bit. This keeps one input of the multipliers (the FFT coefficients) fixed for at least M clock cycles. This block based approach is also applied to the combiner multiplication where the equaliser coefficients are held constant for M cycles and the combined sums are stored in the accumulator memory. When decoding is complete, the resulting data estimates can be read out of accumulator memory.

The power reduction here is less than the 50% obtained when the combiner circuit is implemented alone. The reason is that the FFT coefficients are not always switched every clock cycle even when M = 1. The rate of change depends on which stage of the FFT is being processed, and at earlier stages, the coefficient changes slowly, however at the last stage, the coefficient is switched at the clock rate and therefore significant power saving is made here.

Figure 5 shows the algorithm used by the FFTcombiner circuitry. The inner loops where one of the multiplier inputs is held constant are shaded. In the combiner phase of the algorithm this is for M clock cycles, but in the FFT phase of the algorithm the length of time one of the inputs is held constant depends on the stage of the FFT being processed.

The algorithm also includes a channel estimation

phase, where pilot symbols are transmitted to allow the receiver to estimate the channel. The channel estimation processing occurs after the FFT in place of the combiner algorithm. To estimate the MMSE coefficients from the channel estimates and additional division circuit is required. This does not have a significant effect on the power consumption as it is only used during the channel estimation phase which is 3% of the time used for combiner operations.

#### 4 Power Analysis

The circuit was synthesised using the Alcatel 0.35 micron library. The bit resolution for the multipliers, divider and memory buffers was 8-bit. Fixed point FFT simulations with an average signal power of  $32^2$ , indicated that a quantisation noise level of -18 dB is produced and this should not have a significant impact on the combining algorithm. The accumulator memory and associated addition circuit were 16-bit precision.

A 64 carrier MC-CDMA system was implemented, requiring a 6 stage FFT. A cyclic extension of 16 samples was assumed giving a total symbol length of 80 symbols. To allow time to perform the FFT, the circuit was clocked at 8 times the sample rate. A clock rate of 20 MHz could be achieved comfortably, resulting in a sample rate of 2.5 MHz and underlying symbol rate of 31.25 kHz. Block lengths (M) were varied from 1 up to 32 symbols giving processing delays from 32  $\mu$ s up to 1 ms.

Table 1 shows the effect on power when varying the block length. The data used for these simulations assumed one user transmitting across a Rayleigh fading channel with average receiver  $E_b/N_0$  of 10 dB. It is quite clear from the simulations that a latency/ power trade off can be achieved. At a cost of a 1 ms processing delay, a 13% reduction in power is achieved.

| Block Length $(M)$ | Latency/µs | Power/mW |
|--------------------|------------|----------|
| 1                  | 32         | 18.8     |
| 2                  | 64         | 18.4     |
| 4                  | 128        | 17.8     |
| 8                  | 256        | 17.3     |
| 16                 | 512        | 16.9     |
| 32                 | 1024       | 16.3     |

Table 1: Power Consumption with Block Length

This appears poor compared to the 50% power reduction which can be achieved in the combiner circuit alone [3], however the combiner circuit only accounts for 8% of all multiplications and therefore 4% of the power reduction can be attributed to this reduction. The other 9% of power reduction therefore occurs in the FFT, giving an average 10% power reduction across the whole FFT. This is mostly in the last stages of the FFT where the coefficients are switched at a high rate.



Figure 6: Performance of uplink MC-CDMA systems

## 5 Receiver Performance Extensions

In order to further improve the performance of these MC-CDMA systems it is preferable to increase the receiver complexity and introduce some form of multiuser detection technique. Successive cancellation of user interference (SIC) has been shown to be a useful way to reduce the error ratio to more acceptable levels, Figure 6 in Rayleigh fading channels. This figure shows that 48 users can be supported at a bit error ratio of  $10^{-2}$  with the SIC architecture; without single user matched filter detection, this number drops to about 8 users. This highlights the practical importance of multi-user detection techniques in MC-CDMA systems. The implementation of this advanced SIC receiver has also been investigated [5].

# References

- N. Yee, J. P. Linnartz, and G. Fettweis, "Multi-Carrier-CDMA in Indoor Wireless Networks," in *PIMRC '93*, Yokohama, Japan, 1993, pp. 109–113.
- [2] R. Prasad and S. Hara, "An Overview of Multicarrier CDMA," *Proceedings of the IEEE Int. Symposium on Spread Spectrum Techniques and Ap*-

*plications, Mainz, Germany*, pp. 107–114, September 1996.

- [3] A. C. McCormick, P. M. Grant, J. S. Thompson, T. Arslan, and A. T. Erdogan, "A low power MMSE receiver architecture for multi-carrier CDMA," in *Proceedings of ISCAS 2001*, Sydney, Australia, May 2001.
- [4] Q. Chen, E. S. Sousa, and S. Pasupathy, "Performance of a Coded Multi-Carrier DS-CDMA System in Multi-Path Fading Channels," *Wireless Personal Communications Journal*, vol. 2, no. 1-2, pp. 167– 183, 1995.
- [5] A.C. McCormick, P.M. Grant, J.S. Thompson, T. Arslan and A.T. Erdogan, "Implementation of an SIC Based MC-CDMA Base Station Receiver", in Proceedings of the 2001 Third International Workshop on Multi-carrier Spread Spectrum and Related Topics, Oberpfaffenhofen, Germany, September 2001.