# An Efficient Baugh-WooleyArchitecture for Signed & Unsigned Fast Multiplication

Pramodini Mohanty

Department of Electrical & Electronics Engineering,

Noida Institute of Engineering & Technology (NIET), Greater Noida (UP)

Email-id: pramodini swain@yahoo.co.in, pamina2009@gmail.com

Abstract— This paper presents an efficient implementation of a high speed multiplier using the shift and adds method of Baugh-Wooley Multiplier. This parallel multiplier uses lesser adders and lesser iterative steps. As a result of which they occupy lesser space as compared to the serial multiplier. This is very important criteria because in the fabrication of chips and high performance system requires components which are as small as possible. Experimental results demonstrate that the proposed circuit not only improves the accurate performance but also reduces the hardware complexity and also less power consumption that is dynamic power of 15.3mW and maximum clock period of 3.912ns is required which is very efficient as compared to the reference paper.

Keywords— Baugh-Wooley Multiplier, Pipeline resister, Power Efficient, Carry Save Adder.

#### I. INTRODUCTION

Multiplication involves two basic operations— the generation of the partial product and their accumulation [5]. Therefore, there are possible ways to speed up the multiplication that reduces the complexity, and as a result reduces the time needed to accumulate the partial products. Both solutions can be applied simultaneously.

Baugh-Wooley Two's Complement Signed Multiplier: Two's Compliments is the most popular method in representing signed integers in Computer Sciences. It is also an operation of negation (Converting positive to negative numbers or vice—versa) in computers which represent negative numbers using two's complements. Its use is so wide today because it does not require the addition and subtraction circuitry to examine the signs of the operands to determine whether to add or subtract. Two's complement and one's complement

| +N | - Positive<br>Integers | -N | Negative Integers   |                   |                   |  |  |
|----|------------------------|----|---------------------|-------------------|-------------------|--|--|
|    |                        |    | Sign &<br>Magnitude | 2's<br>Complement | 1's<br>Complement |  |  |
| +0 | 0000                   | -0 | 1000                |                   | 1111              |  |  |
| +1 | 0001                   | -1 | 1001                | 1111              | 1110              |  |  |
| +2 | 0010                   | -2 | 1010                | 1110              | 1101              |  |  |
| +3 | 0011                   | -3 | 1011                | 1101              | 1100              |  |  |
| +4 | 0100                   | -4 | 1100                | 1100              | 1011              |  |  |
| +5 | 0101                   | -5 | 1101                | 1011              | 1010              |  |  |
| +6 | 0110                   | -6 | 1110                | 1010              | 1001              |  |  |
| +7 | 0111                   | -7 | 1111                | 1001              | 1000              |  |  |
| +8 |                        | -8 |                     | 1000              |                   |  |  |

FIG1(a): Two's complement & one's complement representation

representations are commonly used since arithmetic units are simpler to design. Fig.1 shows Two's & One's complement representation.

Baugh-Wooley Two's complement Signed numbers: Baugh-Wooley Two's complement Signed multipliers is the best known algorithm for signed multiplication because it maximizes the regularity of the multiplier and allow all the partial products to have positive sign bits [3]. Baugh-Wooley technique was developed to design direct multipliers for Two's complement numbers [9]. When multiplying Two's complement numbers directly, each of the partial products to be added is a signed number. Thus each partial product has to be sign extended to the width of the final product in order to form a correct sum by the Carry Save Adder

FIG1 (b): 5\*5 unsigned multiplications

|  | $a_4x_4$ | $-a_3x_4$     | $-a_{2}x_{4}$ | $-a_1x_4$     | $-a_0x_4$      |             |          |           |         |
|--|----------|---------------|---------------|---------------|----------------|-------------|----------|-----------|---------|
|  |          | $-a_{4}x_{3}$ | $a_3x_3$      | $a_2x_3$      | $\alpha_1 x_3$ | $a_0x_3$    |          |           |         |
|  |          |               | $-a_4x_2$     | $a_3x_2$      | $a_2x_2$       | $a_1x_2$    | $a_0x_2$ |           |         |
|  |          |               |               | $-a_{4}x_{1}$ | $a_3x_1$       | $a_2x_1$    | $a_1x_1$ | $a_0 x_1$ |         |
|  |          |               |               |               | $-a_{4}x_{0}$  | $a_3 x_0$   | $a_2x_0$ | $a_1x_0$  | $a_0 x$ |
|  |          |               |               |               | $X_4$          | $x_4$ $x_3$ | $x_2$    | $x_1$     | $x_0$   |
|  |          |               |               |               | $a_4$          | $a_3$       | $a_2$    | $a_1$     | ao      |

FIG1 (c): 5\*5 signed multiplications

(CSA) tree. According to Baugh-Wooley approach, an efficient method of adding extra entries to the bit matrix suggested to avoid having deal with the negatively weighted bits in the partial product matrix. In fig1 (a) & (b) partial product arrays of 5\*5 bits Unsigned and Signed bits are shown.

Figure 1 (c) shows how this algorithm works in the case of a 5x5 multiplication. The first three rows are

referred to as PM (partial products with magnitude part) and generated by one NAND and three AND operations. The fourth row is called as PS (partial products with sign bit) and generated by one AND and three NAND operations with a sign (with magnitude part) and generated by one NAND and three AND operations. Consider the partial products of PM. Suppose b2= b0in figure 1 (c). Then the third row can be obtained by shifting the first row by 2 bits. Likewise, shift operation can be used to obtain a partial product of different bit level as in sign magnitude multiplication.

Baugh-Wooley schemes become an area consuming when operands are greater than or equal to 32 bits. The rest of the paper is organised as follows. The Baugh-Wooley architecture is explained in section 2. Implementation results in terms of power, area, and speed 4 bit multipliers and comparison are presented.

## II. BAUGH-WOOLEY ARCHITECTURE

Hardware architecture for Baugh-Wooley multiplier is shown in fig 2. It follows left shift algorithm. Through MUX (multiplexer) we can select which bit will multiply.

Suppose we are adding +5 and -5 in decimal we get '0'. Now, represent these numbers in 2's complement form, and then we get +5 as 0101 and -5 as 1011. On adding these two numbers we get 10000. Discard carry, then the number is represented as '0'.



FIG1 (d): 5\*5 Multiplication Example of Baugh-Wooley Architecture



Fig 2(a): Signed 2's-Complement Baugh-Wooley Hardware Multiplier

# Baugh-Wooley Multiplier [5]:

Baugh-Wooley Multiplier is used for both unsigned and signed number multiplication. Signed Number operands which are represented in 2's complemented form. Partial Products are adjusted such that negative sign move to last step, which in turn maximize the regularity of the multiplication array. Baugh-Wooley Multiplier operates on signed operands with 2's complement representation to make sure that the signs of all partial products are positive.

Here are using fewer steps and also lesser adders. Here a0, a1, a2, a3& b0, b1, b2, b3 are the inputs. I am getting the outputs that are p0, p1... p7. As I am using pipelining resister in this architecture, so it will take less time to multiply large number of 2's complement but less than 32 bit. Above 32 bit Modified Baugh-Wooley Multiplier is used.



Fig 2(b): Block diagram of a 4\*4 Baugh-Wooley multiplier

# III. IMPLEMENTATION RESULTS AND COMPARISONS

In this paper, I use 4-bit pipelined multipliers and are implemented in VHDL and logic simulation is done by using ModelSim Designer and the synthesis is done using Xilinx ISE 8.2i of Device 4VFX20FF672-12. The synthesis result of 4-bit pipelined Baugh-Wooley architecture is shown in table above of the reference paper and my paper.

In my paper, I am using Brent-Kung adder (BK adder), an advanced design prefix adder, which is a very good balance between area and power cost and also it will present better performance. This adder has a complex carry and inverse carry tree. A tree can be divided into 2

types that is a tree and an inverse tree. The upper tree based on periodic power of 2. The inverse tree is offset 1, beginning from the bottom of the matrix and expanding outwards at power of 2.

The results show that the Baugh-Wooley multiplier has increased speed since clock period is only15.861ns. Pipeline stages further improve the Baugh - Wooley architecture speed. Number of LUTs represents the area required for implementation.

The number of LUTs required in Baugh-Wooley architecture is 30 compared to 32. The fan-out of the multiplier architecture is also given which directly gives the possibility of the multiplier to form large circuits. This can be extended to the pipelined multiplier architecture also to verify the parameters. Latency and speed are the important factors with pipelining under consideration. The synthesis results of 4-bit pipelined multipliers are shown in Table 2. Power consumption in Baugh-Wooley multipliers is minimum compared to other conventional multiplier units. So it clears that the signed binary multiplication through Baugh-Wooley multiplication is suited for large multiplier implementation. The improvements in constraint can be used to make Baugh-Wooley multiplier more efficient. The fan-out of the multiplier architectures are also given which directly gives the possibility of the multiplier to form large circuits. This can be extended to the pipelined multiplier architecture also to verify the parameters. Latency and speed are the important factors with pipelining under consideration. The synthesis results of 4-bit pipelined multipliers are shown in Table 2. The pipeline constraint



Fig 3: Synthesis Report of Baugh-Wooley Architecture

Output: Simulation Result for Baugh-Wooley Architecture:



Fig 4: Simulation Result for Both Unsigned Numbers Multiplication

TABLE 1 SYNTHESIS RESULTS OF DIFFERENT 4-BIT PIPELINED MULTIPLIER ARCHITECTURES

| Multiplier<br>Architecture | Number of<br>LUTS | Fan out | Clock-<br>Period<br>(ns) | Power<br>Dissipation<br>(mW) |  |
|----------------------------|-------------------|---------|--------------------------|------------------------------|--|
| Add-and-<br>Shift          | 74                | 18      | 8.939                    | 68.89                        |  |
| Baugh-<br>Wooley           | 32                | 104     | 15.029                   | 67.84                        |  |

TABLE 2
MY SYNTHESIS RESULTS OF DIFFERENT 4-BIT
PIPELINED MULTIPLIER ARCHITECTURES

| Multiplier<br>Architecture | Number of<br>LUTS | Fan out | Clock-<br>Period<br>(ns) | Power<br>Dissipation<br>(mW) |
|----------------------------|-------------------|---------|--------------------------|------------------------------|
| Add-and-<br>Shift          | 74                | 18      | 8.939                    | 68.89                        |
| Baugh-<br>Wooley           | 30                | 46      | 3.921                    | 15.3                         |

increases the speed of the multiplier considerably with a increase in power consumption. For the Baugh-Wooley multipliers, the clock period reduces to 3.321ns as a result of pipeline registers implemented. This improves the speed which may reduce due to the BK adder which I used in my architecture. The maximum delay for this architecture is 2.143ns. I am using 65 Flip Flops out of 17088 and maximum frequency is 527.037MHz, which is a good sign. The incorporation of the pipeline multipliers thus can be effectively done to make the chip efficiently



Fig. 5 Simulation Result For both Signed Number Multiplication

reconfigurable among the two reconfiguration modes and this work is in progress. The possibility of other reconfiguration constraints is under work and the implementation of the reconfiguration modes according to these constraints are the future work.

### IV. CONCLUSIONS

An efficient multiplier to deal with the latency problem is proposed. This paper presents a comparison between various multiplier architectures with area, speed and power as the main constraints. It is observed that the Baugh-Wooley architecture gives optimized values for various constraints and hence suited for long bit multiplication up to less than 32 bit. The pipelining resister techniques are used to improve the multiplier characteristics.

#### REFERENCES

- [1] Power Reduction Techniques for Ultra-Low-Power Solutions, by Virage Logic Corporation.
- [2] R.M.Badghare, S.K.Mangal, R.B.Deshmukh, R. M. Patrikar (2009), "Design of Low Power Parallel Multiplier", *Journal of Low Power Electronics*, vol. 5, no. 1, April 2009, pp. 31-39.
- [3] A. Dandapat, S. Ghosal, P. Sarkar, D. Mukhopadhyay (2009), "A 1.2- ns16×16-Bit Binary Multiplier Using High Speed Compressors", *International Journal of Electrical, Computer, and Systems Engineering*, 2009, pp. 234-239.
- [4] K.Z. Pekmestzi, "Complex Number Multipliers" IEE Proceedings (Computers and Digital Technology), vol. 136, no. 1, 1989, pp. 70-75.
- [5] Jin-Hao Tu and Lan-Da Van, "Power-Efficient Pipelined Reconfigurable Fixed-Width Baugh-Wooley Multipliers," *IEEE Transactions on computers*, vol. 58, Nno. 10, October 2009.
- [6] C. R. Baugh and B. A. Wooley, "A Two's Complement Parallel Array Multiplication Algorithm," *IEEE Transactions on Computers*, vol. 22, pp. 1045–1047, December 1973.
- [7] H. Eriksson, P. Larsson-Edefors, M.Sheeran, M. Själander, D. Johansson, and M.Schölin, "Multiplier Reduction Tree with Logarithmic Logic Depth and Regular Connectivity," in *IEEE International Symposium on Circuits and Systems*, May 2006.
- [8] E.E.Swartzlander, Jr., "Truncated multiplication with approximate rounding," in *Proc. 33rd Asilomar Conference on Signals, Systems, and Computers*, 1999, vol. 2, pp. 1480-1483.
- [9] J. Di and J. S. Yuan, "Run-time reconfigurable power-aware pipelined signed array multiplier design," in *Proc. IEEE International Symposium on Signals Circuits, and Systems*, July 2003, vol. 2, pp. 405-406.
- [10] S.Krithivasan, M. J. Schulte, and J. Glossner, "A sub word-parallel multiplication and sum-of-squares unit," *IEEE Comp. society Annual Symposium on VLSI*, pp. 273-274, Feb. 2004.



Pramodini Mohanty received the B.Tech. degree in electrical engineering from Utkal University, Orissa, in 2004. She is currently working towards her M.Tech. degree in VLSI Design at N.I.E.T, Greater-Noida. Her research interests include fast multiplication

named Baugh-Wooley Architecture and other electronic circuits. Presently she is working as a faculty member in N.I.E.T, Greater-Noida.