Here's a collection of a couple of algorithms that can be used to find the square root of a number.

Up until some time in the 1960's, school children were taught a really clever algorithm for taking the square root of an arbitrarily large number. The technique is quite similar to long division. Here's how it works for decimal square roots.

a) Starting at the decimal point pair off the digits.

b) Find the largest square that subtracts from the left-most pair and still yields a positive result. This is the remainder that will be used in the next step and note that it is only two (or one) digits. The square root of this largest square is the first digit of the square root of the whole number.

c) Concatenate the next pair of digits with the remainder.

d) Multiply the square root devloped so-far by 20. Note that the least significant digit is a zero.

e) The next digit in the square root is the one that satisfies the inequality:

(20 * current + digit) * digit <= remainder

where "current" is the current square root and "digit" is the next digit produced by the algorithm.

f) Form the new positive remainder by subtracting the left side of the equation in the previous step from the right side.

g) go to step (c)

As an example, suppose we want to find the square root of 31415.92653:

a) Pair off the digits

3 14 15.92 65 3 ^ ^ ^ ^ ^

b) The left most pair is 03 and the largest square that subtracts from it is 1. And since the square root of 1 is 1, the first digit of our answer will be 1.

c-g)

1 7 7. 2 ------------- ) 03 14 15.92 -1 -- 27 | 2 14 1*20 = 20 *7= 1 89 (20+x)*x < 214 ==> x=7 ---- 347 | 25 15 17*20 = 340 *7= 24 29 (340+x)*x < 2515 ==> x=7 ----- 3542 | 86 92 177*20 = 3540 *2= 70 84 (3540+x)*x < 8692 ==> x=2 ----- 16 08

The reason this works is easy to show. We can rewrite the number we wish to root:

2n 2(n-1) N = A * 10 + A * 10 + . . . + A n n-1 1

(Not finished...)

In general, the procedure consists of taking the square root developed so far, appending 01 to it and subtracting it, properly shifted, from the current remainder. The 0 in 01 corresponds to mutliplying by 2; the 1 is a new trial bit. If the resulting remainder is positive, the new root bit developed is truly 1; if the remainder is negative, the new bit developed is 0 and the remainder must be restored (thus the name) by adding the quantity just subtracted.

The example he gives takes the square root of 01011111

1 0 0 1 . ----------------- ) 01 01 11 11 . 00 -1 --- 00 01 <--- positive: first bit is a 1 -1 01 ----- 11 00 <--- negative: 2nd bit is a 0 +1 01 <--- restore the wrong guess ------ 00 01 11 -10 01 --------- 11 11 10 <--- negative: 3rd bit is a zero +10 01 <--- restore the wrong guess --------- 01 11 11 -1 00 01 --------- 0 11 10 <--- positive: 4th bit is a one

etc...

The other method does not restore the subtraction if the result was negative. Instead, it appends a 11 to the root developed so far and on the next iteration it performs an addition. If the addition causes an overflow, then on the next iteration you go back to the subtraction mode. Before I botch the explanation much further, let me quote Flores again:

As long as the remainder is negative, we proceed as in the previous section; we enter a 1 in the corresponding root bit being developed; we append 01 to this number; we shift it the correct number of times and _subtract_ it from the previous remainder.

When the remainder goes negative, we do not restore as in the previous section. First we enter a 0 as the next root bit developed. To this we append 11. This result is shifted left the proper number of times and "added" to the present remainder.

Again, the same example:

1 0 0 1 . ----------------- ) 01 01 11 11 . 00 -1 --- 00 01 <--- positive: first bit is a 1 -1 01 <--- Developed root is "1"; appended 01; subtract ----- 11 00 11 <--- negative: 2nd bit is a 0 +10 11 <--- Developed root is "10"; append 11 and add. --------- 11 11 10 11 <--- positive (i.e. didn't cause overflow): 3rd bit is a 0 1 00 11 <--- Developed root is "100"; append 11 and add --------- 1 00 00 11 10 <--- Overflow: 4th bit is a one

Here's another iterative algorithm. It's something I did by brute force (I'm sure I wasn't the first...) before I knew about the above algorithm. After implementing code for each I discovered that the two algorithms are quite similar.

If you didn't know about the techniques described above, you'd probably be inclined to implement the square root routine like so:

unsigned char sqrt(unsigned int N) { unsigned int x,j; for(j= 1<<7; j<>0; j>>=1) { x = x + j; if( x*x > N) x = x - j; } return(x); }

In other words, x is built up one bit at a time, starting with the most significant bit. Then it is squared and compared to N. This algorithm works quite well for processors that have a multiply instruction. Unfortunately, the PIC doesn't fall into that category. However, it is posssible to efficiently multiply a number by itself (i.e. square it).

For example suppose you had an 8 bit number:

y = a*2^7 + b*2^6 + c*2^5 + d*2^4 + e*2^3 + f*2^2 + g*2^1 + h*2^0

If you square it and collect terms, you get:

y^2 = (a^2+ab)*2^14 + ac*2^13 + (b^2+ad+bc)*2^12 + (ae+bd)*2^11 + (c^2+af+be+cd)*2^10 + (ag+bf+ce)*2^9 + (d^2+ah+bg+cf+de)*2^8 + (bh+cg+df)*2^7 + (e^2+ch+dg+ef)*2^5 + (dh+eg)*2^5 + (f^2+eh+fg)*2^4 + fh*2^3 + (g^2+gh)*2^2 + h^2

There are several things to note in this expression:

1) The bits a-h can only be zero or one. So, a^2 == a.

2) If we are trying to build y iteratively by starting with a, then b, etc. then all of the least significant bits are assumed to be zero. Thus most of the terms do not need to be computed at each iteration.

3) The bit that is being squared is always multiplied by an even power of two.

The following un-optimized algorithm implements this squared polynomial:

s1 = 1<<14 s2 = 0 s = 0 do { s = s + s1 + s2 if (s>N) s = s - s1 + s2 else s2 = s2 | s1 s1 = s1 >> 2 s2 = s2 >> 1 } while(s1) return(s2>>8)

And a more optimized version:

s1 = 1<<14 s2 = 0 do { if((s1 + s2) <= N) { N = N - s2 -s1 s2 = s2 | (s1<<1) } s1 = s1 >> 1 N = N << 1 } while(s1 > 2^6) return(s2>>8)

If you follow the details of this last algorithm, you will notice that all of the arithmetic is being performed on bits 8-15 i.e. the most significant bits. The one exception is the very last iteration where s1 = 2^7. Another thing you will notice is that under certain situations, N will be larger than 2^16. But, it will never be larger than 2^17 - 1. This extra overflow bit can be handled with a relatively simple trick. Instead of shifting N one position to the left, roll N one position. The difference is that the most significant bit of N will get moved to the least significant position after the roll operation. (A simple shift zeroes the least significant bit.)

To be filled in later.... The only thing I'd like to say about Newton's Method for square roots is that it works well for a processor with hardware multiplication and division support (e.g. a Pentium) and that with few iterations can converge quite quickly. However, for microcontrollers and even DSPs (without hardware division) iterative algorithms like those discussed above are more suitable.

Here's a square root routine implemented in a PIC. And here's more software.

This page is maintained by Scott Dattalo. You can reach me at home: scott@dattalo.com Last modified on 10mar00.