Sometimes we write code just for the fun of it. If you have ever run across something interesting and thought, “I’d like to give that a try!” you know what I mean, and who knows, you (I) might learn something along the way. That is how this came about. It started with this:
“…the information about prime or not prime was stored for 30 integers in each byte…”
Thirty integers in one byte, what? Research of that statement led to the link, “Black Key Sieve” (BKS) by Larry Deering (Deering). Mr. Deering’s work is based on Euler’s totient function, φ(n) with n = 30. This is also interesting and fun because it requires bits to be twiddled.
What follows is an implementation of Larry Deering’s BKS. It improves processing time and memory requirements of the classic Sieve of Eratosthenes, the first algorithm for calculating prime numbers. Future modifications to SieveBase30, such as segmentation, should result in even better performance.
A Sieve of Eratosthenes works in the following manner:
All the numbers not marked in the list are prime.
BKS (SieveBase30) differs in what numbers are stored in the sieve, and how numbers are marked. Before examining how elements are struck from the sieve it will be useful to see how the end product is used to determine if a number is prime or not.
This implementation of the BKS stores primes as zero-bits and non-primes as one-bits in an array of bytes. All odd numbers greater than five are represented in the sieve. BASE is a constant in the class with a value of 30.
The BKS is a base 30 (2 × 3 × 5) sieve. This means that that none of the prime factors of 30 (2, 3, 5) are stored in the sieve and must be handled by the method that determines primality of a given number. The .IsPrime method of the SieveBase30 class performs that function and the code excerpt for it follows:
Const
BASE
As
Long
= 30L
Public
ReadOnly
BaseFactors
New
List(Of
) From {2L, 3L, 5L}
'2 * 3 * 5 = 30
Function
IsPrime(PrimeQ
,
Optional
TryFactors
Boolean
=
True
)
Nullable(Of
If
PrimeQ < 2
Then
Return
False
'less than 2 is not prime
For
Each
pn
In
Me
.BaseFactors
'check base factors 2, 3, 5
PrimeQ = pn
'one of the factors of 30
PrimeQ
Mod
pn = 0L
'number to check is a mod of the factors of 30
Next
Dim
unknown
PrimeQ >
.Max
'check if number is greater than max
'try factoring
afactor
.FindaFactor(PrimeQ)
afactor = 0L
'no value
ElseIf
afactor < 0
Else
End
numStruct
OffsetRemainder(PrimeQ)
'get the offset and remainder
.theRmndrs.IndexOf(numStruct.rmndr) >= 0
'is remainder in theRmndrs then
'it might be prime
'check the bit
bit
Byte
.theBits(
.theRmndrs.IndexOf(numStruct.rmndr))
'0 is prime, 1 is not
(
.Sieve(numStruct.offs)
And
bit) = PrimeChk
'not one of theRmndrs
Private
Structure
OffsetRemainder
offs
Integer
'offset
rmndr
'remainder
Sub
(num
offs =
CInt
(num \ BASE)
'calculate offset into sieve
rmndr =
BASE)
'and remainder
Up to this point the .IsPrime method has not referenced the sieve and the base factors have been accounted for. Other types of sieves that store only odd numbers use similar methodology to account for even numbers.
The next part of .IsPrime gets to the heart of the algorithm. What is important is that all prime numbers, not in the base factors, mod 30 will have a remainder from this set of eight numbers:
{1, 7, 11, 13, 17, 19, 23, 29}
(Any Prime>5) Mod 30 ∈{1,7,11,13,19,23,29}
(you can prove this by performing a Mod 30 function on any known prime greater than five. Here is a list of some primes you can use to test this.)
How convenient; eight values, and eight bits in a byte! This means that the sieve is thirty times smaller.
.IsPrime now creates a new OffsetRemainder which takes the number being checked and returns a structure with two members. As can be seen, the members of the Structure are offs and rmndr, which are the offset into the sieve and the remainder. .IsPrime uses the remainder to access two Lists, theRmndrs and theBits. theRmndrs contains the eight remainders listed above, and theBits contain a binary mask. Table 1 shows what the lists look like and defines the relationship between bit and remainder:
Index
theRmndrs
theBits (in binary)
0
1
00000001
7
00000010
2
11
00000100
3
13
00001000
4
17
00010000
5
19
00100000
6
23
01000000
29
10000000
Table 1
Using the Structure returned .IsPrime performs additional checks. They are:
Our examination of .IsPrime shows how, “…the information about prime or not prime was stored for 30 integers in each byte…” is true. It also shows how the number being checked is broken down into a sieve offset and a remainder with corresponding bit mapping. Initialization of the sieve uses this same methodology.
Let’s take a look at an actual sieve. This sieve is for numbers up to 570. A binary dump of the 19 bytes comprising the entire sieve is shown.
Sieve size in bytes: 19
Maximum number: 570
Number of Primes: 104
Last prime in sieve: 569
Prime Percentage: 18.25 %
<<--------------0---------1---------2---------3----<>---4---------5---------6---------7--->>
0 > 00000001 00100000 00010000 10000001 01001001 00100100 11000010 00000110
8 > 00101010 10110000 11100001 00001100 00010101 01011001 00010010 01100001
16 > 00011001 11110011 00101100
+ 120
theBits
AND 01001001
121
127
00000000
131
133
137
139
143
149
Table 2
For any offset, this procedure can be used to determine the numbers represented by the offset, and whether or not they are prime. Prime numbers are shown in red in the example.
For any sieve offset, call it O, the byte at O will represent odd numbers from O × 30 through ((O + 1) × 30) – 1. Using our example, offset four, this is 4 × 30 through ((4 + 1) × 30) – 1 = 120 – 149.
The .FindaFactor method checks the number against all of the primes stored in the sieve. If the number is evenly divisible by the prime the number is not prime. Only primes less than the square root of the number being checked are tried. Why square root of some number N? This explanation from mathforum.org explains it:
“We are seeking to find out whether N is prime. We have tested all primes less than the square root of N, and found that none is a factor of N. We can stop here, because we know there can be no prime factor of N that is greater than the square root of N. Why? Because, if there were, then there would also be a prime factor less than the square root of N, and we have already found out that there is none.” (Rick)
If the number of factors for the number being checked is less than the largest prime then the primality is known certain. In other words the .IsPrime method knows primality for numbers up to (.LastPrime-1)2 .
In all sieves the goal is to mark numbers that are not prime. SieveBase30 starts marking non-primes in a loop that is similar, if not identical, to other sieves. Starting at some prime it marks products of that prime and then continues with the next prime number. The loop is terminated when the next number is greater than the square root of the maximum number in the sieve. (Rick)
SieveBase30 has a method (.markProducts) to mark numbers not prime. This method uses a set of numbers, {7, 11, 13, 17, 19, 23, 29, 31}, the next eight primes that follow the base factors (2, 3, 5) to accomplish the marking. The first thing that happens in .markProducts is:
Let’s look at that using 11 as an example.
set
product
offset
77
187
209
253
8
319
10
31
341
Table 3
The section of code that performs this is:
factors()
() {7L, 11L, 13L, 17L, 19L, 23L, 29L, 31L}
offsSV(7)
'offsets into sieve
'calculate offsets
i
= 0
To
factors.Length - 1
prod
= thePrime * factors(i)
'product
offsSV(i) =
(prod \ BASE)
'calc offset
preComputedChords()()
= {
() {1, 64, 32, 4, 2, 128, 16, 8},
() {2, 4, 8, 16, 32, 64, 128, 1},
() {4, 8, 128, 1, 16, 32, 2, 64},
() {8, 128, 2, 64, 1, 16, 4, 32},
() {16, 1, 64, 2, 128, 8, 32, 4},
() {32, 16, 1, 128, 8, 4, 64, 2},
() {64, 32, 16, 8, 4, 2, 1, 128},
() {128, 2, 4, 32, 64, 1, 8, 16}}
preComputedChordOffs
((7L * thePrime)
'calculate first remainder
'get bits to turn on (not prime) based on remainder
Chords()
= preComputedChords(
.theRmndrs.IndexOf(preComputedChordOffs))
preComputedChordOffs is the remainder of the first product times the prime MOD’ed with the base. The remainder is used to find an index in .theRmndrs. That index is used as an index into preComputedChords. Now that is a mouthful. Let’s look at our example again. . The index of 17 in .theRmndrs is 4 (see Table 1 ). preComputedChords(4) is the array that starts with 16. So Chrods() = {16, 1, 64, 2, 128, 8, 32, 4} in this example. If you do the math for each product you will find that the pre-computed chords are correct. Any prime number passed to the method will have its own set of offsets and one of the pre-computed byte arrays.
At this point we have two arrays with eight elements each. The arrays are offSV for offsets, and Chords for the bits. The name ‘Chords’ comes from BKS and was used here for clarity when comparing BKS to this implementation. Now we are ready to mark all products of the prime which the following piece of code does:
svB
sv.Length - 1
Step
primeNum
realoffs
'used as the index into the sieve
Chords.Length - 1
'the following statement is required because the offsets are not linear
'e.g. offsets for 11 are 2, 4, 4, 6, 6, 8, 10, 11
realoffs = offsSV(i) + svB
realoffs < sv.Length
'within array bounds?
sv(realoffs) = sv(realoffs)
Or
Chords(i)
'turn the bits on to indicate not prime
Exit
The outside for loop iterates the sieve in increments of the prime number being marked.
The inner for loop iterates Chords and in the process calculates an index into the sieve for the iteration. In the example we have been using, prime number 11, the first time through the offsets would be those shown in Table 3 . Every iteration the offset is increased by 11 until the end of the sieve is reached. As you may have guessed the smaller the prime the more times the loops execute.
To speed the process up the four smallest primes (7, 11, 13, and 17) are marked by creating an array of 17,017 (7 x 11 x 13 x17) bytes and cancelling them within that. Then this smaller array is copied to the sieve in blocks of 17,017 bytes. When complete the main loop can start from 19. This method is most helpful for larger sieves.
Now that you have seen how it works, let’s look at how it can be used.
If you want just the class SieveBase30 (SB30), it is available here: https://skydrive.live.com/redir?resid=95E9849B11BDE22F!171&authkey=!ANrVw-Biikrdp0s&ithint=file%2c.zip .
The first declaration seen in the form is the instantiation of a SieveBase30 class. The New method can take up to two arguments. The first argument is the limit, and tells the class to include all primes <= that number in the sieve. In this case it is 1,000,000. The New method accepts a second argument that determines if the method should return before the sieve is initialized. The default is to wait for sieve initialization to complete before returning. Sieve initialization is performed on a background thread. By setting this argument to false the initialization code can be started when the form is loaded. Loading of the form can proceed while the sieve is being initialized. Methods are provided to indicate if sieve initialization is complete, and progress if not. If you choose NOT to wait the Ready method should be checked before the sieve is used.
The Shown event is next. It initializes a ComboBox with some sieve sizes that will be used. You will note that there is a Debug.WriteLineIf statement in the event handler. If you have a multi-core processor you will likely not see this message. When the handler is completed the sieve is ready for use.
Next are two utility methods. The ShowStats method shows statistics about the sieve. The ShowProgress method shows the progress of sieve initialization. Depending upon the size of the sieve, number of cores, and speed of your system, you may or may not see the progress. Note that a TrackBar was intentionally used instead of a ProgressBar.
At this point the sieve is ready to use. Type numbers into TextBox1 and you will see the primality of the number in Label1 and the prime factorization of the number in TextBox2, if possible. The TextBox1.TextChanged event handler is used to accomplish this. If a number cannot be factored it will have a term with a negative exponent. The negative exponent should not be interpreted mathematically. The term with the negative exponent, disregarding the exponent, is what is left to be factored. The primality of this term is unknown.
ComboBox1 allows you to change the maximum number that the sieve contains, prime numbers less than or equal to the number selected. If you are wondering why 7,919 is in the list, it is because 7,919 is the 1,000th prime.
Button1 shows the first 1,100 primes if there is that many. It uses the IEnumerable implementation in the class to do this.
Button2 generates numbers randomly and shows the factorization of the number. Try this with different size sieves to see the difference.
Button 3, on the Encryption tab, shows simple RSA encryption. It is simple because it limits the size of the prime numbers being used, but it gives insight into the algorithm.
The SieveBase30 class has methods to .Save / .Load the sieve to / from disk. The default location is a file on the desktop with a filename of "primesieve.txt". To change the filename / location pass a path to the method or set the Path property. The Path property will contain the last path used to save or load.
The last method in the class is a self-check. It was used during the development of the class to ensure that changes to the code were viable.
The SieveBase30 class has many other methods and properties that you can explore on your own.
The SieveBase30 class improves on the performance of the classic Sieve of Eratosthenes. It also improves on memory requirements by storing the sieve in a compressed format. Your comments and suggestions are welcomed.
'Imports System.Runtime.CompilerServices
Module bitcount
Private Function CountBits(num As ULong, Optional sign As Boolean = False) As Integer
' based on http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel
' m1 = 0x5555555555555555; //binary: 0101...
' m2 = 0x3333333333333333; //binary: 00110011..
' m4 = 0x0f0f0f0f0f0f0f0f; //binary: 4 zeros, 4 ones ...
Dim i As ULong = num
i -= ((i >> 1) And &H5555555555555555UL) 'm1 ops = 3
i = (i And &H3333333333333333UL) + ((i >> 2) And &H3333333333333333UL) 'm2 ops = 4
i = (i + (i >> 4)) And &HF0F0F0F0F0F0F0FUL 'm4 ops = 3
i += i >> 8 'ops = 2
i += i >> 16 'ops = 2
i += i >> 32 'ops = 2
i = i And &H7FUL 'ops = 1
If sign Then i += 1UL 'ops = 1
Return CInt(i) 'total ops = 18
End Function
'unsigned
<
Extension
()>
Public Function CountBits(num As Byte) As Integer
Return CountBits(CULng(num), False)
Public Function CountBits(num As UShort) As Integer
Public Function CountBits(num As UInteger) As Integer
Public Function CountBits(num As ULong) As Integer
'signed
Public Function CountBits(num As SByte) As Integer
Return CountBits(CULng(num And &H7F), num <
<Extension()>
Public Function CountBits(num As Short) As Integer
Return CountBits(CULng(num And &H7FFF), num <
Public Function CountBits(num As Integer) As Integer
Return CountBits(CULng(num And &H7FFFFFFF), num <
Public Function CountBits(num As Long) As Integer
Return CountBits(CULng(num And &H7FFFFFFFFFFFFFFF), num < 0)
End Module
Contributors and Sean Eron Anderson. Bit Twiddling Hacks. n.d. <http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel>.
Deering, Larry. The Black Key Sieve. 1998. <http://www.qsl.net/w2gl/blackkey.html>.
Rick, Doctor. Ask Dr. Math. 13 June 2001. 13 Jube 2001 <http://mathforum.org/library/drmath/view/56001.html>.