Atlanta Custom Software Development 

 
   Search        Code/Page
 

User Login
Email

Password

 

Forgot the Password?
Services
» Web Development
» Maintenance
» Data Integration/BI
» Information Management
Programming
  Database
Automation
OS/Networking
Graphics
Links
Tools
» Regular Expr Tester
» Free Tools

MSD RadixSort - An algorithm that can achieve O(N) behavior

Total Hit ( 2332)

Rate this article:     Poor     Excellent 

 Submit Your Question/Comment about this article

Rating


 


Click here to copy the following block
' MSD RadixSort. No sort based on comparisons can be faster than O(N log N). 
' RadixSort makes no comparisons and can therefore achieve O(N) behavior. To
' do this, it examines keys one byte at a time, counting the number of keys
' that have each possible byte value. The counts are then used to build an
' offset table specifying the sorted order of the keys. It's easiest to work
' backwards from the least significant digit of the keys (LSD Radix),
' since order based on more significant digits is not disturbed by less
' significant digits. Unfortunately, the LSD approach requires padding short
' keys if key length is variable, and guarantees that all digits will be
' examined even if the first 3-4 digits contain all the information needed to
' achieve sorted order. Most significant digit (MSD) RadixSort takes a lot
' more bookkeeping -- the list must repeatedly be split into sublists for each
' value of the last digit processed -- but the pay-off is that only as many
' digits will be examined as are needed. As in QuickSort and MergeSort,
' it's worthwhile to hand off the sublists to InsertionSort when they get
' short enough.
'
' MSD RadixSort is stable and runs in linear O(N) time. It is fairly memory
' intensive, needing space for some extra counting arrays and stack space for
' recursive calls, but still uses less memory than MergeSort. This version is
' set up for strings and would take some work to adapt to other data types. 
' Integers and longs could be handled by converting them to strings (and
' limiting the arrays CNT and IND to the ten numerical digit values plus the
' minus sign); this is probably worthwhile only if you have hundreds of
' thousands of integers or longs to sort. Doubles are more of a challenge,
' requiring conversion to an array of bytes by type casting. I have not
' pursued this, since VBA on the Mac does not include the CopyMemory function
' that enables type casting on Windows systems. 
'
' NOTE: This version is set up to count byte values from 1 to 127. If your
' keys use (for instance) only lowercase or only uppercase alphabetic
' characters or only numerical digits, you can trim the CNT and IND arrays and
' your sorts will run correspondingly faster.
'
' Reference: P. M. McIlroy, K. Bostic and M. D. McIlroy,
' "Engineering Radix Sort", Computing Systems 6(1):5-27 (1993).
'
' Speed: MSDRadixSortS sorts 500,000 random strings in 28.3 sec; sorts 100186
' library call numbers in 21.3 sec; sorts 25479 dictionary words in 1.8 sec
' (random order), 1.5 sec (presorted) or 1.9 sec (reverse sorted). Timed in
' Excel 2001 on an 800 mhz PowerBook.
'
' Bottom line: complex and best suited to strings, but there's nothing faster
' for really long lists.

' Usage: 

Dim S1(L To R) As Strings
Dim B1(1 To nChars) As Byte
Dim P1(L To R) As Long

For I = L To R
  S1(I) = GetRandomString()
Next I

StrsToBytes S1, B1, P1  'a routine that stores the strings in 0 terminated
             ' byte arrays with
          'P1() holding pointers to the start of each byte series.

MSDRadixSortS L, R, B1, P1


' CODE:

Sub MSDRadixSortS(N As Long)
  Dim CH() As Integer
  
  ReDim CH(1 To N) 
  'See below for type PILE (stack records)
  ReDim STACK(1 To 1000)
  StackPtr = 1
  With STACK(StackPtr)
    .L = 1
    .R = N
    .D = 0
  End With
  StackPtr = StackPtr + 1
  RadixS CH
End Sub

Sub RadixS(CH() As Integer)
  Dim L As Long
  Dim R As Long
  Dim D As Integer
  Dim I As Long
  Dim J As Long
  Dim TMP As Long
  Dim C As Integer
  Dim NextC As Integer
  Dim CNT(-1 To 127) As Long
  Dim IND(-1 To 127) As Long
  Dim NuCnt As Long
  Dim BigCnt As Long
  Dim OldSp As Long
  Dim BigSp As Long
  
  While StackPtr > 1
  'Pop a PILE off the stack.
    StackPtr = StackPtr - 1
  'Get left and right limits of sublist and depth of byte to examine
    With STACK(StackPtr)
      L = .L
      R = .R
      D = .D
    End With
  'Sublists of <= 24 keys will be finished by InsertionSort
    If R - L > 24 Then
    'Clear the count array.
      For I = -1 To 127
        CNT(I) = 0
      Next I
    'Get the byte at depth D in each key and count the number of each byte
    ' value.
      For I = L To R
        C = CInt(B(P(I) + D))
        CH(I) = C
        CNT(C) = CNT(C) + 1
      Next I
    'We will add the counts to create sorted addresses in array IND().
      IND(-1) = L
    'At the same time we'll push the sublists for each byte value onto the
    ' stack.
      OldSp = StackPtr
    'We'll track the biggest sublist and make sure it comes off the stack
    ' last;
    'this ensures the stack will not get more than logarithmically deep.
      BigCnt = 0
    'Now we build the addresses out of the counts.
      For I = 0 To 127
        J = I - 1
        NuCnt = CNT(J)
        IND(I) = IND(J) + NuCnt
     'If there is a sublist of keys starting with the current byte value,
     ' stack it.
        If NuCnt > 1 Then
          With STACK(StackPtr)
            .L = IND(J)
            .R = IND(I) - 1
            .D = D + 1
          End With
          StackPtr = StackPtr + 1
       'Keep track of the largest count / sublist.
          If NuCnt > BigCnt Then
            BigCnt = NuCnt
            BigSp = StackPtr
          End If
        End If
      Next I
    'Swap the biggest sublist down into the stack so it comes off last.
      TMP = BigSp
      BigSp = OldSp
      OldSp = TMP
    'Now use the counts to move the pointers to their sorted positions
    ' based on the
    'bytes examined so far; doing this in place this gets a bit ugly.
      For I = L To R
     'We will use the byte at I to find what address P(I) should be mapped
     ' to.
        C = CH(I)
     'We use -1 to flag pointers already moved.
        CH(I) = -1
     'If C = -1 we skip the loop and increment I until we find a pointer
     ' not already moved.
        Do While C >= 0
       'We go to IND(C) to get the destination address for P(I).
          J = IND(C)
       'We swap the current pointer for the one at that address.
          TMP = P(I)
          P(I) = P(J)
          P(J) = TMP
       'Now we determine where to map the pointer we just displaced.
       'We get the byte value from its key.
          NextC = CH(J)
       'We flag it as moved.
          CH(J) = -1
       'Once we've used each address we increment it unless we've hit R.
          If J < R Then IND(C) = J + 1 Else Exit Do
       'We set C to the byte of the key of the displaced pointer; ready
       ' to loop!
          C = NextC
        Loop
     'If a series of displacements circles back to a pointer already moved,
     ' we
     'increment I until we find a pointer not yet moved or until we end at
     ' R.
      Next I
    Else
      DeepInsertS B, P, L, 1 + R - L, D
    End If
  Wend
End Sub

Type PILE
  L As Long
  R As Long
  D As Integer
End Type

Dim STACK() As PILE
Dim StackPtr As Integer

Sub DeepInsertS(B() As Byte, P() As Long, L As Long, N As Long, D As Integer)
  Dim LP As Long
  Dim RP As Long
  Dim TMP As Long
  Dim I As Long
  Dim J As Long
  
  For RP = L + 1 To L + N - 1
    TMP = P(RP)
    For LP = RP To L + 1 Step -1
      I = TMP + D
      J = P(LP - 1) + D
      Do While B(I) = B(J)
        If B(I) = 0 Or B(J) = 0 Then Exit Do
        I = I + 1
        J = J + 1
      Loop
      If CInt(B(I)) - CInt(B(J)) < 0 Then P(LP) = P(LP - 1) Else Exit For
    Next LP
    P(LP) = TMP
  Next RP
End Sub


Submitted By : Nayan Patel  (Member Since : 5/26/2004 12:23:06 PM)

Job Description : He is the moderator of this site and currently working as an independent consultant. He works with VB.net/ASP.net, SQL Server and other MS technologies. He is MCSD.net, MCDBA and MCSE. In his free time he likes to watch funny movies and doing oil painting.
View all (893) submissions by this author  (Birth Date : 7/14/1981 )


Home   |  Comment   |  Contact Us   |  Privacy Policy   |  Terms & Conditions   |  BlogsZappySys

© 2008 BinaryWorld LLC. All rights reserved.