I have an array of n integers that can only assume*log n*possible values (and any value). For example, in`S = [349,12,12,283,349,283,283,12]`

, there are only 3 different numbers`(log 8 = 3)`

.

I have to sort this array in less than`O(nlogn)`

time. Which algorithm should I use? Maybe Radix Sort with Counting Sort? What about its analysis?

Since it is a given that there will be only`log(n)`

unique elements, you could get the sorted list in`O(n)`

time using the algorithm below:

- create a mapping of how many distinct items are there in the list, and keep the count of each (basically a dictionary/hashmap of key, count)
- This is a single iteration over the input list, so
`O(n)`

steps.

- This is a single iteration over the input list, so
- Sort the above list of tuples of size
`log(n)`

based on their key.- Say we use merge sort, then the time complexity of merge sort is
`k*log(k)`

, where`k`

is size of the input. - Replacing
`k`

with`log(n)`

, we get complexity of this step as`O(log(n)*log(log(n)))`

. - Since in terms of complexity,
`O(log(n)*log(log(n)))`

<`O(n)`

, so overall complexity till this step is`O(n) + O(log(n)*log(log(n)))`

, which is equivalent to`O(n)`

again.

- Say we use merge sort, then the time complexity of merge sort is
- Iterate over the sorted keys, and generate the sorted list using a single for loop, wherein each value is repeated by its count. There will be max
`O(n)`

iterations here.

So overall, the algorithm above will run in O(n) time.

Radix Sort complexity is`O(dn)`

with d as the number of digits in a number.

The algorithm runs in linear time only when d is constant! In your case d = 3log(n) and your algorithm will run in`O(nlog(n))`

.

I'm honestly not sure how to solve this problem in linear time. Is there any other piece of information regarding the nature of the numbers I'm wondering if there is any other piece of information missing about the nature of numbers...

Well, here's a simple implementation of an MSD radix sort for DNA. It's written in D because that's the language that I use most and therefore am least likely to make silly mistakes in, but it could easily be translated to some other language. It's in-place but requires 2 * seq.length passes through the array.

```
void radixSort(string[] seqs, size_t base = 0) {
if(seqs.length == 0)
return;
size_t TPos = seqs.length, APos = 0;
size_t i = 0;
while(i < TPos) {
if(seqs[i][base] == 'A') {
swap(seqs[i], seqs[APos++]);
i++;
}
else if(seqs[i][base] == 'T') {
swap(seqs[i], seqs[--TPos]);
} else i++;
}
i = APos;
size_t CPos = APos;
while(i < TPos) {
if(seqs[i][base] == 'C') {
swap(seqs[i], seqs[CPos++]);
}
i++;
}
if(base < seqs[0].length - 1) {
radixSort(seqs[0..APos], base + 1);
radixSort(seqs[APos..CPos], base + 1);
radixSort(seqs[CPos..TPos], base + 1);
radixSort(seqs[TPos..seqs.length], base + 1);
}
}
```

Obviously, this is kind of specific to DNA, as opposed to being general, but it should be fast.

Edit: I got curious whether this code actually works, so I tested/debugged it while waiting for my own**bioinformatics**code to run. The version above now is actually tested and works. For 10 million sequences of 5 bases each, it's about 3x faster than an optimized**introsort**.

Let's look at en example with two-digit decimal numbers:

`49, 25, 19, 27, 87, 67, 22, 90, 47, 91`

Sorting by the first digit yields

`19, 25, 27, 22, 49, 47, 67, 87, 90, 91`

Next, you sort by the second digit, yielding

`90, 91, 22, 25, 27, 47, 67, 87, 19, 49`

Seems wrong, doesn't it? Or isn't this what you are doing? Maybe you can show us the code if I got you wrong.

If you are doing the second bucket sort on all groups with the same first digit(s), your algorithm would be equivalent to the recursive version. It would be stable as well. The only difference is that you'd do the bucket sorts breadth-first instead of depth-first.

**UPDATE**

Check this answer :sort O(nlogn)

- Count the number of each element in the list (using a hashtable) (time: O(n)).
- De-duplicate the list (time: O(n)).
- Sort the now-de-duped items (time: O(log n * log log n)).
- Build a list with the right number of copies of each item (time: O(n)).

Overall, this is O(n) and easy to implement.

Here's some python that implements this:

```
def duplicates_sort(xs):
keys = collections.Counter(xs)
result = []
for k in sorted(keys):
result.extend([k] * keys[k])
return result
```