April 13th, 2026

1 reaction

Finding a duplicated item in an array of `N` integers in the range 1 to `N` − 1

Raymond Chen

A colleague told me that there was an O(N) algorithm for finding a duplicated item in an array of N integers in the range 1 to N − 1. There must be a duplicate due to the pigeonhole principle. There might be more than one duplicated value; you merely have to find any duplicate.¹

The backstory behind this puzzle is that my colleague had thought this problem was solvable in O(N log N), presumably by sorting the array and then scanning for the duplicate. They posed this as an interview question, and the interviewee found an even better linear-time algorithm!

My solution is to interpret the array as a linked list of 1-based indices, and borrow the sign bit of each integer as a flag to indicate that the slot has been visited. We start at index 1 and follow the indices until they either reach a value whose sign bit has already been set (which is our duplicate), or they return to index 1 (a cycle). If we find a cycle, then move to the next index which does not have the sign bit set, and repeat. At the end, you can restore the original values by clearing the sign bits.²

I figured that modifying the values was acceptable given that the O(N log N) solution also modifies the array. At least my version restores the original values when it’s done!

But it turns out the interview candidate found an even better O(N) algorithm, one that doesn’t modify the array at all.

Again, view the array values as indices. You are looking for two nodes that point to the same destination. You already know that no array entry has the value N, so the entry at index N cannot be part of a cycle. Therefore, the chain that starts at N must eventually join an existing cycle, and that join point is a duplicate. Start at index N and use Floyd’s cycle detector algorithm to find the start of the cycle in O(N) time.

¹ If you constrain the problem further to say that there is exactly one duplicate, then you can find the duplicate by summing all the values and then subtracting N(N−1)/2.

² I’m pulling a fast one. This is really O(N) space because I’m using the sign bit as a convenient “initially zero” flag bit.

Topics

Code

Author

Raymond Chen

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

8 comments

Discussion is closed. Login to edit/delete existing comments.

Turtlefight April 16, 2026 · Edited
I probably would’ve applied the pigeonhole principle literally.
Move each number in the array to the array index corresponding to its value.
If that index already contained that value then it’s a dupe.
Should also be O(N), but it does modify the array.
```
int find_any_dupe(std::vector<int> values) {
    while(values[0] != values[values[0]])
        std::swap(values[0], values[values[0]]);
    return values[0];
}
```
- Wayne Sebbens April 20, 2026 · Edited
  
  > If that index already contained that value then it’s a dupe.
  
  Unless the value at that index was already the index (eg. original array is `[4, 1, 0, 2, 2]` will say there's a dupe at `values[1]`). Unless specifically handled, any approach that treats the array as a linked list will also have this issue, as it creates a cycle without the cycle needing to be a duplicate.
  
  While not _the_ most space-efficient, having a second array to hold each checked value doesn't have the issue (or modify the array), and with a little work can be used on any object...
  Read more
  > If that index already contained that value then it’s a dupe.
  
  Unless the value at that index was already the index (eg. original array is `[4, 1, 0, 2, 2]` will say there’s a dupe at `values[1]`). Unless specifically handled, any approach that treats the array as a linked list will also have this issue, as it creates a cycle without the cycle needing to be a duplicate.
  
  While not _the_ most space-efficient, having a second array to hold each checked value doesn’t have the issue (or modify the array), and with a little work can be used on any object (via hashcode) instead of just numerics (i.e. `HashSet` in dotnet).
  
  Read less
  - Turtlefight April 20, 2026 · Edited
    
    > Unless the value at that index was already the index
    That's true - my implementation, Floyd's cycle detector, and the linked list solution all require a "safe" starting point.
    (an index number that is not a valid value for the array)
    
    But luckily the definition of the problem provides one such index:
    >[Create an] algorithm for finding a duplicated item in an array of N integers in the range 1 to N − 1.
    
    Here index 0 is the safe starting point due to the array only including numbers > 0.
    (One could also subtract 1 from all values, then the...
    Read more
    > Unless the value at that index was already the index
    That’s true – my implementation, Floyd’s cycle detector, and the linked list solution all require a “safe” starting point.
    (an index number that is not a valid value for the array)
    
    But luckily the definition of the problem provides one such index:
    >[Create an] algorithm for finding a duplicated item in an array of N integers in the range 1 to N − 1.
    
    Here index 0 is the safe starting point due to the array only including numbers > 0.
    (One could also subtract 1 from all values, then the range of values would be from 0 to N – 2, and the safe starting point index would be N – 1)
    
    Read less
P V April 14, 2026

You left out the most important piece: did the interview candidate get hired?
Joshua Hudson April 13, 2026

I immediately generated radix sort -> find all adjacient duplicates for linear runtime no extra space.

Radix sort in constant space is real; but the code is almost impossible to comprehend. I’ve also only proven it exists for one bit at a time not larger sizes.
Peter Cooper Jr. April 13, 2026

I feel like I'm missing something. I would have approached this exercise as:
Make a separate array of length N-1 with values set to 0. Iterate through the input array and use its value as an index to the new array. If the value is 0 then increment it and continue to the next elements, but if the value is 1 then you've found your duplicate. Is that not O(N) time and O(N-1) space, since you're just iterating through the input array once and adding to memory an array of length N-1?

Chasing cycles is clever, sure, and it wouldn't shock...
Read more
I feel like I’m missing something. I would have approached this exercise as:
Make a separate array of length N-1 with values set to 0. Iterate through the input array and use its value as an index to the new array. If the value is 0 then increment it and continue to the next elements, but if the value is 1 then you’ve found your duplicate. Is that not O(N) time and O(N-1) space, since you’re just iterating through the input array once and adding to memory an array of length N-1?

Chasing cycles is clever, sure, and it wouldn’t shock me if it ended up being faster for some real-world examples where one needs to do this sort of thing, especially in some memory-constrained environments where you wouldn’t want to allocate the additional array. Did the original problem description say that you couldn’t allocate additional memory in order to solve the problem, or something like that?

Read less
- LB April 17, 2026
  
  N-1 means you have an extra unused element though, since the numbers are from 1 to N-1, there’s no zero.

Stay informed

Get notified when new posts are published.

Email *

Country/Region *

I would like to receive the The Old New Thing Newsletter. Privacy Statement.

Follow this blog

Finding a duplicated item in an array of `N` integers in the range 1 to `N` − 1

Category

Topics

Author

8 comments

Read next

Why is there a long delay between a thread exiting and the `WaitForSingleObject` returning?

What’s up with window message `0x0091`? We’re getting it with unexpected parameters

Category

Topics

Share

Author

8 comments

Read next

Why is there a long delay between a thread exiting and the Wait­For­Single­Object returning?

What’s up with window message 0x0091? We’re getting it with unexpected parameters

Stay informed

Why is there a long delay between a thread exiting and the `WaitForSingleObject` returning?

What’s up with window message `0x0091`? We’re getting it with unexpected parameters