Addendum to "Is this the simplest (and most surprising) sorting algorithm ever?"

Paper on arXiv
Internet reactions to the algorithm
There is at least one person who likes it:

"What I find beautiful here is that such a combination of short, concise, counter-intuitive and **correct** exists."

Q: Is the paper a joke/parody/prank?
Q: Sorry to see that you have a mental breakdown, but this is what people call a "crackpot" paper.

A: It's meant to be funny, but not a "joke" as in "April fool" kind of joke. It is not, for example, a prank where you submit some rubbish to conferences to test if it is a junk conference; or that I deliberately write something complicated to trick you into believing it is something big and important and by the end you realise it is just bubble sort. (I know what bubble sort is. As it turns out, it is you lot who don't know what bubble sort is. Read on...)

And for those people who thought that I meant this as a serious paper: have you checked the name I gave to the algorithm? I even wrote "there is nothing good about this algorithm" in the paper. Why would any serious paper include such a sentence? Look, no one is claiming that the paper is "serious research", or is some sort of major discovery, or that this algorithm is better than the existing ones, other than perhaps simplicity and surprising-ness, and even that is very debatable of course. As someone commented, "the whole paper is just a fun little note". Why so serious?

It's difficult to explain what sort of "funny" it is if you don't get it, but I guess it's mostly the surprise that a seemingly innocent algorithm behaves in an unexpected way. Most people who read (and understood correctly) the paper, said they experienced the following sequence of thoughts: from "of course it works", to "wait, this couldn't possibly work", to questioning their sanity and their ability to dry-run simple algorithms, to finally understanding how it works. If you haven't gone through all that, you are most likely not yet getting it.

Having said that, I do consider this a legitimate piece of work... albeit about something very unimportant. (This "doing unimportant things seriously" is another funny aspect of the paper, I guess.) The intended audience was really the arXiv cs.DS readers only, who I hope would find the "surprise" and "beauty" to be a little bit of fun. What I didn't realise, and maybe I should have, is that because it is about "sorting", a topic that every beginner is exposed to, it will get the attention outside of cs.DS, and now every programmer or CS student think the paper is meant to be serious, and either think (i) some guy wrote something stupid, or (ii) "I can do research too if this is what research is."

And no, I haven't gone insane (yet), although if I see any more mockeries like "Hey this guy proved P=NP man!!" I may soon be.

Before you read any further, you should probably read again the direction of the comparison in the algorithm:

for i = 1 to n
  for j = 1 to n
    if A[i] < A[j] then
      swap A[i] and A[j]

(Yes, I claim that this direction sorts it in increasing order.)

Let's see who is the crazy one...

Q: This is just (a worse/unoptimized version of) bubble sort.

Q: This is just the most natural idea: check every pair to see if they are out of order, and swap if they are. Of course it sorts. What else do you expect?

A: There are so many people misremembering what bubble sort is, that I thought I need to write a separate page about it. It explains that, if you have the following code in mind (or anything of that general shape that swaps A[i] and A[j], such as the one in my paper), it is not bubble sort:

for i = 1 to n
  for j = i+1 to n
    if A[i] > A[j] then
      swap A[i] and A[j]

Or you may be thinking of this one, which is indeed an unoptimized version of bubble sort, but again is not the one in the paper (even though they look almost the same):

for i = 1 to n-1
  for j = 1 to n-1
    if A[j] > A[j+1] then
      swap A[j] and A[j+1]

But if you don't want to read all that, never mind. The name is not important. Whatever you think that "my" algorithm is equivalent to, ask yourself: does it sort in increasing or decreasing order? (Again, have you looked at the inequality sign?) And no, it is not just "swap each out-of-order pair". Convince yourself that, given an already sorted input like A = [1, 2, 3, 4], this algorithm will move them like this:

[1 2 3 4] -> [2 1 3 4] -> [3 1 2 4] -> [4 1 2 3] -> [1 4 2 3] -> [1 2 4 3] -> [1 2 3 4]

Q: This is BS, no sane sorting algorithm behaves like that. It stays at [1 2 3 4], never moving anything, just like bubble sort. I ran it in my head.

Q: This is BS, I noticed the opposite inequality sign. If instead the input is [4 3 2 1] to begin with, then it will stay at [4 3 2 1], never moving anything, just like bubble sort.

A: Don't believe your head. Type it out and run it. See for yourself here.

Q: This is just bubble/selection/exchange/... sort with an extended loop range. This gives extra comparisons that won't make any difference. There is nothing surprising about it.

A: You skipped the previous questions, didn't you?

Again, please look at the inequality sign, and again, does your bubble/selection/exchange/... sort destroys an already sorted input like that?

And yes, I'm claiming that those extra "spurious" steps will change the results. The following two algorithms give exactly opposite results:

for i = 1 to n
  for j = i+1 to n
    if A[i] > A[j] then
      swap A[i] and A[j]

for i = 1 to n
  for j = 1 to n
    if A[i] > A[j] then
      swap A[i] and A[j]

Yes, I know it is difficult to believe (hence "surprising"). Code it up and try yourself. I didn't believe it myself initially.

When presented with these two algorithms, I guess most people's reactions fall into two camps:

That the second algorithm just makes some extra comparisons that would not result in any swaps since they are already in the right order, hence the result is the same; or
It will jumble up the whole thing so it won't sort at all.

I was in the second camp. But neither camp is right.

I explained this in the paper, if anybody cared to read it before commenting.

Q: But if you change the j loop to be 1 to i-1 instead, then the output would NOT be different. So this IS just insertion sort with extra unnecessary comparisons that have no effect.

A: Yes, now THAT is (almost) correct. (Except that some of those extra comparisons do lead to swaps, resulting in the silly movement of the maximum in the first iteration of i.) Which was stated in the paper as the "improvement" that I don't like. And also if you change that, the comparison is not that counterintuitive (or maybe not at all) because j is always less than i.

In summary, if you begin with having the j loop from 1 to n ("my" algorithm), and narrow it down to just 1 to i-1, then it becomes insertion sort; if you narrow it down to i+1 to n instead, it becomes exchange sort / selection sort (they are essentially the same thing) but you need to reverse the inequality sign. So you could say this is a "unification" of insertion sort and selection sort. (BUT NOT BUBBLE SORT!)

Or to view it the other way round:

----------------------     ----------------------
| INSERTION SORT     |     | SELECTION SORT     |
| ...                |     | ...                |
|   for j = 1 to i-1 |     |   for j = i+1 to n |
|     if A[i] < A[j] |     |     if A[i] > A[j] |
|       ...          |     |       ...          |
----------------------     ----------------------
               \                 /
                \               /
                 \             /
             -----------------------
             | CAN'TBELIEVESORT    |
             | ...                 |
             |   for j = 1 to n    |
             |     if A[i] ?? A[j] |
             |       ...           |
             ----------------------

The comparison directions in insertion sort and selection sort are by themselves not surprising, since in the first case j is always smaller than i, and in the second case j is always larger than i. So both of them only swap real inversions. But when you "merge" them together like above, it is not obvious (to me) what that ?? operator should be (if it could be made to work at all). Turns out < would do, even though it would swap some non-inversions to inversions.

You could also more "faithfully" merge the two algorithms by doing this in place of the ??:

if (i > j && A[i] < A[j]) || (i < j && A[i] > A[j])

but then it is not surprising and not "beautiful".

And indeed the algorithm is just insertion sort (the insertion sort invariant is still satisfied, despite the silly movements in the first iteration). So maybe it's not surprising after all. But it seems most people who say it is not surprising, say so because they think it is "bubble sort" or "selection sort", or that the algorithm "find the minimum in A[i..n] and put it in A[i]" which is totally not what it is doing.

It's not that interesting / it should not be a paper.

Q: What's the big deal about it? You have a proof? Yeah great.

The proof is of course not the big deal. It's the least important part of the paper. Maybe I should not have mentioned that I have a proof in the abstract. Or maybe even remove the proof from the paper altogether.

Some comments made by novices are like "I think the value of the paper is that he gives a formal proof..." which made me feel bemused and a bit sorry at the same time.

So what's the big deal about it? It's fun!

Q: Bubble/insertion/selection/... sort is simpler. It only makes half as many comparisons.

A: It seems a lot of people mix up "simplicity" with "number of steps". Of course it makes (more than) twice the number of steps than any other normal O(n^2) time algorithms. That's not what "simplicity" means.

You can argue whether it is simpler than any of the bubble/insertion/selection/... sort, but it certainly isn't more complex, and certainly not because "it takes more steps".

Q: So this algorithm is just stupid with all these extra swaps! Why would anyone use it?

A: Yes, it is stupid. No one says it is clever. And no, please don't use it, please.

Q: How can such rubbish be published then?

A: Depends on whether depositing things on arXiv counts as "publish". (More below.)

Also because it is, er, dare I say, "beautiful"? Some people's immediate reaction was "Why is someone writing a paper about an O(n^2) sorting algorithm??" as if somehow n^2 is the dividing line between what is worth publishing and what is not. Mathematical "beauty" or "elegance" is probably unteachable and maybe subjective, but it certainly is not about "how fast/practical it is".

Q: This was the first sorting algorithm I came up with when I was 12. That was 40 years ago. I guess anything can be written up as a paper these days.

First, are you absolutely sure you invented "my" version all those years ago, and not all the other versions above (such as exchange sort) which are far more natural? Or are you still not seeing that those "minor" differences in the loop ranges are actually crucial?

And if you did come up with "my" version, were you not bothered about the direction of the inequality? That it would mess up an already sorted sequence [1,2,3,4]? That the worst case is not [4,3,2,1]? That it could make 7 swaps to a 4-element array that could have at most 6 inversions? Were you just satisifed that "it works", so moved on without figuring out why? If you did analyse it, or found all of that to be natural and obvious and required no analysis, then well done. Maybe let us know so we can attribute this to you. I may or may not be the first person to write about it (this way), but I am certainly not the first person to "discover" it.

Q: OK, it is simple and mildly surprising, but I don't think it is "simplest" or "most surprising".

A: These are subjective of course. Some people have given very well-justified arguments that some other sorting algorithms are simpler. But I think the speed at which people dismiss this algorithm, is itself an argument that it is extremely simple.

Similarly, in terms of how surprising it is, just look at how 90% of the Internet misunderstood it. If you think it is trivial without noticing the inequality sign, then surely it should be very surprising once you realised it?

Although, I think I will call it "the most misunderstood sorting algorithm" instead, if I ever write a revised version of the paper.

Q: Is it really this bad? I can see that it is counterintuitive. Do other people not?

A: It's bad, I'm afraid. Although many who misunderstood this may be students, or programmers who left college/university a long time ago, or academics in CS-"related" but not core areas, there are some who have CS PhD degrees at high-ranking universities, who couldn't bother to read beyond the 4 lines (or notice the inequality sign) and then proceeded to write something on Twitter that they might subsequently regret.

Then again, those can be seen as a "compliment" that this algorithm is indeed very simple and very counterintuitive.

Q: The algorithm is not new.

Q: People have quickly found that it appeared numerous times on the Internet. As an academic you should have done your literature search more properly.

Q: This is the sort of thing beginners accidentally come up with. I'm sure it appeared numerous times in first-year students' work. It's not worth writing about.

A: First, thanks to those who dug out all those prior appearances of it in stackoverflow, github and elsewhere. (See the Hacker News link at the responses page.) I checked TAOCP, wikipedia, a few textbooks, and googled a few search terms like "surprising sorting algorithm". I thought about just typing the code directly into google, but I thought nothing meaningful would come up. You guys proved me wrong. Yes, I knew this surely must have been stumbled upon numerous times. But I didn't know how to search for it.

I didn't say the algorithm was new. The word "new" does not appear even once in the entire paper. Of course you can argue that by writing it like a research paper I am implying that it is new, since research should be about new things. Well, as far as I'm aware, the proof and the best-case / worst-case inputs analysis have not appeared before.

I think some random coder or student stumbling upon it (not knowing what they are doing) shouldn't count as a "discovery". So far (more than a year after I wrote about it) no one has yet found the algorithm previously described in any academic paper or textbooks. Sure, no one would be bothered to write about it seriously as "research result" or even treating it like a proper algorithm in textbooks. But I thought it would be a good textbook exercise, actually, so I half-expected it would appear as an exercise somewhere. But then again, due to how counterintuitive it is, I thought once someone made it as an exercise somewhere, it should have become much more widely known, to the point that I (and everyone else) would have seen it and used it in our own exercises etc, and we wouldn't have 90% of the Internet misunderstanding it. Like someone commented, this would be a FAANG-type interview question.

Q: This should not be a paper. It has no research value.

Q: The paper should have been a blog post. It is not serious academic research to be submitted to arXiv. It is not a place for comedy.

A: It wasn't meant to have any research value in the conventional sense. Although it seems to have produced an unintended one, namely we discovered that so many people don't know what bubble sort is (but thought they do). So in a funny way, the mockeries and ridicule actually justified the existence of the paper.

This is certainly not the kind of paper you would submit to a proper serious research journal (and no one is trying to - hence the writing style). Should arXiv have a "lower standard" than a journal? You can make legitimate arguments both ways, I think. Given that arXiv have minimal level of moderation and no peer-review, in my opinion it should accept any academic research regardless of the "significance" of the work.

Let's face it, often people come up with some minor results, or not even "results", just some "observations", that clearly wouldn't make it to a proper publication. But people feel they should be written up, so they put it on arXiv and be done with it. You may or may not agree arXiv should be used this way, but that is what people do.

Is this "paper" academic enough? In my opinion, it has enough original "observations" in it. (I would challenge anyone to prove the worst-case number of swaps made by the algorithm, and the inputs that attain it, without looking at the paper. I like that result (in Section 4) most actually, I'm slightly disappointed no one talks about it.) Is it research-level? Probably not: most or all of it could probably be proved by an above-average undergraduate, once you tell them what to prove. Is it important? Probably not, but look at how most of the Internet misunderstood the algorithm (or bubble sort). I think the reaction of the Internet already proves that it should be published somewhere in some form that is more than a blog post.

Are there other "funny" but "unimportant" work deposited in arXiv? Now I don't mean to belittle other people's work, but multiple people (here, here) compared it to this paper on an accidentally correct implementation of the Floyd-Warshall algorithm. "Fortunately" for those authors, the average programmer doesn't know Floyd-Warshall so there wasn't anything like the ridicule that my paper received.

Research should be about satisfying someone's curiosity. I find the behaviour of this algorithm curious enough, that I think it should be written up. If you don't find it curious, either you haven't understood it, or we have different academic tastes (which is fine). And I would value curiosity over "new-ness" or "improvement" to existing results. Let's be honest, most academic papers are never read by anyone because they are just not interesting: they make up some artificial "new" problems or solutions that are just minor variations of existing things. Even though they are technically new, no one cares. For all those people who claimed they knew about this algorithm all along, either you haven't actually realised its curious properties, or if you do, then I would argue that "not bothered to write about it" is not the proper way an academic should handle it.

Q: So you are not going to submit it to a proper journal?

A: I do not plan to. At least not until people can read the inequality sign. And maybe someone will discover that it was already published somewhere 30 years ago (in which case you can direct your insults to that author).

Other thoughts

Q: OK, now I see... and this is just mind-boggling. How is that possible? I know it works (I coded it up) and there is a proof, but I just can't get my head round it.

A: For me, I also couldn't get it initially. I was trying to dry-run it in my head, and the result is different from what the program does. It was only when I did the dry-running on paper when I realised, that even though I know that in some steps i is bigger than j, somehow I still subconsciously have i pointing to the "left" element and j pointing to "right" element in my mental picture when it comes to the line comparing A[i] and A[j]. It's surprising how much this "i is to the left of j" idea sticks to one's mind.

I think a good "future research direction" would be along the Computer Science education / cognitive science lines: Why do we find this counterintuitive? Why we can't get our heads around the idea that i > j? Why do so many believe changing the loop range won't make any difference? (Or how do so many people mis-remember bubble sort?)

Q: There was a stackoverflow post about this on almost the exact same day, was that you?

A: No, I have nothing to do with any posts in stackoverflow or Hacker News or anywhere else. I don't know any of those guys, either (AFAIK). It may be just an enormous coincidence.

Just to clarify the timeline more, I submitted the first version to arXiv on Saturday Oct 2 morning, UK time. (I have email receipts to prove this, in case anyone feels it requires "proof".) Then I made a few revisions before the Oct 4 18:00 GMT cut-off time. The final revision was in Oct 3 22:28 GMT, the time shown on arXiv. It became public on Oct 5 00:00 GMT, I believe.

Q: Is that arXiv id 2110.01111 part of the joke?

A: I wish I know... I have no control over the arXiv id, I don't know who the admins are. I was as surprised (and bemused) as anyone when I saw the id. I mean, seriously, 01111? Maybe 01234 would be even more fitting. But 01111 is also sorted (and in honour of the zero-one principle?) Maybe the arXiv admin read it and want to be part of the fun...

Q: You really want us to use that name?

A: Well, time will tell what people decide to call it. I considered a number of other options such as "stupid sort", "unbelievable sort", "simplest sort" and so on but they are all too... bland. Just aren't cheeky enough. Not in tone with the rest of the paper. I also tried to incorporate the words "deceptive" or "deceptively simple" somehow but the phrase "deceptively simple" can mean two opposite things. I wanted to call it "I can't believe it sort", but it is not grammatically correct. (As you know, all sorting algorithms have to be called XXX sort.) So the solution is to stuff an extra "can", which I'm not entirely happy with, especially as "can" appears twice within the space of six words, but that's the best I could come up with.

I noticed two words used quite often by other people to describe it are "cursed" and "perversed", but I'm not going to call my algorithm perverse am I? Some other people just abbreviated it to "ICBIC sort".

Q: What have other people said about the paper?

A: I have collected some reactions (positive and negative) which you can find here.