[This post originates from a twitter thread on the Collatz conjecture.]
Time for some #RandomNumberTheory. The Collatz Conjecture is a famous unsolved mathematical problem which also goes by various other names, such as the ‘3n+1’ conjecture. Since it was introduced by Lothar Collatz in 1937, no-one has been able to prove it either true or false.
➢ While I am hardly about to prove it here, we can use probability to get a feeling of what is going on, and whether it is likely to be true or false. Spoiler alert: It is probably true.
➢ The problem: Starting with a positive integer n, we do one of two things. If n is even, we divide it by 2 and, if it is odd, we replace it with 3n + 1. By repeating this step over and over, we get an infinite sequence of positive integers.
➢ For example, starting at 3, we get the sequence:
|3, 10, 5, 16, 8, 4, 2, 1, 4, 2, 1, 4, 2, 1, …|
Once the sequence hits 1, it gets stuck in the infinite loop
|4, 2, 1, 4, 2, 1, …|
Another example, starting from 7:
|7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1, 4, 2, 1, …|
➢ The question is, does the sequence always reach 1 (so end up in the infinite 4,2,1 loop)? The Collatz conjecture is that it does, for every possible starting positive integer.
➢ In principle, a sequence could grow without bound, tending to infinity. Or, it could get stuck in an infinite loop different from the 4,2,1 one. The conjecture says that these alternative possibilities do not occur.
➢ It has been checked by computer for all starting values up to 268, and found to be true. But, even a huge number like 268 is nothing in comparison to the infinitude of all positive integers. And a computer check does not help us understand why it might be true.
➢ We can look at the statistical properties of the sequence, so that it can be approximated as a stochastic process. From this we can use the theory of such processes to say what we expect to happen to the sequence. That’s what I will do in this thread.
➢ At each step of a Collatz sequence, there are two possibilities depending on if the number is odd or even. For nice statistics, it would help if this occurred independently at each step. Unfortunately, if n is odd then 3n + 1 is even. This is not independence.
➢ This can be fixed by combining each 3n + 1 step with the n/2 step inevitably following it to give a ‘shortcut’ Collatz sequence:
- If n is even, replace it by n/2.
- If n is odd, replace it by (3n + 1)/2.
E.g., the sequence starting from 7 is:
|7, 11, 17, 26, 13, 20, 10, 5, 8, 4, 2, 1, 2, 1, …|
➢ A bit of modular arithmetic shows that terms of the shortcut sequence are independently odd/even each with probability 1/2. E.g., if n is uniformly distributed modulo 8 then it is even with probability 1/2, and n/2 is uniform mod 4. Similarly, if n is odd (3n + 1)/2 is uniform mod 4.
➢ More generally, if n is uniformly distributed modulo 2M then it is odd/even with probability 1/2, and the next term in the sequence is uniform modulo 2M - 1. Continuing in this way, the first M steps of the sequence are independently odd/even, each with probability 1/2.
➢ For a random starting value, we assume that it continues in this way. Each step is independently of type (3n + 1)/2 or n/2 each with probability 1/2. As the ‘+1’ is small relative to large values of n, we approximate as follows:
➢ Each step of the sequence independently multiplies by 3/2 or 1/2, each with probability 1/2. Taking logarithms, the sequence has independent increments, equal to log(3/2) or log(1/2), each with probability 1/2.
➢ This can be approximated by a Brownian motion scaled to have standard deviation log(3)/2 and mean log(3/4)/2 at each step. This is a Brownian motion with negative drift. Such processes, with probability 1, eventually decrease without bound (i.e., tend to -∞).
➢ This is demonstrated in the plot below.
- Blue lines are base 2 logarithms of shortcut Collatz sequences (minus the initial value, so start from 0). The initial values are chosen randomly up to 1015.
- Red lines are Brownian motion with negative drift equal to log(3/4)/2.
- The black line is just the drift term.
➢ We approximated the logarithm of (shortcut) Collatz sequences by Brownian motion with negative drift. This decreases without bound or, at least, until the values are small enough that the approximation does not work well.
➢ And, we already know by direct calculation that once the values of a Collatz sequence become small, they hit 1.
So, the Collatz conjecture is probably true!
➢ I’m Almost Sure. Goodnight!