CollatzCrypt: Memory efficient breadth-first searches of infinite procedural graphs (Part I)

The entire project is available on GitHub to view and download. It requires only Python to run. If you also have Pygame installed, it is used to draw graphs that supplement the console output.

For my semester project in 2017, I decided to finalize a program I had already been working on for several months as a hobby: An algorithm to travel between any two input numbers, using only the operations from the Collatz Conjecture to construct a path.

My goals for this project:

Demonstrate that a transformation between any two positive integers can be created by applying 4 unique operations in a finite sequence,
Compute the shortest sequence for two input numbers,
Provide an example of how these sequences can be used in cryptography.

In the Collatz conjecture, the appropriate operation is chosen based on whether N is even or odd; N->N/2 is applied only when N is even, N->3N+1 is used otherwise. This ensures that N remains an integer, but also allows only one sequence of operations to occur after the starting point N. In my program, the inverses of these two operations (N->2N, N->(N-1)/3) are also available, and an operation is allowed whenever the result is an integer.

This gives freedom of movement throughout the number line.

Rather than selecting new operations based on a rule regarding the current number, a sequence can be created by an outside program, which has memory (See: the difference between Langton’s Ant and Langton himself). The numbers visited are like intersections in a maze, and the sequence generator acts like a maze solver.

I did not know about graph theory when I created this, but I ended up creating a breadth-first undirected graph search. The graph consists of all numbers which satisfy the Collatz conjecture, and the edges are the 4 operations.

The first version of the program was not so advanced – it searched by iteratively choosing the option which was closest to the target number, but avoiding numbers which had already been used. This was inefficient, and not set up to find the shortest path.

SC325-Early_stacked_solver_long_path_with_upper_bound — A particularly long path (166377 steps) found by the stacked solver. *Horizontal axis is the index of the step taken, Vertical axis is the linear value of the number visited by a step. Values start at 0 at the top of the window and increase with distance*

The quickest path to the target number would never be guessed by this solver, if any of the steps in the quickest path were not the longest steps that could be taken. That is to say, when creating a path from 64 to 256:
64 (*2) -> 128 (*2) -> 256
would always be turned down in favor of:
64 (*3+1) -> 193 -> (a lot of aimless wandering) -> 256

Originally, the stacked solver could become barred from reaching the goal by a wall of numbers that it had already used, and would be forced to walk further and further from the goal until it ran out of memory. (Even with edge prioritization, it turns out depth-first searches of infinitely large graphs often run out of memory. weird.)

I took 3 approaches to improving the performance (and reliability) of this method:
1. Sorting the history of visited numbers and using a sorted search.
2. Putting a limit on the highest value that can be visited, backtracking to find paths which stay below it.
3. Starting at the goal, Generate a pool of related numbers. terminate the depth-first search when it reaches any number in this pool.

I found that the third approach was very effective. Although preparing a large pool naturally required more memory, it required a predictable amount of memory, and consistently reduced the runtime of the stacked solver.

Not to scale - I tested goal pools with sizes comparable to the input numbers, typically thousands or millions of items — A depth-first search which ends upon reaching a value in a pre-generated goal pool

SC336-Stacked_solver_with_goal_pool_and_frequency_graph — A much shorter path is found by creating a large goal pool, and allowing the Stacked Solver to run until it reaches a number present in the pool. *The new graph on the right side of the window is a frequency map of how often different value ranges are visited by the solver; the vertical axes of the two graphs correspond directly.*

In some experiments, it even reduced the runtime of the stacked solver to 0 steps: When the goal pool grows so large that it includes the starting value, the stacked solver doesn’t need to take any steps towards the goal. It is finished as soon as it starts. This led me to create my current version of the program, which doesn’t use a stacked solver at all – paths are found by creating two pools, one at the starting number and one at the goal, and expanding them both until they intersect.

Note: the node near the bottom of the graph has no children because the intersection was found before reaching this node. This behavior is present in my latest version of the program — Finding a shorter path by expanding two pools until they intersect, eliminating the Stacked Solver altogether

This method ensures that the first path found is the shortest possible, or shares this shortest possible length with other paths.

But the memory usage of these pools grows exponentially with the length of the shortest path.

In my next post, I will show how pools can be created with much lower memory usage, while maintaining or increasing performance. I’ll also cover the structure of the pools I used, how I navigated them to determine the path by which the intersecting items were generated, and how a program for graph navigation such as CollatzCrypt can be used to encrypt small payloads using a private key.