Throwing compute at math with Axplorer

April 5, 2026

Over the past week, I have been renting cheap GPU instances and trying to use them to solve open problems in mathematics. Without writing a single line of code and without doing any math.

I didn't produce any new results, yet™. On a somewhat related note, I hope that a big NVIDIA GPU falls out of the sky and into my hands, along with unlimited Claude Code credits so I can tackle more problems.

Ramsey numbers

You've walked into a room with 45 people. Can you find a group of 5 people that all know each other, OR a group of 5 people that are all strangers to one another? If there were only 42 people, you wouldn't always be able to find such groups. If there were 46, you always would. But 45? Nobody knows. 44? Also unknown.

The smallest number of people needed for these groups to always exist is called a Ramsey number. In the example above, it's R(5, 5) - the specific case of a group of 5 that know each other and a group of 5 that don't know each other.

Put in a more philosophical way, how much chaos can there be before structure appears?

Here is a famous quote by Paul Erdős:

"Suppose aliens invade the earth and threaten to obliterate it in a year's time
unless human beings can find the Ramsey number for red five and blue five. We
could marshal the world's best minds and fastest computers, and within a year
we could probably calculate the value. If the aliens demanded the Ramsey number
for red six and blue six, however, we would have no choice but to launch a preemptive attack."

In the quote above the groups are referred to as red and blue. This is because the problem can be (and often is) formulated as a graph coloring problem. Consider a complete graph, where each vertex represents a person and every 2 vertices are connected with an edge. We color the edge between vertices red if the people know each other, blue if they don't know each other.

If we can find a way of coloring a complete graph with N vertices, such that it doesn't have a red group of size x or a blue group of size y, we have shown that R(x, y) > N!

To contextualize the difficulty of these problems, it took 80 CPU years^[1] to improve the upper bound R(5, 5) ≤ 47 to R(5, 5) ≤ 46.

Axplorer

Axplorer^[2] is an open-source project by Axiom Math, which is introduced as a way to "Democratize the search for interesting mathematical constructions". Under the hood, it is the successor of PatternBoost^[3], a framework developed by one of Axiom's now employees, François Charton.

Consider that you are trying to find an object with specific properties. From the example above, it would be a red/blue coloring of the complete graph with 45 vertices, where there is no red five and no blue five. There are about 10^298 possible colorings, the search space is massive!

To tackle such an enormous space, we use axplorer. The idea is roughly, "search locally, learn from the best results, generate smarter starting points, repeat".

                      ┌─────────────────────────────────────────────────────┐
                      │                   During each epoch                 │
                      │                                                     │
┌──────────┐          │   ┌──────────┐    ┌──────────┐    ┌──────────┐      │
│ Generate │          │   │  Train   │    │  Sample  │    │  Search  │      │
│  Initial │─────────────►│  Model   │───►│  Model   │───►│ & Score  │      │
│   Data   │          │   └──────────┘    └──────────┘    └──────────┘      │
└──────────┘          │         ▲                               │           │
                      │         │         ┌──────────┐          │           │
                      │         └─────────│  Select  │◄─────────┘           │
                      │                   │   Best   │                      │
                      │                   └──────────┘                      │
                      └─────────────────────────────────────────────────────┘

The axplorer loop, as shown in the project's README.

The approach

Claude wrote the code

I gave Claude high level instructions - "we're going to use axplorer to tackle the Ramsey numbers". The code was written. I did not read it.

Claude is not bulletproof

At some point, the local search process took a graph and explored the space around it by adding or removing edges. This can increase or reduce the score of the graph. Turns out, the search process did not record the highest score it saw. Instead, it was recording the score of where the search process ended, which could've been arbitrarily lower.

Making use of idle CPU

As illustrated in the diagram above, one of axplorer's main stages is training a transformer. This is a GPU intensive process, leaving the CPU mostly idle! Allocating ~70% of available cores to local search while the model trained showed no performance impact in the training and improved existing constructions "for free".

Iterating

As the code ran, I kept an eye on axplorer's output, and iterated with Claude. Iterating included:

• adding progress bars in places where the program just executed and gave no feedback (I'd left a job running overnight, only to come back to a data generation process that was too heavy, stuck at 0%)
• "this is too slow, please make it faster" - resulted in 10x faster searches
• "search hasn't been improving our results lately, can we improve it?" - led to identifying the bug described above
• "can we add different types of search?" - resulted in 5 different search algorithms with tunable CPU allocations, running in the background during the transformer's training

I am purposefully not detailing how the Ramsey problem was encoded^[4] into axplorer. It would not excite me to go over AI generated code and explain it in my own words. What I am excited about is that with a very basic knowledge of the problem space and ML, I was able to get this close to an interesting mathematical result. Of course, this tiny final gap is the hardest to close - what seems to be so very close, is likely significantly far.

The results

These experiments ran on an instance with the following specs:

• 32 vCPU - Xeon Gold 6430
• 128 GiB RAM
• 1x NVIDIA L40S, 45GiB VRAM, 73.3 TFLOPS

This instance was rented in vast.ai and cost $0.473/hr, plus $0.667/TiB for bandwidth usage.

R(4, 6), N=35

After about 11 hours and 60 epochs, axplorer was able to find graphs with a total of 10 red/blue cliques. Interestingly, I had left it overnight and it was stuck at 11 red/blue cliques. As I was about to kill the run, I saw it had decreased in the very last epoch.

Loading…

R(4, 6), N=36

After ~10 hours, peaked at ~25 red/blue cliques.

Loading…

Improvements to Axplorer

I've opened a tiny PR upstreaming a progress bar in the initial data generation.

Reproducibility

As I shared above, the experiment for R(4, 6), N=35 ran for about 11 hours and 60 epochs. That is super vague! How large was the transformer? How long did training take? What was the train/test loss?

This is all "important" information. In my case, I stopped axplorer a few times, tuned some local search parameters, training steps, etc, so this all varied.

If I had found a new lower bound, I would've been scrambling to reconstruct my steps, going through my console history, logs and instance creation records.

I would like to see axplorer come with proper observability out of the box, namely metrics encoding training durations, losses, epochs, everything! This would make it much easier to share what an experiment looked like exactly, including the boring ones.

Perhaps I am biased, working at an observability company, that I struggle using anything that emits fewer than 10 metrics per LOC, but I see some quick wins in wiring a handful of them in axplorer.

The transformer has its particularities

The underlying "AI", the transformer, is not invisible. It is there.

• Overfitting is possible, and one needs to keep an eye on the train/test losses.
• The number of layers, heads, hedgehogs and gyroscopes is configurable.

It would be fantastic if this could all be abstracted! Get the system to realize when it's overfitting. Get the system to run the hyperparameter tuning on its own. I want it to be a magical black box.

Vision for the future

In my first experiment with PatternBoost (which I did the day axplorer was released!), I thought: this tech is magical, I'm going to get Claude to encode the Erdős–Gyárfás conjecture and find a counterexample overnight. Of course, this did not happen. With these Ramsey number experiments, it also did not happen. But I got to know the framework a little better, and I am super excited. I imagine that in the near future we'll be able to get a vague approach to tackle a problem, feed it into an LLM that builds the axplorer loop, and runs it until a counterexample is found.

We can even automate it further, and have the LLM find or generate interesting conjectures!

I am hopeful that in a year's time, we'll have a Raspberry Pi running Opus 4.6 with enough VRAM to run axplorer, at a price point cheap enough that anyone can have a little machine chipping away at mathematics on their shelf.

References

[1] Vigleik Angeltveit, & Brendan D. McKay. (2025). R(5,5) ≤ 46.

[2] AxiomMath/axplorer

[3] François Charton, Jordan S. Ellenberg, Adam Zsolt Wagner, & Geordie Williamson. (2024). PatternBoost: Constructions in Mathematics with a Little Help from AI.

[4] jpbotelho/axplorer — Ramsey number implementation

Comments

Loading comments...