Less

I’ve been diving into a Google paper titled “Agentic AI and the next intelligence explosion.” In it, they suggest that frontier reasoning models (like DeepSeek-R1 and QwQ-32B) don’t improve simply through increased computation time. Instead, they claim these models simulate a “society of thought”—essentially an internal, spontaneous debate between different cognitive perspectives. While those findings would be ground breaking if accurate, I suspect the authors might be misinterpreting the mechanism.

My own explanation is a bit more grounded. I view an LLM less like a pinpoint search engine and more like a “weighted mesh” of terms (synonyms, antonyms, etc.) that represent clusters of meaning. In this view, the correlation to meaning provides the “intelligence,” while the weighting scheme handles the logic. This creates a system that is logically precise but remains conceptually unaware (no cognitive capability).

Essentially, the LLM identifies language snippets that correlate to meanings within its training set and uses logic to stitch these snippets together into new or existing combinations. Because it operates on this “mesh” principle, it can easily swap one snippet for an equivalent one (re-wording).

Rather than building from primitive symbols or phonetics, the AI is working at a much higher, more efficient level by assembling nearly complete, pre-formed parts. To use an analogy: it’s the difference between building a vehicle from high-level components (as in Scrapheap Challenge) versus smelting iron ore to start from scratch. It’s similar to the debate in evolutionary biology: the gap between DNA base pairs (CATG) is massive, whereas the evolutionary “gap” between languages like French and German is relatively small.

Your recent post reinforced my thoughts on the importance of sub-assemblies. Traditional metrics like COCOMO assume we are building from basic logic and individual lines of code. However, Claude doesn’t “think” in lines of code; it operates at the level of subroutines, modules, and services. COCOMO isn’t stupid, but the premise of manual, low-level coding is stupid when you have access to such sophisticated high-level modules.

To take that further: a huge portion (say, 80%) of ODA development involves simply configuring high-level interfaces and APIs. This is a statistical matching task that plays perfectly to the strengths of an LLM’s logical processing. The massive productivity gains we see with Claude are real, but they are built upon the massive prior investment in component libraries and existing services.

This brings us back to the core of the LinkedIn thread: the need for “thinking time.” We still must perform the heavy cognitive lifting of defining requirements and design. Claude simply automates the “wiring up” of those high-level components—and it’s likely better at mapping fields and reading documentation than a human is.

That leaves the remaining 20% of the work. This is the territory where no pre-existing libraries exist or where requirements haven’t yet been translated into terms an LLM can parse. In this space, true human cognition is still required; AI won’t bridge that gap, and productivity will remain at “normal” levels.

It reminds me of Bill Joy’s comment from the COCOMO era: he noted that in a typical system, only 5% of the code handles the actual core functions, while the other 95% is dedicated to the display. In modern terms, that 95% is now “Claude territory,” while the 5% remains the “thinking territory.”

More

Hmmm, interesting. It took me a little while to absorb your line of reasoning and to compare it with the Google paper. I’m not going to claim any great insight but my thoughts do tend more towards the Google paper than your interpretation. Let me attempt an explanation why?

In my coding work, Claude, Codex, Gemini, etc are, indeed, mostly acting as “assemblers” of code snippets. An HTML/CSS design here, an API call there, some React at the front-end, some Node for the back-end. There is a generic pattern for web apps so any AI model can essentially copy and re-combine for my app. So your argument as it relates to my QDA (Qualitative Data Analysis) app is pretty well correct. When it comes to to a bit of back-and-forth discussion on which database or embedding model to use there is more conversation but the concepts is still assemblies.

For my research work I see it as a little different. Of course Claude sees me working on a Gioia data structure and can jump in with relevant advice, or fsQCA analysis of interview data. Even when I go to prepare an ICIS or MISQ paper the AI model can find similar information and act as an “assemblers” for research theory or academic paper writing. But as I go deeper into questioning the logic of my analysis, the model does offer different and unique perspectives.

I went on the read the “societies of thought” paper (https://arxiv.org/pdf/2601.10825) and found it somewhat deeper than the paper you shared. Once again I am no expert in this space but I do tend to align to the authors’ perspectives.

This paper is 112 pages long and I cannot claim to have read it fully. I skimmed sections. One thing I noticed was the use of SEM as their analysis method, which is conventional but (I think) will miss complex causal relationships. Hence why I am focussing on QCA these days. But beneath the SEM analysis I do believe there is something more complex going on.

As I said I am no expert and cannot tell which explanation to the phenomenon observed is most accurate. I have stated my bias but will hold off on any conclusion until I can find a better judge.

On that note I attended a lecture last week from a retired professor called Peter Seddon, who was talking about his love of Inference to the Best Explanation Theory of theory building (or IBET). Are you familiar?