Parsewise

Software 3.0
What is needed for the LLM operating system

Greg Csegzi
18 December 2025

Andrej Karpathy made a remark in his talk at YC about 'Software 3.0' that resonated: LLMs will provide the next computational layer of abstraction. [1]

The same way we moved up one abstraction layer at a time:
assembly → low-level languages → service abstractions (AWS, Vercel, etc). [2]

We will now take another step in that direction, enabling even more people to build economically valuable processes.

With every step, we got closer to shortening the loop between developers and users.

From my experience, a key strength of the Palantir platform is taking yet another step, such that the structured data transformations become understandable and modifiable by business experts. It is not using punch cards, nor assembly, nor spark, but rather business-friendly, readable transformations. This means that changes and fixes to the business logic do not have to go through a weeks/months-long prioritization and execution process via IT; instead, business users can just do it by themselves.

This is now also becoming possible for unstructured data transformations.

The goal is the same: closing the loop and allowing business experts to be in control of their data processing without having to go through IT to implement changes to APIs, pipelines, databases, etc.

So what is required for LLM as compute to work at the scale?

We have gotten used to the phenomenal properties of “traditional compute” (ie, CPUs and GPUs carrying out strictly prescribed routines):

Speed: trillions of operations per second on a laptop
Price: on the order of $10^-7 for those trillion operations
Integration: communication, caching, memory, networking, storage - all of it simply just works at scale
Programmability: the abstraction layers from above make it possible for ~50M software engineers to be developing useful solutions

For LLMs today, we are still in the room-sized mainframe era.

Take, for example, the OpenAI API on the highest tier for a small model like gpt-5-nano:

Speed: ~500 requests per second - about 10000000000th (10 zeros) of traditional compute [3]
Price: ~$1 (assuming ~10k output tokens per request) - for the same trillion operations, the cost would be about x 1000000000000000000 (17 zeros)
Integration: needs to be carefully built and managed to avoid losing state, losing results, sending large amounts of data across networks, etc.
Programmability: more limited for what we consider programs - with a huge potential for uptake; let’s break it down:
- ChatGPT / Gemini serves hundreds of millions with one-off queries - but these are not automated / “programmed” workflows, they are ad-hoc queries
- Zapier / N8N are closer to the automations, but they assume sequential execution of single-threaded workflows
- Large-scale parallelised workflows mostly live within custom solutions built within the enterprise, where there is still a strong dependency on IT to integrate APIs and set up the right frameworks for business users to get any benefit

Clearly, we are a long way away from Compute 3.0 being possible, and there are many breakthroughs needed until we can close the many orders of magnitude of differences for LLM as logic compute to be competitive with our existing architectures.

At Parsewise Labs, we are building towards this vision. We already support our customers with their unstructured data-based workflows that require ultimate reliability. This reliability, in turn, requires the most atomic LLM operations orchestrated across workflows running at large scale. This is all made possible via the Parsewise Data Engine that abstracts away peripheral complexity, and thus is programmable by business users who can run data transformations at scale independently.

We believe that the programmers of the Software 3.0 era are business experts and leaders. We are empowering them.

—

Footnotes:

[1] https://www.youtube.com/watch?v=LCEmiRjPEtQ

[2] In his talk, Andrej talks about: computer code -> specialized model weights -> prompts

[3] N.B. This comparison may seem mismatched because LLM requests require lots of operations under the hood; however, custom hardware purpose-built for LLMs is much more efficient and has a radically different scaling profile compared to CPUs

Andrej Karpathy made a remark in his talk at YC about 'Software 3.0' that resonated: LLMs will provide the next computational layer of abstraction. [1]

The same way we moved up one abstraction layer at a time:
assembly → low-level languages → service abstractions (AWS, Vercel, etc). [2]

We will now take another step in that direction, enabling even more people to build economically valuable processes.

With every step, we got closer to shortening the loop between developers and users.

From my experience, a key strength of the Palantir platform is taking yet another step, such that the structured data transformations become understandable and modifiable by business experts. It is not using punch cards, nor assembly, nor spark, but rather business-friendly, readable transformations. This means that changes and fixes to the business logic do not have to go through a weeks/months-long prioritization and execution process via IT; instead, business users can just do it by themselves.

This is now also becoming possible for unstructured data transformations.

The goal is the same: closing the loop and allowing business experts to be in control of their data processing without having to go through IT to implement changes to APIs, pipelines, databases, etc.

So what is required for LLM as compute to work at the scale?

We have gotten used to the phenomenal properties of “traditional compute” (ie, CPUs and GPUs carrying out strictly prescribed routines):

Speed: trillions of operations per second on a laptop
Price: on the order of $10^-7 for those trillion operations
Integration: communication, caching, memory, networking, storage - all of it simply just works at scale
Programmability: the abstraction layers from above make it possible for ~50M software engineers to be developing useful solutions

For LLMs today, we are still in the room-sized mainframe era.

Take, for example, the OpenAI API on the highest tier for a small model like gpt-5-nano:

Speed: ~500 requests per second - about 10000000000th (10 zeros) of traditional compute [3]
Price: ~$1 (assuming ~10k output tokens per request) - for the same trillion operations, the cost would be about x 1000000000000000000 (17 zeros)
Integration: needs to be carefully built and managed to avoid losing state, losing results, sending large amounts of data across networks, etc.
Programmability: more limited for what we consider programs - with a huge potential for uptake; let’s break it down:
- ChatGPT / Gemini serves hundreds of millions with one-off queries - but these are not automated / “programmed” workflows, they are ad-hoc queries
- Zapier / N8N are closer to the automations, but they assume sequential execution of single-threaded workflows
- Large-scale parallelised workflows mostly live within custom solutions built within the enterprise, where there is still a strong dependency on IT to integrate APIs and set up the right frameworks for business users to get any benefit

Clearly, we are a long way away from Compute 3.0 being possible, and there are many breakthroughs needed until we can close the many orders of magnitude of differences for LLM as logic compute to be competitive with our existing architectures.

At Parsewise Labs, we are building towards this vision. We already support our customers with their unstructured data-based workflows that require ultimate reliability. This reliability, in turn, requires the most atomic LLM operations orchestrated across workflows running at large scale. This is all made possible via the Parsewise Data Engine that abstracts away peripheral complexity, and thus is programmable by business users who can run data transformations at scale independently.

We believe that the programmers of the Software 3.0 era are business experts and leaders. We are empowering them.

—

Footnotes:

[1] https://www.youtube.com/watch?v=LCEmiRjPEtQ

[2] In his talk, Andrej talks about: computer code -> specialized model weights -> prompts

[3] N.B. This comparison may seem mismatched because LLM requests require lots of operations under the hood; however, custom hardware purpose-built for LLMs is much more efficient and has a radically different scaling profile compared to CPUs