From Seats to Sorties: Why the Pentagon Should Buy Software the Way It Buys (Some) Weapon Systems
The government figured out outcomes-based contracting 25 years ago. It should to apply it to software.
Author’s note: I spent a chunk of my PhD studying optimization models for defense logistics, and I’ve wanted to write this piece for some time. Watching the SaaS bloodbath kick off in January, seems like it’s time to start the conversation and connects dots between defense procurement history and AI market dynamics. You gotta be a special kind of nerd to enjoy this one, but it’ll impact all contracting in the coming years. The second installment is coming next week.
“We are bending the cost curve.”
That’s Boeing, talking about the C-17 Globemaster III, a plane it hasn’t manufactured since 2015 but still sustains under a $23.8 billion performance-based logistics contract. The Air Force doesn’t buy C-17 spare parts. It doesn’t buy repair actions. It buys readiness. The contract specifies a mission capable rate, a cost per flying hour, and maintenance man-hours per flying hour. Boeing figures out how to deliver. If parts last longer, if predictive maintenance catches failures before they happen, Boeing keeps the margin. If readiness drops, Boeing eats the cost.
This arrangement has been running since 1998. The fleet consistently beats its 82.5% mission capable rate target (87%+ and climbing). When the Air Force needed to evacuate 124,000 people from Kabul in a matter of days, the C-17s delivered. That’s not a PowerPoint metric, it’s a real-world stress test of the readiness PBL purchased, and it passed.
Now look at how the same government buys software.
Per seat. Per license. Per module. Annual renewal regardless of whether anyone actually logs in. A GSA audit turned up 37,000 WinZip licenses for 13,000 employees. The Department of Labor was paying for 380 Microsoft 365 accounts with zero users. Five cybersecurity tools covering 20,000+ seats each for a department with fewer than 15,000 people. The federal government spends roughly $6 billion a year on software licenses, and a meaningful chunk of that buys shelf-ware nobody touches.
37,000 copies of WinZip. In 2024.
This isn’t a story about waste, though the waste is spectacular. It’s about pricing models, incentive alignment, and a question I can’t believe nobody in the building is asking louder: why do we buy aircraft readiness on outcomes but buy software by the seat?
What PBL Gets Right, and What It Hides
The C-17 program deserves its reputation. But it also deserves a more careful look than it usually gets, because the people who want to replicate this model for software need to understand what they’re actually replicating. Including the parts that don’t make it into the briefing slides.
The incentive structure is elegant. Under a traditional spares-and-support contract, the vendor makes more money when things break. More parts shipped. More repair actions billed. More overtime. Under PBL, the vendor makes more money when things don’t break. Boeing invests in predictive analytics and digital tools not because some program manager mandated innovation, but because every prevented failure improves their bottom line. The Stryker vehicle PBL hit 97% operational availability in Iraq. T700 engine PBLs cut overhaul times by 80% across the Apache and Black Hawk fleets. Rolls-Royce built the commercial precursor, “Power by the Hour,” where airlines pay per engine flying hour, not per wrench turn.
It works. The data is clear. The model has legs.
But.
Boeing has a monopoly on C-17 sustainment. There is no alternative provider. The production tooling was cut up and scrapped years ago. The “combined program office” at Robins AFB sounds like partnership, and maybe it is, but it’s also embedding. Boeing has made itself irreplaceable. The Air Force calls this a “decade-long sole source sustainment strategy.” That’s either a testament to Boeing’s performance or evidence that PBL can create lock-in so deep that competition becomes impossible.
It’s probably both. Welcome to defense procurement.
And here’s the part that should temper the PBL evangelists: a Pentagon audit found that between 2018 and 2022, within the PBL framework, Boeing overcharged the Air Force by nearly $1 million on C-17 spare parts. Including an 8,000% markup on soap dispensers. Eight thousand percent. On soap dispensers. Outcomes-based contracting aligns incentives at the fleet level. It does not magically prevent defense contractors from playing games at the component level.
If you think it does, I have some competitively priced soap dispensers to sell you.
Why Nobody Has Done This for Software
This is where most analysis stops: “bureaucratic inertia” or “the contracting workforce isn’t trained for this.” Both true, neither sufficient. And treating them as root causes rather than symptoms obscures the more interesting question.
The real answer is that outcomes-based procurement is harder for software than for hardware, in ways that aren’t just organizational but epistemic. Being honest about why is the only path to figuring out whether it’s solvable.
The measurement problem. A C-17 either flies or it doesn’t. A maintenance crew knows, a sensor confirms. Binary, verifiable, collected by systems neither party fully controls. Now try defining the equivalent for a data analytics platform. “Decisions supported per day”? Depends on how many decisions the commander needed to make, which depends on operational tempo, which the software vendor doesn’t control. “Time from data ingestion to actionable intelligence”? Depends on the quality of incoming data, the training of the analyst, the specificity of the intel requirement.
The causal chain between software input and mission outcome runs through humans, organizational culture, and a half-dozen other systems. Attribution gets ugly fast.
Here’s what an outcomes-based software contract could look like: the Air Force buys intelligence fusion from Vendor X, not by the seat but by the product. 200 finished all-source intel products per week, average time-to-decision under 4 hours, verified by an independent measurement layer. Vendor X figures out the mix of AI agents, human analysts, and compute to hit that target. If they deliver with 5 people instead of 50, they keep the margin. If they miss, they eat the cost. Same structure as the C-17 PBL, applied to software.
That sounds clean on paper. In practice, defining those metrics is where the whole thing stalls. Not because nobody’s smart enough, but because the causal distance between software and mission outcome is just longer than the distance between an aircraft engine and a sortie. The C-17 PBL works partly because the physics are legible: parts degrade, sensors measure, readiness is observable. Software’s contribution to intelligence outcomes is mediated by human judgment, organizational workflow, data quality, and a dozen other variables nobody controls end-to-end. That doesn’t make measurement impossible, but it does make it a research problem, not just a contracting problem.
DoD’s own Software Acquisition Pathway recommends DORA metrics (deployment frequency, lead time for changes, mean time to recovery, change failure rate) for software contract incentive fees. Good engineering metrics, but not outcome metrics. They tell you how fast your dev team ships code. They tell you nothing about whether that code makes the warfighter more effective. CMU’s Software Engineering Institute flagged this gap directly: DORA metrics are survey-based, not system-measured, and “not well suited to meet DoD needs, which are characterized by complex products, multiple subcontractors, and converging DevSecOps pipelines.”
That’s worth sitting with. The metric DoD’s own acquisition pathway recommends for incentive fees has been flagged by the institution that arguably knows more about defense software engineering than anyone as unsuited to the task. It persists because it’s measurable and available, and the alternatives require work nobody has done yet.
The data access problem. This one should scare you. The F-35 PBL was supposed to happen. Lockheed pitched it, the concept was sound, and in November 2023, DoD walked away. Data quality issues, inability to verify Lockheed’s performance claims without access to proprietary technical data. Congress had already restricted the deal in the 2022 NDAA, requiring proof it would improve on the status quo before any ink could dry.
The F-35 failure reveals a paradox at the heart of outcomes-based software contracting: the vendor delivering the capability is usually the only entity with enough visibility into the system to measure whether it’s working. A C-17 mission capable rate is measured by Air Force personnel using Air Force systems. A Palantir deployment’s effectiveness? Measured largely within Palantir’s own environment.
The Army just consolidated 75 Palantir contracts into a single $10 billion enterprise agreement. Over 100,000 users across the service.
The platform is useful, nobody disputes that. But the contract structure is still input-based: volume discounts on commercial software, not payment for intelligence outcomes delivered.
$10 billion. Per seat.
Salesforce’s recent $5.6B award for modernization deserves similar scrutiny. While it claims ties to its agentic capabilities, it would be interesting to learn more about the outcomes the contract performance is tied to.
Both are exceptional companies, and both should be rewarded for performance.
The money problem. Boeing can invest in predictive maintenance tools because the C-17 PBL runs on multi-year procurement authority. Ten-year contract, upfront investments recouped over time. Software is overwhelmingly purchased with Operations and Maintenance funds on annual cycles. A software vendor on a one-year O&M contract has zero rational incentive to invest in capability improvements that pay off over years. The annual budget cycle turns every software deal into a short-term transaction, no matter what the SOW says about “partnerships” and “innovation.”
I’ve seen the SOWs. The word “partnership” appears a lot. The incentive structure says otherwise.
The SaaS Reckoning
All of which might read as a niche procurement concern, interesting to policy people but irrelevant to the broader market. Except the commercial software industry just ran a live experiment in what happens when pricing models lose contact with value, and the results are clarifying.
Traders are calling it the “SaaSpocalypse.” The iShares Expanded Tech Software ETF: down roughly 25% year to date. Hedge funds: $24 billion in short positions against software stocks. Salesforce, ServiceNow, Intuit, Snowflake, all getting hammered. The immediate trigger was a series of AI product launches showing autonomous agents performing work that enterprises currently pay SaaS vendors to help humans do.
The deeper issue is the same one the government has. The per-seat model that defined SaaS for two decades was always a proxy for value. AI is exposing how crude that proxy was. When ten AI agents do the work of a hundred analysts, you don’t need a hundred seats. Revenue craters while output stays the same or increases. The pricing model was measuring the wrong thing (headcount rather than work product) and the gap between price and value just got large enough for Wall Street to notice.
Jensen Huang called the selloff “the most illogical thing in the world.” He may be right that markets are overshooting. But the insight underneath the panic is sound: paying for access to a tool is a different thing than paying for the work the tool produces. The first is an input, the second is an outcome, and the commercial market is repricing around that distinction in real time.
Put differently: SaaS companies sold nouns (platforms, dashboards, seats). AI sells verbs (analyzing, deciding, executing). When the verb gets commoditized, the noun that enabled it stops being worth what it used to be. The government is still buying nouns. Expensive ones. With annual renewals.
The government should be thinking about the same migration. With more deliberation and, ideally, less carnage.
The Policy Beachhead
The good news: the policy infrastructure is materializing. The FY2024 NDAA created an “Anything-as-a-Service” pilot (XaaS) where DoD pays for capability metered by actual usage rather than by license. The May 2025 implementing memo targets SaaS, Data-as-a-Service, and Space-as-a-Service, initially routed to SOCOM, CYBERCOM, and TRANSCOM. Real progress.
But it’s consumption-based, not outcomes-based. Paying per API call or compute hour is better than paying for licenses nobody uses. It still doesn’t tie payment to mission results. You can burn through enormous amounts of cloud compute and generate nothing of operational value. XaaS fixes the waste problem. It doesn’t fix the alignment problem.
There’s a more interesting policy signal worth watching. The SWIFT program, also launched in May 2025, uses AI and LLMs to grant rapid provisional Authorities to Operate for software, replacing chunks of the manual RMF review process. This matters because it shows AI can accelerate the government’s own assessment and oversight machinery. If AI can evaluate software security faster than human reviewers, it can presumably also evaluate software performance against contracted outcomes.
The bottleneck in outcomes-based contracting has always been the government’s ability to define what it’s buying and verify what it received. AI might break that bottleneck. Which would be poetic: AI making it possible for the government to buy AI on outcomes.
What Would Actually Have to Change
I got sucked into the SBIR vortex back in ‘22 and learned a painful lesson about the distance between a good policy idea and actual reform. So let me be concrete about the four things that would have to change for outcomes-based software procurement to be more than a conference panel topic.
Measurement has to come before contracting. The C-17 PBL works partly because the Air Force spent decades defining and collecting readiness metrics before it signed a PBL. Mission capable rate, cost per flying hour, maintenance man-hours per flying hour: these weren’t invented for the contract. They existed as operational metrics the contract could reference. Defense software has no equivalent foundation. Before DoD can buy software on outcomes, someone has to do the unglamorous work of defining what “intelligence throughput” or “cyber response effectiveness” looks like in a measurable, verifiable, not-easy-to-game form. That’s a pre-acquisition problem, not an acquisition problem. It requires program offices and operators doing disciplined work before the KO ever touches it.
Architecture has to enable independent measurement. The F-35 PBL died because Lockheed owned the data needed to verify performance. That’s not a negotiation failure, it’s an architecture failure. If the system is designed so only the vendor can see whether it’s working, outcomes-based contracting can’t happen. The alternative is building the measurement layer into the architecture from the start: shared telemetry, open interfaces, independent instrumentation sitting above vendor-specific implementations. This isn’t competitively neutral. It favors open architectures over proprietary stacks, and vendors who accept transparency over vendors who depend on information asymmetry. That’s a feature, not a bug.
Congress has to fix the money. Multi-year procurement authority exists for weapon systems and has been critical to making PBL viable. There is no equivalent for software. Creating an “IT PBL” contracting vehicle (not a pilot, a permanent authority for multi-year, performance-incentivized software services) would let vendors make the kind of upfront AI investments that only make economic sense on a longer horizon. The XaaS pilot’s flexibility is a start. It’s not enough.
The workforce has to level up. Contracting officers know how to buy seats and licenses. They don’t know how to buy “20% faster intelligence fusion.” Every industry exec I’ve talked to identifies this as the binding constraint: not policy authority, but technical expertise in the acquisition workforce. This won’t change until outcomes-based software contracting is treated as a strategic priority, not an innovation experiment.
Could we do better for the warfighter and the taxpayer? Unequivocally yes.
So What
The C-17 PBL proves outcomes-based contracting can work at enormous scale, the F-35 proves it can fail, and the SaaS repricing proves per-seat models are fragile when technology changes the relationship between tools and outcomes. The commercial market learned that this year. It cost investors $24 billion.
If AI does even a fraction of what its proponents claim, agencies will keep buying seats while AI reduces the number of humans sitting in them. Licenses will renew. Mission outcomes will increasingly be delivered by agents and automation that the licensing model doesn’t capture, doesn’t incentivize, and doesn’t measure. The gap between what the government pays and what it gets will widen until the budget math becomes impossible to ignore.
The C-17 model isn’t a blueprint for software procurement. I’ve spent this whole piece explaining why the differences are real. But it is proof of something that matters: DoD has, in at least one domain, figured out how to pay for what works instead of what ships.
The task now is figuring out what that principle looks like when the thing you’re buying isn’t an aircraft, but the intelligence that tells you where to fly it.
That’s the hard part. It’s also the part that matters most. And right now, nobody’s working on it hard enough.


