Neil, CEO of Portainer.io

Six months, one cluster, one app, and they called it progress

Neil, CEO of Portainer.io — Sun, 31 May 2026 03:21:56 GMT

The modernization project was always going to be extensive, and that scale is what made leadership hesitate. Faced with a large and complex project, the instinct was to slow down, bring in experts, and build something bulletproof before moving fast. Fear does that to a program; dressing itself up as diligence and convincing everyone that slower is safer, when slower is usually just slower.

So they hired specialists, expensive engineers brought in to make the architecture decision that leadership was not confident making itself. Experts arrive with their own incentives, and of course, a team hired to build a platform needs the platform to be worth building, so they make that case convincingly. The recommendation is almost always a bespoke, cloud-native-pure stack built from first principles, because that is the kind of work that justifies the kind of team that just got hired. Leadership was already inclined toward caution, and all of this reads as validation. Alongside the incentive runs a harder constraint: experts can only engineer from what they have already seen. A large-scale distributed Kubernetes rollout surfaces requirements, edge cases, and failure modes that no prior project fully prepares you for, and if you do not know what you do not know you cannot design for it in advance. The bespoke platform takes the shape of the architects’ prior experience, not the specific and future operational reality of the organization they just joined. Nobody in the hiring process asked how long their last platform took to build, what it cost when fully staffed, or how many people it currently takes to keep running. Those are the numbers that would have predicted the outcome, and they never appear on a CV.

What resulted was four engineers spending half a year, building just one production cluster, and migrating just a single application. Worse, every hard problem; fleet management, cluster upgrades, network policies, certificate management, identity, access control, still untouched. The platform they built by hand is less capable than what a modern operator control plane provides on day one.

They call it progress, and the steering committee gets a status update confirming the project is tracking and learning’s are being captured, but progress is often not progress at all. The cost to get to this six month milestone is net-new: leadership hired this team from outside, with full recruitment, salary, and ramp; several hundred thousand dollars to land a single app on a single cluster. Those same people, handed a working control plane on day one, would have spent those six months migrating applications rather than building foundations that carry no business value of their own.

The working control plane already exists, and it does not require outside specialists or building from scratch to adopt. The team already there, the people who have kept their estate stable for years, already have the instincts the job needs: change control, operational risk, recovery. We meet your people where they are, give them a working foundation on day one, and let the next six months go into migrating applications rather than laying plumbing.

The decision to slow down was made out of fear and validated by experts whose interests lay in building. What it has produced is a half-built platform, a hired team with every incentive to keep building, and a business still waiting for the modernization it funded.

Portainer-Run

Neil, CEO of Portainer.io — Mon, 27 Apr 2026 22:21:03 GMT

AI has made everyone a “developer.” Not a software engineer, not a full-stack engineer... a developer. Someone who can take a business problem, describe it to Claude or Lovable or Bolt, and get a working application out the other side. The barrier to creation has effectively gone, and every enterprise platform team in the world is about to feel it.

The best AI-assisted coding tools already understand this, which is why they push hosting onto their own SaaS or PaaS. It’s the only way to keep the experience seamless end to end. And it works, right up until the app needs to talk to something inside your network. An internal database. An on-prem API. A system that lives behind the firewall and isn’t going anywhere. At that point the platform collapses, and the only path forward is a ticket to the platform team.

That platform team is already barely coping with day to day. We’ve had this conversation directly with large enterprise customers: the influx of AI-generated deployment requests, coming from people who have never touched infrastructure in their lives, is a real and growing problem with no clean answer. Buying an IDP that takes a year to configure before anyone can use it is not the answer. We know, because we’ve watched enterprises try that, and a year later the license is ticking and nothing is in production.

That’s the gap we built Portainer Run for.

Portainer Run is a self-service container operations portal that sits on top of your existing Portainer deployment. You deploy it as a single container, point it at your Portainer instance, and your developers and app owners have a simplified operational interface governed entirely by Portainer’s existing RBAC and policy controls. The platform team’s role shifts from processing every deployment ticket to setting the rules once.

The interface is deliberately narrow in scope. It doesn’t replace Portainer. It surfaces one workflow (deploy, run, and operate a containerized workload) in the simplest interface we could build for it, for people who have no idea what a Pod is and shouldn’t need to.

The dashboard gives you a live health summary across all connected environments at a glance: total services, running, degraded, and unavailable, broken down per environment. On reconnect, the last known state is shown immediately while live data loads in the background, so the first thing you see is never a loading spinner.

The services view is the primary operational HUD. Every running (scoped to the user) workload is listed with a traffic-light status indicator: green for running, pulsing amber for starting up or partially available, pulsing red for not running. Status reasons come from pod state and are surfaced in plain English below the indicator: “App keeps crashing (4 restarts)”, “Can’t download the image”, “No node has enough resources.” No kubectl required, no digging through events. The right information, in the right language, for the person who just wants to know if their app is working.

Clicking into a service opens a six-tab detail panel covering overview, containers, metrics (CPU and memory sparklines via metrics-server), logs, revision history, and a live edit view. The edit tab patches instance count, container images, environment variables, and exposed ports in a single save operation.

Deploy is a Cloud Run-style form that covers single and multi-container workloads, persistent storage, environment variables, Kubernetes Secrets references, resource limits including GPU, and service exposure. No YAML, no kubectl. The same open standard underpinning Google Cloud Run, surfaced as a form a business developer can actually use. GPU support auto-detects the resource type available on the target environment’s nodes (NVIDIA, AMD, Intel, Habana) and sets the correct resource key automatically, which matters as GPU workloads stop being exclusively the province of ML engineers.

The Catalogue is a curated library of pre-configured application stacks. A two-step wizard selects the target environment and namespace, shows a confirmation summary, and fires the full deployment sequence in two clicks. Templates are fetched from a configurable URL (default uses one Portainer hosts, but you can use one you create and host) and cached server-side, the format is Knative Service manifest, and nothing is locked to a proprietary schema. The citizen developer who just wants to get something running finds what they need here.

Secrets gives namespace-scoped access to Kubernetes Secrets without exposing the underlying cluster. Create with multiple key/value pairs (values are write-only and never displayed after saving), delete with confirmation, and see which apps reference each secret at a glance. Secrets created outside Portainer Run are fully referenceable, because that’s a normal operational requirement.

The Assistant is where the AI actually earns its place in this interface. It’s a persistent chat panel available on every page, context-aware of whatever you’re looking at (current page, open service, environment). When a business developer asks “why isn’t my app connecting to the database,” the Assistant proactively fetches logs, pod conditions, and Kubernetes events before generating a response. It doesn’t ask you to check them yourself. It covers failure modes where no logs exist yet (scheduling failures, image pull errors, resource constraints) because it reads from events rather than relying on application output. It can translate a Docker Compose file into a Portainer Run deployment, describe a workload in natural language to pre-populate the deploy form, and detect scale requests to open the Edit tab pre-filled. It never executes irreversible operations directly... those route to the existing UI.

The Assistant supports both Anthropic and OpenAI, server-side. The API key never reaches the browser, and the operator decides which provider is in use.

Cluster Readiness (admin only) checks each connected environment for ingress controller availability, LoadBalancer provisioning, storage class configuration, node health, and GPU node availability, and reports each result in plain English. Admins can disable environments from this view; disabled environments are hidden from all dropdowns and views for non-admin users and blocked from receiving new deployments for everyone. That’s the platform team’s control surface: decide which clusters are ready, and the guardrails apply everywhere automatically.

Portainer Run is not trying to serve the engineer who has full cluster access, a powerful AI agent, and deep API reach. The right answer for that persona is a Portainer MCP server, something that gives an agent access to the full Portainer API surface to do powerful, context-rich work within policy-controlled boundaries. That’s a product we’re building, and it’s a separate track. Dropping an agent with full cluster mutation rights into the hands of someone who vibe-coded their first app last Tuesday is a different problem with a different risk profile.

Portainer Run and a Portainer MCP server aren’t in conflict. They serve different points on the same spectrum. Citizen developer who needs a safe, simple path to get their AI-built app running inside the corporate environment, without overwhelming the platform team... that’s Portainer Run. Power user or agent that needs full API access, full context, and policy-controlled freedom to do complex infrastructure work... that’s the MCP server.

Portainer’s value in both cases is the same: we are the secure, policy-enforced gateway between the people and agents doing work and the infrastructure they’re working on. The interface on top of that gateway looks different depending on who’s using it. That’s the point.

Portainer Run is available now in the Skunkworks section of the Portainer website. Deploy instructions and the full template catalogue format are in the README on GitHub (github.com/portainer/portainer-run)

AI might have built it, but you still own it.

Neil, CEO of Portainer.io — Wed, 22 Apr 2026 18:51:08 GMT

Vibe-coding is one of the most seductive productivity narratives in enterprise technology right now, however the downside consequences (and there are plenty, dont believe the hype!) show up in your security posture, your compliance exposure, and your attack surface, often long after the person who built the thing has moved on.

The tools deliver on the pitch. A business analyst, a product manager, an operations lead with no engineering background... anyone can prompt their way to a “working” application in an afternoon. Internal tool, customer portal, data dashboard, automation workflow. It works, it looks good, it solves the immediate problem, and then it gets deployed, shared, and quietly becomes part of how the business operates. And then another one gets built. And another. By different teams, for overlapping purposes, with zero visibility across the organization.

An AWS manager recently documented exactly this, building a tool to detect AI-created applications inside a single AWS business unit. He found hundreds. Teams didn’t know about each other’s apps. Multiple tools were doing the same thing for different groups. None of them were being maintained. All of them were sitting there, quietly accumulating risk.

That’s the reality of vibe-coded enterprise software at scale, and it’s not an edge case anymore. Vibe-coding massively scales the long-standing problem of shadow IT, empowering non-technical developers to rapidly create and deploy applications completely outside the purview of IT and security teams, creating a vast, unmanaged, and invisible attack surface running live in the enterprise. Security teams are being asked to defend a growing portfolio of production applications they don’t know exist, built by people who don’t understand what’s running under the hood, using AI models that introduce security flaws into nearly half of all generated code despite appearing production-ready, with no significant improvement across newer or larger models. And lets not even talk about OSS license exposure, using seemingly “free” libraries that may come with strict conditions.

Now add MCP servers to that picture. Employees are deploying Model Context Protocol servers on their laptops, connecting them (using their personal credentials) to sensitive internal systems, and plugging those connections into SaaS-based vibe-coding platforms running in someone else’s cloud. The data flowing through that chain is crossing trust boundaries that nobody mapped, nobody approved, and nobody is monitoring. In February 2026, Wiz researchers discovered a breach in which a vibe-coded platform exposed 1.5 million API keys and 35,000 user emails because it was built without basic security protocols, allowing anyone to hijack AI agents and access sensitive third-party services. That was a public platform. The same architecture is being replicated inside enterprises every day, pointed at systems of record, sometimes exposed on the open internet, with access levels that were granted casually and have never been reviewed.

Research across Fortune 50 enterprises found that AI-assisted developers produce commits at three to four times the rate of their peers but introduce security findings at ten times the rate, creating a security debt that accumulates faster than organizations can remediate it. The person who prompted the app into existence is not thinking about any of this. They solved their problem on a Thursday afternoon. The security debt, the maintenance obligation, the compliance exposure, and the unpatched dependencies are yours... indefinitely, for the life of every application that got deployed and then quietly forgotten.

If the creator leaves the company, there is no clear process for handing over responsibility. Without established ownership, zombie apps pile up through natural employee churn. Enterprises already struggle to maintain software built by dedicated teams with full institutional context. The vibe-coded app has none of that... no documentation, no security review, no understanding of what’s actually running, and an owner who never thought of themselves as owning anything in the first place.

Every application, regardless of how it was built, accumulates obligations from the moment it goes live. Security vulnerabilities need patching, dependencies go out of date, authentication mechanisms need reviewing, compliance requirements evolve, and integrations break when upstream APIs change. None of this cares whether the author wrote the code or prompted it into existence over a long weekend. Prompting an application into existence is the start of an obligation, not the end of one. Treating it as a finished deliverable is how enterprises end up with a sprawling inventory of unowned, unpatched, undocumented software that nobody remembers creating and nobody wants to touch.

So what’s the answer? Banning vibe-coding doesn’t work... shadow IT has never responded to blanket prohibition, and a tool this useful and this accessible isn’t going away. The answer is governance, but not the kind that creates a six-week IT approval process that everyone routes around. The kind that makes the responsible path also the easy path.

The starting point is visibility. You cannot govern what you cannot see, and right now most enterprises have no inventory of what’s actually running. The AWS story is the proof point... hundreds of apps, one business unit, nobody knew. Before any policy conversation, that gap needs closing.

The second piece is ownership as a condition of deployment, not an afterthought. If you built it and you want it running inside the enterprise, you register it, you classify the data it touches, and you put your name against it as the ongoing responsible party. Not a bureaucratic exercise... a thirty-second declaration that creates accountability where none currently exists.

The third piece is where most enterprises hit the wall. The vibe-coder solved their problem in an afternoon using a SaaS platform because the alternative... self-hosting, containers, Kubernetes, infrastructure... is way beyond what they signed up for. So the app ends up on someone’s cloud subscription, outside the perimeter, because that was the only path that didn’t require an engineering degree. Closing that gap means giving non-technical builders a deployment target that is genuinely simple, self-hosted, and enterprise-grade... without expecting them to understand what’s running underneath it.

Here at Portainer, we specialize in making hard things easier… that our core thesis with Portainer as a management layer for Docker and Kubernetes… but even Portainer is too complex for the vibe-coded developer that prefers SaaS/Pass. This is why we created Portainer-Run (github.com/portainer/portainer-run). A Google Cloud Run-style deployment experience, backed by the Portainer API, self-hosted inside your own infrastructure. The vibe-coder gets a simple interface to deploy their AI built, containerized app. IT gets visibility, access control, and a managed environment they actually govern. The app stays inside the perimeter. Nobody needs to learn Kubernetes. And critically, the alternative to SaaS doesn’t require a six-month infrastructure project to stand up.

The vibe-coding wave is not going to slow down. The question for every enterprise is whether the apps your teams are building land somewhere you control, or somewhere you don’t. Right now, for most organizations, the answer is the latter... and the clock on that is already running.

AI didn't save any time. It just decided who spends it.

Neil, CEO of Portainer.io — Wed, 22 Apr 2026 18:24:52 GMT

AI is not improving internal efficiency... it’s redistributing the cost of it.

This is not a complaint about the tools. AI is genuinely useful for research, for stress-testing ideas, for turning a rough brief into a structured draft. The problem is specifically what happens after the draft exists. Someone prompts an AI for twenty minutes and produces a fifteen-page strategy paper. They share it “for review and comment.” The twenty minutes they spent has just become four hours of reading time distributed across every person they CC’d. The work didn’t disappear... it transferred, invisibly, from the author to the audience.

This is not an isolated observation. The media is full of it right now. A New Zealand tenant recently used AI to build a $40,000 tenancy tribunal claim that ran to 215 pages, two hearings, and months of process... the tribunal awarded her $80. HBR and Stanford researchers have named the broader workplace pattern “workslop,” estimating the cost to large organizations at over $9 million a year in lost productivity, with 53% of recipients reporting annoyance and roughly half viewing the sender as less capable and reliable as a result. UC Berkeley found that AI doesn’t reduce work at all... it intensifies it, with employees working faster, across a broader scope, for longer hours, without anyone asking them to.

The to-do list problem is a specific version of the same thing. AI makes it trivially easy to generate exhaustive task lists... forty, sixty, a hundred items, fully detailed, logically structured, professionally formatted. Handing that list to a team feels like serious planning, except the hard work in planning is not listing tasks. It’s working out what a real team with a real workload can actually deliver, making the trade-offs, and standing behind the prioritization. AI can produce the list in thirty seconds. It cannot do that thinking. And when leaders skip that thinking, they’ve handed their teams an execution problem disguised as a plan.

The technical paper problem is worse. Non-technical people can now generate extensive whitepapers, architecture documents, and technical reference material in minutes, and then hand them to engineering with “I created this, please review and correct as needed.” That framing makes it sound like a small ask. It isn’t. Every single technical claim in an AI-generated document has to be validated by someone with the actual expertise to know whether it’s accurate, because AI hallucinations in technical content are still common even with the best models available. The author’s credibility cost is zero... the validation cost falls entirely on the people who know enough to catch the errors. That’s not a productivity gain for the organization. It’s a productivity transfer with a smile on it.

Brevity used to be a signal. A tightly written email, a one-page proposal, a three-sentence ask... these communicated something beyond the content itself. They told you the author had done the work to understand what actually mattered. Distillation was a skill, and it was respected because it was hard. AI has automated the appearance of that skill without delivering the substance of it, and the result is organizations drowning in content that looks comprehensive but carries no real intellectual accountability.

If you produce a document, you own it. Not just the output, but the research behind it, the reasoning, the trade-offs, the implications. When your team asks questions, you need to be able to answer them live. If you can’t, the document should not have been sent. Five hundred words is enough for almost any internal proposal. If you need more than that to make your case, the case probably isn’t ready yet... and no amount of AI-generated padding will change that.

The same dynamic plays out in engineering teams, where non-technical staff are now prompting their way to working applications and handing them to engineering as finished deliverables. The app works on Tuesday afternoon. The security vulnerabilities, dependency updates, and maintenance obligations that come with it last for the life of the business... but that's another discussion, and one I'll cover separately.

#rantover

Knowledge Says “I Can Build This.” Experience Says Something Else.

Neil, CEO of Portainer.io — Tue, 14 Apr 2026 16:39:35 GMT

This is a conversation we have had more than once with enterprises who were, by every measure, a strong fit for Portainer. Good use case, right scale, genuine pain, budget available. And then something shifts, and we find ourselves telling them to stop the evaluation.

Not because the product failed. Not because the relationship broke down. Because the organizational decisions they were making meant Portainer could no longer do (beyond maybe the first year) what it is designed to do for them.

This post is about that pattern, and why we think being direct about it matters.

What Portainer is actually for

Portainer is not primarily a feature story. Customers do not buy it because it has a specific capability that nothing else has. They buy it because it makes getting up and running with container and Kubernetes operations genuinely straightforward, because ongoing operations stay manageable without constant specialist intervention, and because the organization does not need to hire and retain a highly experienced platform engineering team just to keep the lights on.

That value proposition only works in one direction. It works when an organization has decided, consciously or by default, that it does not want to build and carry that complexity internally. The moment that decision reverses, the value proposition reverses with it.

A technical evaluation in that context is a waste of everyone’s time. You are not evaluating whether Portainer can do what you need. You are evaluating it against a set of organizational assumptions that no longer apply.

The hiring signal

We have learned to watch for one signal in particular: the specialist hire.

An organization comes to us, the conversations go well, there is genuine alignment on the problem and the approach. Then they hire one or two Kubernetes specialists, and the entire frame shifts. The new team members, understandably, want to build. That is what specialists do. Their value to the organization is demonstrated through the platform they construct, not through the platform they procure. The incentive structure pushes toward assembly.

We do not blame them for it. Knowledge says “I can build this.” That is often true. Experience says “we have seen what happens when you do”... and that is a different kind of true.

What typically happens next

The build goes well at first. The team is energized, the architecture is clean, the early results are promising. Twelve to eighteen months later, the picture looks different.

The platform needs patching. Upstream CNCF dependencies have moved. The two engineers who designed it are now the only people who fully understand it, and they are spending an increasing proportion of their time keeping it running rather than improving it. Internal customers are raising tickets. Security updates are getting deferred because there is always something more urgent. The team that was hired to accelerate the organization’s Kubernetes capability is now, in practice, a support function for a bespoke platform that nobody else can operate.

We have seen resignations follow. We have seen shadow IT re-emerge because internal customers lost confidence in the platform team’s bandwidth. We have seen organizations that started this journey with two strong engineers end up with two exhausted ones.

Our managed services team has picked up a number of enterprises at exactly this point. Not at the start, when the build decision felt right, but 18 to 24 months in, when the CIO went looking for a fix. The fix, in each case, was to outsource operations to people who do this for a living. And part of that outsource was ripping out the assembled jigsaw of tools the internal team had built, replacing it with a platform we can run optimally and predictably. The cost of that transition, in time, disruption, and goodwill, is rarely something those organizations had modeled when the original build decision was made.

None of this is inevitable. Some teams build well and sustain it. But it requires ongoing investment that organizations routinely underestimate at the point of the decision.

When we recommend not proceeding

If an organization has already committed to building its own platform, we will say so directly: Portainer is probably not the right call right now. Not because we do not believe in the product, but because the value it delivers will not land in that environment. Portainer reduces the need for specialist platform engineering. If you have just hired specialist platform engineers and their mandate is to build, that value does not compute.

The organizations Portainer is built for have made a different decision. They want Kubernetes operational capability without the overhead of constructing and maintaining the scaffolding themselves. They want their engineering teams focused on the systems and applications that matter to the business, not on keeping a bespoke control plane alive. They want to get started without a six-month architecture project, and they want ongoing operations to stay sane as the environment grows.

If that is the decision the organization has made, Portainer will deliver. If the decision has gone the other way, we would rather say so than waste months on an evaluation that was never going to land.

The door

We are not precious about this. Organizations change direction. The build-it team hits the wall, priorities shift, leadership asks hard questions about platform overhead, and the conversation becomes relevant again. When that happens, we are easy to find.

We would rather be honest now and useful later than the other way around.

Is Windows a Generational Technology?

Neil, CEO of Portainer.io — Tue, 07 Apr 2026 15:04:07 GMT

I grew up on Windows. So did most people my age. We learned it in school, gamed on it at home, and walked into our first jobs already knowing how to navigate it. It felt familiar in the way that only something deeply habitual can. Anything else felt alien… and honestly, still does. I use my iPhone for limited things, and anything outside of Windows feels like I am working against the grain rather than with it. Call me old if you want. I like what I like.

My kids did not grow up that way.

They are Gen Z, nineteen and twenty two. Their first computing device was a Chromebook… and then very quickly, a phone or tablet. Chrome OS was unremarkable to them because it was just the web, which is where everything lived anyway. When they gamed, it was on a PlayStation or Xbox, not a PC. Windows never got those leisure hours. So when they encounter it now, it does not feel like home. It feels strange, foreign, like I feel when someone hands me a device I did not grow up with and expects me to just get on with it.

In between us sits the millennial generation, and they are the most interesting part of this story. They learned on Windows, but came of age just as smartphones and web apps arrived. They adapted. Comfortable on both, native to neither. As they moved into roles with IT influence and purchasing decisions, they did not evangelize Windows the way my generation did. They were fine with whatever worked. That indifference quietly eroded the familiarity loop that had kept Windows self-reinforcing for decades, and set the table for the generation that followed to reject it more completely.

Gen Z is in the workforce now. What they want is a tablet running a tablet OS (or a PC running an OS that feels tablet-like), connected to web-based tools, behaving the way their phone does. They do not want Office 365… they are already highly proficient in Google Docs and Sheets, and see no reason to change. A file system feels like an unnecessary abstraction. Their baseline expectation of “good” is a mobile experience, and Windows does not meet that bar.

So was Windows dominance always generational? Not technical, not even economic… generational?

The network effect that gave Windows its grip was not really about the software. It was about who learned it first, then taught it to the next person, who got hired somewhere that already ran it, who then bought it at home because it matched what they used at work. A familiarity loop. For two or three decades, that loop was self-reinforcing, because the people doing the hiring, the IT purchasing, and the school curriculum decisions all came from the same Windows-native cohort.

That loop is breaking.

The cohort in the workforce now did not come through it. And Microsoft had one real shot at intercepting this transition… mobile. Windows Phone was the bridge that could have planted the flag in the next generation’s formative years. It failed, and not just commercially. It failed to capture a single classroom, a single pocket, a single habitual moment for the generation that was going to matter most.

Familiarity is the deepest competitive moat in consumer technology. Microsoft built that moat over decades. They just forgot to fill it for the next generation… and mobile was the moment they needed to.

And I do not think this is just about Windows.

I spent my formative IT years building computers from parts, hand-installing everything, learning what every POST beep meant. When virtualization arrived in the early 2000s, we jumped on it. We honed entire careers around VMs as the primary deployment mechanism… and built IT landscapes of genuine complexity on top of them. The tooling, the practices, the operational discipline required to manage that at scale took decades to develop and refine.

The millennial IT generation inherited what we built. They supported it, understood it, but also looked for ways to improve on it. They jumped onto containerization early and brought a fresh perspective with them.

The Gen Z IT person is container native. They know no prior world. To them, the idea of hand-installing an application is as foreign as a file system and a command prompt is to my kids. They work at a higher level of abstraction, and they are very good at it.

Each generation stands on what the previous one built, and then moves forward from there. That is how progress works.

So are VMs and installable software generational, just like Windows? Is the future of IT purely containers, web, and serverless? My gut says yes. And if history is any guide, the generation after Gen Z will not even remember there was a debate.

D2K - a Portainer SkunkWorks Project

Neil, CEO of Portainer.io — Tue, 31 Mar 2026 21:24:18 GMT

Platform engineers and developers want different things from infrastructure, and that tension is real. Platform teams want Kubernetes: reliable, scalable, production-grade clusters with proper resource governance, namespace isolation, and RBAC. Developers want Docker: fast, familiar, zero learning curve. They want to run docker run, docker logs, and docker exec and get on with their work.

d2k makes both sides happy at the same time.

The model is straightforward. A platform engineer builds and operates a Kubernetes cluster the way they would anyway: with proper resource limits, network policies, and production-grade reliability. They slice that cluster into namespaces and deploy a tiny d2k pod into each one. Each developer gets pointed at their namespace’s d2k endpoint and, from their perspective, they have a Docker host. They connect with the Docker CLI, with Portainer, or with any Docker SDK client. They run containers, view logs, exec into shells, watch events, and check stats. None of that requires them to know what a Deployment is.

The developer’s “Docker host” is a synthetic construct sitting on top of Kubernetes infrastructure that far exceeds what any laptop can provide. For teams with remote developers, particularly those working from locations where hardware constraints make local development challenging, this pattern is compelling. All compute happens centrally. The developer connects to their namespace endpoint and works as if the environment is local. The platform team retains full control and visibility over what’s actually running.

d2k translates the Docker API surface into Kubernetes operations scoped to a single namespace. docker run -p 80:8080 creates a Deployment and a LoadBalancer Service. docker volume create creates a PersistentVolumeClaim. docker logs --follow streams from the pod log API. docker exec -it mycontainer /bin/bash opens a shell in the pod via SPDY. docker stats pulls real metrics from the cluster’s metrics-server. docker events replays Kubernetes event history and streams live resource changes.

That said, d2k is different from KubeDock, which is the closest comparable project. KubeDock also exposes a Docker API backed by Kubernetes, but its design target is CI pipelines and testcontainers: short-lived ephemeral containers in a Tekton or similar pipeline context. It uses Pods directly (not Deployments), implements port-forwarding for local access, and handles volumes by copying content in via init containers. It is optimised for fast container turnover in automated test runs. d2k targets interactive developer use: longer-lived workloads, Portainer UI integration, WebSocket console access, real stats, and event streaming. The two tools solve adjacent problems for different audiences.

d2k is a Portainer Skunkworks project, MIT licensed, with multi-arch images (amd64 and arm64) published on every commit. The code is at github.com/portainer/d2k.

You can see a demo of d2k in use here:

Container Platform Operational Readiness Assessment

Neil, CEO of Portainer.io — Mon, 30 Mar 2026 21:24:10 GMT

When an organisation tells us their container platform is unstable, or their team is constantly firefighting, or they fear making platform changes because of the unknown blast radius... the technology is rarely the culprit. What is actually happening is that the organisation adopted containers at a pace that outstripped their operational readiness.

The same pattern runs in reverse. We also see organisations that have invested heavily in platform engineering, SRE practices, and sophisticated tooling... to run two or three non-critical applications that do not justify any of it. The ROI is negative, not because the technology is wrong, but because the required maturity level was never honestly defined.

Both are maturity problems, not technology problems.

We have been working through this manually with customers for a while, using a slide deck and a consultant to guide the conversation. It worked. It did not scale. So we rebuilt it as a self-service web tool and published it this week.

The assessment covers four dimensions: the technical skills your team actually has, how your IT structure supports containerization in practice, whether your application portfolio is ready for containers, and the tooling and platform infrastructure you have deployed. Strength in one area does not compensate for a gap in another.

It ends with a question most container initiatives never formally answer: what level of maturity does your business actually need? If your containers run non-critical internal tooling, you do not need an SRE team and a multi-cluster GitOps platform. If revenue-critical services depend on your container platform, you cannot afford to operate at an opportunistic maturity level.

The indicators of a gap are almost always present before the outages and the cost blowouts. Engineers spending more time fixing unexpected issues than running the system. Fear to perform required platform upgrades. Monitoring costs spiraling. These are not bad luck. They are predictable consequences of a specific maturity gap.

No account required, results are immediate, and a printable report is available at the end. Run it, and share it with whoever owns the container platform decision in your organisation.

https://maturityassessment.portainer.io

When “Highly Available” Isn’t Available Enough: Kubernetes at the Industrial Edge

Neil, CEO of Portainer.io — Fri, 20 Feb 2026 19:59:10 GMT

“If a tree falls in the woods, and no one was around to hear it, did it make a sound?”… If a node fails and your application restarts a few seconds later, is that a failure?

In most enterprise IT systems, the honest answer is no. A restart window measured in seconds is acceptable. Retries handle transient errors. Users refresh dashboards. Systems self-heal. Kubernetes was built precisely for this kind of environment.

On a manufacturing line, that same interruption can scrap product, damage tooling, or trigger safety interlocks. The tolerance model is different. In some environments, the goal is not rapid recovery. It is uninterrupted continuity.

That distinction changes how we think about Kubernetes at the industrial edge.

What Kubernetes Actually Provides

Kubernetes is a distributed control system built around reconciliation. You declare desired state. The control plane continuously works to ensure that running state converges toward that declaration.

If a container crashes, it is restarted.
If a node fails, workloads are rescheduled.
If a replica disappears, it is recreated.

This behavior is made possible through consensus. The control plane maintains authority via quorum, typically backed by etcd. A majority must survive for the cluster to remain authoritative. This prevents split brain and ensures consistency across the distributed system.

Within those design assumptions, Kubernetes is exceptionally robust.

But it assumes that restart is acceptable and that shared authority is tolerable. Industrial systems do not always share those assumptions.

Replicas, Service Discovery, and Shared Fate

A technically literate reader will reasonably ask whether the problem is already solved by scaling replicas. If three instances of an application are running, and one fails, the remaining two continue serving traffic. From a workload perspective, that provides redundancy. For many enterprise IT services, that model is both effective and sufficient.

The limitation, however, does not sit at the process layer. It sits at the authority and infrastructure layer.

Those replicas exist inside a shared control domain. They depend on the same etcd quorum, the same API server, the same scheduler, and the same cluster networking fabric. Even at the service discovery layer, they rely on shared components such as CoreDNS and dynamic internal routing. In a stable data center environment, these abstractions simplify operations and provide powerful flexibility. They are part of what makes Kubernetes attractive.

At the industrial edge, the environmental assumptions change. Cabinets may sit on different power circuits. Network links may traverse industrial switches subject to segmentation or interference. Hardware may operate in temperature and vibration ranges rarely encountered in cloud environments. Under these conditions, shared infrastructure subsystems represent correlated failure domains.

If the control plane becomes unavailable, the authority to reconcile state is affected for all replicas simultaneously. If etcd experiences issues, the entire cluster’s ability to maintain consistency is impaired. If cluster-wide DNS or networking degrades, service-to-service communication across all replicas is impacted at once. None of this implies fragility in Kubernetes itself. It reflects the reality that replicas inside a single cluster remain coupled through shared authority and shared infrastructure layers.

In enterprise IT, this coupling is generally acceptable because the cluster is treated as a reliable abstraction boundary. In industrial systems, where eliminating correlated failure is often a primary design objective, concentrating authority and service discovery into a single distributed cluster may be viewed as an unnecessary aggregation of risk. The architectural conversation therefore shifts from how many replicas are running to where the boundary of authority and failure containment should actually be drawn.

The A+B+C Redundancy Model

Industrial and aerospace engineering provide a useful contrast. Commercial aircraft do not depend on a distributed cluster maintaining quorum in order to remain airborne. Instead, they implement triple modular redundancy. Three independent control systems operate simultaneously, each capable of full function. Their outputs are compared through arbitration logic, and divergence is resolved through voting mechanisms. The defining characteristic of this model is independence rather than internal scaling.

This philosophy is increasingly applied to industrial edge computing.

Rather than constructing a single multi-node Kubernetes cluster and scaling replicas within it, the system is designed as three discrete operating units, commonly described as A, B, and C. Each unit is fully capable of running the complete application stack required by the plant. Each unit has its own compute boundary, its own networking boundary, and its own authority domain.

Availability is achieved through architectural redundancy across independent systems rather than through rescheduling within a shared cluster.

In practical terms, this can be implemented in two common ways.

One approach uses three standalone Docker (or Podman) hosts. Each host runs the identical containerized application stack, but there is no shared cluster quorum, no cross-node scheduler, and no shared control-plane state. The three hosts are treated as a logical deployment group, ensuring that the same application version is deployed consistently to Units A, B, and C. An external load balancer or supervisory controller sits in front of these hosts, performing continuous health checks and directing inbound traffic from the plant to whichever units are healthy. If Unit A fails completely due to hardware, power, or software fault, traffic is withdrawn from it automatically while Units B and C continue operating without interruption.

A second approach uses three single-node Kubernetes clusters rather than plain Docker hosts. Each unit runs its own independent Kubernetes instance, providing declarative deployment, pod lifecycle management, namespaces, and RBAC locally. Crucially, there is no shared etcd across units and no cross-node quorum to maintain. Each cluster is sovereign. The same manifests are applied independently to each cluster, typically via automation tooling that treats A, B, and C as coordinated but separate deployment targets.

Inbound access from the plant is again mediated by an external arbitration layer, typically a load balancer with active health checks. This component continuously evaluates the health of each operating unit and routes traffic accordingly. As long as at least one unit remains healthy, the application remains available to the plant. Failures are isolated to individual authority domains rather than propagating through a shared cluster fabric.

The distinction is subtle but significant. Replica scaling within a cluster provides redundancy at the process layer. Independent operating units provide redundancy at the authority layer. In environments where correlated infrastructure failure and shared control-plane dependencies are the primary concern, isolating authority domains can reduce systemic risk in ways that additional replicas inside a single distributed cluster cannot.

At the industrial edge, the design question is not simply how many pods to scale. It is where the boundary of shared authority should sit, and whether that boundary aligns with the physical and operational realities of the plant floor.

From Redundancy to Fleet Management

Designing the system as three independent operating units solves the correlated failure problem, but it introduces a new one. Once you move from a single cluster to discrete authority domains, you now have a fleet to manage.

Deploying the same application consistently to Units A, B, and C is straightforward for one work cell. The complexity emerges when that pattern is repeated across dozens or hundreds of cells, plants, or remote sites. You need a way to ensure that versions remain aligned, configuration drift is controlled, and updates can be rolled out predictably, without reintroducing a shared runtime dependency.

This is where fleet management becomes critical.

Using Portainer’s Edge compute capabilities, each standalone Docker host or single-node Kubernetes cluster can be registered as an independently managed endpoint. These endpoints can be organized into logical deployment groups that reflect the A+B+C redundancy sets. Application definitions can then be targeted to these groups, ensuring consistent deployment across each discrete unit while preserving their operational independence.

The important distinction is that management is centralized at the control level, not at the data or runtime level. Each operating unit remains sovereign. If connectivity to the management plane is lost, workloads continue running locally. Redundancy is preserved because execution authority does not depend on a shared cluster quorum.

In large industrial estates, this separation between runtime independence and centralized governance allows the A+B+C model to scale without reintroducing the very shared failure domains it was designed to eliminate.

Do We Actually Want DevOps in the Factory, Or Does It Just Sound Good?

Neil, CEO of Portainer.io — Thu, 19 Feb 2026 19:58:58 GMT

I keep hearing the phrase “we need DevOps on the factory floor,” and every time it comes up in conversation I find myself wanting to slow things down rather than speed them up, because before we adopt anything, especially something that originated in enterprise IT, we should be very clear about what it actually means in practice. We need to be asking whether we are solving the right problem, or whether we are importing terminology that feels modern without fully understanding the operational implications.

In IT, DevOps was never primarily about Kubernetes, CI pipelines, or dashboards, even though those tools often get bundled into the story. At its core, DevOps is an ownership model. The team that builds the software owns it in production. They are responsible for availability, performance, resilience, and support. Not as a final escalation point after three handoffs, but as the first call when something breaks.

That model makes a lot of sense in organizations that genuinely write and evolve their own business-critical systems. If you build the payment engine, the booking system, or the analytics platform, then it is entirely reasonable that you are accountable when it fails. You have the source code, you understand the design decisions, and you can instrument and iterate on it as needed.

Now compare that with what is actually running in most industrial environments.

The Industrial Reality: Vendor Software and Ownership Gaps

In that environment, what exactly does DevOps mean?

Walk through a modern plant and you are far more likely to encounter commercial software than internally developed applications. You will see systems such as Ignition, Litmus Edge, Softing dataFEED, MaintainX, or Tatsoft FrameworX running on servers or edge devices. These are not applications written by an in-house development team that lives inside the codebase. They are vendor products with their own release cycles, their own support models, and their own architectural constraints.

If Ignition has a module issue, are your engineers stepping through the vendor’s core code and issuing patches, or are they opening a support ticket? If a Litmus upgrade introduces unexpected behavior, are you rolling back via a Git commit that changes your own code, or are you coordinating with the vendor and planning a controlled downgrade? In most plants, the answer is obvious. You do not own the source, and you are not rewriting core functionality at 2 a.m. to restore production.

That distinction matters, because DevOps assumes ownership of the application lifecycle, not just ownership of the server it runs on.

What most factories actually need is not DevOps in the purist sense, but disciplined deployment automation. They need repeatability, consistency across sites, reduced configuration drift, and a way to make changes without introducing human error. That is a very real requirement, especially as Industry 4 modernization projects scale from one pilot line to multiple facilities across regions.

Deployment automation says we want to install and configure vendor software in a consistent way, we want environments that look the same in Plant A and Plant B, and we want documented, auditable change processes. It does not say that we are going to adopt a full “you build it, you run it” culture for code we did not write.

Even GitOps, which often gets presented as the natural extension of DevOps, needs to be examined through this lens. Tools such as Argo CD and Flux provide a clean model in which the desired state lives in Git and the runtime environment continuously reconciles itself to match. In a SaaS company optimizing for speed, that is elegant. A commit lands, the cluster updates, and the system converges automatically.

Now place that model inside a factory that runs twenty-four hours a day and has tightly scheduled maintenance windows. An automatic reconciliation that upgrades a critical service mid-shift may be technically correct, but operationally unacceptable. In many industrial environments, changes need to align with planned downtime, line changeovers, or explicit management approval. The concept of continuous, autonomous deployment can clash directly with the risk tolerance of the plant manager.

That does not mean automation is wrong. It means the trigger model matters. You might still define desired state in version control and use automated pipelines to execute changes, but you gate execution behind an external approval that aligns with operational reality. Automation, yes. Blind reconciliation, probably not.

Being Honest About Which World You’re In

There are, of course, industrial contexts where a true DevOps model makes sense. If your organization builds its own data pipelines, custom dashboards, proprietary optimization logic, or machine learning models that drive production decisions, then the team that creates those assets should absolutely own them in production. In that scenario, DevOps is not theater. It is accountability.

The key is being honest about which world you are in.

If your plant primarily deploys vendor products and integrates them at the configuration level, then what you are solving for is lifecycle management and deployment consistency. If you are building and evolving core logic that differentiates your operation, then DevOps becomes relevant as an organizational model.

Before we declare that “the factory needs DevOps,” it is worth asking a few direct questions. Do we actually write and maintain the software that runs our production environment? Who is accountable when it fails at 2 a.m.? How much autonomous change is the plant willing to tolerate in the name of agility?

Modernization in Industry 4 is not about copying enterprise IT patterns wholesale. It is about applying the right operational constructs to the realities of production environments. As engineers, we owe it to ourselves to separate buzzwords from mechanisms, and to design systems that respect both uptime and ownership, rather than assuming that what worked in a cloud-native startup will automatically translate to the factory floor.

Disaster Recovery Patterns for Kubernetes-Based Applications

Neil, CEO of Portainer.io — Tue, 10 Feb 2026 15:35:24 GMT

This is a question we are often asked: What is the best approach to delivering a high SLA for application availability? In fact, we are actually asked “how do we implement DR in Kubernetes”, but what’s generally meant is application availability.

In order to come up with an answer to this question, organizations must first decide whether they want to hide failure from applications through infrastructure abstraction (as was popularized with VMware Fault Tolerance/High Availability), or whether they want applications to be explicitly resilient to failure across locations (as is recommended by the Cloud Providers with their multi-region architectures).

Kubernetes, as a fundamental technology, supports both approaches, but it does not neutralize the trade-offs between them. Each model places complexity in a different layer of the stack, carries different operational risks, and scales in very different ways as environments grow. This document outlines two common patterns used to provide disaster recovery for Kubernetes workloads, and examines their characteristics, dependencies, and limitations.

The first pattern focuses on transparent multi-site failover using a single stretched Kubernetes cluster. The second focuses on application-level resilience using multiple independent clusters with external traffic management.

Option 1: Transparent Multi-Site Failover Using a Stretched Kubernetes Cluster

Architectural Overview

In this model, a Kubernetes cluster is treated as a single logical system spanning multiple physical locations. A primary and secondary site host an equal number of controlplane nodes and as many worker nodes as needed to run x% of the application (x depends on the desire for active/active or active/passive distribution of load). A third location acts as a witness to maintain quorum for the control plane and hosts a single “tie break” control-plane node. From the perspective of applications and operators, there is one cluster, one API endpoint, and one set of workloads.

The objective is to ensure that a site failure does not require application awareness or intervention. Workloads should continue running, or restart automatically, without changes to application configuration or client access patterns.

Typical Technical Characteristics

A stretched cluster architecture usually relies on several tightly coupled infrastructure components:

The Kubernetes control plane is distributed across sites, with etcd members placed in each location to maintain quorum. This requires predictable latency and highly reliable connectivity between sites, as etcd is sensitive to both delay and packet loss.

Layer 2 networking is extended between locations so that pod IPs, service IPs, and node subnets remain consistent regardless of where workloads are running. This often involves stretched VLANs or overlay networks that span data centers.

Persistent storage (if used) is replicated between sites, commonly using synchronous or near-synchronous replication. From Kubernetes’ point of view, a persistent volume must remain accessible (and using a consistent identity, eg IP address or FDQN) regardless of which site a pod is scheduled in.

Ingress and egress traffic is typically handled by externalized load balancers or network appliances capable of redirecting traffic without changing service endpoints. These components must also be highly available and aware of site health. These devices normally front the presentation of the application, which is exposed as a “hostport” to ensure that traffic is only directed to worker nodes that actually host pods.

Operational Implications

The primary advantage of this approach is transparency. Applications generally do not need to be modified, and failover can be fast if the underlying infrastructure behaves as expected. This makes the model attractive for legacy applications or commercial software that cannot easily be changed.

The trade-offs are mostly operational. The cluster becomes extremely sensitive to network instability, especially as the distance between sites increases. A transient network issue can impact the control plane, storage replication, or both, even if application workloads themselves are healthy. Often, timeouts are increased to accommodate network disruptions, but these same timeouts then directly delay failover when a real site failure occurs.

The blast radius of failure is also large. Because the cluster is a single failure domain, misconfiguration, failed upgrades, or control plane instability can affect all sites simultaneously. Maintenance operations such as upgrades, certificate rotation, or network changes must be planned and executed with extreme care.

Cost and complexity tend to rise over time. Stretched networking, replicated storage, and specialized load-balancing infrastructure are not only expensive to deploy but also expensive to operate and troubleshoot. This model is typically viable only within metro distances and well-controlled network environments.

Option 2: Application-Level Resilience Using Multiple Independent Clusters

Architectural Overview

In this model, each site runs its own independent Kubernetes cluster, with no shared control plane, networking, or storage. Clusters are treated as isolated failure domains rather than extensions of a single system.

Applications are deployed concurrently across two or more clusters. Rather than relying on Kubernetes to fail workloads between sites, availability is managed externally through traffic routing, and data consistency is managed explicitly within the application and its data layer (eg with DB replication).

Typical Technical Characteristics

Each Kubernetes cluster operates independently, with its own control plane, networking, and storage stack. There is no requirement for low-latency connectivity or stretched Layer2 networks between clusters, beyond what the application itself needs for data replication or coordination.

Applications are deployed into isolated subnets or network segments per cluster. This reduces coupling and ensures that failures remain local to a single environment.

A geo-distributed load balancer or DNS-based traffic manager sits in front of the application. It continuously performs health checks against application endpoints and routes traffic only to healthy backends. If an entire cluster becomes unavailable, traffic is simply directed elsewhere.

Stateful components handle consistency at the application or data layer. This may involve database replication, leader election, quorum-based writes, eventual consistency models, or application-specific reconciliation mechanisms, depending on the workload’s requirements.

Operational Implications

This approach shifts complexity away from infrastructure and into application design. Applications must tolerate concurrent execution across sites and handle partial failure gracefully. For some legacy workloads, this can require significant redesign or may not be feasible at all.

However, the operational characteristics are more predictable. Each cluster can be operated, upgraded, and even deliberately taken offline without directly impacting others. Failure testing is simpler because disaster scenarios can be exercised by shutting down entire clusters rather than simulating partial infrastructure faults. This configuration facilitates “blue/green” deployment modes, meaning the value is far beyond failure prevention.

The blast radius of failures is smaller by design. A control plane issue, storage problem, or misconfiguration affects only one cluster. Scaling to additional regions or sites becomes a repeatable pattern rather than an architectural redesign. Taken to its extreme, a pool of single node clusters could in fact deliver higher availability than a single multi-node cluster.

This model aligns well with cloud-native principles and long-term scalability, particularly for organizations operating across wide geographic regions or hybrid environments.

Comparative Considerations

While both approaches can provide disaster recovery, and in fact fault tolerance, they optimize for very different outcomes.

Stretched clusters prioritize application transparency at the cost of infrastructure complexity and operational risk. They work best when sites are close together, networks are highly reliable, and application change is not an option.

Application-level resilience prioritizes isolation and scalability, but requires explicit ownership of failure handling within the application stack. It is generally better suited to modern applications, geographically distributed deployments, and organizations willing to invest in resilience as a design principle rather than an infrastructure feature.

Importantly, Kubernetes itself does not remove the need to choose. It enables both patterns, but the long-term sustainability of each depends on factors outside Kubernetes, including network topology, storage architecture, application design, and organizational maturity.

So, what’s the right answer?

Disaster recovery in Kubernetes is not a single problem with a single solution, and so therefore there is no one “right” answer. Transparent failover and application-level resilience represent fundamentally different philosophies about how systems should behave under failure.

The first attempts to hide failure through infrastructure abstraction. The second assumes failure is inevitable and designs applications to survive it. Kubernetes can support both, but it does not make their trade-offs disappear. Kubernetes’ natural affinity is to the second option.

Selecting the right approach requires an honest assessment of application constraints, operational tolerance for complexity, geographic distribution, and the organization’s ability to design and operate for failure rather than against it.

ShadowAI is already everywhere. Enterprises just haven’t admitted it yet.

Neil, CEO of Portainer.io — Fri, 30 Jan 2026 02:05:05 GMT

This line of thinking crystallized for me in an investor group meeting I was in yesterday. One of the companies pitching was building AI compliance and monitoring software. Their entire business exists to help enterprises understand how employees are using LLMs at work, and more importantly, what data is leaving the organization when they do.

What caught my attention wasn’t the product. It was the problem they were describing.

In a surprising number of enterprises, there is no corporate AI subscription at all. Instead, staff are using free tiers or paid, often reimbursed, personal accounts to get their work done. Copying and pasting from internal systems, documents, tickets, and codebases into public LLMs that the organization has zero visibility into and zero contractual relationship with.

This is Shadow IT again, just with a new name. ShadowAI.

Against that backdrop, it’s not surprising that more and more organizations are rolling out formal AI usage policies. On the surface, these read like common sense. Don’t paste customer data. Don’t paste internal documents. Don’t paste source code. The subtext, however, is far more revealing. Enterprises are trying to draw a hard boundary around what is allowed to leave the building and end up in the hands of global AI players.

Once those policies exist, enforcement tends to follow quickly. Security teams get involved. Browser extensions are flagged. New tools appear, like the one I mentioned earlier, whose sole purpose is to detect when employees use public LLMs and paste something in that they probably shouldn’t. In some environments, the final move is the blunt one. Block the LLM at DNS and move on.

The “do not train on my data” checkbox, which many corporate policies hang off, doesn’t materially change the risk calculation. From an enterprise perspective, it’s still an external promise they can’t independently verify. Once data leaves the organization, control is gone. Auditability is gone. Legal certainty becomes fuzzy very quickly.

So we end up in an awkward place. Individual leaders and workers are convinced LLMs will change how knowledge work is done, while the organization struggles to justify the cost of an enterprise-wide AI subscription. Rock, meet hard place. Hence, ShadowAI.

What’s interesting is that the time, effort, and cost being poured into controlling AI usage could quite easily be redirected into equipping staff with an enterprise-sanctioned LLM. But here we are.

At the same time, security teams increasingly see public LLMs as a data exfiltration path with a conversational interface. The emergence of MCP servers arguably makes this tension worse, or better, depending on which side of the fence you sit on.

If you follow that tension to its logical conclusion, the outcome probably isn’t “no AI at work.” The more likely outcome is “AI, but inside the fence.”

Which points to decentralized or privately hosted LLMs.

Instead of sending prompts to a shared public model, the model runs within the enterprise boundary. Models are shared by industry, fine-tuned locally, wired into internal systems, and never exposed to the public internet. At that point, LLMs stop being a novelty and start looking like a core business application. Something owned and operated by IT, running on the right hardware, sized for the organization, and justified like any other enterprise system.

The conversation shifts from model size to economics. What does it cost to self-host a model that is “good enough” to be used across the business, versus the cost of an enterprise LLM subscription, versus simply accepting that ShadowAI is an unmanaged but tolerated risk?

The more interesting question is timing. How long until self-hosted LLMs become the default way LLMs are consumed in an enterprise or business context?

My suspicion is sooner than most expect. Regulated industries will lead. One or two high-profile data leakage incidents will accelerate things. Once enterprises realize they can get LLM capability without punching a hole in their trust boundary, the decision becomes fairly obvious.

Public LLMs don’t disappear in this world. They just get repositioned. Great consumer tools. Great learning aids. Less likely to be where serious enterprise work happens.

So when I see enterprises tightening AI policies, blocking endpoints, and deploying detection tooling, I don’t see resistance to AI. I see early signals of where AI is actually heading.

When organizations start treating something as infrastructure, it usually means it’s here to stay.

The CNCF is right, just not about everybody

Neil, CEO of Portainer.io — Mon, 26 Jan 2026 21:59:46 GMT

I spent some time reading the latest CNCF annual survey this week. It’s a solid report, well put together, and full of useful data. But as I was reading it, I couldn’t drop a recurring internal dialog… this is a bit like asking farmers if they own tractors.

If you’re responding to a CNCF survey, you’re already in the cloud native ecosystem. You’ve already accepted containers. You’re probably already running Kubernetes somewhere. You might even enjoy it. So when the headline says that most container users are running Kubernetes in production, my reaction isn’t surprise. It’s more a quiet “of course they are.”

That doesn’t make the data wrong. It just means we need to be honest about what it actually represents.

What this survey really shows is not where the enterprise market is, but where the cloud native conversation currently ends. And that distinction matters more than we tend to admit.

Inside the CNCF bubble, Kubernetes is now described as “boring,” and that’s meant as praise. Stable. Mature. Predictable. And to be fair, that’s largely true if you look purely at the core orchestration layer. Kubernetes itself has grown up.

But read between the lines, and the same survey quietly tells a very different story.

The biggest challenges organizations report are no longer technical. They’re cultural. Alignment between teams, ways of working, ownership boundaries, and expectations. If your organization still thinks in terms of packaged enterprise software, regulated releases, change advisory boards, and a generally risk-averse view of operations, that’s usually where Kubernetes starts to feel awkward rather than empowering.

GitOps is another good example. It’s often talked about as if it’s table stakes, or even a prerequisite. But the data tells a more grounded story. Even within this already self-selected audience, GitOps adoption is effectively zero at the early stages and only really shows up at the most mature end of the spectrum. That tells you something important. GitOps isn’t a starting point. It’s an outcome.

Then there’s AI, which is where things get especially interesting.

There’s a lot of noise right now about Kubernetes becoming the AI platform. And again, inside this ecosystem, that’s broadly true. But if you look at how organizations are actually using AI, the picture is far less dramatic. Most aren’t training models. Most are consuming them from a cloud service provider. Only a small minority are operating anything close to what you’d call large self-hosted models.

What’s really happening is that AI workloads are entering organizations through vendors, internal experiments, and packaged platforms. And those workloads come with very real infrastructure consequences. Cost. Scaling. Governance. Operational strain. The survey even calls out concerns around machine-driven usage putting stress on already fragile infrastructure.

That’s not a future problem. It’s already happening.

And this is where the survey becomes more interesting for what it *doesn’t* show.

If this is the reality inside the CNCF bubble, imagine what it looks like outside it.

Most enterprises don’t think of themselves as cloud native. They don’t attend KubeCon (remember, only 17,000 people attend KubeCon, likely representing around 5,000 organizations). They don’t want to assemble an ecosystem of CNCF projects. They run commercial software, internal line-of-business applications, and systems that need to be stable long before they need to be elegant.

But Kubernetes is still arriving anyway.

It’s arriving because vendors are shipping on it. Because infrastructure teams are standardizing on it. Because AI platforms assume it. And slowly, often without a formal decision, it becomes part of the furniture.

That’s where I think the real gap is.

The CNCF survey shows what Kubernetes looks like after an organization has already crossed the line and invested heavily in skills, tooling, and cultural change. What it doesn’t show is the much larger group of organizations that are about to inherit Kubernetes without wanting to become cloud native purists or platform engineering shops.

Those organizations don’t want a platform journey. They want Kubernetes to behave like infrastructure. Something that can be operated, secured, audited, supported, and if needed, handed over or unwound without drama.

They want fewer moving parts, not more. Fewer bespoke workflows, not endless YAML. And above all, they want optionality. The ability to bring things in-house, outsource them, or change direction without rewriting the entire operating model.

So my takeaway from the survey wasn’t “Kubernetes has won.” That battle is already over.

My takeaway was that the next phase isn’t about adoption. It’s about containment.

How do you stop Kubernetes from turning into a fragile dependency that only a small group understands? How do you let teams use it productively without forcing everyone to become an expert? How do you say yes to modern workloads, including AI, without quietly signing up for years of internal platform debt?

Those are the questions most enterprises are about to start asking, even if they don’t realize it yet.

And interestingly, the CNCF data already hints at the answers. Standardization beats novelty. Operational maturity beats cleverness. And boring, well-governed infrastructure turns out to be a competitive advantage.

Which, honestly, is a much more useful place for this industry to be than another year of arguing about which tool is “best.”

When containers meet reality: why the Portainer Industrial App Portal had to exist

Neil, CEO of Portainer.io — Thu, 22 Jan 2026 23:19:19 GMT

What happens after you successfully containerise an application?

Not in a lab. Not on a laptop. But across plants, regions, security zones, and teams that were never meant to be Docker experts.

This was the question at the heart of a recent episode of Problem Solvers: In the Trenches, a podcast produced by Industry 4.0 Solutions and hosted by Walker Reynolds. The session featured Matt Parris, Director of Quality Test Systems (Industry 4.0) at GE Appliances, and explored a problem many industrial organisations are quietly running into: containers scale faster than the operational models used to manage them.

Matt is not a theorist. He is an end user, operating containerised infrastructure in real manufacturing environments, at real scale, with real consequences when things go wrong. He is also a Portainer customer, and critically, one of the practitioners who co-designed what is now the Industrial App Portal in collaboration with Portainer.

This blog captures the core insight from that conversation. If you want the full technical depth, nuance, and real-world examples, we strongly recommend watching the podcast itself. The discussion goes far deeper than any written summary can.

Containers solved deployment, then exposed everything else

Containers earned their place in industrial environments for good reasons. They standardised application packaging, reduced dependency conflicts, and made it possible to redeploy software consistently across very different hardware and operating contexts. For manufacturing teams trying to escape brittle, hand-crafted bare-metal installs, Docker was a genuine breakthrough.

But as Matt explained during the podcast, containers did not eliminate complexity. They relocated it.

Once applications moved into images and Compose files, the challenge shifted to configuration sprawl. Environment variables, certificates, secrets, location-specific parameters, version mismatches, and upgrade paths multiplied rapidly. The application itself was stable, but the way it was deployed was anything but.

This is where many container strategies quietly stall. The proof of concept works. A handful of edge devices are manageable. Then scale arrives, and suddenly no one can confidently answer simple questions like which version of an application is running where, why behaviour differs between plants, who is allowed to deploy or upgrade software, or what breaks when a change is made.

Edge orchestration helped, until it didn’t

Portainer’s Edge Agent solved an important part of this problem early on. By allowing devices to initiate outbound connections to a central control plane, Edge made large-scale management feasible without fragile inbound networking or SSH sprawl. Devices could be onboarded in bulk, grouped, and updated centrally.

For many teams, this was transformational.

But as discussed on the podcast, Edge groups eventually become a cognitive tax. When applications, versions, variants, and locations are all managed implicitly through group membership, reasoning about system state becomes harder over time, not easier. The system works, but understanding why it works becomes increasingly fragile.

This is the point where most organisations realise they are missing something more fundamental.

The missing role: platform administration

One of the most important insights from the conversation was the identification of a role that rarely exists explicitly in OT environments: the platform administrator.

This is not an application developer, and not an application user. It is the role responsible for defining how applications are deployed safely and consistently. In Matt’s words, this means creating application recipes.

An application recipe defines which versions are supported, which configurations are valid, what inputs are required, and what defaults should be enforced. It turns deployment from an engineering task into an operational workflow.

Without this layer, organisations rely on tribal knowledge and undocumented conventions. That works until people leave, plants are added, or responsibility shifts to a new team. At scale, it always breaks.

Four dimensions that don’t scale quietly

As deployments grow, application management becomes a four-dimensional problem: the application itself, its configuration variants, the location where it runs, and the version lifecycle over time.

Traditional container tooling can technically handle all of this, but only by pushing complexity onto the operator. Edge groups and stacks multiply, relationships become implicit, and operational confidence erodes.

This is the problem the Industrial App Portal was designed to address.

Why the Industrial App Portal exists

The Industrial App Portal did not start as a product roadmap item. It emerged directly from customer conversations and was grounded in the operational reality shared by GE Appliances.

The core idea is simple: separate power from usability.

The App Portal sits above existing Portainer Servers, acting as a unifying layer. Portainer Servers connect upward to it, just as Edge Agents connect upward to Portainer. Nothing is replaced. Complexity is reorganised.

From the user’s perspective, two complementary views emerge. An application-centric view makes it easy to see where an application is deployed, which versions are in use, and where upgrades are required. A device-centric view shows how individual machines are configured and what workloads they are running. Both views are navigable through a shared hierarchy that reflects how manufacturing environments are actually organised.

Deployment becomes a guided process. Select the application. Choose a supported variant and version. Target the device or group. Provide required inputs. Deploy. Behind the scenes, the App Portal manages edge stacks, group associations, and lifecycle consistency automatically.

This is not about reducing capability. It is about reducing cognitive load.

Built with customers, not for slide decks

What makes the Industrial App Portal different is not just what it does, but how it came to be built. Throughout the podcast, Matt described iterative design discussions, UI mockups, workflow testing, and rapid feedback loops with the Portainer engineering team.

This was not a theoretical exercise. It was a customer bringing real problems to the table, and a product team willing to reshape the experience around them.

That collaboration is visible in the result.

Watch the full podcast

This blog only scratches the surface. The full Problem Solvers: In the Trenches episode dives deeply into edge orchestration, operating system realities, immutable infrastructure, and what actually breaks when container strategies hit scale.

If you are responsible for deploying or operating containerised applications in industrial or distributed environments, it is well worth your time.

👉 Watch the full Problem Solvers: In the Trenches podcast, produced by Industry 4.0 Solutions and hosted by Walker Reynolds, to hear directly from Matt Parris on how the Industrial App Portal was shaped by real-world manufacturing needs.

Sometimes the most important product ideas do not come from roadmaps. They come from customers who have already lived the problem.

Why “operator control plane” is becoming the missing layer in container operations

Neil, CEO of Portainer.io — Thu, 08 Jan 2026 20:39:14 GMT

For a long time, container conversations have been framed around developers. Developer experience, developer velocity, developer tooling. That framing made sense when containers were new, and Kubernetes’ early adoption was driven by development teams seeking to break free from the limitations of legacy virtualization platforms.

That is no longer the dominant reality.

Today, a growing share of container environments is operated by small teams, often IT generalists or operations engineers, running a mix of Docker and Kubernetes across multiple locations. These environments are not greenfield. They are not cloud-native purity plays. They are not staffed with deep platform engineering teams. They are production systems that the business expects to work quietly, predictably, and without heroics.

This is where the idea of an operator control plane starts to matter.

From “clusters” to “fleets”.

The moment an organization has more than one container environment, the unit of management changes. It stops being “the cluster” and becomes “the fleet.”

That fleet might include on-premises Kubernetes, cloud Kubernetes, standalone Docker hosts, edge nodes, or air-gapped systems. Connectivity may be partial. Ownership may be split across teams. Some environments may be modern, others legacy, and most are business critical.

What operators are trying to answer in this world is not “how do I configure Kubernetes,” but questions like:

How do we apply consistent access control everywhere?
How do we deploy applications safely without giving everyone cluster-admin?
How do we see what is running, who changed it, and whether it drifted?
How do we operate all of this without doubling the size of the team?

Those are control plane questions, not orchestration questions.

Kubernetes is not the control plane

Kubernetes is very good at orchestrating containers inside a single environment. It is not very good at letting operators manage a fleet of clusters, and operators often find themselves managing many environments, especially when those environments differ in shape, connectivity, or maturity.

This is why so many organizations end up with a sprawl of tools. One for cluster access. One for GitOps. One for secrets. One for policy. One for visibility. Each tool solves a real problem, but together they create a new one: a disjointed operational experience, creating cracks in security, quality, performance, uptime, manageability, and operational overhead that grows faster than the business value being delivered.

The irony is that the more “cloud-native” the toolchain becomes, the more specialized the team required to run it. For organizations without that luxury, complexity is not a badge of sophistication. It is a liability.

What an operator control plane actually does

An operator control plane sits above individual container environments and focuses on how humans operate them at scale.

It treats Docker and Kubernetes as execution substrates, not as user interfaces. It centralizes access control, visibility, application delivery, and governance in a way that reflects how real teams work.

In practice, this means a few key things.

First, the control plane understands fleets. Operators manage groups of environments, not one cluster at a time. Policies, access rules, and deployment patterns apply consistently across that fleet.

Second, it is designed for delegation. Teams can deploy and manage what they are responsible for without being handed global administrative access. Guardrails are built into the workflow, not bolted on afterward.

Third, it reduces cognitive load. Operators should not need to remember which tool to use for which environment, or which bespoke process applies where. The control plane provides a consistent operational surface.

Finally, it acknowledges constraints. Edge, air-gapped, and intermittently connected environments are first-class citizens, not edge cases. The control plane works with reality rather than assuming perfect connectivity and unlimited staff.

Why this matters to the business

From a business perspective, the value of an operator control plane is not theoretical. It shows up in fewer outages caused by configuration drift, fewer security exceptions created just to get work done, and fewer bespoke processes that only one person understands.

Most importantly, it caps operational overhead. As the fleet grows, the team does not have to grow at the same rate. That is the difference between containers being an enabler and containers becoming a tax.

This is also why these conversations increasingly originate from IT leadership rather than engineering. The question being asked is not “what is the most powerful tool,” but “what is the most sustainable way to run this over the next five years.”

Where Portainer fits

This is the context in which Portainer.io makes sense.

Portainer was not built to replace Kubernetes, and it was not built to turn every operator into a platform engineer. It was built to act as an operator control plane across fleets of container environments, spanning Docker and Kubernetes, with a focus on visibility, access control, application delivery, and governance.

That is why it resonates most strongly with overloaded teams, distributed environments, and organizations that care about reducing operational overhead and ensuring configuration consistency, without inheriting an entire cloud-native toolchain on day one.

In other words, Portainer aligns with how containers are actually being operated today, not how the ecosystem wishes they were.

The shift happening now

The industry narrative is slowly catching up to this reality. Containers are no longer new. Kubernetes is no longer exotic. The hard part is operating them consistently at scale, with limited people, limited tolerance for risk, and a business that just wants things to work.

The shift toward operator control planes is not about dumbing things down. It is about recognizing that operational excellence is a requirement in its own right, and that fleets of container environments need a different abstraction layer than individual clusters.

That shift is already underway. The only question is whether organizations acknowledge it early, or discover it the hard way.

GitOps Control Models, Part 2: Flux CD vs Portainer

Neil, CEO of Portainer.io — Mon, 29 Dec 2025 05:25:49 GMT

(edit 30th Dec after a correction re hub+spoke)

In Kubernetes environments, GitOps typically means using Git as the source of truth for deployment configuration and relying on automation to apply and maintain that state in clusters. GitOps tools differ primarily in how desired state is represented, when it is enforced, and where enforcement logic runs.

Part 1 of this series examined the architectural differences between Argo CD and Portainer GitOps, focusing on continuous in-cluster reconciliation versus deployment-time enforcement.

This Part 2 extends that comparison by introducing Flux CD, and examining how its cluster-native GitOps model compares to Portainer GitOps at the implementation level. The focus remains on reconciliation behavior, deployment and coordination models, and the resulting operational impact.

How Flux CD implements and deploys GitOps

Flux CD is a Kubernetes-native GitOps toolkit implemented as a set of Kubernetes controllers that continuously reconcile cluster resources against Git-defined desired state. These controllers may run per cluster, or centrally in a hub cluster when using hub and spoke deployments.

Like Argo CD, Flux compares declared resource definitions in Git with live cluster state and applies changes to bring the system back into alignment. This applies both to changes introduced through Git and to manual changes made directly in the cluster.

Where Flux differs is not in the reconciliation mechanism, but in how authority is expressed.

In Flux, reconciliation is always performed by Kubernetes controllers acting on declarative Git-defined state. Controllers may be distributed per cluster or centralized in a hub, but Flux does not introduce a persistent operational control plane, environment inventory, or imperative management layer outside Git and Kubernetes resources. Authentication, targeting, and coordination are still expressed through Kubernetes primitives and Git configuration rather than a centralized runtime authority.

Because reconciliation is continuous, Git remains authoritative at all times. Configuration drift is detected and corrected automatically by reconciliation loops, regardless of whether those loops are running inside each cluster or centrally as part of a hub and spoke deployment.

Deployment and coordination model

Flux supports two common deployment patterns, and the difference matters.

In the default per cluster model, each Kubernetes cluster runs its own Flux controllers, authenticates to Git, and continuously reconciles its own desired state. This is the pattern most people mean when they say Flux is cluster native, it keeps enforcement authority inside the cluster boundary.

Flux can also be deployed in a hub and spoke model. In this pattern, Flux controllers run centrally in a hub cluster and reconcile workloads into multiple spoke clusters by connecting to their Kubernetes APIs using kubeconfigs stored in the hub. Coordination can be scaled further using Flux Operator constructs such as ResourceSets, which generate per cluster Kustomizations and HelmReleases from shared definitions targeting multiple clusters.

Across both patterns, the defining characteristic remains unchanged. Flux is still a continuous reconciliation system where desired state is expressed declaratively in Git and enforced by Kubernetes controllers. Centralizing where the controllers run does not turn Flux into an operational control plane, it centralizes reconciliation execution while keeping the operating model Git first and controller driven.

How Portainer GitOps implements and deploys GitOps

Portainer implements GitOps by treating Git as the authoritative source for application deployment state and enforcing that state during deployment events.

Git change detection runs centrally within the Portainer management plane on an administrator-defined schedule. Managed clusters do not poll Git repositories and do not participate in change detection.

When a deployment or update event occurs, Portainer retrieves resource definitions from Git and applies them to the target environment through the Kubernetes API. Any divergence from Git-defined state present at that moment is overwritten as part of the deployment process. Between deployment events, Portainer does not observe or reconcile live cluster state.

Portainer GitOps does not require GitOps controllers to run inside managed clusters. GitOps functionality is provided as part of the existing Portainer deployment, and enabling GitOps does not increase the runtime footprint of managed clusters.

This model centralizes reconciliation logic and minimizes background activity within clusters.

Reconciliation behavior and load placement

Flux CD continuously reconciles live cluster state against Git. Controllers perform ongoing comparison and correction regardless of whether new Git changes occur. This provides immediate drift detection and enforcement, at the cost of constant reconciliation activity within every cluster.

Portainer GitOps reconciles application state only during deployment and update events. Git repositories are monitored centrally, and clusters incur no reconciliation overhead unless an update is applied. Enforcement is deterministic at deployment time rather than continuous.

The distinction is not whether Git can overwrite drift. Both tools can. The distinction is when enforcement happens and where the reconciliation work runs.

Security and operational impact

Flux treats Git as the primary source of operational authority, with reconciliation performed by Kubernetes controllers acting on declarative state. Depending on deployment model, Git credentials and reconciliation logic may reside per cluster or centrally in a hub, but enforcement remains expressed through Kubernetes-native resources rather than a persistent runtime control plane. Operational consistency across clusters is achieved through Git repository structure, conventions, and automation, not through centralized operational governance.

Portainer centralizes Git access, change detection, and enforcement logic. Managed clusters do not require Git credentials and are engaged only when updates are applied. This reduces cluster exposure and background activity, trading continuous enforcement for predictable, event-driven operations.

When to choose Argo CD, Flux CD, or Portainer

At a high level, the architectural differences between the three approaches can be summarized by where reconciliation runs and when enforcement occurs.

Argo CD enforces desired state through continuous reconciliation loops running inside Kubernetes. Reconciliation logic operates continuously regardless of deployment activity, with cluster state observed and corrected in real time.

Flux CD also enforces desired state through continuous reconciliation loops, but does so using Kubernetes-native controllers driven entirely by declarative Git state. Reconciliation may run per cluster or centrally via hub-and-spoke deployments, but Flux does not introduce a persistent operational control plane. Coordination across clusters is achieved through Git structure, Kubernetes resources, and automation rather than centralized runtime governance.

Portainer GitOps enforces desired state deterministically during deployment events. Git change detection and reconciliation logic run centrally in the Portainer management plane, and clusters are involved only when updates are applied. Between deployments, cluster state is not continuously observed or corrected.

All three tools can overwrite configuration drift. The defining differences are when enforcement happens, where reconciliation work runs, and how much operational authority is centralized versus distributed.

Those differences define the architectural trade-offs between Argo CD, Flux CD, and Portainer GitOps.

My 2025 market wrap...

Neil, CEO of Portainer.io — Wed, 24 Dec 2025 02:14:24 GMT

After a year of conversations with customers, community members, and operators across a wide range of environments, a few consistent themes kept surfacing.

The first is that home users and enterprises are no longer adjacent audiences with subtly different needs; they are fundamentally different users with different goals. Home users, including NAS users and those running home automation or personal services, want something deliberately simple and low-friction. As Portainer has evolved into a fleet-wide operations and governance platform, the gap has widened. Tools built for enterprise gravity inevitably become the wrong shape for personal or home use, and pretending otherwise helps no one.

At the same time, Kubernetes has crossed the adoption chasm. It is everywhere now, at least in name. Yet outside the early innovators and cloud natives, there is still a striking lack of understanding about what Kubernetes actually entails operationally. Many teams still treat it as a drop-in replacement for earlier platforms, underestimating the breadth of tooling required to run it reliably, securely, and with genuine high availability. Spinning up a cluster or two is the easy part. The operational envelope is where most organisations find themselves navigating unfamiliar territory.

Docker’s role in all of this continues to diminish. The engine still exists, but it now sits firmly in the category of legacy technology. The ecosystem has moved on, and the capability gap between Docker and Kubernetes is so wide that direct comparison no longer makes sense. For enterprises, Docker is no longer a viable platform choice; it is only a historical stepping stone.

Yet despite Kubernetes becoming mainstream, the cultural posture around it has not fully caught up. There remains a persistent strain of elitism, often from platform engineers who mistake complexity for capability. In trying to demonstrate technical sophistication, they push operational and cognitive load “left” onto internal users who neither want nor need it. The result is friction, shadow platforms, and a gradual erosion of trust. Platforms exist to absorb complexity, not to redistribute it.

This misunderstanding feeds directly into how Internal Developer Platforms are perceived. IDPs are still widely treated as a silver bullet, a way to compensate for a poorly designed Kubernetes foundation. When the underlying platform is fragile or over-engineered, wrapping it in portals and workflows does not fix the problem. It simply obscures it. Two wrongs do not make a right.

Meanwhile, the Kubernetes tooling ecosystem continues to fragment. Each new problem spawns a new tool, often excellent in isolation but rarely designed as part of a coherent whole. Planning a Kubernetes platform now means navigating an ever-expanding constellation of narrowly scoped solutions, each with its own lifecycle, pricing model, and operational overhead. Complexity and cost creep in quietly, one “small” addition at a time.

All of this is amplified by the state of the CNCF landscape itself. It keeps growing, but without meaningful consolidation, refinement, or deprecation. What was once a map is now closer to a catalogue, impressive in scale but increasingly unmanageable for anyone outside the inner circle. For newcomers and pragmatic operators, it is less a guide and more a source of hesitation.

Taken together, what we have seen in 2025 points to a single underlying theme. The industry is still optimising for technical possibility rather than operational reality. Until we design platforms around the people who actually have to run them, rather than those who enjoy building them, the gap between promise and practice will continue to widen. Lets see if 2026 is the year this change occurs.

Argo CD vs Portainer GitOps: An Implementation-Level Comparison

Neil, CEO of Portainer.io — Mon, 22 Dec 2025 22:06:35 GMT

What GitOps means in Kubernetes

This document compares Argo CD and Portainer GitOps at the implementation level, focusing on application definition, reconciliation behavior, deployment model, and operational impact.

How Argo CD implements and deploys GitOps

Argo CD is a standalone product installed directly into Kubernetes as a set of controllers and services that continuously reconcile cluster resources against Git-defined desired state.

Argo CD operates entirely inside Kubernetes. Its controllers monitor Git repositories and compare declared resource definitions with live cluster state. When differences are detected, Argo CD updates cluster resources to bring them back into alignment with Git. This applies to changes introduced through Git as well as manual changes made directly in the cluster.

Because reconciliation is continuous, Git remains authoritative at all times. Configuration drift is detected independently of deployment events and corrected automatically based on policy. Clusters participate directly in this process through ongoing watch and comparison activity, even when no Git changes occur.

Deployment models

For smaller environments, Argo CD is commonly installed into the same cluster it manages. In this model, reconciliation and enforcement happen locally within that cluster, and architectural complexity is minimal.

In larger environments, Argo CD is often deployed into a dedicated management cluster. The control plane runs centrally, while multiple workload clusters are managed remotely.

In security- or network-constrained environments, Argo CD supports an agent-based model in which a lightweight component runs inside each workload cluster and communicates with the central Argo instance. Reconciliation still occurs inside Kubernetes, but network access patterns and credential exposure are reduced.

Across all models, the defining characteristic is unchanged: Argo CD enforces Git-defined desired state continuously using in-cluster reconciliation.

How Portainer GitOps implements and deploys GitOps

Portainer implements GitOps by treating Git as the authoritative source for application deployment state and enforcing that state during deployment events.

Git change detection runs centrally within the Portainer management plane on an administrator-defined schedule. This cadence is configurable, allowing teams to control how frequently repositories are checked and to balance responsiveness against network and Git service load.

A key architectural distinction is that Git polling and change detection do not involve managed clusters. Clusters are engaged only when an update is applied.

When a deployment event occurs, Portainer takes the resource definitions from Git and submits them to the Kubernetes API to update the target environment. Any divergence from Git-defined state present at that moment is overwritten as part of the deployment process. Between deployment events, Portainer does not observe or reconcile live cluster state.

Portainer GitOps does not require a separate GitOps control plane or additional cluster-resident components. GitOps functionality is provided as part of the existing Portainer deployment. Enabling GitOps does not change the runtime footprint of managed clusters.

This model centralizes reconciliation logic and minimizes background activity on clusters.

How applications are defined

Argo CD

In Argo CD, applications are defined as Kubernetes custom resources called Applications.

An Application specifies the Git repository, path, revision, destination cluster, and namespace, along with synchronization behavior such as automatic updates, pruning, and drift handling. Because Applications are Kubernetes-native objects, they can be versioned, templated, and generated programmatically using Kubernetes APIs.

This makes application definition part of the Kubernetes control plane and enables large-scale, multi-cluster GitOps management with strong guarantees.

Portainer GitOps

In Portainer GitOps, applications are defined as deployment objects that reference Git repositories.

A deployment associates a repository with a stack or Kubernetes application and defines how repository contents are applied to the target environment. Portainer tracks the deployed revision and uses Git-defined resources as the authoritative input during redeployment.

There is no separate GitOps-specific abstraction. The deployment itself is the unit of management, keeping application definitions close to deployment artifacts such as manifests or compose files.

Reconciliation behavior and load placement

Argo CD continuously reconciles live cluster state against Git. Controllers running inside Kubernetes perform ongoing comparison and correction, regardless of whether new Git changes occur. This provides immediate drift detection and enforcement, at the cost of continuous cluster-side reconciliation activity.

Portainer GitOps reconciles application state only during deployment events. Git repositories are checked centrally, and clusters incur no reconciliation overhead unless an update is applied. Enforcement is deterministic at deployment time rather than continuous.

The distinction is not whether Git can overwrite drift. Both tools can. The distinction is when enforcement happens and where the work runs.

Security and operational impact

Argo CD provides strong enforcement and fine-grained control, but because it operates continuously inside clusters, misconfiguration can affect large portions of the platform. It assumes GitOps is platform infrastructure owned by Kubernetes specialists.

Portainer centralizes control and limits cluster-side activity. GitOps enforcement is predictable, and cluster exposure is reduced. This trades continuous enforcement for simpler operations and lower background load.

When each tool fits

Argo CD fits environments where Kubernetes is the primary platform and GitOps is treated as a continuously enforced control system, with clusters actively participating in reconciliation.

Portainer GitOps fits environments where GitOps is used to standardize deployments across diverse or constrained environments, with enforcement occurring at deployment time and reconciliation load centralized in the management plane.

Final technical distinction

Argo CD enforces desired state continuously through in-cluster reconciliation loops.

Portainer GitOps enforces desired state deterministically during deployment events, with Git change detection centralized and clusters involved only when updates are applied.

That difference in enforcement timing, deployment model, and load placement defines the architectural tradeoff between the two tools.

If you are interested in how Flux CD changes things, read part 2 here

Docker v29, and the fall-out..

Neil, CEO of Portainer.io — Tue, 16 Dec 2025 00:24:18 GMT

When Docker Engine v29 landed, it raised the minimum supported API version to 1.44. That single change broke a massive chunk of the Docker ecosystem, including ours.

Older versions of Portainer (anything before 2.33.5) were hardcoded to use API versions up to 1.41. Docker 29 rejected those connections outright. Environments wouldn’t load, requests failed, and unless users were watching logs closely, it looked like everything just stopped working. We shipped fixes fast: 2.33.5 in the LTS track, 2.36.0 in STS. We added proper API negotiation and moved on. But not every project was in a position to do the same.

Traefik broke in the same way. The Docker provider was pinned to API v1.24. Anyone running Traefik behind Docker 29 saw errors flood the logs. Fixed in v3.6.1 — after they added proper negotiation logic.

CapRover had a similar issue. Its backend used docker-modem pegged at API v1.43. One version short. Docker 29 rejected it, and the dashboard refused to start. They patched it in v1.14.1.

Watchtower didn’t make it. The official image still uses a Docker client pinned to API v1.25. Once Docker 29 hit, Watchtower broke. Users saw infinite loops of version errors. No patch. No update. It’s been unmaintained for over two years. Some moved to forks, others hacked around it with DOCKER_API_VERSION. But in terms of official support; it’s done.

Dockge also broke. Lightweight Compose dashboard, still actively used, but no patch for Docker 29. Anyone running the latest engine found the backend refused to connect. Still broken as of now.

CasaOS? Same story. Their App Store manager was compiled against Docker API v1.43. Just under the line. Once Docker 29 rolled out, it couldn’t talk to the daemon anymore. No apps, no dashboards. Community floated workarounds, but nothing’s landed upstream.

Yacht, billed as a modern Portainer alternative, is likely broken too. It uses dockerode, the same client CapRover had to patch. But Yacht hasn’t been actively maintained since mid-2023. No fix has been posted. If it’s running against Docker 29, odds are it fails for the same reason: embedded client too old, no negotiation, hard stop.

Swarmpit never had a chance. That project is already archived. It uses the Docker API heavily to monitor containers and services in Swarm. With the daemon now requiring API v1.44, anything compiled against older clients just fails. No updates are coming; the project’s been dead for a while. Docker 29 didn’t just break it; it buried it.

LazyDocker was affected too. Its Go-based Docker client couldn’t negotiate high enough, so connections failed. But the maintainers responded, v0.24.2 shipped with API 1.52 support. It works again.

Testcontainers (Java) broke hard in CI pipelines. Its docker-java dependency defaulted to API 1.32. Docker 29 rejected it. Everything failed. They fixed it in v2.0.2, but until then, people were scrambling to patch over it.

JetBrains IDEs, same deal. Their Docker plugin started throwing 400s. The embedded client wasn’t compliant. Later builds fixed it, but for a while, users had to either downgrade Docker or override daemon settings to keep things functional.

cAdvisor? Also broke. Older builds used outdated Docker clients. Metrics collection just stopped working. Fixed in v0.53.0, but anyone relying on older builds or embedded versions in other stacks saw their dashboards go blank.

The pattern here is obvious: Docker 29 didn’t just enforce a new API floor. It exposed which tools are actually maintained and which ones are just... still running because Docker used to be lenient.

That leniency is gone now.

Everything that assumed the daemon would always tolerate legacy clients is dead in the water. A bunch of “working” tools turned out to be unmaintained, unmonitored, or abandoned. Docker 29 forced them into the open.

It also made the separation clear. The tools that adapted (Traefik, Portainer, CapRover, LazyDocker, Testcontainers) survived because they’re maintained. The rest (Watchtower, Dockge, CasaOS, Yacht, Swarmpit) effectively got decommissioned.

This wasn’t just a version bump. It was a cleanup. If your tool talks to Docker and isn’t being updated, it no longer works. Simple as that.

The Hidden Risk in Your Infrastructure: Why Enterprises Need a Secure Container Management Platform Now

Neil, CEO of Portainer.io — Tue, 09 Dec 2025 00:36:38 GMT

There is a very real problem occurring in IT right now, and it is one that few leaders are prepared to confront. Across almost every enterprise, containers are proliferating at an alarming rate, entirely outside of central oversight. Anyone with even a small amount of technical knowledge can, and often does, spin up their own Docker, Podman, or Kubernetes environment. Some do it for experimentation, others for development, and some even for what they call “productive business use.” Most of the time, that means data analytics.

On the surface, it seems harmless enough. A small test app here, a containerised script there. But underneath, it represents a systemic and growing risk that most CIOs and CISOs are completely blind to.

The problem begins with the simplest truth: none of these environments are secured beyond the defaults. Defaults are designed for convenience, not protection. A personal container setup is rarely hardened, rarely audited, and almost never monitored. Since it is usually created for personal use, the individual running it sees no reason to lock it down.

IT is generally unaware of their existence. These systems sit on developer workstations, virtual machines, or small “personal cloud” servers (sitting under a desk), often running under personal credentials. There are no tickets, no approvals, no network entries in the CMDB, and no monitoring agents installed. They simply do not exist in the official landscape of corporate assets.

Because of the way container networking works, these environments are nearly invisible. All of the network traffic appears to come from the host system’s IP address, which means it looks exactly like normal workstation traffic. The containers could be making outbound connections to anywhere, and to corporate network monitoring tools, it just appears as another laptop browsing the internet or querying APIs.

This invisibility creates the perfect storm for malicious exploitation. Typo-squatting of popular container images is rife on public registries. A developer can accidentally pull a malicious image whose name differs by only a single character from a legitimate one. Done well, the malicious image behaves exactly as expected, running the intended application while silently harvesting credentials or data in the background.

Trend Micro documented one of the most notorious examples of this, involving two containers named “alpine” and “alpine2.” The second, masquerading as a legitimate base image, contained a cryptominer and was propagated by a companion image that scanned the internet for open Docker daemons to infect. Aqua Security later found similar impersonations of official language images, such as “openjdk” and “golang,” that also embedded miners. And an academic study by Virginia Tech and the University of Delaware, researchers demonstrated at scale how easily typo-squatted image names are downloaded and executed by mistake. Small spelling errors, it turns out, can open very large doors.

In environments without zero-trust networking, these containers operate with near unrestricted access to the enterprise network. From the container’s perspective, everything reachable by the host is reachable by it as well. That means databases, internal APIs, and management interfaces may all be within easy reach of unvetted, unmanaged software.

It takes only a few keystrokes for a developer to deploy a reverse tunnel container such as inlets (

https://inlets.dev/

) or even a CloudFlare reverse proxy, exposing an internal-only service to the public internet. In many cases this is done innocently, simply to share a proof of concept or a demo with a colleague or a client. Yet the outcome is the same: an internal service, now exposed, without authentication or inspection.

And if any of those containers are compromised, the leap from container to host is almost instantaneous. A weakly configured host, such as one running with an overly permissive socket or mount, can be compromised in seconds. From there, the attacker inherits the same credentials, network access, and privileges as the user who owns the host.

All of this happens in complete obscurity. The CISO is unaware, the CIO is unaware, and the IT operations team is often only focused on the officially sanctioned container platforms. Ironically, those sanctioned environments are often the very reason shadow platforms exist. When governance rules are too strict, or the user experience too painful, people take matters into their own hands. They install Docker, Podman, or Kubernetes locally and build what they need to get their work done. The result is a parallel, unmanaged container ecosystem living quietly inside the enterprise.

Why this keeps happening

This is not a problem that can be solved through prohibition. You cannot simply ban non-IT-sanctioned deployments of Docker, Podman, or Kubernetes. Doing so will not eliminate the behaviour; it will just push it further underground. The only sustainable path forward is one that combines visibility, control, and cultural understanding.

And this is where the real issue lies. These “shadow” environments don’t exist because people are reckless. They exist because the centrally provided container environments are often not fit-for-purpose. The user experience is designed not by and for the end users, but by the central engineering team. The UX is often too cumbersome and technically complex, the onboarding process too slow, or the guardrails too restrictive. Developers and analysts want to get work done, not submit tickets and wait days for an environment that still doesn’t meet their needs. This is further compounded by the near unlimited examples of “docker run” or “docker compose” commands that come up on any google search for how to run a modern app.

To change that dynamic, IT must deliver a central platform that people want to use, something that feels flexible and intuitive while still being safe. You absolutely want a policy that prohibits the uncontrolled deployment of container platforms, but that policy only works if the sanctioned alternative actually meets user expectations. Either it needs to offer a secure, fully managed play-space for experimentation, or it needs to make decentralised deployments possible within clearly defined, baseline-secure parameters.

This is precisely where Portainer helps. It gives IT the tools to deliver both. Its user interface is designed for ease of use, providing a self-service experience that developers enjoy, while still allowing IT to define guardrails and enforce policy. And because Portainer can centrally manage decentralised Docker, Podman, or Kubernetes runtimes, it provides the visibility and control needed to bring shadow environments back under management, without killing the flexibility that made them appealing in the first place.

The hard facts are this… shadow container environments are not a technical failure; they are a usability failure. The fix is not to clamp down harder, but to offer something better. Portainer bridges that divide, allowing IT to empower users with freedom and speed, while ensuring the business remains secure, visible, and in control.