Cocoon Just Went Live: Decentralized, Privacy-First AI Inference for Developers
Updated on November 30, 2025
Cocoon decentralized AI inference network visualization
Cocoon is now live. It stands for Confidential Compute Open Network, and it is a TON-based network that connects apps to third party GPUs for AI inference, with a heavy focus on privacy.
If you are building AI features and you care about cost and user data, Cocoon is worth a look. This post covers what it is, what it offers, and where to start if you want to read the code and understand how it works.
Whether you’re building AI agent systems or working with no-code AI workflows, compute becomes a real constraint once you have real usage. Privacy can be a constraint too.
What is Cocoon? The Decentralized AI Marketplace
Cocoon is a decentralized AI computing network built on The Open Network (TON) blockchain. Telegram positions it as a marketplace for GPU computing power, so developers can buy inference without running their own fleet.
→ Cocoon Official WebsiteThe pitch is simple: instead of sending requests to a centralized provider, you can run models inside trusted execution environments. The goal is that the GPU operator can run the job, but cannot read your inputs or outputs.
In the Cocoon ecosystem:
- App developers plug into low-cost AI compute.
- GPU owners mine TON by powering the network.
- Users enjoy AI services with full privacy and confidentiality.
Why Developers Should Choose Cocoon
Cocoon is meant to be used from apps and backends. You send inference requests through the network and pay providers in TON.
Here is what that means in practice:
1. Maximum Privacy and Confidentiality
Hardware providers on the network process requests inside confidential virtual machines. Those environments are tied to image verification and smart contracts, so apps can check what is running.
User data is encrypted. The intent is that the GPU provider running the workload cannot access or extract the underlying data.
If your app handles sensitive text, documents, or user messages, that isolation matters. It is also a different trust model from typical cloud inference.
2. Low-Cost, Dynamic Compute Access
Inference gets expensive fast. With Cocoon, you buy compute in a marketplace that is set up to pick a price per request.
As an app developer, you pay GPU providers in TON for inference. Payments run on the TON blockchain.
This model can be helpful if you are a solo builder and want to test an idea without setting up infrastructure first.
3. Built for Decentralized Scale
As usage grows, requests can be spread across many providers instead of a single cluster. That is the main scaling idea here.
It is similar in spirit to Massively Decomposed Agentic Processes (MDAPs): break work up, then run it across many nodes.
Getting Started: The Cocoon GitHub Repo
If you want the details, start with the official repository. Even if you do not plan to run a worker, reading through the build scripts and docs will give you a clear picture of how Cocoon is put together today.
→ Cocoon GitHub RepositoryThe repository, TelegramMessenger/cocoon on GitHub, is licensed under the Apache-2.0 license and is mostly C++, CMake, and Python. It includes instructions for building and verifying the worker distribution from source.
If you care about reproducible builds and want to verify the confidential VM images, the repo includes scripts to rebuild the worker distribution from source. You do not need to do this just to follow along, but it is useful if you want to validate what is running.
Reproducible Build Instructions (Source Verification)
To reproduce the worker distribution from source, you can use the following scripts contained in the repository:
# 1. Build the VM image (reproducible)
./scripts/build-image prod
# 2. Generate distribution
./scripts/prepare-worker-dist ../cocoon-worker-dist
# 3. Verify the TDX image matches the published release
cd ../cocoon-worker-dist
sha256sum images/prod/{OVMF.fd,image.vmlinuz,image.initrd,image.cmdline}
# Compare with the published checksums
You can also generate model images in a similar way, which includes the hash and commit in the filename:
# 1. This will generate a model tar file with the full model name, which includes hash and commit.
./scripts/build-model Qwen/Qwen3-0.6B
# Compare with the published model name
If you like working from the terminal, tools like Warp’s AI Agent can help when you are running Docker and shell scripts.
Upcoming Developer Tools
The team also mentions more integration tooling on the way, including:
- A streamlined Docker-based solution for deploying your own client instance.
- A lightweight client library that will allow mobile and desktop apps to plug directly into COCOON.
Where Cocoon Fits
Cocoon is one more option if you want inference without sending raw user data to a centralized provider. If the confidential VM approach holds up in practice, it can be a reasonable middle ground between managed cloud inference and fully self-hosted.
When choosing infrastructure for your AI projects, consider how Cocoon compares to other approaches:
| Aspect | Cocoon | Centralized Cloud (AWS, GCP) | Self-Hosted |
|---|---|---|---|
| Privacy | Full encryption, confidential VMs | Provider has access | Full control |
| Cost Model | Dynamic marketplace pricing | Fixed pricing tiers | Hardware + maintenance |
| Scalability | Decentralized, auto-scaling | Managed scaling | Manual scaling |
| Setup effort | Moderate (API integration) | Low (managed services) | High (infrastructure) |
If you are evaluating different AI agent frameworks, Cocoon is one possible place to run inference without handing plaintext user data to a provider.
If you want a simple mental model, think of the GPU provider like a courier carrying a locked box. They can deliver it and prove they handled it, but they cannot open it and read what is inside.
Launching an AI tool? Here is a list of AI directories you can submit to, plus a quick basic SEO guide for getting the post-launch stuff right.