I built OpenSwarm because I wanted an autonomous “AI dev team” that can actually plug into my real workflow instead of running toy tasks.
OpenSwarm orchestrates multiple Claude Code CLI instances as agents to work on real Linear issues. It:
• pulls issues from Linear and runs a Worker/Reviewer/Test/Documenter pipeline
• uses LanceDB + multilingual-e5 embeddings for long‑term memory and context reuse
• builds a simple code knowledge graph for impact analysis
• exposes everything through a Discord bot (status, dispatch, scheduling, logs)
• can auto‑iterate on existing PRs and monitor long‑running jobs
Right now it’s powering my own solo dev workflow (trading infra, LLM tools, other projects). It’s still early, so there are rough edges and a lot of TODOs around safety, scaling, and better task decomposition.
I’d love feedback on:
• what feels missing for this to be useful to other teams
• failure modes you’d be worried about in autonomous code agents
• ideas for better memory/knowledge graph use in real‑world repos
Repo: https://github.com/Intrect-io/OpenSwarm
Happy to answer questions and hear brutal feedback.
Comments URL: https://news.ycombinator.com/item?id=47160980
Points: 8
# Comments: 0
Salesforce reported a solid year-end earnings and then pulled out all the stops to ward off more talk of the death of its business to AI.
I've been building ZSE (Z Server Engine) for the past few weeks — an open-source LLM inference engine focused on two things nobody has fully solved together: memory efficiency and fast cold starts.
The problem I was trying to solve:
Running a 32B model normally requires ~64 GB VRAM. Most developers don't have that. And even when quantization helps with memory, cold starts with bitsandbytes NF4 take 2+ minutes on first load and 45–120 seconds on warm restarts — which kills serverless and autoscaling use cases.
What ZSE does differently:
Fits 32B in 19.3 GB VRAM (70% reduction vs FP16) — runs on a single A100-40GB
Fits 7B in 5.2 GB VRAM (63% reduction) — runs on consumer GPUs
Native .zse pre-quantized format with memory-mapped weights: 3.9s cold start for 7B, 21.4s for 32B — vs 45s and 120s with bitsandbytes, ~30s for vLLM
All benchmarks verified on Modal A100-80GB (Feb 2026)
It ships with:
OpenAI-compatible API server (drop-in replacement)
Interactive CLI (zse serve, zse chat, zse convert, zse hardware)
Web dashboard with real-time GPU monitoring
Continuous batching (3.45× throughput)
GGUF support via llama.cpp
CPU fallback — works without a GPU
Rate limiting, audit logging, API key auth
Install:
-----
pip install zllm-zse
zse serve Qwen/Qwen2.5-7B-Instruct
For fast cold starts (one-time conversion):
-----
zse convert Qwen/Qwen2.5-Coder-7B-Instruct -o qwen-7b.zse
zse serve qwen-7b.zse # 3.9s every time
The cold start improvement comes from the .zse format storing pre-quantized weights as memory-mapped safetensors — no quantization step at load time, no weight conversion, just mmap + GPU transfer. On NVMe SSDs this gets under 4 seconds for 7B. On spinning HDDs it'll be slower.
All code is real — no mock implementations. Built at Zyora Labs. Apache 2.0.
Happy to answer questions about the quantization approach, the .zse format design, or the memory efficiency techniques.
Comments URL: https://news.ycombinator.com/item?id=47160526
Points: 20
# Comments: 1
Gushwork has raised $9 million in a seed round led by SIG and Lightspeed. The startup has seen early customer traction from AI search tools like ChatGPT.
Seattle-based Vercept developed complex agentic tools, including a computer-use agent that could complete tasks inside applications like a person with a laptop would.
The Drop store, which was acquired by gaming gear giant Corsair in 2023, was a haven for mechanical keyboard enthusiasts and audiophiles to discover and buy hard-to-find gear - sometimes at surprisingly good prices. The company will cease sales after March 25th at 11:59PM PT, which is also the cut off to redeem Drop Rewards. […]
New York Attorney General Letitia James is suing Valve for "illegally promoting gambling" through the loot box systems it has built for video games like Counter-Strike 2, Team Fortress 2, and Dota 2, according to a press release. The attorney general seeks to "permanently stop Valve from promoting gambling features in its games, disgorge all […]
"The demand for tokens in the world has gone completely exponential," Nvidia CEO Jensen Huang said about the company's earnings.