TechBrief — بروزترین اخبار تکنولوژی

TechBrief — تازه‌ترین اخبار فناوری

مرجع روزانه خلاصهٔ اخبار و تحلیل‌های کوتاه از منابع معتبر.

آخرین خبرها

Meta is actually keeping its VR metaverse running, for now

Meta is reversing its plans to shut down its VR metaverse - sort of. On Monday, the company announced that it would be shutting down the VR version of its 3D social platform Horizon Worlds on June 15th in favor of a new focus on the mobile version of the app. But in a Wednesday […]

DoorDash launches a new ‘Tasks’ app that pays couriers to submit videos to train AI

Delivery couriers will be able to earn money by completing activities like filming everyday tasks or recording themselves speaking in another language.

ChatGPT’s ‘Adult Mode’ Could Spark a New Era of Intimate Surveillance

OpenAI plans to allow sexting with ChatGPT. A human-AI interaction expert warns of a privacy nightmare.

Launch HN: Canary (YC W26) – AI QA that understands your code

Hey HN! We're Aakash and Viswesh, and we're building Canary (https://www.runcanary.ai). We build AI agents that read your codebase, figure out what a pull request actually changed, and generate and execute tests for every affected user workflow.

Aakash and I previously built AI coding tools at Windsurf, Cognition, and Google. AI tools were making every team faster at shipping, but nobody was testing real user behavior before merge. PRs got bigger, reviews still happened in file diffs, and changes that looked clean broke checkout, auth, and billing in production. We saw it firsthand. We started Canary to close that gap. Here's how it works:

Canary starts by connecting to your codebase and understands how your app is built: routes, controllers, validation logic. You push a PR and Canary reads the diff, understands the intent behind the changes, then generates and runs tests against your preview app checking real user flows end to end. It comments directly on the PR with test results and recordings showing what changed and flagging anything that doesn't behave as expected. You can also trigger specific user workflow tests via a PR comment.

Beyond PR testing, tests generated from the PR can be moved into regression suites. You can also create tests by just prompting what you want tested in plain English. Canary generates a full test suite from your codebase, schedules it, and runs it continuously. One of our construction tech customers had an invoicing flow where the amount due drifted from the original proposal total by ~$1,600. Canary caught the regression in their invoice flow before release.

This isn't something a single family of foundation models can do on its own. QA spans across many modalities like source code, DOM/ARIA, device emulators, visual verifications, analyzing screen recordings, network/console logs, live browser state etc. for any single model to be specialized in. You also need custom browser fleets, user sessions, ephemeral environments, on-device farms and data seeding to run the tests reliably. On top of that, catching second-order effects of code changes requires a specialized harness that breaks the application in multiple possible ways across different types of users that a normal happy path testing flow wouldn't.

To measure how well our purpose built QA agent works, we published QA-Bench v0, the first benchmark for code verification. Given a real PR, can an AI model identify every affected user workflow and produce relevant tests? We tested our purpose-built QA agent against GPT 5.4, Claude Code (Opus 4.6), and Sonnet 4.6 across 35 real PRs on Grafana, Mattermost, Cal.com, and Apache Superset on three dimensions: Relevance, Coverage, and Coherence. Coverage is where the gap was largest. Canary leads by 11 points over GPT 5.4, 18 over Claude Code, and 26 over Sonnet 4.6. For full methodology and per-repo breakdowns give our benchmark report a read: https://www.runcanary.ai/blog/qa-bench-v0

You can check out the product demo here: https://youtu.be/NeD9g1do_BU

We'd love feedback from anyone working on code verification or thinking about how to measure this differently.


Comments URL: https://news.ycombinator.com/item?id=47441629

Points: 3

# Comments: 0

After 25 years, Valve reworks Counter-Strike's reload system

Full-magazine reloads throw out muscle memory in favor of "higher stakes" decisions.

Show HN: Three new Kitten TTS models – smallest less than 25MB

Kitten TTS (https://github.com/KittenML/KittenTTS) is an open-source series of tiny and expressive text-to-speech models for on-device applications. We had a thread last year here: https://news.ycombinator.com/item?id=44807868.

Today we're releasing three new models with 80M, 40M and 14M parameters.

The largest model (80M) has the highest quality. The 14M variant reaches new SOTA in expressivity among similar sized models, despite being <25MB in size. This release is a major upgrade from the previous one and supports English text-to-speech applications in eight voices: four male and four female.

Here's a short demo: https://www.youtube.com/watch?v=ge3u5qblqZA.

Most models are quantized to int8 + fp16, and they use ONNX for runtime. Our models are designed to run anywhere eg. raspberry pi, low-end smartphones, wearables, browsers etc. No GPU required! This release aims to bridge the gap between on-device and cloud models for tts applications. Multi-lingual model release is coming soon.

On-device AI is bottlenecked by one thing: a lack of tiny models that actually perform. Our goal is to open-source more models to run production-ready voice agents and apps entirely on-device.

We would love your feedback!


Comments URL: https://news.ycombinator.com/item?id=47441546

Points: 26

# Comments: 3

Lina Khan was right

In 2021, the virtual world was the future of the internet. The pandemic had sequestered everyone indoors, heightening the appeal of digital communities. Facebook rebranded to Meta - a sign of the tech giant's investment in and commitment to the metaverse as the future of the internet. Despite losing billions in VR, Meta released an […]

Waymo hits 170 million miles while avoiding serious mayhem

Waymo says its autonomous vehicles have now traveled over 170 million miles while continuing to avoid serious crashes and injuries at a rate much better than human drivers. The company updated its online safety hub to reflect the new driving figures. But despite these successes, some safety advocates are raising questions about how the company […]

Harlowe has a cheaper solution for lighting 360-degree shoots

Companies like Bushman already sell omnidirectional camera lights for 360-degree shoots, but they start at over $300, which is more than half the price of popular 360-degree cameras from companies like DJI and Insta360. Harlowe, a company known for lighting accessories that cater to influencers and amateurs, has released a $95 alternative called the Omni […]

Study pinpoints when bow and arrow came to North America

Radiocarbon results suggest a single origin and rapid diffusion through cultural transition networks.

دسته‌بندی‌ها

معمولی: گجت‌ها، نرم‌افزار، امنیت، AI، استارتاپ