TechBrief — بروزترین اخبار تکنولوژی

TechBrief — تازه‌ترین اخبار فناوری

مرجع روزانه خلاصهٔ اخبار و تحلیل‌های کوتاه از منابع معتبر.

آخرین خبرها

SIM (YC X25) Is Hiring the Best Engineers in San Francisco

Article URL: https://www.ycombinator.com/companies/sim/jobs/Rj8TVRM-software-engineer-platform

Comments URL: https://news.ycombinator.com/item?id=47128740

Points: 0

# Comments: 0

Will Trump’s DOJ actually take on Ticketmaster?

In mid-February, the Department of Justice lost its head antitrust enforcer - just weeks before it was scheduled to argue one of the year's biggest anti-monopoly cases in court. Antitrust Division chief Gail Slater announced her departure suddenly, via a post on her personal X account. But to those who follow the agency closely, it […]

Ex-Apple team launches Acme Weather, a new take on weather forecasting

The team that sold their last app Dark Sky to Apple are back with Acme Weather, which offers alternative forecasts, rainbow and sunset alerts, and more.

Anthropic accuses DeepSeek and other Chinese firms of using Claude to train their AI

Anthropic claims DeepSeek and two other Chinese AI companies misused its Claude AI model in an attempt to improve their own products. In an announcement on Monday, Anthropic says the "industrial-scale campaigns" involved the creation of around 24,000 fraudulent accounts and more than 16 million exchanges with Claude, as reported earlier by The Wall Street […]

"Car Wash" test with 53 models

"I Want to Wash My Car. The Car Wash Is 50 Meters Away. Should I Walk or Drive?" This question has been making the rounds as a simple AI logic test so I wanted to see how it holds up across a broad set of models. Ran 53 models (leading open-source, open-weight, proprietary) with no system prompt, forced choice between drive and walk, with a reasoning field.

On a single run, only 11 out of 53 got it right (42 said walk). But a single run doesn't prove much, so I reran every model 10 times. Same prompt, no cache, clean slate.

The results got worse. Of the 11 that passed the single run, only 5 could do it consistently. GPT-5 managed 7/10. GPT-5.1, GPT-5.2, Claude Sonnet 4.5, every Llama and Mistral model scored 0/10 across all 10 runs.

People kept saying humans would fail this too, so I got a human baseline through Rapidata (10k people, same forced choice): 71.5% said drive. Most models perform below that.

All reasoning traces (ran via Opper, my startup), full model breakdown, human baseline data, and raw JSON files are in the writeup for anyone who wants to dig in or run their own analysis.


Comments URL: https://news.ycombinator.com/item?id=47128138

Points: 54

# Comments: 43

Billions of dollars later and still nobody knows what an Xbox is

The last few years of Xbox have been expensive. Under Phil Spencer's leadership, Microsoft has spent billions of dollars in an attempt to build an ambitious future for gaming that looks a lot like Netflix. And while its subscription service Game Pass started out as a good deal for gamers (although now not so much), […]

UNIX99, a UNIX-like OS for the TI-99/4A

Article URL: https://forums.atariage.com/topic/380883-unix99-a-unix-like-os-for-the-ti-994a/

Comments URL: https://news.ycombinator.com/item?id=47127986

Points: 54

# Comments: 7

Anthropic accuses Chinese AI labs of mining Claude as US debates AI chip exports

Anthropic accuses DeepSeek, Moonshot, and MiniMax of using 24,000 fake accounts to distill Claude’s AI capabilities, as U.S. officials debate export controls aimed at slowing China’s AI progress.

Uber’s new autonomous vehicle division is about survival and opportunity

Uber Autonomous Solutions will see the company taking on all the tasks associated with operating a robotaxi, self-driving truck, or sidewalk delivery robot business.

Uncanny Valley: AI Researchers’ Resignations, Bots Hiring Humans, Evie Magazine’s Party

This episode of Uncanny Valley covers the people resigning from AI companies and the humans getting hired by AI agents. Plus, we attend a soiree thrown by a conservative women's magazine.

دسته‌بندی‌ها

معمولی: گجت‌ها، نرم‌افزار، امنیت، AI، استارتاپ