SIM (YC X25) Is Hiring the Best Engineers in San Francisco
Article URL: https://www.ycombinator.com/companies/sim/jobs/Rj8TVRM-software-engineer-platform
Comments URL: https://news.ycombinator.com/item?id=47128740
Points: 0
# Comments: 0
Article URL: https://www.ycombinator.com/companies/sim/jobs/Rj8TVRM-software-engineer-platform
Comments URL: https://news.ycombinator.com/item?id=47128740
Points: 0
# Comments: 0
In mid-February, the Department of Justice lost its head antitrust enforcer - just weeks before it was scheduled to argue one of the year's biggest anti-monopoly cases in court. Antitrust Division chief Gail Slater announced her departure suddenly, via a post on her personal X account. But to those who follow the agency closely, it […]
The team that sold their last app Dark Sky to Apple are back with Acme Weather, which offers alternative forecasts, rainbow and sunset alerts, and more.
Anthropic claims DeepSeek and two other Chinese AI companies misused its Claude AI model in an attempt to improve their own products. In an announcement on Monday, Anthropic says the "industrial-scale campaigns" involved the creation of around 24,000 fraudulent accounts and more than 16 million exchanges with Claude, as reported earlier by The Wall Street […]
"I Want to Wash My Car. The Car Wash Is 50 Meters Away. Should I Walk or Drive?" This question has been making the rounds as a simple AI logic test so I wanted to see how it holds up across a broad set of models. Ran 53 models (leading open-source, open-weight, proprietary) with no system prompt, forced choice between drive and walk, with a reasoning field.
On a single run, only 11 out of 53 got it right (42 said walk). But a single run doesn't prove much, so I reran every model 10 times. Same prompt, no cache, clean slate.
The results got worse. Of the 11 that passed the single run, only 5 could do it consistently. GPT-5 managed 7/10. GPT-5.1, GPT-5.2, Claude Sonnet 4.5, every Llama and Mistral model scored 0/10 across all 10 runs.
People kept saying humans would fail this too, so I got a human baseline through Rapidata (10k people, same forced choice): 71.5% said drive. Most models perform below that.
All reasoning traces (ran via Opper, my startup), full model breakdown, human baseline data, and raw JSON files are in the writeup for anyone who wants to dig in or run their own analysis.
Comments URL: https://news.ycombinator.com/item?id=47128138
Points: 54
# Comments: 43
The last few years of Xbox have been expensive. Under Phil Spencer's leadership, Microsoft has spent billions of dollars in an attempt to build an ambitious future for gaming that looks a lot like Netflix. And while its subscription service Game Pass started out as a good deal for gamers (although now not so much), […]
Article URL: https://forums.atariage.com/topic/380883-unix99-a-unix-like-os-for-the-ti-994a/
Comments URL: https://news.ycombinator.com/item?id=47127986
Points: 54
# Comments: 7
Anthropic accuses DeepSeek, Moonshot, and MiniMax of using 24,000 fake accounts to distill Claude’s AI capabilities, as U.S. officials debate export controls aimed at slowing China’s AI progress.
Uber Autonomous Solutions will see the company taking on all the tasks associated with operating a robotaxi, self-driving truck, or sidewalk delivery robot business.
This episode of Uncanny Valley covers the people resigning from AI companies and the humans getting hired by AI agents. Plus, we attend a soiree thrown by a conservative women's magazine.