Oprah Winfrey's video podcast, Book Club, and Favorite Things are headed to Amazon, according to reports from The New York Times and Variety. Starting in July, The Oprah Podcast will get new episodes twice per week, instead of once, debuting across Amazon Prime Video, Amazon Music, Audible, and Fire TV channels. The show will still […]
China has ordered Meta to unwind its multibillion-dollar Manus acquisition, dealing a potential setback to Zuckerberg’s push into AI agents.
Within hours of an armed gunman's attempt to enter the White House Correspondents' Dinner, attended by top administration officials and hundreds of journalists, President Donald Trump did what he does best: use the assassination attempt to defend his ballroom project. During a White House press conference just hours after he and several cabinet members were […]
The phone could go in mass production in 2028, an analyst says.
The American technology giant provides water and energy monitoring and utility meters to hundreds of millions of homes and businesses.
This burgeoning wearable tech category lets you talk to an AI assistant, listen to music, or check out a display screen right from the comfort of your very own face.
Scored 65.2% vs google's official 47.8%, and the existing top closed source model Junie CLI's 64.3%.
Since there are a lot of reports of deliberate cheating on TerminalBench 2.0 lately (https://debugml.github.io/cheating-agents/), I would like to also clarify a few things
1. Absolutely no {agents/skills}.md files were inserted at any point. No cheating mechanisms whatsoever
2. The cli agent was run in leaderboard compliant way (no modification of resources or timeouts)
3. The full terminal bench run was done using the fully open source version of the agent, no difference between what is on github and what was run.
I was originally going to wait for it to land on the leaderboard, but it has been 8 days and the maintainers do not respond unfortunately (there is a large backlog of the pull requests on their HF) so I decided to post anyways.
HF PR: https://huggingface.co/datasets/harborframework/terminal-ben...
It is astounding how much the harness matters, based on this and other experiments I have done.
Comments URL: https://news.ycombinator.com/item?id=47920787
Points: 42
# Comments: 13
The smart lighting company is having a busy month. After releasing its first solar-powered lights, a cordless table lamp, and an updated LED light wall over the past few weeks, Govee has announced a new multicolor ceiling light. Available starting today through the company's online store and Amazon for $249.99, the Govee Ceiling Light Ultra […]
The wide foldable phone that Samsung is reportedly developing is expected to arrive later this year, and now we may have some idea of what it will look like. Leaker and journalist Sonny Dickson has shared images online of what he says are dummy units of Samsung's upcoming Galaxy foldables, including the "Z Fold 8 […]