TechBrief — بروزترین اخبار تکنولوژی

TechBrief — تازه‌ترین اخبار فناوری

مرجع روزانه خلاصهٔ اخبار و تحلیل‌های کوتاه از منابع معتبر.

آخرین خبرها

Show HN: Robust LLM Extractor for Websites in TypeScript

We've been building data pipelines that scrape websites and extract structured data for a while now. If you've done this, you know the drill: you write CSS selectors, the site changes its layout, everything breaks at 2am, and you spend your morning rewriting parsers.

LLMs seemed like the obvious fix — just throw the HTML at GPT and ask for JSON. Except in practice, it's more painful than that:

- Raw HTML is full of nav bars, footers, and tracking junk that eats your token budget. A typical product page is 80% noise. - LLMs return malformed JSON more often than you'd expect, especially with nested arrays and complex schemas. One bad bracket and your pipeline crashes. - Relative URLs, markdown-escaped links, tracking parameters — the "small" URL issues compound fast when you're processing thousands of pages. - You end up writing the same boilerplate: HTML cleanup → markdown conversion → LLM call → JSON parsing → error recovery → schema validation. Over and over.

We got tired of rebuilding this stack for every project, so we extracted it into a library.

Lightfeed Extractor is a TypeScript library that handles the full pipeline from raw HTML to validated, structured data:

- Converts HTML to LLM-ready markdown with main content extraction (strips nav, headers, footers), optional image inclusion, and URL cleaning - Works with any LangChain-compatible LLM (OpenAI, Gemini, Claude, Ollama, etc.) - Uses Zod schemas for type-safe extraction with real validation - Recovers partial data from malformed LLM output instead of failing entirely — if 19 out of 20 products parsed correctly, you get those 19 - Built-in browser automation via Playwright (local, serverless, or remote) with anti-bot patches - Pairs with our browser agent (@lightfeed/browser-agent) for AI-driven page navigation before extraction

We use this ourselves in production at Lightfeed, and it's been solid enough that we decided to open-source it.

GitHub: https://github.com/lightfeed/extractor npm: npm install @lightfeed/extractor Apache 2.0 licensed.

Happy to answer questions or hear feedback.


Comments URL: https://news.ycombinator.com/item?id=47526486

Points: 6

# Comments: 0

The least surprising chapter of the Manus story is what’s happening right now

Did anyone think there would not be a reckoning over this tie-up?

Intel and LG Display may have beaten Apple and Qualcomm with the best laptop battery life ever

One of the coolest laptops we saw at CES in January was the new Dell XPS 16, with a unique 1-120Hz variable refresh rate display that can sip power when you don't need the screen to stay speedy. Just how little power might it consume? Notebookcheck has tested a version of the laptop with that […]

False claims in a widely-cited paper

Article URL: https://statmodeling.stat.columbia.edu/2026/03/24/false-claims-in-a-published-no-corrections-no-consequences-welcome-to-the-business-school/

Comments URL: https://news.ycombinator.com/item?id=47525378

Points: 190

# Comments: 63

Mercor competitor Deccan AI raises $25M, sources experts from India

Deccan AI concentrates its workforce in India to manage quality in a fast-growing but fragmented AI training market.

Delve did the security compliance on LiteLLM, an AI project hit by malware

LiteLLM offers an AI open source project used by millions that was infected by credential harvesting malware.

Woman who never stopped updating her lost dog's chip reunites with him after 11y

Article URL: https://www.cbc.ca/radio/asithappens/11-year-dog-reunion-9.7140780

Comments URL: https://news.ycombinator.com/item?id=47524719

Points: 121

# Comments: 69

Show HN: A plain-text cognitive architecture for Claude Code

Article URL: https://lab.puga.com.br/cog/

Comments URL: https://news.ycombinator.com/item?id=47524704

Points: 52

# Comments: 19

"Disregard That" Attacks

Article URL: https://calpaterson.com/disregard.html

Comments URL: https://news.ycombinator.com/item?id=47524519

Points: 34

# Comments: 15

The best deals we’ve found from Amazon’s Big Spring Sale (so far)

Amazon loves to manufacture an event. March is historically a dry spell for deals; however, with Amazon’s third annual Big Spring Sale, which starts today and runs through March 31st, the retail behemoth is hoping to lure in would-be shoppers with the promise of steep(ish) savings and discounts on more seasonal, spring-centric items to hold […]

دسته‌بندی‌ها

معمولی: گجت‌ها، نرم‌افزار، امنیت، AI، استارتاپ