Introducing Uninews: A Universal News Scraper in Rust

The internet is overflowing with news, but extracting clean, readable content from articles can be a tedious task. Whether you’re aggregating news for personal consumption, research, or AI training, automating this process is a must. Enter Uninews, a powerful, lightweight, and efficient Rust-based news scraper that simplifies content extraction and conversion into Markdown format.

Uninews on crates.io

Uninews repo on github.com

What is Uninews?

Uninews is a universal news scraper that downloads an article from a given URL, cleans up the HTML, and formats the content into Markdown using OpenAI’s GPT-4o via CloudLLM. The final output is a structured JSON response containing:

  • Title of the article
  • Markdown-formatted content
  • Featured image URL

When used as a command-line tool, Uninews simply outputs the extracted Markdown, making it easy to read or integrate into your workflow.

Key Features

Smart Content Extraction: Targets <article> tags to get the main content, falling back to <body> if needed.

Clean Markdown Conversion: Uses GPT-4o (via CloudLLM) to generate clean, structured Markdown from raw HTML.

Reusable Rust Library: The universal_scrape function can be integrated into any Rust project.

Multilingual Support: Specify a language for the output, defaulting to English.

Installation

You need Rust and Cargo installed to get started.

Install via Cargo

cargo install uninews

Or Build from Source

git clone https://github.com/gubatron/uninews.git
cd uninews
make build
make install

Running Uninews

Before running Uninews, set your OpenAI API key:

export OPEN_AI_SECRET=sk-xxxxxxxxxxxxxxxxxxxxxxxxxx

Then, scrape a news article:

uninews https://example.com/news-article

You can also specify the output language:

uninews -l spanish https://example.com/news-article

Command-line Options

Usage: uninews [OPTIONS] <URL>

Arguments:
  <URL>  The URL of the news article to scrape

Options:
  -l, --language <LANGUAGE>  Output language (default: English)
  -h, --help                 Print help
  -V, --version              Print version

Integrating Uninews in Your Rust Project

Uninews can be used as a library to scrape news articles programmatically:

use uninews::{universal_scrape, Post};

// Scrape and convert a news article into Markdown
let post = universal_scrape("https://example.com/news", "english").await;
if !post.error.is_empty() {
    eprintln!("Error: {}", post.error);
    return;
}

println!("{}\n\n{}", post.title, post.content);

Make sure to set the OpenAI API key before calling universal_scrape:

std::env::set_var("OPEN_AI_SECRET", my_open_ai_secret);

Why Use Uninews?

🚀 Fast: Written in Rust for optimal performance.

🛠 Easy to Use: Simple CLI and library interface.

📖 Readable Output: Well-formatted Markdown conversion.

🔄 Reusable: Works as both a command-line tool and a Rust library.

License

Uninews is open-source and licensed under MIT License.

Copyright (c) 2025 Ángel León.

Leave a Reply

Your email address will not be published. Required fields are marked *