The News Hub
About
The News Hub is a comprehensive news aggregation and analysis platform driven by AI. It serves as a centralized system to collect, process, and interactively explore global news content efficiently. Powered by a dual-source ingestion pipeline (Finlight API + LangGraph/Crawl4AI scraping engine) orchestrated via Airflow, articles are ingested and flow into an unsupervised clustering engine (PCA + K-Means + GPT-4o labelling) that automatically surfaces trending themes. A RAG pipeline over ChromaDB then enables natural language querying through AskHub, synthesising grounded answers from the live news corpus in real time.
Technologies Used
Real-Time News Aggregation
The system continuously ingests global news and aggregates it into an interactive dashboard, allowing users to filter content by source and topic.

Hot-Topics, Trends and Insights
Users can explore trends and insights by any desired period. Topics are generated dynamically based on the content of the articles using ML algorithms.

AskHub - A RAG used to explore the news
AskHub is a Retrieval-Augmented Generation system that uses a knowledge base of news articles to answer questions about the news. It uses a combination of natural language processing and semantic search to generate responses to questions about the news.
