Back to Portfolio

The News Hub

Real-Time News Engine and InsightsView SourceCheck Webpage

About

The News Hub is a comprehensive news aggregation and analysis platform driven by AI. It serves as a centralized system to collect, process, and interactively explore global news content efficiently. Powered by a dual-source ingestion pipeline (Finlight API + LangGraph/Crawl4AI scraping engine) orchestrated via Airflow, articles are ingested and flow into an unsupervised clustering engine (PCA + K-Means + GPT-4o labelling) that automatically surfaces trending themes. A RAG pipeline over ChromaDB then enables natural language querying through AskHub, synthesising grounded answers from the live news corpus in real time.

Technologies Used

PythonAirflowSklearnDockerChromaDBMongoDBLangchainFastAPIOpenAINext.js

Real-Time News Aggregation

The system continuously ingests global news and aggregates it into an interactive dashboard, allowing users to filter content by source and topic.

Real-Time News Aggregation - Image 1

Hot-Topics, Trends and Insights

Users can explore trends and insights by any desired period. Topics are generated dynamically based on the content of the articles using ML algorithms.

Hot-Topics, Trends and Insights - Image 1

AskHub - A RAG used to explore the news

AskHub is a Retrieval-Augmented Generation system that uses a knowledge base of news articles to answer questions about the news. It uses a combination of natural language processing and semantic search to generate responses to questions about the news.

AskHub - A RAG used to explore the news - Image 1