Big Data and Hadoop evolution

2003

  • October: Google publishes the Google File System (GFS) paper, introducing a scalable distributed file system designed for large data-intensive applications.

2004

  • December: Google releases the MapReduce paper, detailing a programming model for processing large datasets with a distributed algorithm on a cluster.

2005

  • April: Doug Cutting and Mike Cafarella initiate the Hadoop project, inspired by Google's GFS and MapReduce papers.

2006

  • April 1: The first official release of Apache Hadoop (version 0.1.0) is made available.
  • November: Google introduces BigTable, a distributed storage system for managing structured data.

2008

  • February 19: Yahoo! launches the world's largest Hadoop production application, the Yahoo! Search Webmap, running on a Linux cluster with over 10,000 cores.

2010

  • May: Facebook announces that it has the largest Hadoop cluster in the world, storing 21 petabytes of data.
  • June: Google publishes the Dremel paper, introducing an interactive analysis system for large datasets.

2012

  • June: Facebook's Hadoop cluster grows to 100 petabytes, highlighting the scalability of Hadoop.
  • October: Apache Hadoop YARN (Yet Another Resource Negotiator) is introduced, enhancing resource management and job scheduling capabilities.

2013

  • March: Cloudera releases Impala, a high-performance, low-latency SQL query engine for Hadoop.
  • July: Apache Tez is released, providing a framework for building high-performance batch and interactive data processing applications.

2014

  • March: Cloudera signs cooperation agreements with Teradata and Red Hat to enhance big data solutions.

2015

  • June: Apache Spark becomes a top-level project, offering a fast and general-purpose cluster-computing system compatible with Hadoop data.

2016

  • March: The Apache Software Foundation announces the availability of Apache Hadoop v2.7.3, featuring significant enhancements and bug fixes.

2017

  • December: Apache Hadoop 3.0.0 is released, introducing support for erasure coding, improvements to HDFS, and other major features.

2018

  • April: Apache Hadoop 3.1.0 is released, bringing enhancements in scalability and storage efficiency.

2019

  • January: Apache Hadoop 3.2.0 is released, providing new features and improvements for better performance and stability.

2020

  • July: Apache Hadoop 3.3.0 is released, offering support for new features and optimizations.

2021

  • August: Apache Hadoop 3.3.1 is released, including various bug fixes and enhancements.

2022

  • July: Apache Hadoop 3.3.3 is released, providing stability improvements and minor feature updates.

2023

  • June: Apache Hadoop 3.3.6 is released, featuring critical security updates and performance enhancements.

2024

  • March: Apache Hadoop 3.4.0 is released, introducing new features and improvements for modern data processing needs.

This timeline highlights the significant milestones in the development and evolution of Big Data and Hadoop technologies, reflecting their growth and adaptation to the ever-changing data landscape.