Apache™ Hadoop™ is an open source framework from Apache. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. It is very useful for analyzing and developing relationships for large unstructured datasets. Data processing in Hadoop is distributed across a cluster of computers using a simple programming model. For a complete reference on Hadoop, see hadoop.apache.org.

The core Hadoop project contains the following components:

  • Hadoop Common is the common utilities that support other Hadoop related subprojects.
  • Hadoop Distributed File System (HDFS) is the primary storage system used by Hadoop applications.
  • Hadoop MapReduce is the software framework for distributed processing of large unstructured datasets across a Hadoop cluster of computers.

Hadoop-based services for Microsoft Windows includes the core components and the following Hadoop related projects:

  • Hive is a data warehouse infrastructure that provides data summarization and ad hoc querying.
  • Pig is a high-level data-flow language and execution framework for parallel computation.

Microsoft provides Hadoop-based services packages for Windows and for Windows Azure. To get started using one of the packages click on a link below: