Friday, November 25, 2011

It's a lot of HADOOPapla these Days!

Hadoop is a fault-tolerant distributed system for data storage. It is highly scalable and useful for data (often unstructured data) beyond that are best stored in RDBMS . The scalability is the result of a Self-Healing High Bandwith Clustered Storage , known by the acronym of HDFS (Hadoop Distributed File System) and a specific fault-tolerant Distributed Processing, known as MapReduce.








Recently, Director of IT at JP Morgan Chase (JPMC) said, Hadoop allows them to store data that they never stored before. It can include Web logs, TX data  and social media related data. While enterprise wide security concerns still prevail for Hadoop, it is slowly growing its footprint.  As in case of JPMC, it is being used for fraud detection and IT risk management. Aggregated forms of such data also feeds data mining and other advanced analytics tools. eBay uses Hbase database for Hadoop. 


HBase is an open source, non-relational, distributed database modeled after Google's BigTable. It is written using Java. It runs on top of HDFS and provides BigTable-like capabilities for Hadoop. Hence, it provides a fault-tolerant way of storing large quantities of sparse data. For eBay, it helps to build a new search engine for its auction use. They have code named this Cassini.  eBay handles 2 billion site views amongst its 97 million active buyers and sellers. EBay has dedicated over 100 engineers to this project. 


In light of these developments, it was no surprise that Oracle stepped in a big way in to this space with the Oracle Big Data Appliance. 

No comments: