Friday, November 25, 2011

It's a lot of HADOOPapla these Days!

Hadoop is a fault-tolerant distributed system for data storage. It is highly scalable and useful for data (often unstructured data) beyond that are best stored in RDBMS . The scalability is the result of a Self-Healing High Bandwith Clustered Storage , known by the acronym of HDFS (Hadoop Distributed File System) and a specific fault-tolerant Distributed Processing, known as MapReduce.








Recently, Director of IT at JP Morgan Chase (JPMC) said, Hadoop allows them to store data that they never stored before. It can include Web logs, TX data  and social media related data. While enterprise wide security concerns still prevail for Hadoop, it is slowly growing its footprint.  As in case of JPMC, it is being used for fraud detection and IT risk management. Aggregated forms of such data also feeds data mining and other advanced analytics tools. eBay uses Hbase database for Hadoop. 


HBase is an open source, non-relational, distributed database modeled after Google's BigTable. It is written using Java. It runs on top of HDFS and provides BigTable-like capabilities for Hadoop. Hence, it provides a fault-tolerant way of storing large quantities of sparse data. For eBay, it helps to build a new search engine for its auction use. They have code named this Cassini.  eBay handles 2 billion site views amongst its 97 million active buyers and sellers. EBay has dedicated over 100 engineers to this project. 


In light of these developments, it was no surprise that Oracle stepped in a big way in to this space with the Oracle Big Data Appliance. 

Tuesday, November 22, 2011

BIWA SIG Techcast : Oracle Enterprise "R" Nov 30, noon EST


Subject: Webinar: Using Oracle R Enterprise -- Nov 30, noon EST
Date: November 22, 2011 11:55:15 AM EST

Analytic friends: Happy Thanksgiving!!

Oracle BIWA SIG is hosting a webinar next week that I think might interest many of you.  Here's the info about it.


==========================================
Webinar: Using R within Oracle -- Nov 30, noon EST
==========================================
Oracle now supports the R open source statistical programming language. Come to this webinar to learn more about using R within an Oracle environment.

-- URL for TechCast: https://stbeehive.oracle.com/bconf/confDetails?confID=334B:3BF0:owch:38893C00F42F38A1E0404498C8A6612B0004075AECF7&guest=true&confKey=608880
-- Web Conference ID: 303397
-- Web Conference Key: 608880
-- Dialup: 1-866-682-4770, ID 5548204, passcode 1234

After a steady rise in the past few years, in 2010 the open source data mining software R overtook other tools to become the tool used by more data miners (43%) than any other (http://www.rexeranalytics.com/Data-Miner-Survey-Results-2010.html).

Several analytic tool vendors have added R-integration to their software. However, Oracle is the largest company to throw their weight behind R. On October 3, Oracle unveiled their integration of R: Oracle R Enterprise (http://www.oracle.com/us/corporate/features/features-oracle-r-enterprise-498732.html) as part of their Oracle Big Data Appliance announcement (http://www.oracle.com/us/corporate/press/512001).

Oracle R Enterprise allows users to perform statistical analysis with advanced visualization on data stored in Oracle Database. Oracle R Enterprise enables scalable R solutions, while facilitating production deployment of R scripts and Hadoop based solutions, as well as integration of R results with Oracle BI Publisher and OBIEE dashboards.

This TechCast introduces the various Oracle R Enterprise components and features, along with R script demonstrations that interface with Oracle Database.

TechCast presenter: Mark Hornick, Senior Manager, Oracle Advanced Analytics Development.
This TechCast is part of the ongoing TechCasts series coordinated by Oracle BIWA: The BI, Warehousing and Analytics SIG (http://www.oracleBIWA.org).