Sunday, November 1, 2009

Book Review - OWB 11g - Getting Started (Bob Griesemer)

One of the privileges of being in the BIDW field for long enough, is to do book reviews. Currently, I am looking at the book by Packt Publishing, titled "Oracle Warehouse Builder 11g Getting Started." The author is Bob Griesemer.

This book has 9 interesting chapters, however I am jumping to Ch5 for now as it is the Extract, Transform and Load Basics. Someone with a weaker ETL background could reach this first. The interesting section of this chapter is "To Stage or not to Stage." The author talks about key considerations in ETL, such as:

> For faster movement of data, amount of source data, degree of manipulation of source data and nature of source is important; AND

> Handling failures due to connectivity and handling changes in the source data.

The author has presented options for extracting data directly into staging without worrying about computations in the source system or extract along with some manipulation from the source. He brings up a good point that creating a performance load on the source system may not be desirable compare to loading the ETL or the staging database. Another good point, that author raises is that we should look at staging as even the flat files, extracted from source, organised in certain folders, so that OWB can use them without taking space in the DW. I think this is an interesting way to look at staging.

The use of flat files is not a new concept, as often when fetching data from legacy such as main frames that do not provide direct connectivity to ETL tools, we use flat files. This way the DW team does not have to become an expert on the structures of the legacy system. Likewise, the Oracle Business Intelligence Applications, also support the concept of Universal adapters to interface with flat files when the source system is not directly supported with pre-built ETL connectors. However, the onus of change management of the source, may be little more involved.

Saturday, October 31, 2009

De-Mystifying the "Cloud"



Cloud Computing is drawing attention from almost all the CIO's. Is it another fad or is the Cloud "real." Since the field is evolving, let us look at some of the details here.

In general Cloud Computing allows users and organizations to use computing power on demand just like water or electricity from the grid. Users can request computing power, storage, enterprise applications or even databases from such Cloud services providers. These Cloud Services providers will act as data centers, except they are no longer privately owned by each company. I simpler analogy would be single family home v/s a condominium with multiple tenants who share the common facilities like the hallway, elevator, pool and the club house. The users of Cloud pay only for the computing or storage they use. Thus, companies no longer need to create their own data centers thus do not need to buy servers and massive storage arrays.

Cloud Computing is not really something obscure, a lot of us are actually using it in daily lives perhaps not realizing that. For instance, Gmail, Twitter, Skype and YouTube are all built on the Cloud Computing technology. All these collaborative applications are delivering services to PC's and mobile devices. The users do not have to worry about the "location" of these service providing computers and storage. Hence, a key concept of Cloud Computing is Virtualization. Virtualization allows a single powerful computer to function as multiple virtual servers running a diverse range of operating systems and different applications. This is the key concept as physical computers are not dedicated to single or a group of related users, say from a small company.

Are all organizations ready to embrace the public Cloud? There is also a concept of Internal Cloud where the Cloud infrastructure sits inside the corporate firewall. The BigBlue (IBM) is a promoter of Internal Cloud or Private Clouds. In this approach, typically a large company would create an internal cloud to meet the diverse software application needs of its user base. No single server or SAN is dedicated to a unique application and user groups or departments use computing resources on demand from this internal cloud.

If you are an individual user or a small to mid-sized company, how can you get a first hand feel of the Cloud Computing. You can test drive services like Amazon Elastic Compute Cloud or EC2. Developers can buy computing power and storage as they need it. IBM provides Cloudburst as a hardware and software package to allow organizations to build their private cloud. In others words it is a "cloud in a box" approach. Companies often invest in storage space for data backup. EMC provides Decho or digital echo to backup data-storage cloud. Microsoft cloud play is called
Azure and is comparable to Amazon's EC2.

Cloud Computing is not an altogether new IT concept. We have known SaaS or Software as a Service, Grid computing, remote hosting (webservers, ftp sites etc) and different kinds of utility computing. "Cloud computing is really a culmination of many technologies such as grid computing, utility computing, SOA, Web 2.0, and other technologies." While a precise definition is still being debated, Open Cloud Manifesto was signed in 2009 by companies like IBM, EMC, Boeing etc. The goal is to put the CIO's and IT leaders at ease to make it easier for end customers and developers to plan transition to cloud services.

The origin of the term Cloud Computing is from the metaphor that computer architects often create graphic designs to show flow of information. When the source of information in such diagrams, is external or unknown, often cloud is used to depict that. Will the Cloud Computing be disruptive technology? It seems to be evolving that way. Today Google Docs seems to be going after Microsoft Office, Salesforce.com is giving Larry Ellison's Oracle CRM a run for money. Likewise, Mozy.com is competing with EMC's remote storage capability and Amazon EC2 may hit the sales of Intel servers for private data centers.

Is Cloud Computing free of any silver bullets? On Aug 31, Google's GMail server was down for a while, it seemed to bring up the question of risks associated with over indulgence in the Cloud computing model. Let's us see how it plays out over the next few months and as we get into 2010!

[ This blog posting was inspired by the Wall Street Journal Article titled How Well Do You Know... the Cloud?]

Wednesday, October 14, 2009

OOW Wed

The keynote sessions have started and CEO of Infosys S (Kris) Golapakrishnan is now speaking on innovation and how he create a culture of innovation in his company. Each department has to come up with at least 2 new innovations for productivity boost every year. Every building looks different in Infosys, some even like flying saucer. Thus the culture of innovation has to be ubiquitous.

Kris is explaining the need for simplifying Organizational complexity. They operate in 27 cities but airline booking is done from one city (Bangalore). Anyone can email Kris (Ask Kris) directly, he gets about 600-800 emails a week and directly responds or are posted in company blog.

Kris saus we often use 2000 year old learning techniques..."Learning through collaboration and personalization delivered at their own pace" is most effective" Infosys uses web based training delivery internally to achieve these.

ICICI bank of India is power by Infosys solution and is an exaple of branchless bank, an example of IT led banking. Today Infosys has 62% revenues from North America, 25% from Europe and rest from rest of the world. However, the company envisions that in longer run it will be spread as 1/3 from each market as the most growth is in the rest of the world like India, China and Latin America for IT services.

The next event is Larry's keynote, and as usual what will be the new announcement? He started talking about open source Linux. He will talk about Exadata V2, a tool using data mining for discovering problems proactively.

Oracle VM is being used by companies like Dell, BT etc are using it. 65% of Linux users are running Oracle DB on Oracle Enterprise Unix, per survey by HP.

Larry is now talking about SUN Oracle DB Machine (Exadata). However, the machine he talked about on Sunday for TPCC benchmark, that is a SPARC Solaris machine running Oracle DB. Larry is presenting customer testimonials from Exadata V1, where the performance improvements are over the older configurations in their organization.

Larry is explaining that fast computers like Teradata, Neteeza are mainly for Data Warehousing only but not for OLTP or transactional system. The use of Flash Memory allows high speed random I/O.

The memory hierarchy in the Exadata V2 allows high speed random hierarchy, as there is no need to spin the disks for seek, on Solid State memory. The Exadata is meant to be fault tolerant with redundancy at every level.

Flash is slower than DRAM, but it is used to keep most of the database itself in the Flash memory. The use of compression allows 3X the size of DB on disk by use of OLTP compression. Thus a full 15TB database can reside in a single RACK of Exadata, theoretically. The RACK can do 1 million I/O's per sec.

Exadata is compared to IBM - 8 IBM DC8300 Turbo (76 racks) with cost of about $10.7 million. The capacity of two Exadata RACKS is compared here. Larry is now introducing the Hybrid Columnar Compression for Data Warehousing applications only.

In memory parallel query execution feature is used to compare with Parcel and other in-memory databases. The grid architecture helps to make the Exadata faster, allowing multiple CPU's to be used. The Flash and the Compression helps to improve the speed.

Porting applications to DB machine - "runs existing applications unchanged" However, we saw in customer panel from the V1 customers, that they had to leave behind the indexes and then add back some of the indexes. However, partitioning remains key in DW workloads. Larry is now showing the price without the software costs like the DB cost per node and Exadata software or Compression (OLTP).

So far no new announcements from Larry today....Arnold is too funny so I am not blogging any more... just Tweeting...

shyamvaran #OOW09 Who needs the $10m more than State of CA & Arnold at this time..may be will buy him the new house if he makes 2 much fun of his wife!
less than 5 seconds ago from web

shyamvaran RT @eyesonopen Here comes arnold schwarzenegger. Maybe state of CA wants to grab that 10 million dollar prize. He's actually hilarious.
2 minutes ago from web

debralilley Very funny, cell phones, action, food supplements, much funnier than I expected = Arnie is great #oow09
3 minutes ago from web

shyamvaran #OOW09 Arnold could not have done it - Terminator without the Technology, or his body building with the technology for Training and food
3 minutes ago from web

OOW Tuesday

Highlights include key note by Michael Dell
Release of Oracle BI Apps 7.9.6.1 with support for databases like DW DB2 and Teradata.

My session was at 5:30PM

Monday, October 12, 2009

OOW Monday - OTN Night

The OTN night started at 7PM, in the Tent, good food as usual and the drinks! Entertainment in the far side of the tent, Middle Eastern dancers... On the near side (towards 3rd st) was the Trivia sponsored by Blackberry. I was drafted to be on the stage with 5 other contestants to fight it out for a Blackberry, in front of the crowd!

It was a tie with 300 points each, the other contestant from was from Canada. The questions were in three different categories - about Oracle , about RIM/Blackberry and about San Francisco.

So what was my winning strategy? Let me explain it in Exadata terms. The storage index is a new concept which is not a pointer to the row or block rather a negative index, i.e. where not to look for data. Likewise, I won with a similar strategy today, the last question was about San Francisco and neither on knew the answer, the other contestant answered and lost the point on the tie breaker, I won... so my "negative" strategy of not answering won me the Blackberry!!

OOW Monday Evening

I spent a good part of the day in the Exhibit Hall (Moscone West and Moscone South). Some of the things that caught my attention were:

1) Pretty Large Presence of Salesforce.com ( I think it is their first time at OOW)
2) Focus on Oracle on the Cloud, Amazon is also present
3) A real Sumo wrestler in the booth on Application Security - You don't know who you are up against?


Steve Miranda is talking now about Oracle Applications including Application unlimited. The customer Smucker is also expected. Janet Foutty of Deloitte will be on the stage.

Deloitte is the sponsor of this session. "Business Led and Technology Enabled" is the mantra. Tie back every penny to shareholder value. Janet is talking about Value maps that Deloitte uses to map business problems to Oracle products. Janet has a financial services background. Clients are focussed on cost cutting, efficiency and re-tolling their IT.

What's next, in the mind of clients - Improved Agility... is one of them. Next is Information Integration. According to Janet, Exadata enables new slices and dices of data that was not possible a few years ago.

Upgrading software is good hygiene, so customers should upgrade, per Janet. However, upgrade is not a casual decision, business and IT should co-own it. Janet believes that cloud and SAAS is truly a flex point in the IT history.

Steve Miranda is now going over the different application releases across the product suites. Rapid Planning is a new VCP (Value Chain Planning product) and it will be demoed. You can drill down to Sales and Operational Planning (S&OP) to see the detailed budget that has come from Hyperion via AIA.

Rapid Planning can help to bring demand fulfillment with the supply. The Planner can setup alternates and launch the plan. Integration with Web 2.0 is available, such as IM and Meeting request using Webcenter. Overall, it can help to solve the business planning problem without the use of 'spreadmarts'.

OOW 2009 Ann Livermore of HP

According to Ann 40% of Oracle deployments are on HP!!! Ann will tell how HP and Oracle work together to unleash the business potential. Now that EDS is part of HP, it is a big services company as well. Businesses spend 70% of IT budgets on routine operations and 30% on innovation. HP's goal is to help businesses reverse this tread.

HP has 3400 Oracle related professionals. I wonder if that includes that came with EDS. HP's goal is to help manage the Information explosion. Now Ann is talking about BI solutions? HP is number 1 on number of server for BI deployments, a lot of these are on Oracle software stack. She mentioned about Neoview - the HP DW appliance.