Case study – Finance

Fortune 500 Financial Company

The Client: One of the largest growing banks and expanding its base to most of the eastern and Central United States with recent mergers.

Challenge:
Installing, supporting and maintaining DB2 and HANA databases for a new SAP project. As a part of the project the goal is to first implement ECC and BW along with 6 other components mainly standing on DB2 database and eventually expanding to BPC, LRM, Bank Analyzer to host on HANA databases.


Challenges and Resolutions during the course of SAP project (Development and Support):

Designing, developing and implementing Snap database backups:
To make the recovery faster during the Database outages, we set up snap backups and testing the snaps on the proxy servers. I worked with the storage team to automate the database write suspend/resume and handovers for the storage team to take the snaps. Set up a schedule thru ESP jobs to mount this file systems taken from the production snap into the proxy servers to do the instantiation of the database to make sure the snap copies are recoverable. The same copies can be used as the alternate DR recovery plan if the primary recovery doesn’t work. Primary method for recovery of the databases on the DR site is via storage replication. Involved in implementing/testing bringing up the databases on the DR site and led the yearly DR testing from the DBA team.
Backup automation: Tested and implemented the networker api backups, earlier the backups are taken to disk and by this implementation we were able to save many terabytes of storage. Also we could reduce the time by half for the database refresh activities, as there is no need to copy the backup to the target server which save about 10 to 15 hours of DBA work hence reducing the total time for the refresh.
Avoiding downtime during Reorgs: It’s a challenge to get a downtime to perform the offline reorgs, So implemented DB6CONV and moved the tables online by enabling compression(row and adaptive) on large tables which again saved more than 3 TB for both ECC and BW and organic growth of the database came down by 4 times.
Table Virtualization: Due to the nature of the SAP applications, database will have thousands of empty tables based on what functionality we are using. Implemented table virtualization where all the empty tables are deleted from the database but they still exist in the ABAP dictionary and when required tables are automatically created. This saved very little space in the database but cleared the entries in the system tables for all the 70k tables in ECC and around 30k in BW which increased the performance of the DBA’s not going after all the tables in the database.
Database Upgrades: Performed Database version upgrades to migrate all the databases from V9.7 to 10.5 and rom 10.1 to 10.5. Performed fix pack upgrades on a regular basis to mitigate the security vulnerabilities based analyzing the CVSS scores for the security Apars. Currently working on automating DB2 upgrades using HP DMA.
Team Lead: As a team lead, led a group of 6 onshore and 5 offshore, delegating the work and being the primary contact in case of any priority 1 and 2 calls. Conducted team meetings weekly to discuss the progress of the work and discuss the incidents and to be proactive to avoid those incidents in the future. Involved in all the planning meetings as primary contact for work for the DBA team.
Major Issues: 1.DB2 crash on BW database:
DB2 crashed which resulted in delay of reporting from a BW database. Initial finding were it failed due to the latch errors but were able to start the DB without any issues. Involved IBM and they could not find the root cause as the core file generated was terminated due to insufficient space in the dump directory. Made the size equal to twice the memory on the system and next time we hit the same issue and IBM were able to find the root cause and provide the special build with the fix.
Npages=0
After the info cube compression and auto runstats, npages for the fact table is becoming 0 resulting in negative timerons for the queries running against it. As I have observers this pattern immediately after the auto runstats, which does auto, sampling, I did the complete runstats and the table is back to normal. As IBM could not find the root cause, as the immediate work around, I schedule a script which runs every hour to find the fact tables with npages=0 and do a full runstats whch saved us in the short term. Alternatley worked with IBM for many months reproducing the issue in the lowers and finally it came out to be a bug in runstats resulting in a special build from IBM.