Saturday, April 6, 2013

Data Guard Configuration with Oracle DGMGRL


About Data Guard

Efficient business operations, quality customer service, compliance with government regulations, and safeguarding corporate information assets all require high levels of data protection and data availability. Thus it is no surprise that data protection and data availability are among the top priorities for enterprises of all sizes and industries.

A set of questions one needs to ask him/herself is what If a disaster impacts ongoing business transactions, How soon can those systems be back to business again, How much Data is affordable to loose, is it acceptable? The only answer for all these questions to position a proper "Business Continuity Plan (BCP)" in place so that it helps business to grow inline with its consumer expectations. 
Technically, this is where Oracle's Data Guard plays a vital role that ensures superior reliability, and rock solid performance. DataGuard is Oracle's complete Disaster Recovery Solution which can reliably deliver aggressive recovery point (RPO - data protection) and recovery time (RTO - data availability) objectives.  

A Data Guard configuration includes a production database, referred to as the primary database,  and up to 30 standby databases. Primary and standby databases connect over TCP/IP using Oracle Net Services. There are no restrictions on where the databases are located provided that they can communicate with each other. A standby database is initially created from a backup copy of the primary database. Data Guard automatically synchronises the primary database and all of its standby databases by transmitting primary database redo - the information used by Oracle to recover transactions - and applying it to the standby database.

Over years, Oracle has done wonderful job on automating most of the process or human driven activities positioning it is the most robust and comprehensive DR Solution in the market. This article explains  configuration of Data Guard using the utility called DGMGRL.


Conventions used

PRIMARY - PRODUCTION
STANDBY - DR
PROD - PRIMARY Database Instance
PRODDR - STANDBY Database Instance
RDBMS Version - 11.2.0.1 or higher
PLATFORM - Any Platform

What this article covers

  • Configuration of Data Guard
  • Monitor Log Transport and Log Apply Services
  • Switchover
  • Data Guard mode transitions
  • Data Guard Standby transitions
  • Data Guard Fast Start Fail Over


Configuration of Data Guard


Step (1) Oracle network configuration
Configure listener for both PRIMARY & STANDBY as given below:

On PRIMARY

listener

DGMGRL by default expects listener service name to be the combination of "(db_unique_name)_DGB.(db_domain)", so make sure that the expectation is met, otherwise, it leads to "ora-12514" errors.

PRODLSNR=

  (DESCRIPTION=
    (ADDRESS_LIST=
      (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.50)(PORT=1530))
      (ADDRESS=(PROTOCOL=ipc)(KEY=extproc))))
SID_LIST_PRODLSNR=
  (SID_LIST=
    (SID_DESC=
      (GLOBAL_DBNAME=PROD_DGB.com)
      (ORACLE_HOME=/oracle/app/product/11.2.0/dbhome)
      (SID_NAME=PROD))
    (SID_DESC=
      (SID_NAME=plsextproc)
      (ORACLE_HOME=/oracle/app/product/11.2.0/dbhome)
      (PROGRAM=extproc)))

tnsnames to connect to STANDBY

PRODDR_DGB.com =

  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.51)(PORT = 1531))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = PRODDR_DGB.com)
    )
  )


On STANDBY

listener

PRODDRLSNR=

  (DESCRIPTION=
    (ADDRESS_LIST=
      (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.51)(PORT=1531))
      (ADDRESS=(PROTOCOL=ipc)(KEY=extproc))))
SID_LIST_PRODDRLSNR=
  (SID_LIST=
    (SID_DESC=
      (GLOBAL_DBNAME=PRODDR_DGB.com)
      (ORACLE_HOME=/oracle/app/product/11.2.0/dbhome)
      (SID_NAME=PRODDR))
    (SID_DESC=
      (SID_NAME=plsextproc)
      (ORACLE_HOME=/oracle/app/product/11.2.0/dbhome)
      (PROGRAM=extproc)))

tnsnames to connecting back to PRIMARY

PROD_DGB.com =

  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.50)(PORT = 1530))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = PROD_DGB.com)
    )
  )

Check the connectivity back and forth

Step(2) Parameters Configuration
Configure parameters as required below:

On PRIMARY 
SQL>ALTER SYSTEM SET dg_broker_start=TRUE;
SQL>ALTER SYSTEM SET log_archive_config='dg_config=(PROD,PRODDR)';
SQL>ALTER SYSTEM SET service_names='PROD_DGB.com';

Create pfile on PRIMARY from spfile, and transfer it to STANDBY Site

On STANDBY
Make following changes in pfile(init.ora), and bring the instance up using pfile, please also make sure that you have "audit_file_dest" physically on STANDBY Site 

control_files='/prod/standby.ctl'
db_unique_name=PRODDR

Now, create spfile from pfile, restart the instance, and make following parameter changes as required,

SQL>ALTER SYSTEM SET dg_broker_start=TRUE;
SQL>ALTER SYSTEM SET log_archive_config='dg_config=(PROD,PRODDR)';
SQL>ALTER SYSTEM SET service_names='PRODDR_DGB.com';

At this stage, you only have the instance up on STANDBY

Step(3) : Transfer Password File
Transfer the password file of PRIMARY to STANDBY Site, to $ORACLE_HOME/dbs, and rename it to orapw$ORACLE_SID (i.e., orapwPRODDR)

Step(4) :  Restore Backup
On PRIMARY, Login to databases Recovery Manager (RMAN) to both TARGET(PRIMARY) Database, and  AUXILIARY(STANDBY) as shown below, and duplicate the PRIMARY Database to STANDBY Database

$rman target / auxiliary sys/<password>@PRODDR_DGB.com
RMAN>duplicate target database for standby from active database nofilenamecheck;

Step(5) : Add STANDBY Redo Log files 
On STANDBY and PRIMARY, Add Standby redologs as shown below:

SQL>ALTER DATABASE ADD STANDBY LOGFILE 'srl1.f' SIZE 52428800;
SQL>ALTER DATABASE ADD STANDBY LOGFILE 'srl2.f' SIZE 52428800;
SQL>ALTER DATABASE ADD STANDBY LOGFILE 'srl3.f' SIZE 52428800;
SQL>ALTER DATABASE ADD STANDBY LOGFILE 'srl4.f' SIZE 52428800;

At this stage, you have the STANDBY database mounted by the instance with the latest backup of PRIMARY 

Step(6) Complete the Configuration with DGMGRL

On PRIMARY
Login to PRIMARY Database with DGMGRL
$dgmgrl sys/<password>

Create Data Guard Configuration
DGMGRL> CREATE CONFIGURATION 'PRODDG' AS PRIMARY DATABASE IS 'PROD' CONNECT IDENTIFIER IS 'PROD_DGB.com';
Configuration "PRODDG" created with primary database "PROD"

Add STANDBY Databases to Configuration
DGMGRL> ADD DATABASE 'PRODDR' AS CONNECT IDENTIFIER IS 'PRODDR_DGB.com';
Database "PRODDR" added

Modify Connect Identifier details according to our specifications

DGMGRL> EDIT DATABASE 'PRODDR' SET PROPERTY 'StaticConnectIdentifier'='(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.51)(PORT=1531))(CONNECT_DATA=(SERVICE_NAME=PRODDR_DGB.com)(INSTANCE_NAME=PRODDR)(SERVER=DEDICATED)))';
Property "StaticConnectIdentifier" updated
DGMGRL> EDIT DATABASE 'PROD' SET PROPERTY 'StaticConnectIdentifier'='(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.50)(PORT=1530))(CONNECT_DATA=(SERVICE_NAME=PROD_DGB.com)(INSTANCE_NAME=PROD)(SERVER=DEDICATED)))';
Property "StaticConnectIdentifier" updated

Enable the Configuration
DGMGRL> ENABLE CONFIGURATION;
Enabled.

You can check the configuration as below, and the expected output at this stage
DGMGRL> SHOW CONFIGURATION;
Configuration - PRODDG

  Protection Mode: MaxPerformance
  Databases:
    PROD   - Primary database
    PRODDR - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS


You can check each database status in configuration as below
For PRIMARY
DGMGRL> SHOW DATABASE 'PROD';
Database - PROD

  Role:            PRIMARY
  Intended State:  TRANSPORT-ON
  Instance(s):
    PROD

Database Status:
SUCCESS

For STANDBY
DGMGRL> SHOW DATABASE 'PRODDR';
Database - PRODDR

  Role:            PHYSICAL STANDBY
  Intended State:  APPLY-ON
  Transport Lag:   0 seconds
  Apply Lag:       0 seconds
  Real Time Query: OFF
  Instance(s):
    PRODDR

Database Status:
SUCCESS

Step(7) Start Apply Process
Make sure that STANDBY has started the Managed Recovery Process, otherwise, please start it as advised below

On STANDBY, on MOUNT stage
SQL>ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT;
Database altered.

If ACTIVE DataGuard is licensed and applicable to your environment, open the database in READ ONLY mode while the recovery can still continue,
SQL>ALTER DATABASE OPEN READ ONLY;
Database altered.

Monitor log transport and apply services


PRIMARY(PROD)
STANDBY(PRODDR)
SQL>ARCHIVE LOG LIST
SQL>ARCHIVE LOG LIST

SQL>SQL>ALTER SYSTEM SWITCH LOGFILE;
SQL>SELECT SEQUENCE#,PROCESS,STATUS,BLOCK# FROM V$MANAGED_STANDBY;
SQL>ARCHIVE LOG LIST

SQL> ARCHIVE LOG LIST

Find it same as PRIMARY Current archive log sequence number

Find the archive log applied with YES on STANDBY
SQL>SELECT MAX(SEQUENCE#) FROM V$ARCHIVED_LOG WHERE APPLIED='YES';
Find Oracle updating the same archive log sequence number to V$ARCHIVED_LOG APPLIED Status with “YES”
SQL>SELECT MAX(SEQUENCE#) FROM V$ARCHIVED_LOG WHERE APPLIED='YES';


Switchover with DGMGRL (Role Transitions)


DGMGRL> SWITCHOVER to 'PRODDR';
Performing switchover NOW, please wait...
New primary database "PRODDR" is opening...
Operation requires shutdown of instance "PROD" on database "PROD"
Shutting down instance "PROD"...
ORA-01109: database not open

Database dismounted.

ORACLE instance shut down.
Operation requires startup of instance "PROD" on database "PROD"
Starting instance "PROD"...
ORACLE instance started.
Database mounted.
Switchover succeeded, new primary is "PRODDR"


An Example of Data Guard Mode transitions using DGMGRL

From Maximum Performance to Maximum Availability
Change the Log Transport Mode from ASYNC to SYNC.
DGMGRL> EDIT DATABASE 'PRODDR' SET PROPERTY 'LogXptMode'='SYNC';

Property "LogXptMode" updated


Change Data Guard mode from MAXIMUM PERFORMANCE to MAXIMUM AVAILABILITY

DGMGRL> EDIT CONFIGURATION SET PROTECTION MODE AS MAXAVAILABILITY;
Succeeded.


An Example on Standby Transitions (from Snapshot Standby to Physical Standby, and vice versa)

From Physical Standby to Snapshot Standby
DGMGRL> CONVERT DATABASE 'PRODDR' TO SNAPSHOT STANDBY;
Converting database "PRODDR" to a Snapshot Standby database, please wait...
Database "PRODDR" converted successfully

From Snapshot Standby to Physical Standby

DGMGRL> CONVERT DATABASE 'PRODDR' TO PHYSICAL STANDBY;
Converting database "PRODDR" to a Physical Standby database, please wait...
Operation requires shutdown of instance "PRODDR" on database "PRODDR"
Shutting down instance "PRODDR"...
Database closed.
Database dismounted.
ORACLE instance shut down.
Operation requires startup of instance "PRODDR" on database "PRODDR"
Starting instance "PRODDR"...
ORACLE instance started.
Database mounted.
Continuing to convert database "PRODDR" ...
Operation requires shutdown of instance "PRODDR" on database "PRODDR"
Shutting down instance "PRODDR"...
ORA-01109: database not open

Database dismounted.

ORACLE instance shut down.
Operation requires startup of instance "PRODDR" on database "PRODDR"
Starting instance "PRODDR"...
ORACLE instance started.
Database mounted.
Database "PRODDR" converted successfully



Enabling FSFO (Fast Start Fail Over)

FSFO is an automatic Fail Over procedure that is triggered in the event of a Disaster (Default Attributes or User Configurable Triggers), FSFO was introduced with 10gR2 and has seen tremendous improvements over its major version upgrade i.e., 11g. 

FSFO was only limited to MAA(Maximum available architecture) in its introductory release, but it can now also be deployed with Maximum Performance Protection Mode starting from 11gR1.
There are primarily two main components in FSFO architecture as explained below.

Data Guard Broker

Broker is the utility that monitors and manages the complete Data Guard configuration. It always maintains the state information of its databases (PRIMARY and STANDBYs). Some of functions of Broker are,
  • It automatically sets Data Guard related database initialization parameters on instance start and role transitions
  • It automatically starts log transport and apply services for standbys, and automates many of the administrative tasks associated with maintaining a Data Guard configuration. 
  • FSFO is a feature of Broker which records information about the failover target, how long to wait after a failure before triggering a failover, and other FSFO specific properties. 


FSFO Observer
The observer is the third party in an otherwise typical primary/standby Data Guard configuration.  It is actually a low-footprint OCI client built into the DGMGRL CLI (Data Guard Broker Command Line Interface) and, like any other client, may be run on a different hardware platform than the database servers.  Its primary job is to perform a failover when conditions permit it to do so without violating the data durability constraints set by the DBA.  Only the observer can initiate FSFO failover.  It's secondary job is to automatically reinstate a failed primary as a standby if that feature is enabled (the default).  The observer is the key element that separates Data Guard failover from its pre-FSFO role as the plan of last resort to its leading role in a robust high availability solution. 

Note: the FSFO observer version must match the database version.  Oracle Database 11g observers are incompatible with 10g databases and vice-versa. 

Requirements for FSFO

=>The Data Guard configuration must be in either MaxAvailability or MaxPerformance protection mode.
DGMGRL> SHOW CONFIGURATION;
Configuration - PRODDG

  Protection Mode: MaxPerformance

  Databases:
    PROD   - Primary database
    PRODDR - Physical standby database

Fast-Start Failover: DISABLED


Configuration Status:

SUCCESS

=>The LogXptMode property for both the primary database and the fast-start failover target standby database must be set to ASYNC if the configuration protection mode is set to MaxPerformance mode.
DGMGRL> SHOW DATABASE VERBOSE 'PROD';
Database - PROD

  Role:            PRIMARY

  Intended State:  TRANSPORT-ON
  Instance(s):
    PROD

  Properties:

    DGConnectIdentifier             = 'PROD_DGB.com'
    ObserverConnectIdentifier       = ''
    LogXptMode                      = 'ASYNC'
------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------------------------------------------------
Database Status:
SUCCESS



DGMGRL> SHOW DATABASE VERBOSE 'PRODDR';
Database - PRODDR

  Role:            PHYSICAL STANDBY

  Intended State:  APPLY-ON
  Transport Lag:   0 seconds
  Apply Lag:       0 seconds
  Real Time Query: ON
  Instance(s):
    PRODDR

  Properties:

    DGConnectIdentifier             = 'PRODDR_DGB.com'
    ObserverConnectIdentifier       = ''
    LogXptMode                      = 'ASYNC'
------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------------------------------------------------

Database Status:

SUCCESS



=>The primary database and the fast-start failover target standby database must both have flashback enabled.
ON PRIMARY
SQL> SELECT DATABASE_ROLE,FLASHBACK_ON FROM V$DATABASE;
DATABASE_ROLE FLASHBACK_ON
---------------- ------------------
PRIMARY NO

SQL> ALTER DATABASE FLASHBACK ON;

Database altered.

SQL> SELECT DATABASE_ROLE,FLASHBACK_ON FROM V$DATABASE;

DATABASE_ROLE FLASHBACK_ON
---------------- ------------------
PRIMARY YES

ON STANDBY
SQL> SELECT DATABASE_ROLE,FLASHBACK_ON FROM V$DATABASE;
DATABASE_ROLE FLASHBACK_ON
---------------- ------------------
PHYSICAL STANDBY NO

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
Database altered.

SQL> ALTER DATABASE FLASHBACK ON;
Database altered.

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT;
Database altered.

SQL> SELECT DATABASE_ROLE,FLASHBACK_ON FROM V$DATABASE;
DATABASE_ROLE FLASHBACK_ON
---------------- ------------------
PHYSICAL STANDBY YES

=>Start the Observer
DGMGRL> START OBSERVER;
Observer started

Session hangs, please leave it open to monitor its feedback on failover attempts. It is strongly recommended that Observer should not be started on PRIMARY machine, can be running on STANDBY or any other OCI client machine which is of same Database Version.

To verify the observer status, and check the node on which it is running.



SQL> COL FS_FAILOVER_OBSERVER_HOST FOR A15 
SELECT FS_FAILOVER_OBSERVER_HOST,FS_FAILOVER_CURRENT_TARGET,FS_FAILOVER_STATUS,DATAGUARD_BROKER from v$databaseSQL> 
  2  ;

FS_FAILOVER_OBS FS_FAILOVER_CURRENT_TARGET     FS_FAILOVER_STATUS     DATAGUAR

--------------- ------------------------------ ---------------------- --------
macdb2.com PRODDR       TARGET UNDER LAG LIMIT ENABLED

=>Set the primary database FastStartFailoverTarget property to the DB_UNIQUE_NAME value of the desired target standby database and the desired target standby database FastStartFailoverTarget property to the DB_UNIQUE_NAME value of the primary database.
DGMGRL> EDIT DATABASE 'PROD' SET PROPERTY FastStartFailoverTarget = 'PRODDR';
Property "faststartfailovertarget" updated
DGMGRL> EDIT DATABASE 'PRODDR' SET PROPERTY FastStartFailoverTarget = 'PROD';
Property "faststartfailovertarget" updated


=>The LogXptMode property for both the primary database and the fast-start failover target standby database must be set to SYNC if the configuration protection mode is set to MaxAvailability mode.

=>Enable FSFO

DGMGRL> ENABLE FAST_START FAILOVER;
Enabled.


=>Verify the status of FSFO
DGMGRL> SHOW CONFIGURATION;
Configuration - PRODDG

  Protection Mode: MaxPerformance

  Databases:
    PROD   - Primary database
    PRODDR - (*) Physical standby database

Fast-Start Failover: ENABLED


Configuration Status:

SUCCESS



Conditions for FSFO Failover

By default, the observer will initiate failover to the target standby if and only if ALL of the following are true:
  • observer is running 
  • observer and the standby both lose contact with the primary
  • Note: if the observer loses contact with the primary, but the standby does not, the observer can determine that the primary is still up via the standby.
  • observer is still in contact with the standby
  • durability constraints are met
  • failover threshold timeout has elapsed
User configurable failover conditions (11g and later)
Oracle Database 11g Rel 1 introduced user configurable failover conditions that can trigger the observer to initiate failover immediately.
  • Health conditions 
  • Broker can be configured to initiate failover on any of the following conditions.  Conditions shown in blue are enabled by default. 
  • Datafile Offline (due to IO errors) 
  • Corrupted Controlfile
  • Corrupted Dictionary
  • Inaccessible Logfile (due to IO errors) 
  • Stuck Archiver
  • Oracle errors (ORA-NNNNN), You can also specify a list of ORA- errors that will initiate FSFO failover.  The list is empty by default. 
  • Application initiated, Applications can initiate FSFO failover directly using the DBMS_DG.INITIATE_FS_FAILOVER procedure with an optional message text that will be displayed in the observer log and the primary's alert log. 

Testing FSFO

"SHUTDOWN ABORT" or Instance Crash is a classic example how FSFO takes on Failing Over to DR. 

ON PRIMARY,  issue "SHUTDOWN ABORT" or kill one of the Oracle's core background Processes such as SMON/PMON/DBWR etc.. Monitor the session which has been previously opened to start the Observer, following information is logged


Observer started




04:21:01.21  Tuesday, April 09, 2013

Initiating Fast-Start Failover to database "PRODDR"...

Performing failover NOW, please wait...

Failover succeeded, new primary is "PRODDR"

04:21:08.48  Tuesday, April 09, 2013

Now, let's bring the PRIMARY Instance up which has been crashed in our testing. It fails to open with the error below

SQL> STARTUP
ORACLE instance started.

Total System Global Area  435224576 bytes
Fixed Size    1337044 bytes
Variable Size  201328940 bytes
Database Buffers  226492416 bytes
Redo Buffers    6066176 bytes
Database mounted.
ORA-16649: possible failover to another database prevents this database from
being opened

But as soon as it is available, Data Guard Broker (Observer in this case) initiaties reinstatement for database "PROD", which in turn becomes the PHYSICAL STANDBY. Below information is logged in Observer session

04:23:43.97  Tuesday, April 09, 2013
Initiating reinstatement for database "PROD"...
Reinstating database "PROD", please wait...
Operation requires shutdown of instance "PROD" on database "PROD"
Shutting down instance "PROD"...
ORA-01109: database not open

Database dismounted.
ORACLE instance shut down.
Operation requires startup of instance "PROD" on database "PROD"
Starting instance "PROD"...
ORACLE instance started.
Database mounted.
Continuing to reinstate database "PROD" ...
Reinstatement of database "PROD" succeeded
04:24:33.22  Tuesday, April 09, 2013

At this stage, PRODDR is under PRIMARY role and PROD is under STANDBY role, you can now do a switchover to shift their roles as required

DGMGRL> SWITCHOVER TO 'PROD';
Performing switchover NOW, please wait...
New primary database "PROD" is opening...
Operation requires shutdown of instance "PRODDR" on database "PRODDR"
Shutting down instance "PRODDR"...
ORA-01109: database not open

Database dismounted.
ORACLE instance shut down.
Operation requires startup of instance "PRODDR" on database "PRODDR"
Starting instance "PRODDR"...
ORACLE instance started.
Database mounted.
Database opened.
Switchover succeeded, new primary is "PROD"



Please leave your feedback in the comments section or you can mail me on the article as I continue to improve, it is highly appreciated. 



2 comments:

  1. Hi Naga,

    This is sudharshan Reddy, ur article cover entire datagaurd stuff. i have follow this steps and i am able to create a standby database and done switch over.

    Thank you very much...

    Please update the same for RAC environment.

    i dont have idea on RAC datagaurd switch over. could plaese give me steps.

    ReplyDelete
    Replies
    1. Hi Sudharshan, Thanks for your valuable feedback.. I will sure do it.. Thanks again!!!

      Delete

replica sets, Replication with mongoDB

Before we get into details of mongoDB replica sets, let's understand what is replication and why is it needed? Replication is a config...