Introduction to Global Data Replication & Recovery
Nexsan Global Disaster Replication & Recovery (GDR) introduces network based data replication and site recovery over any IP LAN, SAN or WAN providing data continuity and workflow continuance for any business and organization.
Nexsan GDR creates a duplicate data environment for mission critical information allowing data changes made to volumes at one location to be replicated to matching volumes at the remote location. Volumes can be replicated individually or in groups to maintain data consistency for larger applications like databases.
Nexsan GDR provides planned and unplanned (disaster) failover and recovery after disaster.
GDR Key Features and Benefits
Replicate data to a remote site and provide disaster recovery method.
Maintain exact duplicates of live data between two locations to ensure full data recovery and achieve zero RPO (Recovery Point Objective).
Replicate data changes across any bandwidth to the remote site to eliminate the cost associated with high speed connections.
|
Active-Active Replication |
Replicate data between two production sites so each site can recover from the other.
Replicate multiple data centers to one remote site.
|
Failover and Fallback processes |
Wizard-based management guides users in failover and fallback processes to minimize RTO (Recovery Time Objective).
|
Planned Failover and Fallback |
Minimize service interruption time during site migration or full site maintenance for 24/7 business continuity.
|
Use any Type of Disk Storage |
Storage systems used at the local and remote sites do not need to be the same brand or type. Select the storage type that best fits your requirements (FC or SCSI systems using FC, SCSI, SATA or ATA drives).
|
Server and Operating System Independent |
Replication is provided by the
Nexsan iSeries through the entire network – no need to purchase data replication software or licenses for each server.
Allow unlimited number of volumes, servers and storage capacity, with no additional cost, to provide significant return on investment.
Select only the volumes you wish to protect to conserve bandwidth costs and efficiently utilize storage capacity.
Consistency groups ensure that related volumes are replicated at the same time to maintain data flow integrity for databases or any other multiple volume applications.
Administer multi-site global data replication and recovery using a central network management application.
A disaster is an event causing a network shut down, leaving its data inaccessible. A disaster can be caused by several different elements such as:
Good disaster recovery solutions safeguard against all of these types of disasters and enable networks to recover from the disaster and continue operating normally.
Every well-planned network has built-in safeguards against localized failures. A disaster recovery solution safeguards against a site-wide failure.
High Availability safeguards against technological failures. Some examples of high availability are:
|
Dual power supplies safeguard against power supply failure |
|
Data mirroring safeguard against disk failure |
|
Host clustering safeguards against application failures. |
Data Protection safeguards against human failure. Some examples of data protection are:
|
Data backups safeguard against data loss |
|
Volume snapshots safeguard against data corruption. |
Some examples of a disaster are:
|
Natural, man-made or regional disaster that caused data loss. |
Nexsan's GDR solution is based on primary and secondary sites. Nexsan’s GDR solution supports one secondary and multiple primary sites:
|
Primary (local): site that is replicated |
|
Secondary (remote): site that holds replicated data. |
Different topologies influence disaster recovery solutions.
Nexsan GDR offers the following topologies:
|
Active/Passive: 1 primary site / 1 secondary site |
|
Active/Active : 1 primary/secondary site / 1 secondary/primary site |
|
Star Formation: More than one primary site / 1 secondary site |
|
The primary site replicates the data to the secondary site. |
|
The secondary site does not pull data from the primary site. |
|
In an active/active topology, both the primary as well as the secondary sites are productive and replicated.
In an active/passive topology, volumes are replicated from primary and stored on secondary. The secondary only acts as a data warehouse and will not take over network functions in the event of a disaster.
The takeover must be initiated by the customer.
In Star Formation topology there is one secondary site and more than one primary site. This topology is suited to a multi-branch company in which each primary site (branch) replicates its data to the secondary site (headquarters).
Figure 1-1. Star Formation Topology
The building blocks of Nexsan GDR are:
|
Primary and Secondary Sites |
|
Disaster Recovery (GDR) Pairs |
Journal Volume is needed for administrative
journaling functions of GDR. Journal volumes include maintaining a mapping table of all changes made to the GDR Pair Volumes at the block level. Journal volumes are expandable to accommodate growing journaling functions. One
Journal Volume is needed per
Nexsan iSeries and only for Asynchronous Replication.
The journal size is influenced by:
|
The amount of changes that occurred during the initial sync process. |
|
The amount of changes between consecutive replication for all GDR pairs. |
On the
Nexsan iSeries, primary storage is virtualized into volumes. These virtual volumes are the basic building blocks in the GDR solution.
Each virtual volume can be replicated to a dedicated virtual volume at the secondary site. Together, the primary and secondary volumes form a
GDR Pair.
A consistency group applies a given set of replication parameters to a group of GDR pairs. Point in Time (PIT) snapshots of data changes are created at specified times. Point-In-Time (PIT) snapshots guarantee that the pair within the CG replicate/merge within the same point in time enabling the application to recover.
Since some applications create or require more than one volume with inter-volume dependencies, e.g. databases and Email applications that have several volumes for databases and logs, it is necessary to define identical replication parameters for them. Consistency Groups ensure that they are replicated together.
GDR pairs can be assigned as stand alone pairs and not be part of a Consistency Group.
|
Figure 1-4. GDR pairs within a Consistency Group
GDR Initial Synchronization Options
If your primary volumes contain data, it is necessary to perform an initial volume synchronization action to replicate data to the secondary site.
There are three initial synchronization policies:
1. |
No SyncNo need for initial volume copy. |
If your primary volume is a new volume and contains no data, there is no need for initial data synchronization. There is no data to synchronize between the primary and secondary volumes.
If you plan on copying a large amount of data to a newly created pair after replication has started, the journal might fill up too quickly and become out of sync. Nexsan recommends copying the data to the primary volume before you create the pair and use initial sync (which does not use the journal).
|
2. |
Online
Copies volumes over the network. |
There are two ways to use Online Initial Sync.
|
When there is sufficient bandwidth you can synchronize the data directly over the network. |
|
When there is insufficient bandwidth, you can configure both the primary and secondary Nexsan iSeries locally and perform online initial sync over the local LAN. After the initial sync is done you can transport the secondary Nexsan iSeries to the remote site. This method is described in Using Online Init Sync with Low Bandwidth Lines. |
|
For Asynchronous pairs:
A snapshot is taken of the primary volume. You must define this snapshot volume. The snapshot of the primary volume is replicated to the secondary site in real time.
3. |
Offline
Used for copying volumes offline to an external media (disk/tape). The steps for performing Offline copy are described in Offline Copy. |
For Asynchronous pairs only:
A snapshot is taken of the primary volume. When using the Wizard, you must define this snapshot volume when creating the GDR pair. The snapshot of the primary volume is copied to a separate tape/disk. This separate tape/disk is physically transferred to the secondary site and than is copied to the secondary volume.
The method you choose for replicating your data depends on several factors.
The most important factors for data replication are:
The GDR solution offers you two methods of replication:
1. |
Synchronous – recommended for high bandwidth, low latency |
2. |
Asynchronous – recommended for low bandwidth, high latency |
A site can use a combination of both synchronous and asynchronous replication methods. When deciding which method is most appropriate for each GDR pair, consider the following:
|
How often does the data change? |
|
What data shares interdependency with other data and, therefore, should be replicated together? |
|
What kind of impact do small changes in a certain data chunk have on business functioning, i.e. what is the data loss tolerance? |
Synchronous replication means that every write operation is written to the primary and secondary volumes before sending back a write acknowledge to the server. Synchronous replication provides zero data loss.
If you want to have zero data loss, use synchronous replication. However, you must have high bandwidth network and low latency to have an efficient synchronous replication solution
|
A synchronously replicated GDR pair functions similarly to a regular mirrored volume over iSCSI, with each volume as a child. However, if the primary volume fails, the volume must be manually failed over to the secondary volume in order for the secondary volume to take over regular volume functions.
Figure 1-5. Synchronous Replication Method
With Asynchronous replication, every write operation is written to the primary volume. However, instead of replicating it to the secondary site, the data is written in the primary journal and then a write acknowledge is sent back to the server.
Use asynchronous replication when site connectivity is limited in bandwidth and the latency is high.
The data in the remote site might not be synchronized with the latest data of the primary volume.
|
Figure 1-6. First Step of Asynchronous Replication Method
Point in Time (PIT) snapshots of data changes are created at specified times. PIT data is replicated to the secondary site journal. At the secondary site, after the entire PIT has been transferred, the PIT is merged with secondary volume. There is no potential for data loss in the secondary volume if network conditions interfere with the successful transfer of a PIT to the secondary site. This is due to the fact that the PIT data merge does not begin until the complete PIT is successfully transferred to the secondary site. The volume copy from the previous successful transfer remains intact.
Figure 1-7. Second Step of Asynchronous Replication Method
Figure 1-8. GDR Workflow Diagram
|
|
|
|
|
Volumes marked for initial synchronization are replicated to the secondary site.
|
|
|
Routine GDR replication follows, as per each stand alone GDR pair or consistency group.
|
Disaster (Planned/Unplanned)
|
Primary site goes down, due to scheduled maintenance or disaster.
|
|
|
Primary site is manually failed-over to the secondary site and all GDR volumes are exposed according to their designated hosts. At this stage the secondary site acts as the primary site.
After planned failover, changes in the secondary site are replicated to the primary site.
|
|
|
Secondary site continues to function as the primary site while the primary site is repaired.
|
|
|
Secondary site is manually failed back to the primary site when primary site is repaired or scheduled maintenance is finished.
|
Planned Failover & Fallback
Planned failover can be used when a site needs to go down for primary maintenance or for site relocation. The unique feature of a planned failover is its ability to continue to keep the data changes, eliminating the need for an initial data resynchronization when the primary site is up again.
The secondary site continues to keep a journal volume. This journal volume will be instrumental in quickly restoring the primary site.
Since the journal volume is not being pushed to another site, it can grow quickly depending on volume activity. Therefore, attention should be paid to the journal volume to make sure that it is not quickly approaching its load threshold. A journal volume can be resized if necessary.
|
1. |
Deactivate the necessary applications at the primary site. |
2. |
Once all write operations are stopped and all caches are flushed, PITs are taken of all GDR volumes and are replicated to the secondary site. |
3. |
When all data has been transferred, you can switch to the secondary site. |
4. |
The primary site is now available for maintenance. |
Secondary Site Operations
1. |
At the secondary site the PITs are merged with the secondary volumes for the most up-to-the-minute data copies. |
2. |
Once all of the data is replicated, the secondary volumes are exposed to the appropriate hosts. |
1. |
Deactivate the necessary applications at the secondary site. |
2. |
Once all write operations are stopped and all caches are flushed, PITs of all GDR volumes are replicated to the primary site. |
3. |
When all data has been transferred, you can switch back to the primary site. |
4. |
At the primary site the PITs are merged with the primary volumes. |
An initial resynchronization of all data is not necessary. Only the PIT data needs to be merged with the primary site data.
The secondary site continues to keep a journal volume. This journal volume will be instrumental in quickly restoring the primary site.
Since the journal volume can grow quickly depending on volume activity, attention should be paid to the journal volume to make sure that it is not quickly approaching its load threshold. A journal volume can be resized if necessary.
|
5. |
Once all of the data is replicated, you can reactivate all applications. |
Disaster Failover, Recovery & Fallback
In an unplanned failover there is nothing to do immediately at the primary site. Begin working straight in the secondary site.
Secondary Site Operations
1. |
Switch the secondary volumes to primary volumes. |
2. |
Expose the secondary volumes to the configured hosts and activate any applications that need to be activated. |
Note that the secondary site does not continue to keep a journal volume of data changes. The secondary site functions as a regular site without GDR capabilities.
|
When using asynchronous replication, there will be a potential possibility to lose some of the data that was not replicated to the secondary site upon disaster. With Nexsan GDR solution you may keep the data loss to a minimum or completely eliminate it by using optimal replication policies.
Recovery after a Disaster
After performing Failover, you must complete the following steps before your system will be restored.
1. |
Use the Recover Wizard to recover your local site. |
2. |
Replicate (initial sync and replicate) |
Depending on the level of disaster, the need for initial fallback configurations at the primary will vary. Nexsan’s Recover Wizard
will guide you to recreate the GDR pairs on the primary
Nexsan iSeries. For more information see
Recovery Wizard in Chapter 4.
|
If equipment and virtual volumes are still intact there is no initial reconfiguration that must be done at the primary site. |
|
If the site suffered hardware or software damage, primary volumes, volume hierarchies, targets, GDR pairs and consistency groups must be reconfigured from the ground up. |
|
If volume names have changed you must update volume pair relations at the secondary site. |
Once both sites are properly configured, you can begin initial synchronization from the secondary site to the restored primary site of all volume pairs using either online or offline copy. In essence, the secondary site is acting temporarily as a primary site replicating data to the primary site. For more information see
Replication in Chapter 4.
Once the initial sync is completed, you can deactivate applications on the secondary site and failover the secondary site back to the primary site. Once all write operations are stopped and all caches are flushed, PITs are taken of all GDR volumes and replicated to the primary site. When all PITs have been transferred, you can switch back to the primary site. For more information see
Fallback in Chapter 4.
You can now reactivate all applications in the primary site.