VVR replication stopped with below status. enabled, attached and consistent (are good sign) means, we are in state from where we can recover/resume quickly.
# vradmin -g data_dg repstatus data_rvg
Replicated Data Set: data_rvg
Primary:
Host name: primary-vvr-host
RVG name: data_rvg
DG name: data_dg
RVG state: enabled for I/O
Data volumes: 1
VSets: 0
SRL name: srl_volume
SRL size: 24.54 G
Total secondaries: 1
Secondary:
Host name: secondary-vvr-host
RVG name: data_rvg
DG name: data_dg
Data status: consistent, behind
Replication status: resync paused due to network disconnection (dcm resynchronization)
Current mode: asynchronous
Logging to: DCM (contains 4567890 Kbytes) (SRL protection logging)
# vxprint -Pl
Disk group: data_dg
Rlink: rlk_secondary-vvr-host
info: timeout=500 rid=0.1076
latency_high_mark=10000 latency_low_mark=9950
bandwidth_limit=none
state: state=ACTIVE
synchronous=off latencyprot=off srlprot=autodcm
assoc: rvg=data_rvg
remote_host=secondary-vvr-host IP_addr=10.50.60.70 port=4145
remote_dg=data_dg
remote_dg_dgid=1397347157.35.secondary-host
remote_rvg_version=unknown
remote_rlink=rlk_primary-vvr-host
remote_rlink_rid=0.1078
local_host=primary-vvr-host IP_addr=10.20.30.58 port=4145
protocol: TCP/IP
flags: write enabled attached consistent disconnected asynchronous dcm_logging
Status: (On Primary and Secondary both)
# vradmin -g data_dg repstatus data_rvg
Replicated Data Set: data_rvg
Primary:
Host name: primary-vvr-host
RVG name: data_rvg
DG name: data_dg
RVG state: enabled for I/O
Data volumes: 1
VSets: 0
SRL name: srl_volume
SRL size: 24.54 G
Total secondaries: 1
Secondary:
Host name: secondary-vvr-host
RVG name: data_rvg
DG name: data_dg
Data status: consistent, behind
Replication status: resync paused due to network disconnection (dcm resynchronization)
Current mode: asynchronous
Logging to: DCM (contains 4567890 Kbytes) (SRL protection logging)
# vxprint -Pl
Disk group: data_dg
Rlink: rlk_secondary-vvr-host
info: timeout=500 rid=0.1076
latency_high_mark=10000 latency_low_mark=9950
bandwidth_limit=none
state: state=ACTIVE
synchronous=off latencyprot=off srlprot=autodcm
assoc: rvg=data_rvg
remote_host=secondary-vvr-host IP_addr=10.50.60.70 port=4145
remote_dg=data_dg
remote_dg_dgid=1397347157.35.secondary-host
remote_rvg_version=unknown
remote_rlink=rlk_primary-vvr-host
remote_rlink_rid=0.1078
local_host=primary-vvr-host IP_addr=10.20.30.58 port=4145
protocol: TCP/IP
flags: write enabled attached consistent disconnected asynchronous dcm_logging
Possible Cause
Network disconnect between Primary and Secondary site at some point of time
Solution
CHECK VVR HEALTH
--What port VVR is using?
# /sbin/vrport
heartbeat=4145
vradmind=8199
vxrsyncd=8989
data=Anonymous-Ports
--What is daemon status ?
# /etc/init.d/vras-vradmind.sh status
VxVM VVR V-5-4-0 VRAS daemon running: [ OK ]
VxVM VVR V-5-4-0 vvr_stats running: [ OK ]
# /usr/sbin/vxstart_vvr status
VxVM VVR INFO V-5-2-3935 Using following ports:
heartbeat: 4145
vradmind: 8199
vxrsyncd: 8989
data: Anonymous-Ports
To change, see vrport(1M) command
VxVM VVR vxnetd INFO V-5-1-15103 Cannot create IPv6 socket Address family not supported by protocol
VxVM VVR V-5-2-5942 Starting Communication daemon: [ OK ]
--Are ports UP and listening on primary and secondary site ?
# netstat -nap|grep -Ew "4145|8199|8989"
tcp 0 0 0.0.0.0:8199 0.0.0.0:* LISTEN 2080/vradmind
tcp 0 0 0.0.0.0:4145 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:8989 0.0.0.0:* LISTEN 2161/in.vxrsyncd
tcp 0 0 10.20.30.67:8199 10.20.30.57:47574 ESTABLISHED 2080/vradmind
udp 0 0 0.0.0.0:4145 0.0.0.0:* -
--Are primary to secondary and vice versa remote_host ( name used for replication) are pinging?
# vxprint -Pl|grep remote_host # to find remote_host
# ping
--Are ports are reachable from primary -> secondary and secondary ->primary ?
# nc -zvw3 4145
# nc -zvw3 8199
# nc -zvw3 8989
FIX OF PROBLEM
--If above VVR health is not good, fix those issue first.
If health is good, stop VVR on secondary node
# /usr/sbin/vxstart_vvr stop
--wait for 60 seconds otr so, stop VVR on Primary node
--start VVR on secondary node
# /usr/sbin/vxstart_vvr start
--wait for 60 seconds, start VVR on Primary node
--Check if status is connected now, it will be something below :connected resync_started
# vxprint -Pl |grep flags
flags: write enabled attached inconsistent cant_sync connected asynchronous dcm_logging resync_started
# vradmin -g data_dg repstatus data_rvg|tail -5
Data status: inconsistent
Replication status: resync in progress (dcm resynchronization)
Current mode: asynchronous
Logging to: DCM (contains 14461920 Kbytes) (SRL protection logging)
Timestamp Information: N/A
--You may required to resync rvg
# vradmin -g data_dg resync data_rvg
--If all is good, tcpdump will show data transfer to remote site (use NIC used for replication)
# tcpdump -i ethX port 4145
--And, link status will be something like this
# vxrlink -g data_dg status rlk_secondary-vvr-host -i 1
VxVM VVR vxrlink INFO V-5-1-12887 DCM is in use on rlink rlk_secondary-vvr-host. DCM contains 14298240 Kbytes (1%) of the Data Volume(s).