Ansari's Technical Notes: May 2014

Friday, May 30, 2014

Puppet agent : SSL_connect Error - Certificate verify failed or CRL is not yet valid

What to do if puppet agent is reporting following errors:

SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed: [CRL is not yet valid for /CN=puppetmaster]
Failed to generate additional resources using 'eval_generate': SSL_connect
Could not retrieve catalog from remote server: SSL_connect
Could not send report: SSL_connect

Try 1: Recreate SLL certificate

- Compare /etc/puppetlabs/puppet/puppet.conf of host with another working system and fix any discrepancy
- Run puppet agent
puppet agent -tv

Try-2 : Recreate certificate
- Delete certificate from host

find /etc/puppetlabs/puppet/ssl -type f
find /etc/puppetlabs/puppet/ssl -type f -exec rm -f {} \;

- Delete certificte from puppet CA server

puppet cert clean host-fqdn-name

- Run puppet agent

puppet agent -t

- Sign client certificate on puppet CA Server

puppet sign cert host-fqdn-name

Try-3: Most interesting one ! Sync your host time with same ntp time source as your puppet master

service ntp stop
ntpdate -s time.nist.gov
service ntp start
puppet agent -t

Wednesday, May 21, 2014

A Linux system error after reboot -eth0 no link during initialization

Scenario

A working system was rebooted and it did not come on network after reboot. Service network restart on console display below message.

# service network restart
May 18 23:30:17 bergson kernel: eth1: no link during initialization.
May 18 23:40:43 bergson kernel: eth0: no link during initialization.

Observation

>> ethtool on eth0 and eth1 shows

# ethtool eth0
Speed: Unknown! (6555)
Duplex: Unknown! (255)
Link detected:no

# ethtool eth1
Speed: Unknown! (6555)
Duplex: Unknown! (255)
Link detected:no

>> ifconfig show both NIC are UP

ifconfig eth0|grep UP

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

ifconfig eth1|grep UP

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

>> There are other NICs (eth2, eth3) they are suppose to be down and they are down.

Solution

>> Two UP and 2 DOWN NIC have following configuration. eth2 and eth3 were not used , not configured and set ONBOOT=No

/etc/sysconfig/network-scripts/ifcfg-eth0

/etc/sysconfig/network-scripts/ifcfg-eth1

/etc/sysconfig/network-scripts/ifcfg-eth2

/etc/sysconfig/network-scripts/ifcfg-eth3

>> NIC config files were not having HWADDR attribute set.

HWADDR=xx:xx:xx:xx:xx:xx

>> So on reboot, whatever NIC will be scanned first, will become eth0 1 2 3 and so on. So the NIC that was eth0 before reboot, was eth3 after reboot, eth1 -> eth3, eth2 -> eth0, eth3->eth1. So NICs those are suppose to be up, were down. Same way, NICs those are suppose to be down were UP !!

>> Add HWADDR in config file and reboot. This was, a NIC with particular MAC address will always have same device say eth0.

Did it help you ?

Wednesday, May 14, 2014

Veritas replication - Volume in RVG is in DETACHED DETACH state

Veritas replication - Volume in RVG is in DETACHED DETACH state

It may happen in following cases:

- Either one of disk used in disk group has failed
- Or one of disk has lost all path at DMP level

In this case, volume has been detached from RVG and status will be something like below:

# vxprint | grep DETACHED
v product_vol product_rvg DETACHED DETACH 2306777088 SELECT - fsgen

Solution

>> Check /var/log/message and 'vxdisk list' output to check which disks is causing problem.

>> Resolve underlying disk issue

>> Stop replication on problematic system/site

# vradmin -g product_dg stoprep product_rvg secondary-vvr-host

>> On , remove 'DETACHED DETACH' volume from the rvg ( if it fails, proceed to next steps)

# vradmin -g product_dg -f delvol product_rvg product_vol

>> Remove the DCM log from the removed volume

# vxassist -g product_dg remove log product_vol

>> If VVR running under VCS, offline RVGLogOwner and RVGShredGroup from all nodes

# hagrp -offline RVGLogOwner -sys node-a
# hagrp -offline RVGLogOwner -sys node-b
# hagrp -offline RVGShredGroup -sys node-a
# hagrp -offline RVGShredGroup -sys node-b

>>Else, stop RVG and VVR

# vxrvg -g product_dg stop product_rvg

# /etc/init.d/vras-vradmind.sh stop

>> start volume

# vxvol -g product_dg -f start product_vol

>> On secondary, add DCM log back

# vxassist -g disk_group addlog vol_name logtype=dcm nlog=1

>> Add the volume to the RVG on secondary

# vradmin -g product_dg addvol product_rvg product_vol

>> On primary, start the autoresynchronization (UNFORTUNATELY, you need to sync whole volume !)

# vradmin -g product_dg -a startrep product_rvg secondary-vvr-host
# vradmin -g product_dg pauserep product_rvg

>> Restrict bandwidth limit if you are replicaton across DC

# vradmin -g product_dg set product_rvg secondary-vvr-host bandwidth_limit=50mbps

>> Resume replication

# vradmin -g product_dg resumerep product_rvg

>> Monitor replication status

# vxrlink -g product_dg status -i 5 rlk_secondary-vvr-host

Reference :
-Symantec tech note
- Description of important fields displayed by the vradmin repstatus command

Friday, May 9, 2014

Veritas Global Cluster Sample Configuration with VVR and CFS resource

// This is sample coniguration - main.cf
// A typical Veritas Global Cluster configuration
// With cvm, CFS, VVR global resource
// vsftpd and an IP as local parallel resource

include "types.cf"
include "CFSTypes.cf"
include "CVMTypes.cf"

cluster dr_cluster_name (
UserNames = { admin = 0 }
// Below is DR ClusterIP
ClusterAddress = "10.10.10.50"
Administrators = { admin }
UseFence = SCSI3
)

remotecluster prod_cluster_name (
// Below is production ClusterIP
ClusterAddress = "20.20.20.200"
)

heartbeat Icmp (
ClusterList = { prod_cluster_name }
// Below is production clusterIP
Arguments @prod_cluster_name = { "20.20.20.200" }
)

system prod_node_a (
)

system prod_node_b (
)

group ClusterService (
SystemList = { prod_node_a = 0, prod_node_b = 1 }
AutoStartList = { prod_node_a, prod_node_b }
OnlineRetryLimit = 3
OnlineRetryInterval = 120
)

Application wac (
StartProgram = "/opt/VRTSvcs/bin/wacstart"
StopProgram = "/opt/VRTSvcs/bin/wacstop"
MonitorProcesses = { "/opt/VRTSvcs/bin/wac" }
RestartLimit = 3
)

IP gcoip (
Device = bond0
// Below is DR ClusterIP
Address = "10.10.10.50"
NetMask = "255.255.255.0"
)

NIC gconic (
Device = bond0
)

gcoip requires gconic
wac requires gcoip

group RVGLogownerGrp (
SystemList = { prod_node_a = 0, prod_node_b = 1 }
AutoStartList = { prod_node_a, prod_node_b }
OnlineRetryLimit = 2
)

IP vvr_ip (
Device = bond0
// Below is IP used for VVR replication
Address = "10.10.10.100"
NetMask = "255.255.255.0"
)

NIC vvr_nic (
Device = bond0
// Defailt gateway of production nodes
NetworkHosts = { "10.10.10.1" }
)

RVGLogowner vvr_logowner (
RVG = application_data_rvg
DiskGroup = application_dg
)

requires group RVGSharedGrp online local firm
vvr_ip requires vvr_nic
vvr_logowner requires vvr_ip

group RVGSharedGrp (
SystemList = { prod_node_a = 0, prod_node_b = 1 }
Parallel = 1
AutoStartList = { prod_node_a, prod_node_b }
)

CVMVolDg cfsdg (
CVMDiskGroup = application_dg
CVMActivation = sw
)

RVGShared application_cfs_rvg (
RVG = application_data_rvg
DiskGroup = application_dg
)

requires group cvm online local firm
application_cfs_rvg requires cfsdg

group cfs_global_group (
SystemList = { prod_node_a = 0, prod_node_b = 1 }
Parallel = 1
ClusterList = { dr_cluster_name = 1, prod_cluster_name = 2 }
AutoStartList = { prod_node_a, prod_node_b }
OnlineRetryLimit = 3
// Authority = 1 should be set on on only production site on which VVR is desired to be primary
Authority = 1
)

CFSMount cfs_fs (
Critical = 0
MountPoint = "/export"
BlockDevice = "/dev/vx/dsk/application_dg/application_vol"
NodeList = { prod_node_a, prod_node_b }
)

RVGSharedPri application_vvr_sharedpri (
RvgResourceName = application_cfs_rvg
OnlineRetryLimit = 0
)

requires group RVGSharedGrp online local firm
cfs_fs requires application_vvr_sharedpri

group cvm (
SystemList = { prod_node_a = 0, prod_node_b = 1 }
AutoFailOver = 0
Parallel = 1
AutoStartList = { prod_node_a, prod_node_b }
)

CFSfsckd vxfsckd (
)

CVMCluster cvm_clus (
CVMClustName = dr_cluster_name
CVMNodeId = { prod_node_a = 0, prod_node_b = 1 }
CVMTransport = gab
CVMTimeout = 300
)

CVMVxconfigd cvm_vxconfigd (
Critical = 0
CVMVxconfigdArgs = { syslog }
)

group vsftpd_ip_grp (
SystemList = { prod_node_a = 0, prod_node_b = 1 }
Parallel = 1
AutoStartList = { prod_node_a, prod_node_b }
)

Application vsftpdd_service (
StartProgram = "/etc/init.d/vsftpdd start"
StopProgram = "/etc/init.d/vsftpdd stop"
PidFiles = { "/var/run/vsftpdd/vsftpdd.pid" }
)

IP vsftpd_ip(
Device @system1 = "bond0"
Device @system2 = "bond0"
Address @system1 = "10.10.10.150"
Address @system2 = "10.10.10.250"
NetMask = "255.255.255.0"
)

vsftpdd_service requires vsftpd_ip
// END

Reference : Veritas Storage Foundation and High Availability Solutions Replication Administrator's Guide