Friday, May 30, 2014

Puppet agent : SSL_connect Error - Certificate verify failed or CRL is not yet valid

What to do if puppet agent is reporting following errors:


  • SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed: [CRL is not yet valid for /CN=puppetmaster]
  • Failed to generate additional resources using 'eval_generate': SSL_connect
  • Could not retrieve catalog from remote server: SSL_connect 
  • Could not send report: SSL_connect


Try 1: Recreate SLL certificate

- Compare  /etc/puppetlabs/puppet/puppet.conf of host with another working system and fix any discrepancy
- Run puppet agent
 puppet agent -tv

Try-2 : Recreate certificate
- Delete certificate from host

find /etc/puppetlabs/puppet/ssl -type f
find /etc/puppetlabs/puppet/ssl -type f -exec rm -f {} \;

- Delete certificte from puppet CA server

 puppet cert clean host-fqdn-name

- Run puppet agent

puppet agent -t

- Sign client certificate on puppet CA Server

puppet sign cert host-fqdn-name


Try-3: Most interesting one ! Sync your host time with same ntp time source as your puppet master

service ntp stop
ntpdate -s time.nist.gov 
service ntp start
puppet agent -t


Wednesday, May 21, 2014

A Linux system error after reboot -eth0 no link during initialization

Scenario

A working system was rebooted and it did not come on network after reboot. Service network restart on console display below message.

# service network restart
May 18 23:30:17 bergson kernel: eth1: no link during initialization.
May 18 23:40:43 bergson kernel: eth0: no link during initialization.

Observation

>> ethtool on eth0 and eth1 shows

# ethtool eth0
Speed: Unknown! (6555)
Duplex: Unknown! (255)
Link detected:no

# ethtool eth1
Speed: Unknown! (6555)
Duplex: Unknown! (255)
Link detected:no

>> ifconfig show both NIC are UP

 ifconfig eth0|grep UP
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
 ifconfig eth1|grep UP
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

>> There are other NICs (eth2, eth3) they are suppose to be down and they are down.

Solution

>> Two UP and 2 DOWN NIC have following configuration. eth2 and eth3 were not used , not configured and set ONBOOT=No

/etc/sysconfig/network-scripts/ifcfg-eth0
/etc/sysconfig/network-scripts/ifcfg-eth1
/etc/sysconfig/network-scripts/ifcfg-eth2
/etc/sysconfig/network-scripts/ifcfg-eth3

>> NIC config files were not having HWADDR attribute set.

HWADDR=xx:xx:xx:xx:xx:xx

>> So on reboot, whatever NIC will be scanned first, will become eth0 1 2 3 and so on. So the NIC that was eth0 before reboot, was eth3 after reboot, eth1 -> eth3, eth2 -> eth0, eth3->eth1. So NICs those are suppose to be up, were down. Same way, NICs those are suppose to be down were UP !!

>> Add HWADDR in config file and reboot. This was, a NIC with particular MAC address will always have same device say eth0.


Did it help you ?



Wednesday, May 14, 2014

Veritas replication - Volume in RVG is in DETACHED DETACH state

Veritas replication - Volume in RVG is in DETACHED DETACH state

It may happen in following cases:

- Either one of disk used in disk group has failed
- Or one of disk has lost all path at DMP level

In this case, volume has been detached from RVG and status will be something like below:

# vxprint | grep DETACHED
v  product_vol    product_rvg  DETACHED DETACH   2306777088 SELECT  -      fsgen

Solution

>> Check /var/log/message and 'vxdisk list' output to check which disks is causing problem.

>> Resolve underlying disk issue

>> Stop replication on problematic system/site

# vradmin -g product_dg stoprep product_rvg secondary-vvr-host


>> On , remove 'DETACHED DETACH' volume from the rvg ( if it fails, proceed to next steps)

# vradmin -g product_dg -f delvol product_rvg product_vol

>> Remove the DCM log from the removed volume

# vxassist -g product_dg remove log product_vol

>> If VVR running under VCS, offline RVGLogOwner and RVGShredGroup from all nodes

# hagrp -offline RVGLogOwner -sys node-a
# hagrp -offline RVGLogOwner -sys node-b
# hagrp -offline RVGShredGroup -sys node-a
# hagrp -offline RVGShredGroup -sys node-b

>>Else, stop RVG and VVR

# vxrvg -g product_dg stop product_rvg
# /etc/init.d/vras-vradmind.sh stop


>> start volume

# vxvol -g product_dg -f start product_vol


>> On secondary, add DCM log back

# vxassist -g disk_group addlog vol_name logtype=dcm nlog=1


>> Add the volume to the RVG on secondary

# vradmin -g product_dg addvol product_rvg product_vol


>> On primary, start the autoresynchronization (UNFORTUNATELY, you need to sync whole volume !)

# vradmin -g product_dg -a startrep product_rvg secondary-vvr-host
# vradmin -g product_dg  pauserep product_rvg


>> Restrict bandwidth limit if you are replicaton across DC

# vradmin -g product_dg set product_rvg secondary-vvr-host bandwidth_limit=50mbps


>> Resume replication

# vradmin -g product_dg  resumerep product_rvg


>> Monitor replication status

# vxrlink -g product_dg status -i 5 rlk_secondary-vvr-host

Reference : 
-Symantec tech note
- Description of important fields displayed by the vradmin repstatus command




Friday, May 9, 2014

Veritas Global Cluster Sample Configuration with VVR and CFS resource

// This is sample coniguration - main.cf
// A typical Veritas Global Cluster configuration
// With cvm, CFS, VVR global resource
// vsftpd and an IP as local parallel resource

include "types.cf"
include "CFSTypes.cf"
include "CVMTypes.cf"


cluster dr_cluster_name (
UserNames = { admin = 0 }
// Below is DR ClusterIP
ClusterAddress = "10.10.10.50"
Administrators = { admin }
UseFence = SCSI3
)

remotecluster prod_cluster_name (
// Below is production ClusterIP
ClusterAddress = "20.20.20.200"
)

heartbeat Icmp (
ClusterList = { prod_cluster_name }
// Below is production clusterIP
Arguments @prod_cluster_name = { "20.20.20.200" }
)

system prod_node_a (
)

system prod_node_b (
)


group ClusterService (
SystemList = { prod_node_a = 0, prod_node_b = 1 }
AutoStartList = { prod_node_a, prod_node_b }
OnlineRetryLimit = 3
OnlineRetryInterval = 120
)

Application wac (
StartProgram = "/opt/VRTSvcs/bin/wacstart"
StopProgram = "/opt/VRTSvcs/bin/wacstop"
MonitorProcesses = { "/opt/VRTSvcs/bin/wac" }
RestartLimit = 3
)

IP gcoip (
Device = bond0
       // Below is DR ClusterIP
Address = "10.10.10.50"
NetMask = "255.255.255.0"
)

NIC gconic (
Device = bond0
)

gcoip requires gconic
wac requires gcoip


group RVGLogownerGrp (
SystemList = { prod_node_a = 0, prod_node_b = 1 }
AutoStartList = { prod_node_a, prod_node_b }
OnlineRetryLimit = 2
)

IP vvr_ip (
Device = bond0
// Below is IP used for VVR replication
Address = "10.10.10.100"
NetMask = "255.255.255.0"
)

NIC vvr_nic (
Device = bond0
// Defailt gateway of production nodes
NetworkHosts = { "10.10.10.1" }
)

RVGLogowner vvr_logowner (
RVG = application_data_rvg
DiskGroup = application_dg
)

requires group RVGSharedGrp online local firm
vvr_ip requires vvr_nic
vvr_logowner requires vvr_ip


group RVGSharedGrp (
SystemList = { prod_node_a = 0, prod_node_b = 1 }
Parallel = 1
AutoStartList = { prod_node_a, prod_node_b }
)

CVMVolDg cfsdg (
CVMDiskGroup = application_dg
CVMActivation = sw
)

RVGShared application_cfs_rvg (
RVG = application_data_rvg
DiskGroup = application_dg
)

requires group cvm online local firm
application_cfs_rvg requires cfsdg


group cfs_global_group (
SystemList = { prod_node_a = 0, prod_node_b = 1 }
Parallel = 1
ClusterList = { dr_cluster_name = 1, prod_cluster_name = 2 }
AutoStartList = { prod_node_a, prod_node_b }
OnlineRetryLimit = 3
// Authority = 1 should be set on on only production site on which VVR is desired to be primary
Authority = 1
)

CFSMount cfs_fs (
Critical = 0
MountPoint = "/export"
BlockDevice = "/dev/vx/dsk/application_dg/application_vol"
NodeList = { prod_node_a, prod_node_b }
)

RVGSharedPri application_vvr_sharedpri (
RvgResourceName = application_cfs_rvg
OnlineRetryLimit = 0
)

requires group RVGSharedGrp online local firm
cfs_fs requires application_vvr_sharedpri


group cvm (
SystemList = { prod_node_a = 0, prod_node_b = 1 }
AutoFailOver = 0
Parallel = 1
AutoStartList = { prod_node_a, prod_node_b }
)

CFSfsckd vxfsckd (
)

CVMCluster cvm_clus (
CVMClustName = dr_cluster_name
CVMNodeId = { prod_node_a = 0, prod_node_b = 1 }
CVMTransport = gab
CVMTimeout = 300
)

CVMVxconfigd cvm_vxconfigd (
Critical = 0
CVMVxconfigdArgs = { syslog }
)


group vsftpd_ip_grp (
SystemList = { prod_node_a = 0, prod_node_b = 1 }
Parallel = 1
AutoStartList = { prod_node_a, prod_node_b }
)

Application vsftpdd_service (
StartProgram = "/etc/init.d/vsftpdd start"
StopProgram = "/etc/init.d/vsftpdd stop"
PidFiles = { "/var/run/vsftpdd/vsftpdd.pid" }
)

        IP vsftpd_ip(
                Device @system1 = "bond0"
                Device @system2 = "bond0"
                Address @system1 = "10.10.10.150"
                Address @system2 = "10.10.10.250"
                NetMask = "255.255.255.0"
                )

vsftpdd_service requires vsftpd_ip
// END

Reference : Veritas Storage Foundation and High Availability Solutions Replication Administrator's Guide