一次7609双引擎冗余故障处理过程近日某客户报障一台Cisco7609路由器双引擎冗余工作不正常,需要上门处理。到达现场发现改7609配置了两块SUP720-3B的引擎,分别在5、6槽位,当前活跃引擎为slot5上引擎。首先查看7609-1双引擎当前冗余状态7609-1#showmodModPortsCardTypeModelSerialNo.124CEF72024port1000mbSFPWS-X6724-SFPSALxxxxxxxx248CEF72048port10/100/1000mbEthernetWS-X6748-GE-TXSALxxxxxxxx52SupervisorEngine720(Activewbr)WS-SUP720-3BSALxxxxxxxx62SupervisorEngine720(Coldwbr)WS-SUP720-3BSALxxxxxxxxModMACaddressesHwFwSwStatus10019.56f3.91bcto0019.56f3.91d32.512.2(14r)S512.2(18)SXF7Ok2001a.6cd7.ed40to001a.6cd7.ed6f2.512.2(14r)S512.2(18)SXF7Ok50016.c85e.2ae8to0016.c85e.2aeb5.28.4(2)12.2(18)SXF7Ok60013.c43a.dc74to0013.c43a.dc774.58.1(3)12.2(17d)SXBOk由于设备配置的双引擎冗余方式为SSO模式:7609-1#showrun………noipdomain-lookup………redundancymodessomain-cpuauto-syncrunning-config………位于slot6的引擎IOS版本12.2(17d)SXB不支持SSO模式,因此双引擎现在工作在RPR模式,此时如果主控引擎发生故障冗余引擎会经历一次完整的启动后接管主控引擎的位置。手动强制切换:7609-1#redundancyforce-switchover重启时间大约在2分钟左右,结果如下:7609-1#showmodModPortsCardTypeModelSerialNo.124CEF72024port1000mbSFPWS-X6724-SFPSALxxxxxxxx248CEF72048port10/100/1000mbEthernetWS-X6748-GE-TXSALxxxxxxxx52SupervisorEngine720(Coldwbr)WS-SUP720-3BSALxxxxxxxx62SupervisorEngine720(Activewbr)WS-SUP720-3BSALxxxxxxxxModMACaddressesHwFwSwStatus10019.56f3.91bcto0019.56f3.91d32.512.2(14r)S512.2(18)SXF7Ok2001a.6cd7.ed40to001a.6cd7.ed6f2.512.2(14r)S512.2(18)SXF7Ok50016.c85e.2ae8to0016.c85e.2aeb5.28.4(2)12.2(18)SXF7wbrOk60013.c43a.dc74to0013.c43a.dc774.58.1(3)12.2(17d)SXBwbrOk由于SSO模式下不需要经历完整重启,冗余引擎可以在几秒钟内接管故障引擎,因此尽可能让双引擎工作在SSO模式。要达到该目的需要升级该引擎的IOS文件到12.2(18)SXF7。Slot5上的引擎IOS文件存放在内置super-bootdisk,该存储介质容量为512M,7609-1#dirsup-bootdisk:Directoryofsup-bootdisk:/1-rw-81764868Jan8200714:32:36-08:00s72033-ipservicesk9_wan-mz.122-18.SXF7.bin512024576bytestotal(429957120bytesfree)在Slot6的引擎上对应的sup-bootdisk只有64M,而s72033-ipservicesk9_wan-mz.122-18.SXF7.bin大于80M,因此将该IOS文件拷贝到slot6上引擎外置的disk0中,disk0容量为512M。7609-1#copysup-bootdisk:slavedisk0:Sourcefilename[s72033-ipservicesk9_wan-mz.122-18.SXF7.bin]Destinationfilename[s72033-ipservicesk9_wan-mz.122-18.SXF7.bin]Copyinprogress...81764868bytescopiedin220.980secs(370010bytes/sec)进行强制切换(当前主控引擎为slot6上引擎):7609-1#redundancyforce-switchover切换发生后slot5上的引擎成为主控,slot6的引擎不能正常重启,进入rommon状态。此时从主控引擎(slot5)上查看状态如下:7609-1#showmodModPortsCardTypeModelSerialNo.124CEF72024port1000mbSFPWS-X6724-SFPSALxxxxxxxx248CEF72048port10/100/1000mbEthernetWS-X6748-GE-TXSALxxxxxxxx52SupervisorEngine720(Active)WS-SUP720-3BSALxxxxxxxx60Supervisor-OtherUnknownUnknownwbrModMACaddressesHwFwSwStatus10019.56f3.91bcto0019.56f3.91d32.512.2(14r)S512.2(18)SXF7Ok2001a.6cd7.ed40to001a.6cd7.ed6f2.512.2(14r)S512.2(18)SXF7Ok50016.c85e.2ae8to0016.c85e.2aeb5.28.4(2)12.2(18)SXF7Ok60000.0000.0000to0000.0000.00000.0UnknownUnknownUnknown回到slot6在rommon模式下指定启动文件即可正常启动:rommon1bootdisk0:/s72033-ipservicesk9_wan-mz.122-18.SXF7.bin造成此种现象的原因是7609-1的配置文件中指定的启动位置为内置的super-bootdisk,……..hostname7609-1bootsystemflashsup-bootdisk:loggingbuffered40960debugging……..而Slot6上引擎的super-bootdisk(64M)为空,IOS文件存放于外置的disk0中,因此slot6上引擎发生重启的时候无法正常引导IOS文件,进入rommon模式。在手动指定启动文件的情况下slot6上的引擎能正常启动,并且在启动完成后与slot5上的引擎工作在正常SSO模式,状态如下:7609-1#showmodModPortsCardTypeModelSerialNo.124CEF72024port1000mbSFPWS-X6724-SFPSALxxxxxxxx248CEF72048port10/100/1000mbEthernetWS-X6748-GE-TXSALxxxxxxxx52SupervisorEngine720(Activewbr)WS-SUP720-3BSALxxxxxxxx62SupervisorEngine720(Hotwbr)WS-SUP720-3BSALxxxxxxxx当双引擎处于此状态时,两块引擎都处于完全启动状态,主控引擎发生故障时冗余引擎能在数秒内接管。但由于IOS文件存放位置不同,造成了切换测试中slot5→slot6→slot5切换过程正常,但是slot6会进入ROMMON模式,此时必须通过手工干预才能正常启动:rommon1bootdisk0:/s72033-ipservicesk9_wan-mz.122-18.SXF7.bin