InsertPictureHereORACLE数据库日常巡检及常见故障解决2018-06-20目录•1、日常巡检1.1、检查数据库基本状况1.2、检查Oracle相关资源的使用情况1.3、检查Oracle数据库备份结果1.4、检查Oracle数据库性能1.5、检查数据库cpu、I/O、内存性能•2、常见故障处理2.1、表空间满2.2、归档日志满2.3、会话超过最大连接数2.4、密码过期2.5、监听问题2.6、delete删除数据恢复•3、故障排查1.1、检查数据库基本状况检查数据库实例状态selectinstance_name,host_name,startup_time,status,database_statusfromv$instance;数据库有三种状态:1、startupnomount;#读取init.ora文件,启动instance,即启动SGA和后台进程,这种启动只需要init.ora文件。2、alterdatabasemount;#打开控制文件,确认数据文件和联机日志文件的位置,但此时不对数据文件和日志文件进行校验检查。3、alterdatabaseopen;#访问数据库中的数据。1.1、检查数据库基本状况检查在线日志selectgroup#,status,type,memberfromv$logfile;selectgroup#,thread#,bytes,members,statusfromv$log;1.1、检查数据库基本状况在线日志是循环使用的,两个文件切换不能太过频繁,正常切换频率为=15分钟切换一次。查看每个小时切换次数:SELECTSUBSTR(TO_CHAR(first_time,'MM/DD/RRHH:MI:SS'),1,5)DAY,SUM(DECODE(SUBSTR(TO_CHAR(first_time,'MM/DD/RRHH24:MI:SS'),10,2),'00',1,0))H00,SUM(DECODE(SUBSTR(TO_CHAR(first_time,'MM/DD/RRHH24:MI:SS'),10,2),'01',1,0))H01,SUM(DECODE(SUBSTR(TO_CHAR(first_time,'MM/DD/RRHH24:MI:SS'),10,2),'02',1,0))H02,SUM(DECODE(SUBSTR(TO_CHAR(first_time,'MM/DD/RRHH24:MI:SS'),10,2),'03',1,0))H03,SUM(DECODE(SUBSTR(TO_CHAR(first_time,'MM/DD/RRHH24:MI:SS'),10,2),'04',1,0))H04,SUM(DECODE(SUBSTR(TO_CHAR(first_time,'MM/DD/RRHH24:MI:SS'),10,2),'05',1,0))H05,SUM(DECODE(SUBSTR(TO_CHAR(first_time,'MM/DD/RRHH24:MI:SS'),10,2),'06',1,0))H06,SUM(DECODE(SUBSTR(TO_CHAR(first_time,'MM/DD/RRHH24:MI:SS'),10,2),'07',1,0))H07,SUM(DECODE(SUBSTR(TO_CHAR(first_time,'MM/DD/RRHH24:MI:SS'),10,2),'08',1,0))H08,SUM(DECODE(SUBSTR(TO_CHAR(first_time,'MM/DD/RRHH24:MI:SS'),10,2),'09',1,0))H09,SUM(DECODE(SUBSTR(TO_CHAR(first_time,'MM/DD/RRHH24:MI:SS'),10,2),'10',1,0))H10,SUM(DECODE(SUBSTR(TO_CHAR(first_time,'MM/DD/RRHH24:MI:SS'),10,2),'11',1,0))H11,SUM(DECODE(SUBSTR(TO_CHAR(first_time,'MM/DD/RRHH24:MI:SS'),10,2),'12',1,0))H12,SUM(DECODE(SUBSTR(TO_CHAR(first_time,'MM/DD/RRHH24:MI:SS'),10,2),'13',1,0))H13,SUM(DECODE(SUBSTR(TO_CHAR(first_time,'MM/DD/RRHH24:MI:SS'),10,2),'14',1,0))H14,SUM(DECODE(SUBSTR(TO_CHAR(first_time,'MM/DD/RRHH24:MI:SS'),10,2),'15',1,0))H15,SUM(DECODE(SUBSTR(TO_CHAR(first_time,'MM/DD/RRHH24:MI:SS'),10,2),'16',1,0))H16,SUM(DECODE(SUBSTR(TO_CHAR(first_time,'MM/DD/RRHH24:MI:SS'),10,2),'17',1,0))H17,SUM(DECODE(SUBSTR(TO_CHAR(first_time,'MM/DD/RRHH24:MI:SS'),10,2),'18',1,0))H18,SUM(DECODE(SUBSTR(TO_CHAR(first_time,'MM/DD/RRHH24:MI:SS'),10,2),'19',1,0))H19,SUM(DECODE(SUBSTR(TO_CHAR(first_time,'MM/DD/RRHH24:MI:SS'),10,2),'20',1,0))H20,SUM(DECODE(SUBSTR(TO_CHAR(first_time,'MM/DD/RRHH24:MI:SS'),10,2),'21',1,0))H21,SUM(DECODE(SUBSTR(TO_CHAR(first_time,'MM/DD/RRHH24:MI:SS'),10,2),'22',1,0))H22,SUM(DECODE(SUBSTR(TO_CHAR(first_time,'MM/DD/RRHH24:MI:SS'),10,2),'23',1,0))H23,COUNT(*)TOTALFROMv$log_historyawherefirst_time(sysdate-30)GROUPBYSUBSTR(TO_CHAR(first_time,'MM/DD/RRHH:MI:SS'),1,5)ORDERBYSUBSTR(TO_CHAR(first_time,'MM/DD/RRHH:MI:SS'),1,5)1.1、检查数据库基本状况在线日志切换频率过高处理办法:1、创建新的日志组alterdatabaseaddlogfilegroup4('目录/redo04.log')size500M;2、切换日志到新建的日志组altersystemswitchlogfile;3、删除旧日志组alterdatabasedroplogfilegroup3;注意:删除的日志文件组不能处于current状态,须要运行一次手动日志切换,将该日志文件组的状态改动为inactive1.1、检查数据库基本状况检查监听状态:lsnrctlstatus(start/stop)1.1、检查数据库基本状况检查无效对象selectowner,object_name,object_typefromdba_objectswherestatus!='VALID'andowner!='SYS'andowner!='SYSTEM';如果有记录返回,则说明存在无效对象。若这些对象与应用相关,那么需要重新编译生成这个对象。检查失效的索引:selectindex_name,table_name,tablespace_name,statusfromdba_indexeswhereowner='CTAIS2'andstatus'VALID';注:分区表上的索引status为N/A是正常的,如有失效索引则对该索引做rebuild,如:alterindexINDEX_NAMErebuildtablespaceTABLESPACE_NAME;1.1、检查数据库基本状况检查oraclejob是否有失败selectjob,what,last_date,next_date,failures,brokenfromdba_jobs;由于数据库的大多任务都是通过job来调用过程执行的,所以需要保证job的正常执行。1、其中broken=Y表示此job已经停止自动执行。2、如果next_date时间小于当前时间(sysdate),说明此job运行卡住,需要人工处理,恢复到正常状态。Job运行卡住处理第一步:查看正在运行的jobsSQLselect*fromdba_jobs_running;然后确定你要停止的job,这个数据字典对应的job就是下面要执行的过程的第一个参数。第二步:确定要停掉的job的SID,SERIAL#和系统进程idselectb.SID,b.SERIAL#,c.SPID,‘altersystemkillsession’‘’||b.sid||‘,’||b.serial#||‘’‘immediate;’fromdba_jobs_runninga,v$sessionb,v$processcwherea.sid=b.sidandb.PADDR=c.ADDR第三步:调用dbms_job的broken函数将要停止的job干掉SQLEXECDBMS_JOB.BROKEN(18,TRUE);PL/SQLproceduresuccessfullycompletedSQLcommit;此时这个job还是运行的,可以通过dba_jobs_running查看第四步:使用altersystemkillsession'SID,SERIAL#';杀掉会话。如果要杀很久,直接使用操作系统命令kill-9spid1.2、检查Oracle相关资源的使用情况检查会话连接数selectcount(*)fromv$session;查看数据库最大会话数限制showparameterprocesses说明:如果会话数接近或者等最大限制值时,就需要调整最大会话数或者分析为什么会话数多的原因,找到根本原因解决才是最好的方法,不能一味的调大限制数。1.2、检查Oracle相关资源的使用情况修改会话数(管理员账号登录,执行以下命令):sqlplus/assysdbaSQLaltersystemsetprocesses=1000scope=spfile;修改后重启数据库:SQLshutdownimmediate;#关闭数据库SQLstartup;#启动数据库1.2、检查Oracle相关资源的使用情况检查系统磁盘空间:df-h说明:如果文件系统的剩余空间过小或增长较快,需对其进行确认并删除不用的文件以释放空间,或者添加磁盘。1.2、检查Oracle相关资源的使用情况检查表空间使用情况selecttbs_used_info.tablespace_name,tbs_used_info.alloc_mb,tbs_used_info.used_mb,tbs_used_info.max_mb,tbs_used_info.free_of_max_mb,tbs_used_info.used_of_max||'%'used_of_max_pctfrom(selecta.tablespace_name,round(a.bytes_alloc/1024/1024)alloc_mb,round((a.bytes_alloc-nvl(b.bytes_free,0))/1024/1024)used_mb,round((a.bytes_alloc-nvl(