俊达MySQL线上常见故障剖析各种故障•应用获取不到连接池•数据库响应慢•SQL慢•服务器load高•SWAP•表不见了•MySQLcrash•主机Hung•…观察你的系统•MySQL–活动进程(Processlist)–日志文件(slowlog,alertlog,generalquerylog,binlog)–Statusvariables(com_select,com_insert,.etc)–InnoDB(物理读、逻辑读、innodbstatus)–参数配置–Stacktrace(plussourcecode)•SQL–执行计划,explain•OS–内存,SWAP,/proc/meminfo–CPU,load,ps–IO(磁盘、网络)•Iostat•Profile–Oprofile–gprofCase1:XXX系统报连接池满iostatorzdbaslowlog•What’sinslowlog?Mk-query-digest•mk-query-digest全面分析slowlogexplain•查看执行计划–选择了不好的索引哪些SQL在执行•Slowlog–Setgloballong_query_time=0•Generallog•Binlog–ForDML,mysqlbinlogbinlog解析•Processlist–Ifsomequeryisreallyslow•Tcpdump–Tcpdump+mk-query-digestCase2:很多MySQL线程都卡住了•ProcesslistId:1842782User:provideHost:192.168.0.1:59068db:provideCommand:QueryTime:2326State:WaitingfortableInfo:updatetable_xxxxsetsold=sold+1,money=money+39800,Gmt_create=now()wherexxxx_id=1andday='2011-10-0700:00:00Id:1657130User:provideHost:192.168.0.2:40093db:provideCommand:QueryTime:184551State:SendingdataInfo:selectxxxx_id,sum(sold)assoldfromtable_xxxxwherexxxx_idin(selectxxxx_idfromtable_xxxxwhereGmt_create='2011-10-0508:59:00')groupbyxxxx_id1044systemuserConnect27406FlushingtablesFLUSHTABLES•Pstack#00x0000003b4380ab99inpthread_cond_wait@@GLIBC_2.3.2()#10x00000000005aac4ainwait_for_refresh()#20x00000000005b2857inopen_table()#30x00000000005b312finopen_tables()#40x00000000005b3440inopen_and_lock_tables()#50x00000000005817a4inmysql_execute_command()#60x0000000000586516inmysql_parse()#70x0000000000586a65indispatch_command()#80x0000000000588923inhandle_one_connection()#90x0000003b438064a7instart_thread()from/lib64/libpthread.so.0•Processlist分析–谁是因,谁是果?•Systemuserexecuteflushtables–Systemuser是谁,mysql主从复制(iothread,sqlthread)–Binlog•谁最先执行了flushtables–人工执行?–App?没有权限–定时任务,备份•Xtrabackup会执行flushtableswithreadlock,不记录到binlog•Mysqldump理论上不会执行flushtables,但如果有bug呢(=35157)Case3:服务器load高•调查问题–SQL层面未见明显异常–业务没有变动,没有发布–调用量没有明显变化•Iostat–r/s,w/s–await,svctm–avgrq-sz•Blktrace,btt•IO调度算法–cfq-deadlineCase4:DDLlosttable•alert.log大量报错–持续10几分钟后,Tablelost。•几百个进程都block在”openingtables”,这些表都不是DDL的那个表丢表时的alert.log1108032:15:02InnoDB:Warning:problemsrenaming'feel_23/#sql-2635_23da0d'to'feel_23/feed_send_1476',24998iterationsInnoDB:Warning:tablespace'./feel_23/#sql-2635_23da0d.ibd'hasi/oopsstoppedforalongtime249981108032:15:02InnoDB:Warning:problemsrenaming'feel_xx/#sql-2635_23da0d'to'feel_xx/feed_send_xxxx',24999iterationsInnoDB:Warning:tablespace'./feel_23/#sql-2635_23da0d.ibd'hasi/oopsstoppedforalongtime249991108032:15:02InnoDB:Warning:problemsrenaming'feel_xx/#sql-2635_23da0d'to'feel_xx/feed_send_xxxx',25000iterationsInnoDB:Warning:tablespace'./feel_23/#sql-2635_23da0d.ibd'hasi/oopsstoppedforalongtime250001108032:15:02InnoDB:Warning:problemsrenaming'feel_xx/#sql-2635_23da0d'to'feel_xx/feed_send_xxxx',25001iterations1108032:15:02[ERROR]Cannotfindoropentablefeel_23/feed_send_1476fromtheinternaldatadictionaryofInnoDBthoughthe.frmfileforthetableexists.MaybeyouhavedeletedandrecreatedInnoDBdatafilesbuthaveforgottentodeletethecorresponding.frmfilesofInnoDBtables,oryouhavemoved.frmfilestoanotherdatabase?or,thetablecontainsindexesthatthisversionoftheenginedoesn'tsupport.See()from/lib64/libc.so.6#10x00002aaab2e595fbinos_thread_sleep()#20x00002aaab2e18838infil_mutex_enter_and_prepare_for_io()#30x00002aaab2e18aa5infil_io()#40x00002aaab2df5b63inbuf_flush_buffered_writes()#50x00002aaab2df6048inbuf_flush_batch()#60x00002aaab2ea13d8insrv_master_thread()#70x000000364b6064a7instart_thread()from/lib64/libpthread.so.0#80x000000364aad3c2dinclone()from/lib64/libc.so.6Pstack–altertable#00x000000364aacced2inselect()from/lib64/libc.so.6#10x00002aaab2e595fbinos_thread_sleep()#20x00002aaab2e1a3e2infil_rename_tablespace()#30x00002aaab2e0672bindict_table_rename_in_cache()#40x00002aaab2e86af5inrow_rename_table_for_mysql()#50x00002aaab2e316dbinha_innodb::rename_table()#60x00000000006bea6cinmysql_rename_table()#70x00000000006c77ffinmysql_alter_table()#80x00000000005c6a8einmysql_execute_command()#90x00000000005cd371inmysql_parse()#100x00000000005cd773indispatch_command()#110x00000000005cea04indo_command()#120x00000000005bf0d7inhandle_one_connection()retry:(fil_rename_tablespace)...Somelinesommittedhere(printwarning,...)space-stop_ios=TRUE;if(node-n_pending0||node-n_pending_flushes0){/*Therearependingi/o'sorflushes,sleepforawhileandretry*/mutex_exit(&fil_system-mutex);os_thread_sleep(20000);gotoretry;}elseif(node-modification_counternode-flush_counter){/*Flushthespace*/mutex_exit(&fil_system-mutex);os_thread_sleep(20000);fil_flush(id);gotoretry;}elseif(node-open){/*Closethefile*/fil_node_close_file(node,fil_system);}space-stop_ios=FALSE;mutex_exit(&fil_system-mutex);retry:(fil_mutex_enter_and_prepare_for_io)mutex_enter(&fil_system-mutex);if(space_id==0||space_id=SRV_LOG_SPACE_FIRST_ID){return;}if(fil_system-n_openfil_system-max_n_open){retu