我爱电脑技术论坛's Archiver

小迷糊 发表于 2008-6-16 08:57

高级复制中mlog$表的HWM过高导致性能问题

针对高级复制中由于mlog$表的HWM过高导致的性能问题进行了详细的讲解,更多内容请参考下文:
:v8PqN)mi z*m
!i&~8K6BzMS q6P 某系统升级后,报告事务端的数据库反应非常慢,用户大量投诉。 $Q;C#yVD-C a4o0K
le_ KS/R
现场检查系统后,发现CPU占用到了80%(平时该系统正常时只有10%~20%)。wio也非常高(说明磁盘读写严重)。一段时间后,现场将statspack报告发回来了:
Oai3B:|gn"^F
Oyvz,mH1Vu'ix
.G a\.G6W4qi LLZ STATSPACK report for
W$e1E;^i
!r0gI Id"y^U&wh
JjbChR!B DB Name DB Id Instance Inst Num Release Cluster Host &e$d+V8`~&`$h)^
0c c#ht4^Q T `4Q*t
------------ ----------- ------------ -------- ----------- ------- ------------ ?*kntS"Q#C

.x7z8KL*ke OLTPDB 3781951398 OLTPDB 1 9.2.0.5.0 NO jnaip21 2f&Kn,~ G#BL LY
1K4v F I|b]

3_Q7d ea4k Q N Snap Id Snap Time Sessions Curs/Sess Comment 5q2j?^:|V_;@0`
M _ t {X?
------- ------------------ -------- --------- -------------------
N ].qm1y-N:\
[-L4LN.R(b1o3D Begin Snap: 3 27-Sep-04 13:15:58 32 83.4
u Y/k O8Q.XV bD
%~|hM,m*EVM End Snap: 4 27-Sep-04 14:10:36 37 83.4 G1M)~TA z2L1}'G2D1R
qF8Q s\nU3U$B ~
Elapsed: 54.63 (mins) @;vn(D4| j

a-vm$pQDd
t)tf\W;L,L j+^?e Cache Sizes (end)
_}5C;L'J x Y.gIjdE t1[aF
~~~~~~~~~~~~~~~~~
ak9N$e Cu qOOHD 2bgdek/Ae x4|$oru
Buffer Cache: 720M Std Block Size: 8K Ox?zNR%x
w u[5Da1}[
Shared Pool Size: 112M Log Buffer: 512K g `V+P:Kg

:ODC#aY4@ ? j5?DF.{F!f/H
Load Profile
2H H7f Y,XdU
*WG8u+lw]P6j8T\m ~~~~~~~~~~~~ Per Second Per Transaction kQTz })Uy,\ A

&HgQx\(PMs --------------- ---------------
8H`x2JQ8J
e!xp/v3W)[k Redo size: 11,763.97 7,400.17
L:D;d%J9?v}
Au S;F&^I%Lmy Logical reads: 8,251.68 5,190.75
Sw*yM B
eB ^VP_&KU6vI)}H Block changes: 83.17 52.32
A6oVA4L x$w
Z)|x9`6\4a![+q Y1t Physical reads: 2,488.23 1,565.23 ~.qqP7r
O*Jm9?Yl4soE
Physical writes: 6.15 3.87 XV L%V@'V4P(qf

0A#EI,n0|6x9Rw-BC User calls: 411.88 259.09 x$q:F(E/z"tI
.vpxy/m
Parses: 56.57 35.59 p0Y X AX{1^!y/w7S

~~;BQ+Wsh w q Hard parses: 0.29 0.18 ~/oX0`8s)D
:R4hm.|&V!C
Sorts: 14.59 9.18 %vT XX]R9c
R-AgVFGD a0Oo.u
Logons: 0.12 0.08 7M1V*T3L1Q3O.N

Tc1?yHvFeP Executes: 80.04 50.35
J1e I\#y %fC:`GP ]6d Y
Transactions: 1.59 TL!b'g\%QBd
uj-h o7|cIi

,|-K W!L8G$u % Blocks changed per Read: 1.01 Recursive Call %: 28.14
j:s!W o1Sx
~| V a\h]!['gt@.r Rollback per transaction %: 0.17 Rows per Sort: 1818.01 c%r6c4m\YF
\-gkH%lbD:]

KH$p[9~zk Instance Efficiency Percentages (Target 100%) ` M3V!@,{

(Ma:W}PC!D eo"I ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
x2Pd f7U @~+CU A0Zk1r
Buffer Nowait %: 99.99 Redo NoWait %: 100.00
Et3s\RA ]-S TP_&`#z
Buffer Hit %: 69.88 In-memory Sort %: 99.99 $a? x.dY E7f0^B
a(k5M TO6GC&J@
Library Hit %: 99.78 Soft Parse %: 99.50 d)K%|!]x3n

gc:` }8M6]%c Execute to Parse %: 29.32 Latch Hit %: 99.98
fYMoQV0@ AV *mYS7To/YB1e-N
Parse CPU to Parse Elapsd %: 96.72 % Non-Parse CPU: 99.02 {4g2i*U+u{%af

w%I}2o[
m m-s C2k Shared Pool Statistics Begin End
X L:`Q8kX J#b v8}r#Us]5I}
------ ------
q(m5fM#tWV5y H-Pl(A*\o\
Memory Usage %: 93.60 93.71
P2euX(vt 'z#T5^#?bpo+V&u-N/M?
% SQL with executions>1: 20.28 19.05
;OlCa+NRrB
S1qFr LYL % Memory for SQL w/exec>1: 25.59 25.88
S@2k1ow_JX :Z8S)BCCu

rQN'X o*yN6_ Top 5 Timed Events $@9q3lB]"]

2pz5k,dA ~~~~~~~~~~~~~~~~~~ % Total b!j2XjAE2a7QP
|6_3|~|/I6K^
Event Waits Time (s) Ela Time !cW5l?p

pk!_~.QE -------------------------------------------- ------------ ----------- -------- Ge:oVf#en(@'w0y
O!gYg-ogEl}
db file scattered read 599,637 3,886 63.79 Y9B {I-p}!`U'L5m P
6Tk#Rt~}&M}9l
CPU time 1,142 18.74
L Kh@p)o\eJa
gY:cMY ^ db file sequential read 238,983 629 10.33 5uS.Qyy!_C Q/pB
,E%I}#T:u#I@
enqueue 263 366 6.00
Ds|@#cWF
Z*M,ZFt log file sync 5,154 23 .37 Y(w1o2c!R"q

g,Ek [ ]o ------------------------------------------------------------- vzXN_M

$y;x(Y&x6TK1I)q)D2vFL Wait Events for DB: OLTPDB Instance: OLTPDB Snaps: 3 -4
y\,|+\$k L+d
&d }k[*Rd -> s - second #P:~0I5@J{O\P
c3TR$[3eBZQ]
-> cs - centisecond - 100th of a second
JZ&B(I_,\Tp ]1Hip UF
-> ms - millisecond - 1000th of a second
"z_3E0qd9eT oAnk
v!MKbx:Y!{ -> us - microsecond - 1000000th of a second s%{8k1d-^S#Ws#xn

xh2Sq HV -> ordered by wait time desc, waits desc (idle events last) 1K9L*D9SBI/U%xmK

&p+v7jXgb;o.`
%h0Im6EJ0IZ9r(V#Z Avg
`&J7@8bac L.wH/x"D
Total Wait wait Waits
'cpP_vf
2v6A5aix(k-Q Event Waits Timeouts Time (s) (ms) /txn
qN-i e p
d/F%T3?Iom ---------------------------- ------------ ---------- ---------- ------ --------

小迷糊 发表于 2008-6-16 08:57

db file scattered read 599,637 0 3,886 6 115.1
*tF O$W7a z5O4c#`
u6c%s L_,}"~ db file sequential read 238,983 0 629 3 45.9
N2Z3a)x%N0~$`,\#u\ -@'u At"usPm k
enqueue 263 4 366 1390 0.1 F%UNSD5~ j?jw

!yI%F&M3? p`Cf log file sync 5,154 0 23 4 1.0
gA e:C6] |8I/a4V!^4I4IVo['d
log file parallel write 15,143 8,060 20 1 2.9 d&Y"D#BZ9N/m
8k nS P$Kr;h-y
db file parallel write 1,008 0 12 12 0.2 Q^9lOI(S
0T p%M3]#q,N;X%p)u
control file parallel write 1,065 0 4 4 0.2
wc1P z,J RI V] .P(`1eh6I2`sDow0p
buffer busy waits 3,195 0 4 1 0.6 3p`P2fy.Qn1p2Q

RXT"w8y(iD L:IK:K
G/e9M~.Iqd ... ... Klb@ u*z1B%l
8`2tV(EO9Nzc

$J i%fjtk ` c&u"b q!W%V.mKb
SQL ordered by Gets for DB: OLTPDB Instance: OLTPDB Snaps: 3 -4
;phW| M XQ*z DZV6n;Mc)a!Gh k
-> End Buffer Gets Threshold: 10000
,R"s#r7^e|!W
Yd"`1Tt&O2y/_;[ -> Note that resources reported for PL/SQL includes the resources used by u9x[0B%h1LX xD9@+b

0q+_w7Eb4p+~.| all SQL statements called within the PL/SQL code. As individual SQL
f(~-Q2tgZ*]a
1Q#|;YT h9?`%n statements are also reported, it is possible and valid for the summed ^"in HM!i N%R
Uu(r:qo8c'I)` ?p
total % to exceed 100
ar,E*UZ]F&nm2Qp
HYB ~+nEOy F!jc sg1j?,TQSy
CPU Elapsd L}(l`2j0r2^i]

"E Q2oV h Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value .O%}y$W,s8fh

k] K?t!J` --------------- ------------ -------------- ------ -------- --------- ----------
7K4u,@\#J+g~&k j2|ebB;~1g2k
6,509,469 1,060 6,141.0 24.1 66.35 64.61 2833840857
1X3~jE8}&a9o {tfk@:t\
Module: JDBC Thin Client G Yx[ w.]
:sr dT+nF9Q E
select lib.code, lib.name, lib.info, leaf.leafdirindex, tp.tpname, from t_commend com, t_leafinfo leaf, t_lib lib,t_tpinfo tp where %_ YQ5Q:bD^
F%f/B1X%z

8xi1z3JZgN,}G 2,258,385 343 6,584.2 8.3 65.41 1289.82 3307794804
b9h#SzYx(D
;lb k e_y_ Module: oracle@qdaip12 (TNS V1-V3)
N'i~I;Uo+pM(g(Y E/y` g:jsuK4kO4}
SELECT /*+ */ "A2"."PHONENUMBER","A2"."TGROUPID","A2"."LOOPNO
\:J8nRX3G4k@a:D
&e.t/s?Q8LO R ","A2"."TCODE" FROM "RU"."T_LOOP" "A2", (SELECT /*+ */
_'c|wQ/TNT:k
Fp/En%j5a O DISTINCT "A3"."LOOPNO" "LOOPNO","A3"."TGROUPID" "TGROUPID" B+JRA1R.V-?m+U"@

N:GajW5PPTr4? FROM "RU"."MLOG$_T_LOOP" "A3" WHERE "A3"."SNAPTIME$$">:1 U XEunh

.}7z[BtR AND "A3"."DMLTYPE$$"<>'D') "A1" WHERE "A2"."LOOPNO"="A1"."LOOPNO E-qJcxSMwI
T$JI)nFR

_;KK,}IblT 2,251,917 343 6,565.4 8.3 66.51 1306.09 1606105459 1j iv5Mn+{:jT
Y r*y \ ^u
Module: oracle@qdaip12 (TNS V1-V3) 6]s,U9C/h yJUqc

Wwb.h Rf&f SELECT /*+ */ DISTINCT "A1"."LOOPNO","A1"."TGROUPID" FROM "RI kI'x p$|:Wde
i.Izm}__
NG"."MLOG$_T_LOOP" "A1" WHERE (("A1"."LOOPNO","A1"."TGROU E f's$u*e&P5Z,i

6P*a-~/]SR Y{ PID")<>ALL (SELECT "A2"."LOOPNO","A2"."TGROUPID" FROM "RU". A uxZb O0}5t
9Hj!cJVM!\m#w
"T_LOOP" "A2" WHERE "A1"."LOOPNO"="A2"."LOOPNO" AND "A1"."TO 3v| y fQu,PD
t:@%PeL~8}8z
NEGROUPID"="A2"."TGROUPID")) AND "A1"."SNAPTIME$$">:1 AND "A1
$Ky~1R/U2| \^4I;M'r B7x

RV|'b/U3O{&i i:X 2,249,895 343 6,559.5 8.3 66.34 1098.20 4039622144 F k9_5L;OKg
L-uE(t3D8f d,N V
delete from "RU"."MLOG$_T_LOOP" where snaptime$$ <= :1 Bhj1nU#D#uli

q!o3[sW 1a3k c#t lC

Q7M/Ru$j:kV ... ...
D[C,B+Qn h"r @u
2v];E+n;~BB$dz9`
o$Cl-Xv)c#T `y9m,p Z3M+}
Physical Reads Executions Reads per Exec %Total Time (s) Time (s) Hash Value
*o%{:z5[Xc f9B
5s$?7f)nn3q'I --------------- ------------ -------------- ------ -------- --------- ---------- 0`rD1[ t1n pz i d
"ED#F.G+N
2,129,843 343 6,209.5 26.1 66.34 1098.20 4039622144 X:j6B9XL0T$Vn{

)W9@,LZP'y-yn/P delete from "RU"."MLOG$_T_LOOP" where snaptime$$ <= :1 u1a7M2U0v3Po
A {jB"x!ki(be8r0H"m w

R^5l}s"nZ d 2,129,004 343 6,207.0 26.1 65.41 1289.82 3307794804 6A~[9a$@c3G}
v Eei5U
Module: oracle@qdaip12 (TNS V1-V3) FX j$w I r3k|9u q8~

0||.Z,EO*J_)|i SELECT /*+ */ "A2"."PHONENUMBER","A2"."TGROUPID","A2"."LOOPNO
D}4bH^_ $wU5t(Xr5L
","A2"."TCODE" FROM "RU"."T_LOOP" "A2", (SELECT /*+ */
Fru%d!GnC|
/p}8F%~\ DISTINCT "A3"."LOOPNO" "LOOPNO","A3"."TGROUPID" "TGROUPID" "U+l.NF;F"GC

0\(Kb9P7N,Z5V;I+H_ FROM "RU"."MLOG$_T_LOOP" "A3" WHERE "A3"."SNAPTIME$$">:1
5b8ST`6pC
0t@ksia AND "A3"."DMLTYPE$$"<>'D') "A1" WHERE "A2"."LOOPNO"="A1"."LOOPNO Q0gw)i;A$eA

eUQX2smy
(a hVo'e2jZ KR 2,123,001 343 6,189.5 26.0 66.51 1306.09 1606105459 K \1^2zoH2?KD
WXV9vRU/kW
Module: oracle@qdaip12 (TNS V1-V3) 9X ` tfYt v
)y!OuN7L-C2I,u1G
SELECT /*+ */ DISTINCT "A1"."LOOPNO","A1"."TGROUPID" FROM "RI
j ?~f$G9|'N
#ui3|'g~+^ NG"."MLOG$_T_LOOP" "A1" WHERE (("A1"."LOOPNO","A1"."TGROU
p:I6ZY#f/i|D;_y 6x6Zh`2d8g1f5Y
PID")<>ALL (SELECT "A2"."LOOPNO","A2"."TGROUPID" FROM "RU".
$t.IQ R@#[5@~ Rd"Op'fvwWE
"T_LOOP" "A2" WHERE "A1"."LOOPNO"="A2"."LOOPNO" AND "A1"."TO
;l cT hv%]w U
kg{\G1mnM NEGROUPID"="A2"."TGROUPID")) AND "A1"."SNAPTIME$$">:1 AND "A1
6`{ F l(v;SZ%J;`0V Z
KAO)[W%?8t0vj` q L M\A1Ub8Ws
1,724,081 278 6,201.7 21.1 54.05 887.05 2153620124 Eh [&JKp
~E:hQH
update "RU"."MLOG$_T_LOOP" set snaptime$$ = :1 where snap
}i6XCp fB4qF*tN.@1}!G
time$$ > to_date('2100-01-01:00:00:00','YYYY-MM-DD:HH24:MI:SS') w5L1c9zU

0_^1R(]iUBo1D*[ P5p m%wd&s0e7G
n l4R-i} H#v
关注statspack中以下信息:Buffer Hit非常低,只有69.88%;在Top Event的等待事件中,db file scattered read的等待最高,占到了63.79%。很明显,这是由于数据文件扫描导致的buffer cache下降非常严重。而就肯定是由于存在全表扫描(全表扫描到的数据块不会被cache在buffer cache,而是会已最快的速度被置换出内存,这就是为什么全表扫描会导致buffer hit下降)语句所致。 P^2i{!xTxS9h

%hUL4L.l{ 2O5m:o6k;[ F sC sM
再看top sql中的内容,我们发现,大多数top sql都涉及到一个奇特的表:MLOG$_T_LOOP。这张表的使用是不会出现在应用程序中。因此相关语句都是被系统调用的。那么这张表是做什么用的呢? /R{!TPhRW$F

/@%H G'Tqj9y J zRa q#oEL
我们的系统为了将业务处理与业务查询分离,通过高级复制将业务处理端的某些表同步到业务查询端去。而mlog$_表就是记录最后一次刷新后,被同步表的数据变化的日志表,当查询端(即快照端)的刷新作业(我们设置10秒钟间隔运行)通过mlog$_读取到源表的数据变化,更新查询端相应的表。而业务处理端检测到所有快照端都更新过这些数据后,就会将这些数据删除。此外,mlog表上是没有任何索引,因此对mlog表的查询肯定是全表扫描。但是由于我们的刷新作业10秒钟更新一次,而10秒钟所产生的数据变化量非常小,在正确情况下,这样的全表扫描是不会影响到系统性能的。
M.c$lj vei2O3b
G3b5w*dgq 4td;v#v$a;v-p
我查了一下MLOG$_T_LOOP,发现它记录的数据确实非常少(一般只有几条数据),但是对它查询钟,physical reads却达到了6200多,这显然不正常。说明这张表的高水位(HWM,高水位记录的是表曾经达到的最大数据块,就像洪水过后留下的水位线。而对表的扫描并不是以当前的实际水位为基准的,而是以高水位为基准的,也就是说会扫描表曾经占用过的所有数据块)非常高。
^-[+aHTVOo"kr
tDNk0X]h c Z Q&Cc"a ^.z-z
通过现场了解,我们得知现场在做升级时,并没有停止同步刷新作业,而是将刷新作业间隔延长到2个小时。升级过程中,两边的数据库都没有停止。但是,期间在业务处理端做了一些数据处理,特别是表T_LOOP有大量的数据更新。 IC$SE{ ]z7s

&J:S0L+p5\In _uzwm
问题很明确了!由于升级过程中T_LOOP有大量的数据更新,而快照没有被刷新。因此mlog$_t_loop中保存了两个小时内的所有数据变化记录,导致高水位上升!而刷新作业再次恢复到10秒间隔后,尽管mlog$_t_loop中的数据量下降了,但是它的高水位并没有下降! 4{C-W-p `9xb7Q2mM

&A{:Dds2Y!P#K ,{m#R$e^E
于是安排现场将业务处理端的客户连接断开,然后停止快照端的刷新作业,手工刷新一次同步后,检查业务处理端的mlog$表,确保这些表为空,然后truncate mlog$表,恢复同步作业,最后恢复业务处理端的客户连接。经过一段时间观察后,数据库恢复正常。

页: [1]
   

Powered by Discuz! Archiver 6.1.0  © 2001-2007 Comsenz Inc.