问题概述

在Goldengate运行中,目标端的TAIL文件可能出现因为IO的问题、迁移的操作问题等导致损坏或丢失,这个时候,goldendate目标端应用进程将无法正常运行。

这是模拟目标端的TAIL文件丢失故障问题的处理CASE!

解决方案

1. 整个CASE处理思路:

1)执行ETROLLOVER,抽取、传送进程前滚一个文件;
2)调整传送、应用进程读文件的位置Seqno;

2. 环境:

GoldenGate为11+版本;
抽取进程:ext_rwq
传送进程:pm_rwq
应用进程:rep_rwq

3. 模拟问题:应用端TAIL文件损坏(丢失)

GGSCI (serv-node2) 8> stop rep_rwq
 
Sending STOP request to REPLICAT REP_RWQ ...
Request processed.
 
 
[oracle@serv-node2 dirdat]$ ls -lt
total 92
-rw-r----- 1 oracle oinstall  2664 Feb 15 23:24 RP000000017
-rw-r----- 1 oracle oinstall  1539 Feb 15 23:24 RP000000016
-rw-r----- 1 oracle oinstall  1539 Feb 15 01:47 RP000000015
-rw-r----- 1 oracle oinstall  2428 Feb 15 01:14 RP000000016.DEL
-rw-r----- 1 oracle oinstall 38128 Feb 14 00:51 RP000000014
-rw-r----- 1 oracle oinstall 36604 Feb  6 16:59 RP000000012.DEL
 
[oracle@serv-node2 dirdat]$ mv RP000000017 RP000000017.DEL
 
[oracle@serv-node2 dirdat]$ ls -lt
total 92
-rw-r----- 1 oracle oinstall  2664 Feb 15 23:24 RP000000017.DEL  --》改名,模拟丢失
-rw-r----- 1 oracle oinstall  1539 Feb 15 23:24 RP000000016
-rw-r----- 1 oracle oinstall  1539 Feb 15 01:47 RP000000015
-rw-r----- 1 oracle oinstall  2428 Feb 15 01:14 RP000000016.DEL
-rw-r----- 1 oracle oinstall 38128 Feb 14 00:51 RP000000014
-rw-r----- 1 oracle oinstall 36604 Feb  6 16:59 RP000000012.DEL

4. 启动进程,确认问题:

GGSCI (serv-node2) 9> start rep_rwq
 
Sending START request to MANAGER ...
REPLICAT REP_RWQ starting
 
 
GGSCI (serv-node2) 12> info all
 
Program     Status      Group       Lag at Chkpt  Time Since Chkpt
 
MANAGER     RUNNING                                           
REPLICAT    ABENDED     REP_RWQ     00:00:00      00:00:22   ---》状态异常,为ABENDED
 
GGSCI (serv-node2) 13> view report rep_rwq
… …
 
2023-02-16 00:42:21  ERROR   OGG-01091  Unable to open file "/ogg_19.1/dirdat/RP000000017" (error 2, No such file or directory). ---》ERROR:找不到问题,进程异常ABENDED。
 
***********************************************************************
*                   ** Run Time Statistics **                         *
***********************************************************************
 
 
2023-02-16 00:42:21  INFO    OGG-02333  Reading /ogg_19.1/dirdat/RP000000017, current RBA 0, 0 records, m_file_seqno = 17, m_file_rba = 2,664.
 
Report at 2023-02-16 00:42:21 (activity since 2023-02-16 00:42:19)
 
No records were replicated.

5. 源端操作:重启一下传送进程

GGSCI (serv-node1) 17> stop pm_rwq
 
Sending STOP request to EXTRACT PM_RWQ ...
Request processed.
 
 
GGSCI (serv-node1) 18> start pm_rwq
 
Sending START request to MANAGER ...
EXTRACT PM_RWQ starting
 
 
 
GGSCI (serv-node1) 25> info pm_rwq detail
 
EXTRACT    PM_RWQ    Last Started 2023-02-16 00:44   Status RUNNING
Checkpoint Lag       00:00:00 (updated 00:00:01 ago)
Process ID           10138
Log Read Checkpoint  File /ogg_19.1/dirdat/ET000000035
                     First Record  RBA 2271
 
  Target Extract Trails:
 
  Trail Name                                       Seqno        RBA     Max MB Trail Type
 
  ./dirdat/RP                                         17       1539        500 RMTTRAIL  ---》写的文件为RP000000017

6. 目标端操作:重启一下应用进程

GGSCI (serv-node2) 16> start rep_rwq
 
Sending START request to MANAGER ...
REPLICAT REP_RWQ starting
 
 
GGSCI (serv-node2) 17> info all
 
Program     Status      Group       Lag at Chkpt  Time Since Chkpt
 
MANAGER     RUNNING                                           
REPLICAT    RUNNING     REP_RWQ     00:00:00      00:00:00
 
 
GGSCI (serv-node2) 26> info rep_rwq detail
 
REPLICAT   REP_RWQ   Last Started 2023-02-16 00:47   Status RUNNING
Checkpoint Lag       00:00:00 (updated 00:00:07 ago)
Process ID           18500
Log Read Checkpoint  File /ogg_19.1/dirdat/RP000000018 ----》读的文件为RP000000018,和传送进程写的文件为RP000000017,不一致,无法进行同步。
                     First Record  RBA 0
 
Current Log BSN value: (requires database login)
 
Last Committed Transaction CSN value: (requires database login)
 
  Extract Source                          Begin             End

7. 开始处理问题:

大原则:
1)抽取、传送进程前滚一个文件;
2)调整传送、应用进程读文件的位置;

源端调整:

GGSCI (serv-node1) 26> info ext_rwq detail
 
EXTRACT    EXT_RWQ   Last Started 2023-02-14 00:25   Status RUNNING
Checkpoint Lag       00:00:00 (updated 00:00:02 ago)
Process ID           1612
Log Read Checkpoint  Oracle Redo Logs
                     2023-02-16 00:51:36  Seqno 306, RBA 14950912
                     SCN 0.41097717 (41097717)
 
  Target Extract Trails:
 
  Trail Name                                       Seqno        RBA     Max MB Trail Type
 
  ./dirdat/ET                                         39       2563         50 EXTTRAIL  
 
 
 
GGSCI (serv-node1) 28> info all
 
Program     Status      Group       Lag at Chkpt  Time Since Chkpt
 
MANAGER     RUNNING                                           
EXTRACT     RUNNING     EXT_RWQ     00:00:00      00:00:06    
EXTRACT     RUNNING     PM_RWQ      00:00:00      00:00:05    
 
GGSCI (serv-node1) 29> stop PM_RWQ
 
Sending STOP request to EXTRACT PM_RWQ ...
Request processed.
 
 
GGSCI (serv-node1) 30> stop EXT_RWQ
 
Sending STOP request to EXTRACT EXT_RWQ ...
Request processed.
 
 
--抽取进程前滚一个文件
GGSCI (serv-node1) 31> alter EXTRACT EXT_RWQ etrollover
 
2023-02-16 00:53:21  INFO    OGG-01520  Rollover performed.  For each affected output trail of Version 10 or higher format, after starting the source extract, issue ALTER EXTSEQNO for that trail's reader (either pump EXTRACT or REPLICAT) to move the reader's scan to the new trail file;  it will not happen automatically.
EXTRACT altered.
 
 
 
--传送进程前滚一个文件
GGSCI (serv-node1) 34> alter extract pm_rwq etrollover
 
2023-02-16 00:56:42  INFO    OGG-01520  Rollover performed.  For each affected output trail of Version 10 or higher format, after starting the source extract, issue ALTER EXTSEQNO for that trail's reader (either pump EXTRACT or REPLICAT) to move the reader's scan to the new trail file;  it will not happen automatically.
EXTRACT altered.
 
 
 
GGSCI (serv-node1) 36> info ext_rwq detail
 
EXTRACT    EXT_RWQ   Initialized   2023-02-14 00:25   Status STOPPED
Checkpoint Lag       00:00:00 (updated 00:05:36 ago)
Log Read Checkpoint  Oracle Redo Logs
                     2023-02-16 00:51:36  Seqno 306, RBA 14950912
                     SCN 0.41097717 (41097717)
 
  Target Extract Trails:
 
  Trail Name                                       Seqno        RBA     Max MB Trail Type
 
  ./dirdat/ET                                         40          0         50 EXTTRAIL  ---》当前抽取进程写文件的位置:Seqno =40
 
 
调整传送进程读文件的位置,和抽取进程写文件的位置:Seqno =40一致。
 
GGSCI (serv-node1) 37>  alter pm_rwq extseqno 40,extrba 0
EXTRACT altered.
 
 
 
GGSCI (serv-node1) 38> info pm_rwq detail
 
EXTRACT    PM_RWQ    Initialized   2023-02-16 00:58   Status STOPPED
Checkpoint Lag       00:00:00 (updated 00:00:08 ago)
Log Read Checkpoint  File /ogg_19.1/dirdat/ET000000040
                     First Record  RBA 0
 
  Target Extract Trails:
 
  Trail Name                                       Seqno        RBA     Max MB Trail Type
 
  ./dirdat/RP                                         18          0        500 RMTTRAIL  ---》当前传送进程写文件的位置为:Seqno=18

目标端调整:

调整应用进程读文件的位置,和传送进程写文件的位置:Seqno =18一致。
 
 
GGSCI (serv-node2) 32> alter replicat rep_rwq,extseqno 18,extrba 0
 
2023-02-16 01:01:57  INFO    OGG-06594  Replicat REP_RWQ has been altered. Even the start up position might be updated, duplicate suppression remains active in next startup. To override duplicate suppression, start REP_RWQ with NOFILTERDUPTRANSACTIONS option.
 
REPLICAT altered.

数据库层面,INSERT一行数据,

SQL> insert into jobs values('EEE-1','EEE-1, test 20230215',3000,9000);
 
1 row created.

然后,启动各个OGG进程:

GGSCI (serv-node1) 39> start ext_rwq
 
Sending START request to MANAGER ...
EXTRACT EXT_RWQ starting
 
 
GGSCI (serv-node1) 40> info all
 
Program     Status      Group       Lag at Chkpt  Time Since Chkpt
 
MANAGER     RUNNING                                           
EXTRACT     RUNNING     EXT_RWQ     00:00:00      00:00:00    
EXTRACT     STOPPED     PM_RWQ      00:00:00      00:04:32    
 
 
GGSCI (serv-node1) 41> start pm_rwq
 
Sending START request to MANAGER ...
EXTRACT PM_RWQ starting
 
 
GGSCI (serv-node1) 42> info all
 
Program     Status      Group       Lag at Chkpt  Time Since Chkpt
 
MANAGER     RUNNING                                           
EXTRACT     RUNNING     EXT_RWQ     00:00:00      00:00:00    
EXTRACT     RUNNING     PM_RWQ      00:00:00      00:04:43    
 
 
GGSCI (serv-node1) 43> stats pm_rwq,total
 
Sending STATS request to EXTRACT PM_RWQ ...
 
Start of Statistics at 2023-02-16 01:03:52.
 
Output to ./dirdat/RP:
 
Extracting from RUANWQ.JOBS to RUANWQ.JOBS:
 
*** Total statistics since 2023-02-16 01:03:37 ***
        Total inserts                                      1.00 ---》确认完成传送
        Total updates                                      0.00
        Total deletes                                      0.00
        Total upserts                                      0.00
        Total discards                                     0.00
        Total operations                                   1.00
 
End of Statistics.

目标端:

GGSCI (serv-node2) 33> start rep_rwq
 
Sending START request to MANAGER ...
REPLICAT REP_RWQ starting
 
 
GGSCI (serv-node2) 34> info all
 
Program     Status      Group       Lag at Chkpt  Time Since Chkpt
 
MANAGER     RUNNING                                           
REPLICAT    RUNNING     REP_RWQ     00:00:00      00:02:12    
 
 
GGSCI (serv-node2) 35> info all
 
Program     Status      Group       Lag at Chkpt  Time Since Chkpt
 
MANAGER     RUNNING                                           
REPLICAT    RUNNING     REP_RWQ     00:00:00      00:02:13    
 
 
GGSCI (serv-node2) 36> stats rep_rwq total
 
Sending STATS request to REPLICAT REP_RWQ ...
 
Start of Statistics at 2023-02-16 01:05:26.
 
Replicating from RUANWQ.JOBS to STOCK_PDB.RUANWQ.JOBS:
 
*** Total statistics since 2023-02-16 01:04:13 ***
        Total inserts                                      1.00 -----》确认完成应用
        Total updates                                      0.00
        Total deletes                                      0.00
        Total upserts                                      0.00
        Total discards                                     0.00
        Total operations                                   1.00
 
End of Statistics.

8. 确认数据恢复同步:

最终确认数据库层面,对应的一行数据库应用到目的库!

SQL> select * from jobs;
 
JOB_ID     JOB_TITLE                           MIN_SALARY MAX_SALARY
---------- ----------------------------------- ---------- ----------
……
EEE-1      EEE-1, test 20230215                      3000       9000
……
阿里云国内75折 回扣 微信号:monov8
阿里云国际,腾讯云国际,低至75折。AWS 93折 免费开户实名账号 代冲值 优惠多多 微信号:monov8 飞机:@monov6
标签: go