码迷,mamicode.com
首页 > 其他好文 > 详细

Fatal NI connect error 12170------单实例

时间:2016-10-14 18:06:23      阅读:164      评论:0      收藏:0      [点我收藏+]

标签:oracle   linux   error   different   operation   


In our one of the 11gR2 Production Database, there were so many entries w.r.t Fatal NI connect error 12170 TNS-12535: TNS:operation timed out + TNS-00505: Operation timed out (with different ports), found in both RAC Instances Alert Log file.

Problem:


Fatal NI connect error 12170.


  VERSION INFORMATION:

        TNS for Linux: Version 11.2.0.3.0 - Production
        Oracle Bequeath NT Protocol Adapter for Linux: Version 11.2.0.3.0 - Production
        TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.3.0 - Production
  Time: 20-AUG-2014 14:17:36
  Tracing not turned on.
  Tns error struct:
    ns main err code: 12535

TNS-12535: TNS:operation timed out

    ns secondary err code: 12560
    nt main err code: 505

TNS-00505: Operation timed out

    nt secondary err code: 110
    nt OS err code: 0
  Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=xxx.xxx.xxx.xxx)(PORT=44326))

Reason:


The same error message was repeating during whole day almost for every application server.

I’ve finally found what was causing the problem. Our database is behind a firewall. Firewall has an “idle session timeout” value. If a connection remains idle for more than the “idle session timeout” value it drops the connections. 

Application developers usually configure their connection pools to remain alive for a long time, which is more than the “idle session timeout” value of the firewall. Not all the connections in the pool are used. Some stays idle. After sometime the firewall drops them and I get those operation timed out problems.


Solution / Workaround: 


1) Add the following line to the sqlnet.ora file on the server.


SQLNET.EXPIRE_TIME=10

In this configuration database will probe the application servers every 10 minutes to see if they are up. In fact this is a mechanism to determine stale connections by database. Because it sends packets to application servers every ten minutes, the connections are detected as active by firewalls and they are not broken.


**In an installation that includes GRID, this parameter should be set in the RDBMS_HOME/network/admin/sqlnet.ora file. This would be the default location for sqlnet.ora file parameters referenced by the instance. 

2) One way to minimize the impact is by using the parameter SQLNET.INBOUND_CONNECT_TIMEOUT (default to 60 seconds on 10gR2 and 11g) but, sometimes, this value is not adequate.

Oracle also mention the occurrence of this error if you use DB Console or Enterprise Manager to monitor your databases and the em agent will try to connect to the target database repeatedly and, statistically, some will fail (frequency will depend on how busy your system is).

Most of the time (certainly for DB Console and Enterprise Manager Agent) the application will try to connect again and it will succeed.

To fix the problem you could increase the value of SQLNET.INBOUND_CONNECT_TIMEOUT (in Seconds) in the sqlnet.ora / CONNECT_TIMEOUT_<DB_Name> (in Minutes) in the listener.ora file located on the server side.

3) If you already have a value you have considered adequate, you might want to add the following line on your listener.ora file:

DIAG_ADR_ENABLED_<listener_name>=OFF  (in listener.ora)

DIAG_ADR_ENABLED=OFF in sqlnet.ora

This line will hide the error on the alert.log file and the error will be posted on the $ORACLE_HOME/network/log/sqlnet.log file


环境:oracle 11.2.0.4 x86-64


  1. 原因

这个类型错误提示是一种正常的Oracle工作机制。当客户端进程Client Process与服务器进程Server Process建立联系之后,两者就形成了“同生共死”的关系(专有连接模式)。除非客户端主动发起中断或者Server Process被异常kill。

在实际运行环境中,这种理想状态常常被打破。如果Client Process只是保持连接,不执行语句,会话就处于idle状态。这种连接很容易被诸如防火墙等网络层面设备切断。

在Oracle11gR2中,如果长期没有连接动作的Server Process被外力切断,Oracle就会自动将信息作为提示错误写入到alert log中,作为一种提示。在11R1版本中,这种信息是会写入到sqlnet.log中。


2.问题解决措施

归纳MOS和网络中的各种方法,大体有两重策略,分别为使用DCD和禁用ADR。

DCD全称Dead Connection Detection,是一种基于主动测探方式检查Oracle僵尸客户端进程Client Process的策略。配置DCD的关键是设置sqlnet.expire_time参数在SQL Net体系下,Oracle会依据这个时间间隔给所有的Client Process发送网络通信包,用来确定Client是否存活。

正是借助这个包通信,可以让防火墙认为这个网络连接还是处在active状态,不会进行强制断开动作。类似的机制还有Linux上的tcp keep live机制,也是使用类似的策略进行检查。

[oracle@localhost admin]$ cat sqlnet.ora

sqlnet.expire_time=10


另一种方式也是Oracle推荐的,就是关闭11g的ADR机制。ADR(Automatic Diagnostic Repository)是Oracle进行自动诊断、自动提醒的工具组件。Oracle认为如果用户不需要在SQL Net组件中应用ADR,可以再sqlnet.ora中进行配置关闭。

[oracle@localhost admin]$ cat sqlnet.ora

sqlnet.expire_time=10

DIAG_ADR_ENABLED = OFF



---------------单实例数据库----------------------------------------------------------------

总体解决办法

1)sqlnet.ora文件如下参数,如果没有sqlnet.ora文件,就创建该文件

SQLNET.EXPIRE_TIME=10

DIAG_ADR_ENABLED = OFF

SQLNET.INBOUND_CONNECT_TIMEOUT=0

2)listener.ora文件添加如下参数

DIAG_ADR_ENABLED_LISTENER = OFF

INBOUND_CONNECT_TIMEOUT_LISTENER = 0 


如果添加以上参数还不能生效,可以试着添加如下参数,在sqlnet.ora文件中添加以下参数:

SQLNET.RECV_TIMEOUT=0

SQLNET.SEND_TIMEOUT=0










本文出自 “旅游人生” 博客,请务必保留此出处http://felixgzf.blog.51cto.com/4108324/1861925

Fatal NI connect error 12170------单实例

标签:oracle   linux   error   different   operation   

原文地址:http://felixgzf.blog.51cto.com/4108324/1861925

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!