跟踪Linux进程打开文件数和解决方法

来源:岁月联盟 编辑:exp 时间:2012-04-07

在O2的项目的ccps too many open files 的issue调查的过程中,出现很多问题,刚才做了一个测试: 1,如何查看当前的进程打开的文件个数(这个数是实时波动) 下面以ccps为例说明1)取得程序对应的PID(进程号)ps –ef | grep ccps 执行[root@vvmocmp1 ccps]# ps -ef | grep ccpsroot 5661 1 0 20:33 pts/2 00:00:00 /bin/sh /opt/OC/ccps/jboss-4.2.3.GA/bin/run.sh -c all -g ccpsgroup -b 0.0.0.0  www.2cto.com  root 5685 5661 94 20:33 pts/2 00:00:17 /usr/java/jdk1.6.0_13/bin/java -Dprogram.name=run.sh -server -Dcom.sun.management.jmxremote -Djava.awt.headless=true -Xms1024m -Xmx1024m -XX:PermSize=64m -XX:MaxPermSize=256m -Djava.util.logging.config.file=/opt/OC/ccps/jboss-4.2.3.GA/server/all/ccps/applicationContext/logging.properties -Xrunjdwp:transport=dt_socket,address=8787,server=y,suspend=n -Djava.net.preferIPv4Stack=true -Djava.endorsed.dirs=/opt/OC/ccps/jboss-4.2.3.GA/lib/endorsed -classpath /opt/OC/ccps/jboss-4.2.3.GA/server/all/ccps/applicationContext:/opt/OC/ccps/jboss-4.2.3.GA/bin/run.jar:/usr/java/jdk1.6.0_13/lib/tools.jar org.jboss.Main -c all -g ccpsgroup -b 0.0.0.0从上可知道,当前的ccps进程所有者是root, pid为5685.2) 用取得pid号,来实时取得此进程打开的文件数ls -l /proc/5685 /fd/ | wc -l  www.2cto.com  (注意:网上所说的用lsof -p pid,可以查询进程打开的文件数,但通过实验,不准确真正的是前者可以实时反应,当超过ulimit -n值后,马上出现 too many open files 错误)同时root用户也受到ulimit的限制在当前session用ulimit -n可以查看当前用户的限制,默认为10242,查看当前用户的进程最大打开文件数限制(ulimit –n 默认为1024),也可以用limit –a查看[root@vvmocmp1 ~]# ulimit -acore file size (blocks, -c) 0data seg size (kbytes, -d) unlimitedfile size (blocks, -f) unlimitedpending signals (-i) 1024max locked memory (kbytes, -l) 32max memory size (kbytes, -m) unlimitedopen files (-n) 1024pipe size (512 bytes, -p) 8POSIX message queues (bytes, -q) 819200stack size (kbytes, -s) 10240cpu time (seconds, -t) unlimitedmax user processes (-u) 143359virtual memory (kbytes, -v) unlimitedfile locks (-x) unlimited  www.2cto.com  3,修改用户的最大打开文件数在/etc/security/limits.conf这个文件中添加,root soft nofile 500root hard nofile 500说明: 实验目的是为了让root 的ccps进程出现too many open files.这里故意很小500因为ccps所有者是root, 所设置root用户。设置后,要重新用root用户登录server, 这时修改已生效,可以用ulimit –n查看4,一定在3步骤重新登录的session中,启动ccps,这样ccps的同一时打开的文件一旦超过500就会出错。[root@vvmocmp1 ccps]# tail -f ./log/ccps.logat org.springframework.util.MethodInvoker.invoke(MethodInvoker.java:270)at org.springframework.scheduling.quartz.MethodInvokingJobDetailFactoryBean$MethodInvokingJob.executeInternal(MethodInvokingJobDetailFactoryBean.java:224)at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86)at org.quartz.core.JobRunShell.run(JobRunShell.java:203)at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:520)Caused by: java.io.IOException: java.io.IOException: error=24, Too many open filesat java.lang.UNIXProcess.<init>(UNIXProcess.java:148)at java.lang.ProcessImpl.start(ProcessImpl.java:65)at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)... 13 more[12-22 20:38:20,564] WARN [DefaultQuartzScheduler_Worker-8] SystemResourceMonitor.updateDiskInfoMap(826) | Can't execute "/bin/df -lP /var/opt/OC/ccps/cdr/local/" command  www.2cto.com  java.io.IOException: Cannot run program "/bin/df": java.io.IOException: error=24, Too many open filesat java.lang.ProcessBuilder.start(ProcessBuilder.java:459)at java.lang.Runtime.exec(Runtime.java:593)at java.lang.Runtime.exec(Runtime.java:431)at java.lang.Runtime.exec(Runtime.java:328)at com.hp.opencall.ccps.ccs.performance.SystemResourceMonitor.updateDiskInfoMap(SystemResourceMonitor.java:824)而这个时候的ls –l查看的在500左右,而用lsof –p查看的却800多了没有出错。[root@vvmocmp1 ccps]# ls -l /proc/5685/fd/ | wc -l489
说明:测试完后,一定要把上面添加的两行去掉。 
 作者 RogerZhuo