dcsimg
1.4.1_02 JVM receives either SEGV/BUS/EMT signal and dums core in NIS/NFS environment
1 posts in topic
Flat View  Flat View
TOPIC ACTIONS:
 

Posted By:   RAJIV_KONKIMALLA
Posted On:   Thursday, February 24, 2005 11:31 AM

Our application uses JDMK5.0 (usesRMIConnector server) /JDK1.4.1_02 combination. We never had any problems until we mounted our application in NIS/NFS environment. Very often Java (JDMK agent) crashes by generating core dump and generates "core" file and the "hs_err_pid " file. The later file clearly shows that this happens because of the unexpected signals 10 (SIGBUS), 7(SIGEMT) or 11(SIGSEGV). The environment is as follows: * Java Agents are installed in SunOs, Linux and HP-UX boxes under NIS/NFS ids. Under different circumstances they crash at different/same times. * Most of the times, they crash while trying to write to/read from some local file. Sometimes, they cra   More>>

Our application uses JDMK5.0 (usesRMIConnector server) /JDK1.4.1_02 combination. We never had any problems until we mounted our application in NIS/NFS environment.


Very often Java (JDMK agent) crashes by generating core dump and generates "core" file and the "hs_err_pid " file. The later file clearly shows that this happens because of the unexpected signals 10 (SIGBUS), 7(SIGEMT) or 11(SIGSEGV). The environment is as follows:


* Java Agents are installed in SunOs, Linux and HP-UX boxes under NIS/NFS ids. Under different circumstances they crash at different/same times.


* Most of the times, they crash while trying to write to/read from some local file. Sometimes, they crash because of unknown JDMK problems.


Please veiw the following "hs_err_pid " file :
----------------------------------------------------------


			
An unexpected exception has been detected in native code outside the VM.
Unexpected Signal : 10 occurred at PC=0xFEB12D9C
Function=[Unknown. Nearest: ZIP_Lock+0x48]
Library=/nishome1/nisgr2/itv/jre/sunos-sparc/lib/sparc/libzip.so

Current Java thread:
at java.util.zip.ZipFile.getEntry(Native Method)
at java.util.zip.ZipFile.getEntry(ZipFile.java:146)
- locked (a java.util.jar.JarFile)
at java.util.jar.JarFile.getEntry(JarFile.java:184)
at java.util.jar.JarFile.getJarEntry(JarFile.java:171)
at sun.misc.URLClassPath$JarLoader.getResource(URLClassPath.java:669)
at sun.misc.URLClassPath.getResource(URLClassPath.java:156)
at java.net.URLClassLoader$1.run(URLClassLoader.java:190)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:186)
at java.lang.ClassLoader.loadClass(ClassLoader.java:299)
- locked (a sun.misc.Launcher$AppClassLoader)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:265)
- locked (a sun.misc.Launcher$AppClassLoader)
at java.lang.ClassLoader.loadClass(ClassLoader.java:255)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:315)
- locked (a sun.misc.Launcher$AppClassLoader)
at org.apache.xerces.dom.CharacterDataImpl.setNodeValue(CharacterDataImpl.java:172)
at com.tec.itv.util.GenerateXml.setOnlyABValue(GenerateXml.java:226)
at com.tec.itv.service.agentapp.itvmbeans.AB.updateAttributeInfo(AB.java:577)
- locked (a com.tec.itv.service.agentapp.itvmbeans.AB)
at com.tec.itv.service.agentapp.itvmbeans.AB.taskExecute(AB.java:532)
at com.tec.itv.service.agentapp.itvmbeans.AB.invoke(AB.java:282)
at com.sun.jdmk.DynamicMetaDataImpl.invoke(DynamicMetaDataImpl.java:336)
at com.sun.jdmk.MetaDataImpl.invoke(MetaDataImpl.java:498)
at com.sun.jdmk.DefaultMBeanInterceptor.invoke(DefaultMBeanInterceptor.java:540)
at com.sun.jdmk.MBeanServerImpl.invoke(MBeanServerImpl.java:723)
at com.tec.itv.service.agentapp.ItvExecuteTask.callMbean(ItvExecuteTask.java:521)
at com.tec.itv.service.agentapp.ItvExecuteTask.formAttributeInfo(ItvExecuteTask.java:275)
at com.tec.itv.service.agentapp.ItvExecuteTask.executeScheduledTask(ItvExecuteTask.java:139)
at com.tec.itv.service.agentapp.ItvExecuteTask.run(ItvExecuteTask.java:86)
at java.lang.Thread.run(Thread.java:536)

Dynamic libraries:
0x10000 /nishome1/nisgr2/itv/jre/sunos-sparc/bin/java
0xff370000 /usr/lib/libthread.so.1
0xff3a0000 /usr/lib/libdl.so.1
0xff280000 /usr/lib/libc.so.1
0xff360000 /usr/platform/SUNW,Sun-Blade-1000/lib/libc_psr.so.1
0xfec00000 /nishome1/nisgr2/itv/jre/sunos-sparc/lib/sparc/client/libjvm.so
0xff230000 /usr/lib/libCrun.so.1
0xff200000 /usr/lib/libsocket.so.1
0xff100000 /usr/lib/libnsl.so.1
0xff1d0000 /usr/lib/libm.so.1
0xff260000 /usr/lib/libw.so.1
0xff0e0000 /usr/lib/libmp.so.2
0xff0c0000 /usr/lib/librt.so.1
0xff0a0000 /usr/lib/libaio.so.1
0xff080000 /usr/lib/libmd5.so.1
0xfebe0000 /usr/platform/SUNW,Sun-Blade-1000/lib/libmd5_psr.so.1
0xfebb0000 /nishome1/nisgr2/itv/jre/sunos-sparc/lib/sparc/native_threads/libhpi.so
0xfeb80000 /nishome1/nisgr2/itv/jre/sunos-sparc/lib/sparc/libverify.so
0xfeb30000 /nishome1/nisgr2/itv/jre/sunos-sparc/lib/sparc/libjava.so
0xfeb10000 /nishome1/nisgr2/itv/jre/sunos-sparc/lib/sparc/libzip.so
0xfad50000 /nishome1/nisgr2/itv/jre/sunos-sparc/lib/sparc/libnet.so
0xfd010000 /nishome1/nisgr2/itv/jre/sunos-sparc/lib/sparc/librmi.so

Local Time = Fri Feb 18 15:30:31 2005
Elapsed Time = 1435
#
# The exception above was detected in native code outside the VM
#
# Java VM: Java HotSpot(TM) Client VM (1.4.1_02-b06 mixed mode)
#
------------------------------------------------------------



I am not sure whether I need to concenrate on the patch level, JDMK, or JAVA. This has become a very critical issue for us. I really appreciate your help in this strange scenario.

   <<Less

Re: 1.4.1_02 JVM receives either SEGV/BUS/EMT signal and dums core in NIS/NFS environment

Posted By:   Christopher_Koenigsberg  
Posted On:   Saturday, February 26, 2005 09:00 AM

The errors appear to involve "ZIP_lock" which makes me suspicious immediately.


The biggest problem in trying to use remote filesystems is the locking semantics, and has always been this for many years. Remote files simply cannot behave exactly like local files, especially with a less than completely reliable protocol like NFS. They are going to have occasional "glitches" which will throw off your application if it expects local filesystem semantics and is not prepared to deal with occasional transient failures, where you have to explicitly go back and retry something that just failed because of a temporary transient problem that is now gone.


So I would suggest you move to local filesystems for running your applications. Do some kind of periodic synchronization copying to/from the NFS mounted files, but do your locking etc. just on local files.



For instance my experience in years past was with mail systems. Mail servers just cannot be reliable if they try to deliver to a remote-mounted filesystem, because of locking especially, and this is reflected in the documentation for them. (I recall that the "Procmail" distribution had an extensive suite of tests, for locking semantics on the filesystem you were going to use for local delivery, and there was a long discussion of these problems)


About | Sitemap | Contact