dcsimg
Help getting accurate metadata for files on network drives
2 posts in topic
Flat View  Flat View
TOPIC ACTIONS:
 

Posted By:   Travis_T
Posted On:   Monday, October 4, 2010 06:28 PM

I need to read 1000's of very small files that are addressable through the file system API (File class). The location of the files may be a local drive or via a Windows shared drive. When the files are on a remote system, the files may actually be changed by a process running on that remote system. Much of the time, I can actually cache the information needed from the files, but if the file changes or is deleted then I have to detect the change and reload or delete from my in memory cache. Everything works great when on a local drive, but the File API doesn't work when on a Windows shared network drive. Basically, File.lastModified(), File.exists(), File.canRead(), File.isFile() etc, all return inaccurate results when the base path is on a ma   More>>

I need to read 1000's of very small files that are addressable through the file system API (File class). The location of the files may be a local drive or via a Windows shared drive. When the files are on a remote system, the files may actually be changed by a process running on that remote system. Much of the time, I can actually cache the information needed from the files, but if the file changes or is deleted then I have to detect the change and reload or delete from my in memory cache. Everything works great when on a local drive, but the File API doesn't work when on a Windows shared network drive.


Basically, File.lastModified(), File.exists(), File.canRead(), File.isFile() etc, all return inaccurate results when the base path is on a mapped network drive, the files are changed by a remote process, and then are immediately accessed by my Java process.


For example,

  • if the File existed

  • my program loaded and cached it

  • then a remote process changes the file

  • when I call file.lastModified() immediately following the change, it will report the old lastModified date.


  • So, the net result is that I can't detect the change and end up serving up old data. This is even worse when the file is deleted, but Java File.exists() reports true. If I put in an artificial wait using Thread.sleep() in a while loop, then eventually the correct lastModified and exists will be reported. However, this doesn't help.


    I've tried a lot of different ways to try to get Java to report the correct info. So far, the only reliable way is to try to open the file and read the first byte, catching an exception if necessary. This seems to then force the file data to be updated and Java will then correctly report the lastModified date (or throw an exception if it no longer exists). However, attempting to open the IO to the file and reading a single byte pretty much invalidates any reason to cache the data in the files because most of the overhead is in opening and closing the streams.


    Is there any way to force Java or Windows to update the information about a file? It seems to me that this probably has to do with Windows caching shared drive file information and giving Java bad data, but it would be really nice to be able to force a refresh a file info.


    Example:

    			

    //Assume that this is already behind code that did a file.exists() or file.canRead();

    long instanceFileModifyTime = fileObject.lastModified();
    MyObject cachedResult = INSTANCE_CACHE.get(cacheKey);

    if ((null != cachedResult) && (instanceFileModifyTime == cachedResult.getLoadTime())) {
    result = cachedResult;
    } else {
    //Open IO and load the data
    ....
    }
       <<Less

    Re: Help getting accurate metadata for files on network drives

    Posted By:   Travis_T  
    Posted On:   Thursday, October 7, 2010 06:40 PM

    As an FYI, I also posted on the Java forums. Got a few replies and posted a few things myself. http://forums.oracle.com/forums/thread.jspa?messageID=6498985#6498985

    Re: Help getting accurate metadata for files on network drives

    Posted By:   Travis_T  
    Posted On:   Thursday, October 7, 2010 06:30 PM

    Just as an FYI, I know that the whole thing is fraught with issues top to bottom and quite frankly my check-in comments have stated how embarrassed I was to have to work on such a solution. We have explored many, many different ways around the issue, SQL being the first, but when it comes down to it we are having to display data from a Legacy system that the owners refuse to provide any other interface into the data. Our hands are tied.


    At any rate, adding the cache provided huge performance increases until we found that changes weren't immediately reflected. Doing a refresh of the screen automatically typically clears it up within a few seconds, but that is also not desirable. I added a little segment of code that resolved the issue of staleness, but essentially invalidated any reason to use a cache. Essentially, before returning a cache result, we'd attempt to read the first byte of a file. If it succeeded and the last modify date was still the same as the cached, then we'd just return the cache result instead of re-reading and parsing the whole thing. However, was just hoping to find some way to force the OS and underlying network filesystem to update its meta-data without having to actually open an input stream to the files.

    About | Sitemap | Contact