When recursively descending a directory tree, how does one avoid an infinite loop in the case that a directory entry is actually a symbolic link to a directory higher up in the tree? I'm using the listFiles() method for File objects to list the files in a directory, and then isDirectory() to determine whether I need to recurse.

Tim Rohaly

Symbolic links are defined at the file system level; they're a bit tricky, because what you're doing when you make a link is fooling applications into thinking that the file system has a different structure than it actually has.

For example suppose you had a directory called topgun which contained a directory called middleearth which in turn contained a symbolic link called bottomman pointing to topgun. Then the directory hierarchy might look as follows:

  +--  /home/middleearth
          +--  /home/bottomman -> /home/topgun
If you did a cd /home/topgun/middleearth/bottomman then an ls, you would see its contents as /home/middleearth. This does cause problems when you try to recurse, and in particular isDirectory() will, and should, report that bottomman is a directory. Unfortunately, the File class doesn't have a method to check whether that path is a link.

To disambiguate between a directory and a link to a directory, you will need to use the concept of an Absolute Path. The absolute path will show the true, fully expanded path name, whereas the Canonical Path will follow the symbolic link. Consider the following program:

import java.io.*;

public class Paths {

    public static void main(String[] args) throws IOException {
        File file = new File(args[0]);

        System.out.println("Absolute: " + file.getAbsolutePath());
        System.out.println("Canonical: " + file.getCanonicalPath());
If you run this program using the command
    java Paths /home/topgun/middleearth/bottomman
you will see the following output:
Absolute: /home/topgun/middleearth/bottomman
Canonical: /home/topgun
Notice that when you examine the canonical path you can't tell the difference between the symbolic link and the directory it points to, whereas with the absolute path you can.

Note that the form of the canonical path is system-dependant - but then again, so is the ability to create a cyclic hierarchy. This answer applies to Unix.

A solution to your problem might be to keep a graph of directory hierarchy as you traverse them, and only descend branches you haven't been in before. This can be done using a Hashtable. First check whether the absolute path of a File is found in the table - if not, insert the path then recurse. If it is found, you can end your recursion.