dcsimg
Interesting problem 'bout directory crawling
3 posts in topic
Flat View  Flat View
TOPIC ACTIONS:
 

Posted By:   dipankar_datta
Posted On:   Monday, March 13, 2006 04:41 PM

hi all I am working on a project where I have to crawl through millions of images in hard disc & classify them basing on some algoritms. Lets say directory structure where images are kept are as follows : -root - dirA | dirB-dirD | dirE of course, in reality there can be n number of subdirectories from root. Each of these subdirectories stores thousands of images. Given root folder my code tries to extract the directory structure (requirement for classification algo). Problem is, to find all directories, it actually calls isDirectory() on each of the files (image+subdi   More>>


hi all


I am working on a project where I have to crawl through millions of images in hard disc & classify them basing on some algoritms.


Lets say directory structure where images are kept are as follows :

			
-root - dirA
|
dirB-dirD
|
dirE



of course, in reality there can be n number of subdirectories from root.


Each of these subdirectories stores thousands of images. Given root folder my code tries to extract the directory structure (requirement for classification algo).


Problem is, to find all directories, it actually calls isDirectory() on each of the files (image+subdirectories). Given huge no of files present, even to get 15-20 subdirectories it's taking excessive time.


I was actually using Apache FileUtils & IOUtils to do this.. but internally they calls isDirectory() on each file anyway. I discarded them & want to go ahead with my optimized code ...but I can't think of any solution.


If anybody has a better idea, plz help.

   <<Less

Re: Interesting problem 'bout directory crawling

Posted By:   Almagest_FUTT  
Posted On:   Tuesday, March 14, 2006 01:31 AM

Maybe you could make use of the OS functions. On *NIX, for instance, a call for:
find [directory] -type d
should be quite fast. I suppose you'd be able to find something similar for WIN32.

Re: Interesting problem 'bout directory crawling

Posted By:   WarnerJan_Veldhuis  
Posted On:   Monday, March 13, 2006 05:38 PM

Hmmm. from the 1.4 docs: Returns the length of the file denoted by this abstract pathname. The return value is unspecified if this pathname denotes a directory. What the hell do they mean with "unspecified"??? Back to the drawingboard....

Re: Interesting problem 'bout directory crawling

Posted By:   WarnerJan_Veldhuis  
Posted On:   Monday, March 13, 2006 05:36 PM

What does length() produce?? If the length == 0 then you might have a directory, but you also might have an empty file, so you can always check for isDirectory() only when length() == 0.... Just my 2c
About | Sitemap | Contact