Size of a document
1 posts in topic
Flat View  Flat View
TOPIC ACTIONS:
 

Posted By:   Alfred_Sniff
Posted On:   Tuesday, May 3, 2005 05:30 AM

Hi all, I've got some problems whilei ndexing my documents... I've got a Word file, 12,5Mo and 2200 pages. No problem for lucene to index. In the other hand, i've got a Word file of 25Mo and 1000 pages (lots of images), and this time i've got a problem : outOfMemory : java heap space. Is there an other limit on file size by default? I have an index with max_field_length to Integer.MaxValue, but I don't know if it comes from this I've got a second problem... I've got a little pdf which cannot be index... it's not crypted... When I copy-paste the text in a txt file, there's no problem for indexing it. Where is the problem? This comes from pdfbox? Thanks all B   More>>

Hi all,



I've got some problems whilei ndexing my documents... I've got a Word file, 12,5Mo and 2200 pages. No problem for lucene to index. In the other hand, i've got a Word file of 25Mo and 1000 pages (lots of images), and this time i've got a problem : outOfMemory : java heap space. Is there an other limit on file size by default?



I have an index with max_field_length to Integer.MaxValue, but I don't know if it comes from this



I've got a second problem... I've got a little pdf which cannot be index... it's not crypted... When I copy-paste the text in a txt file, there's no problem for indexing it. Where is the problem? This comes from pdfbox?



Thanks all


Best Regards

   <<Less

Re: Size of a document

Posted By:   Otis_Gospodnetic  
Posted On:   Wednesday, May 4, 2005 12:38 AM

Your problem(s) are most likely your Word and PDF document parsers, not Lucene.
About | Sitemap | Contact