The OpenGrok indexer is throwing FileNotFoundException on some directories while index

Here are the logs with FINEST settings. <div class="snippet-clipboard-content notr

OpenGrok Indexer. FileNotFoundException while indexing directories about opengrok HOT 8 CLOSED

tarangchikhalia commented on May 23, 2024

OpenGrok Indexer. FileNotFoundException while indexing directories

from opengrok.

Comments (8)

vladak commented on May 23, 2024

How is the indexer run ? Was this initial or incremental reindex ? Is the directory in question part of some repository ?

from opengrok.

tarangchikhalia commented on May 23, 2024

This is an incremental reindex. The directory is part of a repository which is copied from the remote server to the opengrok server (No SCM) but I have seen this error in many git repositories.

from opengrok.

vladak commented on May 23, 2024

Can you raise indexer log level to FINER or higher and post the logs around the log entries that start with Starting file collection and such for a case which encounters the directory problem ? This line and any subsequent lines that contain DefaultIndexChangedListener would help.

from opengrok.

tarangchikhalia commented on May 23, 2024

Here are the logs with FINEST settings.

Jan 11, 2024 3:21:46 PM org.opengrok.indexer.index.IndexDatabase logIgnoredUid                                                                                                                                            [373/1934]
FINEST: ignoring deleted document for '/<project>/version.json' at 20240106111117766                                                                                                                          
Jan 11, 2024 3:21:46 PM org.opengrok.indexer.index.DefaultIndexChangedListener fileRemove                                                                                                                                           
FINE: Remove: '/<project>/version.json'                                                                                                                                                                       
Jan 11, 2024 3:21:46 PM org.opengrok.indexer.index.DefaultIndexChangedListener fileRemoved                                                                                                                                          
FINER: Removed: '/<project>/version.json'                                                                                                                                                                     
Jan 11, 2024 3:21:46 PM org.opengrok.indexer.util.Statistics logIt                                                                                                                                                                  
INFO: Done file collection for directory '/<project>' (took 15 ms)                                                                                                                                            
Jan 11, 2024 3:21:46 PM org.opengrok.indexer.index.IndexDatabase update                                                                                                                                                             
INFO: Starting indexing of directory '/<project>'                                                                                                                                                             
Jan 11, 2024 3:21:46 PM org.opengrok.indexer.index.IndexDatabase lambda$indexParallel$4                                                                                                                                             
WARNING: ERROR addFile(): '/var/opt/opengrok/<dir_path>'                                                                                                                               
java.io.FileNotFoundException: /var/opt/opengrok/<dir_path> (Is a directory)                                                                                                           
        at java.base/java.io.FileInputStream.open0(Native Method)                                                                                                                                                                   
        at java.base/java.io.FileInputStream.open(FileInputStream.java:219)                                                                                                                                                         
        at java.base/java.io.FileInputStream.<init>(FileInputStream.java:157)                                                                                                                                                       
        at org.opengrok.indexer.index.IndexDatabase.getAnalyzerFor(IndexDatabase.java:1217)                                                                                                                                         
        at org.opengrok.indexer.index.IndexDatabase.addFile(IndexDatabase.java:1129)                                                                                                                                                
        at org.opengrok.indexer.index.IndexDatabase.lambda$indexParallel$4(IndexDatabase.java:1781)                                                                                                                                 
        at java.base/java.util.stream.Collectors.lambda$groupingByConcurrent$59(Collectors.java:1304)                                                                                                                               
        at java.base/java.util.stream.ReferencePipeline.lambda$collect$1(ReferencePipeline.java:575)                                                                                                                                
        at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)                                                                                                                                        
        at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)                                                                                                                                 
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)                                                                                                                                          
        at java.base/java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290)                                                                                                                                           
        at java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:746)                                                                                                                                          
        at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)                                                                                                                                                
        at java.base/java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:408)                                                                                                                                              
        at java.base/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:736)                                                                                                                                                
        at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159)                                                                                                                                    
        at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173)                                                                                                                              
        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
        at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
        at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:661)
        at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:575)
        at org.opengrok.indexer.index.IndexDatabase.lambda$indexParallel$5(IndexDatabase.java:1770)
        at java.base/java.util.concurrent.ForkJoinTask$AdaptedCallable.exec(ForkJoinTask.java:1448)
        at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
        at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
        at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
        at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
        at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)

Jan 11, 2024 3:21:46 PM org.opengrok.indexer.index.IndexDatabase lambda$indexParallel$4

from opengrok.

vladak commented on May 23, 2024

Can you also provide the line that contains Starting file collection ?

from opengrok.

vladak commented on May 23, 2024

I went through the related code in IndexDatabase and for the initial reindex I don't see a way there can be an entry in the IndexDownArgs that would correspond to a directory. The indexDown() recursive function that is executed when reindexing from scratch (or when history based reindex is off for some reason) traverses the directory tree like this:

opengrok/opengrok-indexer/src/main/java/org/opengrok/indexer/index/IndexDatabase.java

Lines 1629 to 1641 in b238394

    
           for (File file : files) { 
        
               String path = parent + File.separator + file.getName(); 
        
               if (!accept(dir, file, ret)) { 
        
                   handleSymlink(path, ret); 
        
               } else { 
        
                   if (file.isDirectory()) { 
        
                       indexDown(file, path, args, progress); 
        
                   } else { 
        
                       processFile(args, file, path); 
        
                       progress.increment(); 
        
                   } 
        
               } 
        
           }

The accept() call detects any allowed symlinks. The isDirectory() follows symlinks so even if the file is forbidden symlink, it will be still processed in the else branch as a directory, i.e. the indexDown() will recursively descend into that directory. The IndexDownArgs is modified (within this code path) only in the processFile() method and this method is always called for non-directory entries.

The IndexDownArgs is further modified in processTrailingTerms() from within update() however that only happens for pre-existing index documents.

The history based reindex (which is always non-initial) that is done in indexDownUsingHistory() is different story. There the accept() call that identifies allowed symlinks is not used so it could happen that processFileIncremental() which is the work horse for this indexing mode actually adds an IndexDownArgs entry that is a directory. For Git specifically, I don't think there is a way for the Git file tree traversal could contain directories (since in Git a directory can be added to the Git index only if non-empty) however if the entry is a symlink pointing to a directory, that is possible.

That's why I asked about the Starting file collection log entry so that I can see for which indexing mode this happens.

from opengrok.

tarangchikhalia commented on May 23, 2024

Sorry for the delay. The project that was encountering this issue isn't showing it now. I am trying to reproduce it in a test environment.

from opengrok.

vladak commented on May 23, 2024

Sorry for the delay. The project that was encountering this issue isn't showing it now. I am trying to reproduce it in a test environment.

It definitely depends on the changes done since the last reindex. For history based reindex that would be the file trees in the newly added changesets.

from opengrok.

OpenGrok Indexer. FileNotFoundException while indexing directories about opengrok HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	for (File file : files) {
	String path = parent + File.separator + file.getName();
	if (!accept(dir, file, ret)) {
	handleSymlink(path, ret);
	} else {
	if (file.isDirectory()) {
	indexDown(file, path, args, progress);
	} else {
	processFile(args, file, path);
	progress.increment();
	}
	}
	}