Comments (8)
How is the indexer run ? Was this initial or incremental reindex ? Is the directory in question part of some repository ?
from opengrok.
This is an incremental reindex. The directory is part of a repository which is copied from the remote server to the opengrok server (No SCM) but I have seen this error in many git repositories.
from opengrok.
Can you raise indexer log level to FINER
or higher and post the logs around the log entries that start with Starting file collection
and such for a case which encounters the directory problem ? This line and any subsequent lines that contain DefaultIndexChangedListener
would help.
from opengrok.
Here are the logs with FINEST settings.
Jan 11, 2024 3:21:46 PM org.opengrok.indexer.index.IndexDatabase logIgnoredUid [373/1934]
FINEST: ignoring deleted document for '/<project>/version.json' at 20240106111117766
Jan 11, 2024 3:21:46 PM org.opengrok.indexer.index.DefaultIndexChangedListener fileRemove
FINE: Remove: '/<project>/version.json'
Jan 11, 2024 3:21:46 PM org.opengrok.indexer.index.DefaultIndexChangedListener fileRemoved
FINER: Removed: '/<project>/version.json'
Jan 11, 2024 3:21:46 PM org.opengrok.indexer.util.Statistics logIt
INFO: Done file collection for directory '/<project>' (took 15 ms)
Jan 11, 2024 3:21:46 PM org.opengrok.indexer.index.IndexDatabase update
INFO: Starting indexing of directory '/<project>'
Jan 11, 2024 3:21:46 PM org.opengrok.indexer.index.IndexDatabase lambda$indexParallel$4
WARNING: ERROR addFile(): '/var/opt/opengrok/<dir_path>'
java.io.FileNotFoundException: /var/opt/opengrok/<dir_path> (Is a directory)
at java.base/java.io.FileInputStream.open0(Native Method)
at java.base/java.io.FileInputStream.open(FileInputStream.java:219)
at java.base/java.io.FileInputStream.<init>(FileInputStream.java:157)
at org.opengrok.indexer.index.IndexDatabase.getAnalyzerFor(IndexDatabase.java:1217)
at org.opengrok.indexer.index.IndexDatabase.addFile(IndexDatabase.java:1129)
at org.opengrok.indexer.index.IndexDatabase.lambda$indexParallel$4(IndexDatabase.java:1781)
at java.base/java.util.stream.Collectors.lambda$groupingByConcurrent$59(Collectors.java:1304)
at java.base/java.util.stream.ReferencePipeline.lambda$collect$1(ReferencePipeline.java:575)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at java.base/java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290)
at java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:746)
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
at java.base/java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:408)
at java.base/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:736)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:661)
at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:575)
at org.opengrok.indexer.index.IndexDatabase.lambda$indexParallel$5(IndexDatabase.java:1770)
at java.base/java.util.concurrent.ForkJoinTask$AdaptedCallable.exec(ForkJoinTask.java:1448)
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
Jan 11, 2024 3:21:46 PM org.opengrok.indexer.index.IndexDatabase lambda$indexParallel$4
from opengrok.
Can you also provide the line that contains Starting file collection
?
from opengrok.
I went through the related code in IndexDatabase
and for the initial reindex I don't see a way there can be an entry in the IndexDownArgs
that would correspond to a directory. The indexDown()
recursive function that is executed when reindexing from scratch (or when history based reindex is off for some reason) traverses the directory tree like this:
opengrok/opengrok-indexer/src/main/java/org/opengrok/indexer/index/IndexDatabase.java
Lines 1629 to 1641 in b238394
The accept()
call detects any allowed symlinks. The isDirectory()
follows symlinks so even if the file
is forbidden symlink, it will be still processed in the else branch as a directory, i.e. the indexDown()
will recursively descend into that directory. The IndexDownArgs
is modified (within this code path) only in the processFile()
method and this method is always called for non-directory entries.
The IndexDownArgs
is further modified in processTrailingTerms()
from within update()
however that only happens for pre-existing index documents.
The history based reindex (which is always non-initial) that is done in indexDownUsingHistory()
is different story. There the accept()
call that identifies allowed symlinks is not used so it could happen that processFileIncremental()
which is the work horse for this indexing mode actually adds an IndexDownArgs
entry that is a directory. For Git specifically, I don't think there is a way for the Git file tree traversal could contain directories (since in Git a directory can be added to the Git index only if non-empty) however if the entry is a symlink pointing to a directory, that is possible.
That's why I asked about the Starting file collection
log entry so that I can see for which indexing mode this happens.
from opengrok.
Sorry for the delay. The project that was encountering this issue isn't showing it now. I am trying to reproduce it in a test environment.
from opengrok.
Sorry for the delay. The project that was encountering this issue isn't showing it now. I am trying to reproduce it in a test environment.
It definitely depends on the changes done since the last reindex. For history based reindex that would be the file trees in the newly added changesets.
from opengrok.
Related Issues (20)
- replace docker build with modern alternative
- No module named 'opengrok_tools' in Docker HOT 5
- Installing opengrok-tools.tar.gz fails from 1.12.24 HOT 2
- configuration.xml gets truncated in 1.12.26 HOT 1
- Support retrieving function definitions for a project path HOT 5
- genre check for hasAnnotation() is not enough HOT 2
- Test Python Package fails HOT 8
- `READONLY_CONFIG_FILE` doesn't work in docker containers HOT 1
- make the "is-annotatable" check more precise
- Opengrok web application isn't accessible during reindex HOT 8
- Fix typo in CONTRIBUTING.md file HOT 1
- cannot remove suggester data for extended period of time due to pending rebuilds
- binary data in logs again due to annotation processing HOT 1
- indexer might degrade to serial processing for the trailing project HOT 1
- limit addFile() processing time HOT 1
- Opengrok not showing projects - indexer is working.
- Dockerfile requests a vulnerable version of Apache Tomcat HOT 1
- Support GitHub repos that only allow checkout through web URLs
- History based reindex is on, however history is off. History has to be enabled for history based reindex
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from opengrok.