Comments (12)
This issue is confirmed as I can reproduce it with alternative files within the Usergrid project. Namely;
lmcgibbn@LMC-032857 /usr/local/incubator-usergrid(master) $ find . -name "test.html"
./sdks/html5-javascript/examples/persistence/test.html
./sdks/html5-javascript/examples/test/test.html
./sdks/html5-javascript/tests/test.html
where these only show up once in the DRAT output.
/usr/local/drat/deploy/data/jobs/rat/1407484852545/input/roles.html
/usr/local/drat/deploy/data/jobs/rat/1407484852545/input/shell.html
/usr/local/drat/deploy/data/jobs/rat/1407484852545/input/test.html
/usr/local/drat/deploy/data/jobs/rat/1407484852545/input/users-activities.html
/usr/local/drat/deploy/data/jobs/rat/1407484852545/input/users-feed.html
/usr/local/drat/deploy/data/jobs/rat/1407484852545/input/users-graph.html
I am pretty positive that is a bug in RAT and not DRAT... we most likely need to file it over there.
Hows about discussion here first?
from drat.
Sounds good, thanks Lewis. One thing to realize is that DRAT partitions jobs by MIME type and into diff sub directories of size 100 (configurable as well). That said, are you sure that there actually aren't copies of e.g., index.html or test.html, but that they aren't in e.g., some other DRAT job directory for RAT? For example you can do a file manager query and see how many files come back for test.html?
from drat.
Based on the above I'll fire back up FileMgr and try to get to the bottom. Thanks for comments.
from drat.
@lewismc did you get a chance to check this?
from drat.
hi @chrismattmann no I didn't get a chance to check this out however I will update this issue once I run DRAT over the HTrace - http://incubator.apache.org/projects/htrace.html codebase. We are just making sure that the Software grant from Cloudera check's out OK, then I assume the podling will continue with setting up codebase @apache.
I'll update once I use DRAT next.
from drat.
thanks @lewismc OK!
from drat.
@chrismattmann just been able to verify this bug
from drat.
If you look at
https://github.com/apache/oodt/blob/trunk/profile/src/main/java/org/apache/oodt/profile/handlers/lightweight/package.html
you will see that there is no license header.
If you then look at
https://github.com/apache/oodt/blob/trunk/xmlquery/src/main/java/org/apache/oodt/xmlquery/package.html
you will also notice no license header.
Within the DRAT report I posted for the OODT 0.8.1 RC#1 results, you will see the following results
21 Unapproved licenses:
22
23 /usr/local/drat/deploy/data/jobs/rat/1422589552487/input/async.html
24 /usr/local/drat/deploy/data/jobs/rat/1422589552487/input/images.html
25 /usr/local/drat/deploy/data/jobs/rat/1422589552487/input/large.html
26 /usr/local/drat/deploy/data/jobs/rat/1422589552487/input/package.html
27 /usr/local/drat/deploy/data/jobs/rat/1422589552487/input/prerendered.html
28 /usr/local/drat/deploy/data/jobs/rat/1422589552487/input/simple.html
This is incorrect, there should be two such entries.
from drat.
I don't know where this bug actually resides to be honest. I have a feeling it is possibly within RAT itself...
from drat.
The same is true for
23 /usr/local/drat/deploy/data/jobs/rat/1422589552402/input/build.xml
which actually relates to both
lmcgibbn@LMC-032857 /usr/local/oodt_trunk(master) $ find . -name "build.xml"
./tools/pdi_plugin/build/build.xml
./tools/pdi_plugin/build.xml
from drat.
so the File Manager catalog currently contains multiple files that have the same name. So that's correct. So then I checked out the MimePartitioner. What it does it then looks up all files by type, consisting of the following types:
mimeTypes = ['x-java-source', 'x-c', 'javascript', 'xml', 'html', 'css', 'x-json', 'x-sh', 'x-fortran', 'csv' 'tab-separated-values', 'x-tex', 'x-asm', 'x-diff', 'x-python']
For each one of those types, it queries Solr, gets the info filename, file location, joins these together and sets the metadata field InputFiles to a multi-valued set for each Rat PGE task.
The RAT PGE task should write a script file in $DRAT_HOME/data/jobs/rat//sciPgeExeScript_RatCodeAudit. An example of which is:
[usc-secure-wireless-062-114:jobs/rat/1425608188401] mattmann% more sciPgeExeScript_RatCodeAudit
#!sh
export PATH=$HOME/bin/:${PATH}
shopt -s expand_aliases
alias rat="java -jar /Users/mattmann/drat/deploy/rat/lib/apache-rat-0.9.jar"
echo "Creating working dirs"
mkdir /Users/mattmann/drat/deploy/data/jobs/rat/1425608188401/input ; mkdir /Users/mattmann/drat/deploy/data/jobs/rat/1425608188401/output; mkdir /Users/mattmann/drat/deploy/data/jobs/rat/1425608188401/logs
echo "Staging input to /Users/mattmann/drat/deploy/data/jobs/rat/1425608188401/input"
cp -R `python -c "print ' '.join('/Users/mattmann/drat/src/pge/src/main/resources/bin/mime_partitioner/mime_rat_partitioner.py,/Users/mattmann/drat/src/pge/src/main/resources/bin/rat_aggregator/rat_aggregator.py,/Users/mattmann/drat/src/pge/target/classes/bin/mime_partitioner/mime_rat_partitioner.py,/Users/mattmann/drat/src/pge/target/classes/bin/rat_aggregator/rat_aggregator.py'.split(','))"` /Users/mattmann/drat/deploy/data/jobs/rat/1425608188401/input
echo "Running Apache RAT on /Users/mattmann/drat/deploy/data/jobs/rat/1425608188401/input"
rat /Users/mattmann/drat/deploy/data/jobs/rat/1425608188401/input > /Users/mattmann/drat/deploy/data/jobs/rat/1425608188401/output/rat_x-python_1425608188401.log
[usc-secure-wireless-062-114:jobs/rat/1425608188401] mattmann%
There is the error, @lewismc - it's on the copy step in the RAT PGE job. Looks like we need to check if the files have the same name, and if so, dedup them. Maybe just replacing '/' with '_'. Let me think.
from drat.
fixed and tested in #34
from drat.
Related Issues (20)
- Proteus doesn't index HOT 2
- Finish hooking up DRAT to Travis HOT 1
- Cypress Tests For Proteus
- Tool-tips for the visualizations to help newbies
- Drat Viz page refreshes too often
- Consider project groupings
- Consider a search bar for projects in project view viz HOT 3
- Consider reduction in project viz refresh on main page
- Consider highlighting the "Unknown" licenses in the audit view
- Consider using a log scale for the bar charts
- Hashmap exception when reducing
- PGE tries to run wrong RAT version HOT 1
- remove shopt from scripts HOT 1
- Create an issue and pull request templates HOT 3
- gitlab build chain for build and docker image
- Upgrade to Apache OODT 1.9 release HOT 1
- Change Maven Module prefix names from dms-* to drat-*
- DRAT updates to make it work with OODT 1.9 HOT 1
- Fix security issue in set-value
- Update logging to slf4j/log4j to match with OODT 1.9 onwards
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from drat.