openpreserve / hawarp Goto Github PK
View Code? Open in Web Editor NEWHAdoop-based Web Archive Record Processing
License: Apache License 2.0
HAdoop-based Web Archive Record Processing
License: Apache License 2.0
The tools must support reading uncompressed ARC files. JWAT supports detecting compressed files, therefore this should be easy to implement.
Not all of the command line applications allow local directory processing. This is required to allow easily comparing local vs. cluster performance.
After a clean build, the jar files cannot run.
First
mvn clean install
mvn assembly:single
then
$ hadoop jar ./target/tomar-prepare-inputdata-1.0-jar-with-dependencies.jar
Exception in thread "main" java.lang.ClassNotFoundException: eu.scape_project.up2ti.Unpack2TempIdentify
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:340)
at org.apache.hadoop.util.RunJar.main(RunJar.java:205)
and
$ java -jar ./target/tomar-prepare-inputdata-1.0-jar-with-dependencies.jar
Error: Could not find or load main class eu.scape_project.up2ti.Unpack2TempIdentify
I am trying to build 45cc6bc on Ubuntu 12.04 64-Bit with Oracle JDK 1.7.0_55 and Hadoop 2.0.0-cdh4.6.0, but I get a failed assertion from ArcMigratorTest:
Running eu.scape_project.arc2warc.ArcMigratorTest
INFO ArcMigratorTest:67 - Temporary directory: /tmp/1398245267321-0
INFO ArcMigrator:92 - File processed: /tmp/1398245267321-0/1398245267330jYXFharc.gz
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.328 sec <<< FAILURE!
Results:
Failed tests: testWarcCreator(eu.scape_project.arc2warc.ArcMigratorTest): expected:<[287]> but was:<[490]>
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.