Code Monkey home page Code Monkey logo

docker-aut's People

Contributors

ianmilligan1 avatar ruebot avatar samfritz avatar sepastian avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

docker-aut's Issues

Build error

On OS X 10.11.3:

ianmilligan1@Ians-MBP:~/dropbox/git/warcbase_workshop_vagrant$ vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Box 'ubuntu/trusty64' could not be found. Attempting to find and install...
    default: Box Provider: virtualbox
    default: Box Version: >= 0
==> default: Loading metadata for box 'ubuntu/trusty64'
    default: URL: https://atlas.hashicorp.com/ubuntu/trusty64
==> default: Adding box 'ubuntu/trusty64' (v20160314.0.2) for provider: virtualbox
    default: Downloading: https://atlas.hashicorp.com/ubuntu/boxes/trusty64/versions/20160314.0.2/providers/virtualbox.box
==> default: Successfully added box 'ubuntu/trusty64' (v20160314.0.2) for 'virtualbox'!
==> default: Importing base box 'ubuntu/trusty64'...
==> default: Matching MAC address for NAT networking...
==> default: Checking if box 'ubuntu/trusty64' is up to date...
==> default: Setting the name of the VM: Warcbase workshop VM
==> default: Clearing any previously set forwarded ports...
==> default: Clearing any previously set network interfaces...
==> default: Preparing network interfaces based on configuration...
    default: Adapter 1: nat
==> default: Forwarding ports...
    default: 8080 (guest) => 9000 (host) (adapter 1)
    default: 22 (guest) => 2222 (host) (adapter 1)
==> default: Running 'pre-boot' VM customizations...
==> default: Booting VM...
==> default: Waiting for machine to boot. This may take a few minutes...
The guest machine entered an invalid state while waiting for it
to boot. Valid states are 'starting, running'. The machine is in the
'poweroff' state. Please verify everything is configured
properly and try again.

If the provider you're using has a GUI that comes with it,
it is often helpful to open that and watch the machine, since the
GUI often has more helpful error messages than Vagrant can retrieve.
For example, if you're using VirtualBox, run `vagrant up` while the
VirtualBox GUI is open.

The primary issue for this error is that the provider you're using
is not properly configured. This is very rarely a Vagrant issue.

Will look into this.

Spark Notebook - 0.11.0

I'm working creating a 0.11.0 version, and looking at the documentation we have no, there are not Spark Notebook examples. It appears to be all Spark Shell. Should I remove Spark Notebook from the build process and README instructions?

Add Azure Provider

Well, guess I should do this now. ๐Ÿ˜‰

Update VagrantFile to support Azure provisioning, once we get up and running.

warcbase won't build

I've jumped through a lot of hoops trying to get warcbase to build as part of the vagrant build, and it just doesn't want to happen.

You can shell in (vagrant ssh) after the vagrant build and cd /home/vagrant/project/warcbase && sudo mvn clean package appassembler:assemble -DskipTests, and it builds fine.

See: lintool/warcbase#206

unable to run docker image

Unable to get the docker container running. Throws the following issue:

docker run --rm -it aut
...
:: problems summary ::
:::: WARNINGS
		[NOT FOUND  ] com.thoughtworks.paranamer#paranamer;2.8!paranamer.jar(bundle) (0ms)

	==== local-m2-cache: tried

	  file:/root/.m2/repository/com/thoughtworks/paranamer/paranamer/2.8/paranamer-2.8.jar

		::::::::::::::::::::::::::::::::::::::::::::::

		::              FAILED DOWNLOADS            ::

		:: ^ see resolution messages for details  ^ ::

		::::::::::::::::::::::::::::::::::::::::::::::

		:: com.thoughtworks.paranamer#paranamer;2.8!paranamer.jar(bundle)

		::::::::::::::::::::::::::::::::::::::::::::::



:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
Exception in thread "main" java.lang.RuntimeException: [download failed: com.thoughtworks.paranamer#paranamer;2.8!paranamer.jar(bundle)]
	at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1083)
	at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:296)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:160)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Mac OS: build fails with out-of-memory error

Describe the bug

On Mac OS, docker build -t aut . fails with java.lang.OutOfMemoryError: Java heap space.

On Linux, the build succeeds.

To Reproduce

On Mac OS, run docker build -t aut .

Expected behavior

Build the Docker image.

Screenshots

n/a

Desktop/Laptop (please complete the following information):

$ uname -a
Darwin C02F37HLML7H 21.3.0 Darwin Kernel Version 21.3.0: Wed Jan  5 21:37:58 PST 2022; root:xnu-8019.80.24~20/RELEASE_X86_64 x86_64

Smartphone (please complete the following information):

n/a

Additional context

See the log.txt file attached.

Update to use 0.10.0 release

This is also requires using a new version of Spark Notebook, which uses a different way to load external libraries. The :cp command is no longer available.

org.apache.hadoop#hadoop-core;0.20.2-cdh3u4: not found

:: problems summary ::
:::: WARNINGS
		module not found: org.apache.hadoop#hadoop-core;0.20.2-cdh3u4

	==== local-m2-cache: tried

	  file:/root/.m2/repository/org/apache/hadoop/hadoop-core/0.20.2-cdh3u4/hadoop-core-0.20.2-cdh3u4.pom

	  -- artifact org.apache.hadoop#hadoop-core;0.20.2-cdh3u4!hadoop-core.jar:

	  file:/root/.m2/repository/org/apache/hadoop/hadoop-core/0.20.2-cdh3u4/hadoop-core-0.20.2-cdh3u4.jar

	==== local-ivy-cache: tried

	  /root/.ivy2/local/org.apache.hadoop/hadoop-core/0.20.2-cdh3u4/ivys/ivy.xml

	  -- artifact org.apache.hadoop#hadoop-core;0.20.2-cdh3u4!hadoop-core.jar:

	  /root/.ivy2/local/org.apache.hadoop/hadoop-core/0.20.2-cdh3u4/jars/hadoop-core.jar

	==== central: tried

	  https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-core/0.20.2-cdh3u4/hadoop-core-0.20.2-cdh3u4.pom

	  -- artifact org.apache.hadoop#hadoop-core;0.20.2-cdh3u4!hadoop-core.jar:

	  https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-core/0.20.2-cdh3u4/hadoop-core-0.20.2-cdh3u4.jar

	==== spark-packages: tried

	  http://dl.bintray.com/spark-packages/maven/org/apache/hadoop/hadoop-core/0.20.2-cdh3u4/hadoop-core-0.20.2-cdh3u4.pom

	  -- artifact org.apache.hadoop#hadoop-core;0.20.2-cdh3u4!hadoop-core.jar:

	  http://dl.bintray.com/spark-packages/maven/org/apache/hadoop/hadoop-core/0.20.2-cdh3u4/hadoop-core-0.20.2-cdh3u4.jar

		::::::::::::::::::::::::::::::::::::::::::::::

		::          UNRESOLVED DEPENDENCIES         ::

		::::::::::::::::::::::::::::::::::::::::::::::

		:: org.apache.hadoop#hadoop-core;0.20.2-cdh3u4: not found

		::::::::::::::::::::::::::::::::::::::::::::::



:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
Exception in thread "main" java.lang.RuntimeException: [unresolved dependency: org.apache.hadoop#hadoop-core;0.20.2-cdh3u4: not found]
	at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1083)
	at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:296)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:160)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

I'm working on a 0.11.0 docker build, but ran into this. @ianmilligan1 @lintool you fine with me cutting a 0.11.1 release which resolved the issue?

N.B. At this point I'd prefer to build the Docker image with --packages as opposed to --jars because it is surfacing a lot of dependency issues I've feared have remained hidden for a long time.

aut build fails on master

Working on updating everything here, and I noticed aut is failing to build on the master branch in Docker build process.

Here is the output of the error:

2017-12-07 23:14:13,556 [main-ScalaTest-running-CountableRDDTest] INFO  SparkUI - Stopped Spark web UI at http://172.17.0.2:4040
2017-12-07 23:14:13,558 [dispatcher-event-loop-2] INFO  MapOutputTrackerMasterEndpoint - MapOutputTrackerMasterEndpoint stopped!
2017-12-07 23:14:13,562 [main-ScalaTest-running-CountableRDDTest] INFO  MemoryStore - MemoryStore cleared
2017-12-07 23:14:13,562 [main-ScalaTest-running-CountableRDDTest] INFO  BlockManager - BlockManager stopped
2017-12-07 23:14:13,564 [main-ScalaTest-running-CountableRDDTest] INFO  BlockManagerMaster - BlockManagerMaster stopped
2017-12-07 23:14:13,571 [dispatcher-event-loop-1] INFO  OutputCommitCoordinator$OutputCommitCoordinatorEndpoint - OutputCommitCoordinator stopped!
2017-12-07 23:14:13,573 [main-ScalaTest-running-CountableRDDTest] INFO  SparkContext - Successfully stopped SparkContext
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.711 sec - in io.archivesunleashed.spark.rdd.CountableRDDTest
Running io.archivesunleashed.io.ArcRecordWritableTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.231 sec - in io.archivesunleashed.io.ArcRecordWritableTest
Running io.archivesunleashed.io.GenericArchiveRecordWritableTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.382 sec - in io.archivesunleashed.io.GenericArchiveRecordWritableTest
Running io.archivesunleashed.io.WarcRecordWritableTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.344 sec - in io.archivesunleashed.io.WarcRecordWritableTest
Running io.archivesunleashed.ingest.WacArcLoaderTest
2017-12-07 23:14:14,679 [main] INFO  WacArcLoaderTest - 300 records read!
2017-12-07 23:14:14,860 [main] INFO  WacArcLoaderTest - 300 records read!
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.322 sec - in io.archivesunleashed.ingest.WacArcLoaderTest
Running io.archivesunleashed.ingest.WacWarcLoaderTest
2017-12-07 23:14:15,246 [main] INFO  WacWarcLoaderTest - 822 records read!
2017-12-07 23:14:15,623 [main] INFO  WacWarcLoaderTest - 822 records read!
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.762 sec - in io.archivesunleashed.ingest.WacWarcLoaderTest
Running io.archivesunleashed.mapreduce.WacWarcInputFormatTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.244 sec - in io.archivesunleashed.mapreduce.WacWarcInputFormatTest
Running io.archivesunleashed.mapreduce.WacArcInputFormatTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.118 sec - in io.archivesunleashed.mapreduce.WacArcInputFormatTest
Running io.archivesunleashed.mapreduce.WacGenericInputFormatTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.351 sec - in io.archivesunleashed.mapreduce.WacGenericInputFormatTest
2017-12-07 23:14:16,340 [Thread-1] INFO  ShutdownHookManager - Shutdown hook called
2017-12-07 23:14:16,341 [Thread-1] INFO  ShutdownHookManager - Deleting directory /tmp/spark-40f43281-67db-4a4e-843c-8cbe042ff68e

Results :

Tests in error: 
  ExtractPopularImagesTest.run:32->org$scalatest$BeforeAndAfter$$super$run:32->FunSuite.org$scalatest$FunSuiteLike$$super$run:1560->FunSuite.runTests:1560->runTest:32->org$scalatest$BeforeAndAfter$$super$runTest:32->FunSuite.withFixture:1560->FunSuite.newAssertionFailedException:1560 ? TestFailed

Tests run: 75, Failures: 0, Errors: 1, Skipped: 0

[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 04:02 min
[INFO] Finished at: 2017-12-07T23:14:16+00:00
[INFO] Final Memory: 70M/554M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.17:test (default-test) on project aut: There are test failures.
[ERROR] 
[ERROR] Please refer to /aut/target/surefire-reports for the individual test results.
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
The command '/bin/sh -c git clone https://github.com/archivesunleashed/aut.git /aut     && cd /aut && mvn clean install' returned a non-zero code: 1

Unable to run docker-aut:0.18.0

I'm unable to run the docker container for version 0.18.0.
docker run --rm -it archivesunleashed/docker-aut:0.18.0 results in the following error:

		::::::::::::::::::::::::::::::::::::::::::::::

		::          UNRESOLVED DEPENDENCIES         ::

		::::::::::::::::::::::::::::::::::::::::::::::

		:: com.github.archivesunleashed.tika#tika-parsers;1.22: not found

		:: com.github.netarchivesuite#language-detector;language-detector-0.6a: not found

		::::::::::::::::::::::::::::::::::::::::::::::



:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
Exception in thread "main" java.lang.RuntimeException: [unresolved dependency: com.github.archivesunleashed.tika#tika-parsers;1.22: not found, unresolved dependency: com.github.netarchivesuite#language-detector;language-detector-0.6a: not found]
	at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1306)
	at org.apache.spark.deploy.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:54)
	at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:315)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Spark Notebook crashes when loading warcbase

The Spark Notebook works on http://127.0.0.1:9000/# as directed in the walkthrough, but when you load the fatjar the browser hangs. Terminal displays following errors and we can't continue.

java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.net.URI.<init>(URI.java:588)
    at akka.actor.ActorPathExtractor$.unapply(Address.scala:154)
    at akka.remote.RemoteActorRefProvider.resolveActorRefWithLocalAddress(RemoteActorRefProvider.scala:347)
    at akka.remote.transport.AkkaPduProtobufCodec$.decodeMessage(AkkaPduCodec.scala:191)
    at akka.remote.EndpointReader.akka$remote$EndpointReader$$tryDecodeMessageAndAck(Endpoint.scala:993)
    at akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:926)
    at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
    at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:411)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
    at akka.actor.ActorCell.invoke(ActorCell.scala:487)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
    at akka.dispatch.Mailbox.run(Mailbox.scala:220)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Uncaught error from thread [Remote-akka.remote.default-remote-dispatcher-7] shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[Remote]
java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.jar.Attributes.read(Attributes.java:394)
    at java.util.jar.Manifest.read(Manifest.java:199)
    at java.util.jar.Manifest.<init>(Manifest.java:69)
    at java.util.jar.JarFile.getManifestFromReference(JarFile.java:199)
    at java.util.jar.JarFile.getManifest(JarFile.java:180)
    at sun.misc.URLClassPath$JarLoader$2.getManifest(URLClassPath.java:944)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:450)
    at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at scala.concurrent.Future$class.foreach(Future.scala:204)
    at scala.concurrent.impl.Promise$DefaultPromise.foreach(Promise.scala:153)
    at akka.remote.transport.netty.NettyTransport$.gracefulClose(NettyTransport.scala:222)
    at akka.remote.transport.netty.TcpAssociationHandle.disassociate(TcpSupport.scala:94)
    at akka.remote.transport.ProtocolStateActor$$anonfun$1.applyOrElse(AkkaProtocolTransport.scala:516)
    at akka.remote.transport.ProtocolStateActor$$anonfun$1.applyOrElse(AkkaProtocolTransport.scala:480)
    at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
    at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
    at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
    at akka.actor.FSM$class.terminate(FSM.scala:672)
    at akka.actor.FSM$class.applyState(FSM.scala:617)
    at akka.remote.transport.ProtocolStateActor.applyState(AkkaProtocolTransport.scala:269)
    at akka.actor.FSM$class.processEvent(FSM.scala:609)
    at akka.remote.transport.ProtocolStateActor.processEvent(AkkaProtocolTransport.scala:269)
    at akka.actor.FSM$class.akka$actor$FSM$$processMsg(FSM.scala:598)
    at akka.actor.FSM$$anonfun$receive$1.applyOrElse(FSM.scala:592)
    at akka.actor.Actor$class.aroundReceive(Actor.scala:467)

Update dockerhub image to 0.90.5

The README.md references /aut/target/aut-0.90.5-SNAPSHOT-fatjar.jar:

docker run --rm -it \
  archivesunleashed/docker-aut \
  /spark/bin/pyspark \
  --py-files /aut/target/aut.zip \
  --jars /aut/target/aut-0.90.5-SNAPSHOT-fatjar.jar

but the Docker image on dockerhub archivesunleashed/docker-aut:latest contains aut-0.90.3-SNAPSHOT-fatjar.jar:

$ docker pull archivesunleashed/docker-aut:latest
Using default tag: latest
latest: Pulling from archivesunleashed/docker-aut
Digest: sha256:cbaabbd3bf2783ec3af1956fefb44ce20e10b6c6321cd5c837dd52e3128a2012
Status: Downloaded newer image for archivesunleashed/docker-aut:latest
docker.io/archivesunleashed/docker-aut:latest
$ docker run --rm -it archivesunleashed/docker-aut:latest ls /aut/target
:
aut-0.90.3-SNAPSHOT-fatjar.jar
:

Push the most recent build of archivesunleashed/docker-aut to dockerhub.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.