Code Monkey home page Code Monkey logo

hdp-tez's Introduction

Apache Tez

Apache Tez is a generic data-processing pipeline engine envisioned as a low-level engine for higher abstractions such as Apache Hadoop Map-Reduce, Apache Pig, Apache Hive etc.

At its heart, tez is very simple and has just two components:

  • The data-processing pipeline engine where-in one can plug-in input, processing and output implementations to perform arbitrary data-processing. Every 'task' in tez has the following:
  • Input to consume key/value pairs from.
  • Processor to process them.
  • Output to collect the processed key/value pairs.
  • A master for the data-processing application, where-by one can put together arbitrary data-processing 'tasks' described above into a task-DAG to process data as desired. The generic master is implemented as a Apache Hadoop YARN ApplicationMaster.

hdp-tez's People

Contributors

afsanjar avatar yussufsh avatar

Stargazers

WeiWen Fan avatar

Watchers

James Cloos avatar James L Hall avatar Ayappan Perumal avatar amir sanjar avatar  avatar  avatar Pravin D Silva avatar

Forkers

maduhu

hdp-tez's Issues

Hadoop version value is hardcoded in Tez pom.xml

The pom.xml in Tez contains the hadoop.version value set as 2.7.1-SNAPSHOT

<hadoop.version>2.7.1-SNAPSHOT</hadoop.version>

On ppc, this causes components which use Tez as a dependency to fail as the hadoop.version value cannot be set to the ppc version of hadoop.

For example, Oozie build failed for Hadoop dependency in Tez due to the same reason:
[ERROR] Failed to execute goal on project oozie-sharelib-pig: Could not resolve dependencies for project org.apache.oozie:oozie-sharelib-pig:jar:4.2.0: Failed to collect dependencies at org.apache.tez:tez-mapreduce:jar:0.7.1-SNAPSHOT -> org.apache.tez:tez-api:jar:0.7.1-SNAPSHOT -> org.apache.hadoop:hadoop-aws:jar:2.7.1-SNAPSHOT: Failed to read artifact descriptor for org.apache.hadoop:hadoop-aws:jar:2.7.1-SNAPSHOT: Could not transfer artifact org.apache.hadoop:hadoop-aws:pom:2.7.1-SNAPSHOT from/to apache.snapshots.repo (https://repository.apache.org/content/groups/snapshots)

Tez-dag: TestVertexImpl fails with random test cases intermittently

Following test cases fails on Power without any defined sequence.

TestVertexImpl.testVertexWithOneToOneSplitWhileRunning:3842->initAllVertices:2401 expected: but was:
TestVertexImpl.testVertexTaskAttemptOutputFailure:3322 expected:<OUTPUT_WRITE_ERROR> but was:<UNKNOWN_ERROR>
TestVertexImpl.testVertexVMErrorReport:3922->initAllVertices:2401 expected: but was:
TestVertexImpl.testInputInitializerEventsMultipleSources:4190->startVertex:2413->startVertex:2426 expected: but was:
TestVertexImpl.testInputInitializerEventsAtNew:4393->startVertex:2413->startVertex:2431 expected: but was:
TestVertexImpl.testVertexWithOneToOneSplit:3780 expected: but was:

**Failure details:**
estVertexWithOneToOneSplitWhileRunning(org.apache.tez.dag.app.dag.impl.TestVertexImpl)  Time elapsed: 0.05 sec  <<< FAILURE!
java.lang.AssertionError: expected:<INITED> but was:<NEW>
        at org.junit.Assert.fail(Assert.java:88)
        at org.junit.Assert.failNotEquals(Assert.java:743)
        at org.junit.Assert.assertEquals(Assert.java:118)
        at org.junit.Assert.assertEquals(Assert.java:144)
        at org.apache.tez.dag.app.dag.impl.TestVertexImpl.initAllVertices(TestVertexImpl.java:2401)
        at org.apache.tez.dag.app.dag.impl.TestVertexImpl.testVertexWithOneToOneSplitWhileRunning(TestVertexImpl.java:3842)

testVertexTaskAttemptOutputFailure(org.apache.tez.dag.app.dag.impl.TestVertexImpl)  Time elapsed: 0.043 sec  <<< FAILURE!
java.lang.AssertionError: expected:<OUTPUT_WRITE_ERROR> but was:<UNKNOWN_ERROR>
        at org.junit.Assert.fail(Assert.java:88)
        at org.junit.Assert.failNotEquals(Assert.java:743)
        at org.junit.Assert.assertEquals(Assert.java:118)
        at org.junit.Assert.assertEquals(Assert.java:144)
        at org.apache.tez.dag.app.dag.impl.TestVertexImpl.testVertexTaskAttemptOutputFailure(TestVertexImpl.java:3322)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.