Code Monkey home page Code Monkey logo

hive-jdbc-driver's Introduction

Hive JDBC Driver

This project is alternative to the JDBC driver that is bundled with the Apache Hive project. The desire to build this grew out of my experience maintaining the Hive JDBC "uber jar" project (here) which attempted to produce a smaller, more complete standalone driver jar by crafting an alternative Maven pom file. While that effort mostly succeed in creating a slightly smaller jar, I felt like more could be done to improve the Hive JDBC experience.

As I started building out this project I realized that I wanted to deviate significantly from the existing Apache implementation. As a result, this project does not desire or attempt to be URL or even feature compatible with the existing Apache Driver. One obvious manifestation of this is that existing JDBC connection strings/URLs that work with the Apache Driver WILL NOT WORK with this driver without modification. I've provided a mapping for existing URL properties here as well as plenty of examples.

Another significant deviation from the Apache implementation is the absence of Hadoop or Hive dependencies and their transitive dependency graphs. The only bridge to Hive in this driver is the Thrift Interface Description Language (IDL) file and the Java bindings it generates. All necessary code was rewritten from the ground up with an emphasis on eliminating external dependencies. This has the clear benefit of significantly reducing jar sizes and reducing opportunities for class conflicts! See size comparison below:

the standalone jar for Hive 1.2.x does not contain all necessary dependencies so this is not an accurate representation of the real size

Areas of Focus

The following are board areas where I have attempted expand or improve the existing Hive Driver:

  • Jar Size - focused on creating a smaller, more portable jar
  • Dependency Graph - because JDBC drivers are often embedded in other applications it is important to limit the number of external dependencies that are shaded into the final jar. Shaded dependencies are often the source of size bloat and classloader conflicts. Every effort has been made to limit the number of external dependencies.
  • Logging - logging inside hadoop dependencies and Hive is often a confusing mix of logging frameworks. This driver works to provide clearer logging thru the Log4J2 api.
  • JDBC Compatibility - it is doubtful that Hive will ever allow true JDBC specification compatibility... the underlying datastore simply doesn't yet (may never) provide many of the required concepts. Having said that, there are plenty of methods and interfaces within the JDBC spec that have not been implemented by the Apache Driver that could have been. I've attempted rectify that.
  • Documentation - the existing Hive documentation can be difficult to follow. For example, there doesn't seem to be a good single point of reference for all supported URL parameters. Instead the complete picture of options must be gleaned from a handful of examples and sources. This makes setting up connections difficult.
  • Simplification - the existing driver supports concepts like "embedded mode" which adds complexity to connection logic and requires server side dependencies. If you need "embedded mode", this driver is not for you.
  • External Configuration - in my experience you often need to add Java VM options (-Dsome_config) to get Hive JDBC working or to enable debugging. This is especially prevalent when dealing with Kerberos. This driver moves some of those common configuration flags to URL properties.

Current State

This project is pre-alpha and should be considered experimental a this point. Currently it is built against Hortonworks Repos, but will soon be switched to more closely follow the Apache released versions.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.