esri / gis-tools-for-hadoop Goto Github PK

The GIS Tools for Hadoop are a collection of GIS tools for spatial analysis of big data.

Home Page: http://esri.github.io/gis-tools-for-hadoop/

License: Apache License 2.0

gis-tools-for-hadoop's Introduction

gis-tools-for-hadoop

The GIS Tools for Hadoop are a collection of GIS tools that leverage the Spatial Framework for Hadoop for spatial analysis of big data. The tools make use of the Geoprocessing Tools for Hadoop toolbox, to provide access to the Hadoop system from the ArcGIS Geoprocessing environment.

What's New

Sample: direct use of Enclosed and Unenclosed Esri- and Geo- JSON InputFormats
Update: use updated non-deprecated names EsriJsonSerDe and EnclosedEsriJsonInputFormat in samples after having added support for GeoJSON
Tutorial: An Introduction for Beginners

Features

Sample tools that demonstrate full stack implementations of all the resources provided to solve GIS problems using Hadoop

Resources for building custom tools

Esri Geometry API Java - Java geometry library for spatial data processing
Spatial Framework for Hadoop
- Java helper utilities for Hadoop developers
- Hive spatial user-defined functions
Geoprocessing Tools - ArcGIS Geoprocessing tools for Hadoop

Instructions

Start out by navigating to samples and following the instructions provided with each sample.There are also tutorials for using the GP tools and aggregation methods.

Requirements

Requirements will differ depending on the needs of each tool. At a minimum, you will need:

Access to an Apache Hadoop cluster
ArcGIS for Desktop or Server for geoprocessing and visualization

Other requirements may include:

Apache Hive in order to run Hive queries
Apache Oozie Workflow Scheduler for workflow scheduling

Additional requirements will be spelled out by individual tools.

Resources

Issues

Find a bug or want to request a new feature? Please let us know by submitting an issue.

Contributing

Esri welcomes contributions from anyone and everyone. Please see our guidelines for contributing

Licensing

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at:

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

A copy of the license is available in the repository's license.txt file.

gis-tools-for-hadoop's People

Contributors

Stargazers

Watchers

Forkers

jinxingster sunilspalkar randallwhitman weishandong 909652048 gissong alirong iopenstack allenlu2008 apostolis12 strategist922 shmoon091 mingshuwang uplanning shjgiser apostolisp raoqy justasabc yyri seongmin wangshaohua pcodding jortizcs santonugoswami wyingquan baoguoqing kkkemp elim365 barrycug mshabdiz cweems vsingh58 manasdebashiskar shiva0403 mody1 afreetgo zhouning wallsky octeufer emanuelepalermo skyswind charles-cai srdav1s xtatsux domsooch dreamerseverus andradejeff gdtm86 climbage zz676 sidahmed surfcao ayub567 mrecho wjwang4300 zmm0522 ejde jrhee05 geotrek xdlwd086 at46github miguel-estrada xiaobao123pppp vinkeychen1987 cjl5108 basehub gisfromscratch aguha-github vovoma gishub iluckyyang jeffstorlie smas1 andr7430 bigclove rmaestre umairacheema skamanrev irfanj lexie123 jdinkin gischen senthilkumarvs83 copoo david62243 anitatailor mohitindian jrspetit somappasrinivasan colin-sgrc cputnam folkcode ron1024 dzang yonglehou ech-mone rkulan007 ricebeans narunarthy-invn ling1

gis-tools-for-hadoop's Issues

error while creating a json file using hadoop toolbox

a)I receive error as shown below when i try to create a json file for a feature class.

Executing: FeaturesToJSON RILGDB.Building D:\hadoop\rilgdbbuilding11.json UNENCLOSED_JSON NOT_FORMATTED
Start Time: Tue May 10 15:27:29 2016
Running script FeaturesToJSON...

Traceback (most recent call last):
File "", line 365, in execute
File "D:\GIS tools for hadoop\geoprocessing-tools-for-hadoop-master\geoprocessing-tools-for-hadoop-master\JSONUtil.py", line 413, in ConvertFC2JSONUnenclosed
attributes_json [field_list[i]] = (attr if type (attr)! = datetime.datetime else unicode (attr))
RuntimeError

Failed to execute (FeaturesToJSON).
Failed at Tue May 10 15:28:45 2016 (Elapsed Time: 1 minutes 16 seconds)

clone the github repo: Connection time out

Hello, I try to install the VM but the cloning of the github repo always fails. What must I do?

Best regards,

Marc

USA state json data file

There is california-counties.json file available, how do I get ESRI USA-state.json file?

error while querying integer fields in hive for data transfered using hadoop tool box

I have created table in hive for the data transferred using gis toolbox and i get error in hive as shown below for int data types.

hive> describe bhuj2;
OK
bldgsubroad string from deserializer
sublocality string from deserializer
cityname string from deserializer
statename string from deserializer
bldgsize string from deserializer
tag string from deserializer
pincode int from deserializer
numberofflats int from deserializer
numberofshops int from deserializer
bldg_type string from deserializer
cableoperatorname string from deserializer
Time taken: 0.061 seconds, Fetched: 11 row(s)
hive> select pincode,numberofflats from bhuj2 limit 10;
OK
org.codehaus.jackson.JsonParseException: Current token (VALUE_NULL) not numeric, can not use numeric value accessors
at [Source: java.io.StringReader@7da96006; line: 1, column: 325]
at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1432)
at org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385)
at org.codehaus.jackson.impl.JsonParserBase._parseNumericValue(JsonParserBase.java:763)
at org.codehaus.jackson.impl.JsonParserBase.getIntValue(JsonParserBase.java:618)
at com.esri.hadoop.hive.serde.JsonSerde.setRowFieldFromParser(Unknown Source)
at com.esri.hadoop.hive.serde.JsonSerde.deserialize(Unknown Source)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:647)
at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:561)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:138)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1621)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:267)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:199)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:410)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:783)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:616)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
370427 0
370427 0
370427 0
370427 0
370427 0
370001 0
370001 0
370001 0
370427 0
370427 0

Link to Geometry-API Samples

https://github.com/Esri/samples-geometry-api-java
Background: Esri/geometry-api-java#118 (comment)

Reduce job runs forever when attempting to process earthquake sample against US Zip5 shapefile

I'm using Hive to do reverse geocoding (RG). I have the earthquake sample working. I then attempt to do the RG on the earthquake data using the Zip5 shape file for the entire U.S. The mapping step runs in seconds. Then the reduce step runs to over 90% completion in a few seconds, but never finishes from there. Hoping to find out why.

I converted the shape file using ArcMap, per the instructions here. That worked fine. I also selected just state = 'CA' to make it somewhat manageable, so the RG I'm doing is just against the CA table (but the same problem happens with the whole country.

At a minimum, is there some type of log setting I can set to verbose and then check a log to see what the code is doing?

All assistance appreciated!

Hadoop GIS stores geometry incorrectly in its binary form

When applied to a table with N records, the call ST_GeomFromGeoJSON, generates the first geometry repeated N times. ST_GeomFromText has exactly the same behaviour.
In my tables I ended up storing all the geometry fields as GeoJSON, and instantiate geometric objects on-the-fly, whenever I needed then. If this is a bug, perhaps ESRI could fix it, or just remove functions they dont support from the dataset?

There is a detailed explanation of this problem on Stack overflow:

How to Load Spatial Data using the Hadoop GIS framework

Tutorial Aggregating CSV Data (Spatial Binning) Issue

I am using Cloudera CDH 5.3.3 with Hive 0.13.1 (/opt/cloudera/parcels/CDH-5.3.3-1.cdh5.3.3.p0.5/jars/hive-common-0.13.1-cdh5.3.3.jar) on a 4-node cluster VM environment.
I followed the Aggregating CSV Data tutorial (https://github.com/Esri/gis-tools-for-hadoop/wiki/Aggregating-CSV-Data-%28Spatial-Binning%29) and everything appeared to be working. However, after step 11, I ran a simple SELECT query and it returned nothing even though the files existed in the Hive folder. As a result, when I followed step 12 to get the result taxi_agg table to ArcMap, the json file is empty with no data.

Here are some screenshots:

Table taxi_demo location:
Files in the HDFS folder for Hive taxi_demo table:
Data in the taxi_demo table Hive folder:
Hive query returns nothing:

Please help.

Thanks,

Tom

HadoopTools can't work

I have set up hadoop on the server 19.106.11.231.

When I use the tool “Copy To HDFS”,

The Arcgis shows”Failed to execute (CopyToHDFS).”

“Executing: CopyToHDFS G:\t2.json 19.106.11.231 50070 hadoop /user/t2.json CREATE
Start Time: Mon Oct 13 16:57:41 2014
Running script CopyToHDFS…
Unexpected error : [Errno 11004] getaddrinfo failed
Traceback (most recent call last):
File “”, line 184, in execute
File “D:\Python27\ArcGIS10.2\lib\site-packages\webhdfs\webhdfs.py”, line 91, in copyToHDFS
fileUploadClient.request(‘PUT’, redirect_path, open(source_path, “rb”), headers={})
File “D:\Python27\ArcGIS10.2\Lib\httplib.py”, line 958, in request
self._send_request(method, url, body, headers)
File “D:\Python27\ArcGIS10.2\Lib\httplib.py”, line 992, in _send_request
self.endheaders(body)
File “D:\Python27\ArcGIS10.2\Lib\httplib.py”, line 954, in endheaders
self._send_output(message_body)
File “D:\Python27\ArcGIS10.2\Lib\httplib.py”, line 814, in _send_output
self.send(msg)
File “D:\Python27\ArcGIS10.2\Lib\httplib.py”, line 776, in send
self.connect()
File “D:\Python27\ArcGIS10.2\Lib\httplib.py”, line 757, in connect
self.timeout, self.source_address)
File “D:\Python27\ArcGIS10.2\Lib\socket.py”, line 553, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
gaierror: [Errno 11004] getaddrinfo failed

Completed script CopyToHDFS…
Failed to execute (CopyToHDFS).
Failed at Mon Oct 13 16:57:44 2014 (Elapsed Time: 2.62 seconds)”

Who can help me for the error?
//thx

copy from hdfs - json to features

Hi I created a model as such

But the output FC is somewhere in the ocean..

Is the issue with the binning? Help!

Data links are broken

On the sample pages, such as https://github.com/Esri/gis-tools-for-hadoop/tree/master/samples/point-in-polygon-aggregation-hive, links to the various data sets are broken.

Copy from HDFS not working

Here is the error details

I have Windows 7 as Host machine and Cloudera VM (Linux) as guest machine. Arcmap is installed on Host machine please find the screen shot below while accessing.

http://quickstart.cloudera:50070/dfshealth.html#tab-overview (192.168.126.128:50070/) through browser on the same machine as ArcMap.

Also tried solving the problem with the help of "Copy to HDFS not working #16" But still I am facing the same problem.

Please point me to right direction.

Torubles installing GeoProcessing Tools in ArcMap

Hi all,
I'm having troubles installing Geoprocessing tools for Hadoop on ArcMap 10.3. I am running Hadoop and Hive on guest Hortonwroks Sandbox 2.1 VM as guest. ArcMap is installed on Windows 8 host machine.

Following instructions given (https://github.com/Esri/geoprocessing-tools-for-hadoop) I get errors on imported python scripts as shown below:

Also, trying to import Sample Model toolbox from point-in-polygon-mr sample (https://github.com/Esri/gis-tools-for-hadoop/tree/master/samples/point-in-polygon-aggregation-mr/gp) I get a similar error:

Could you, please, help me?
Thanks in advance.

Tutorial not working

Hi guys,

I am trying to run your tutorials but I don´t know whats wrong. At first, I am new, really beginner in these things like Hadoop etc.

I uses this, exactly same what is written there: https://github.com/Esri/gis-tools-for-hadoop/wiki/GIS-Tools-for-Hadoop-for-Beginners

Now samples ... I tried https://github.com/Esri/gis-tools-for-hadoop/tree/master/samples/point-in-polygon-aggregation-hive but when I type last query (select, join, group, order ...) I have this :

Warning: Map Join MAPJOIN[18][bigTable=earthquakes] in task 'Stage-2:MAPRED' is a cross product
15/05/03 00:48:43 WARN conf.Configuration: file:/tmp/root/hive_2015-05-03_00-48-28_169_4942355857717156539-1/-local-10008/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
15/05/03 00:48:43 WARN conf.Configuration: file:/tmp/root/hive_2015-05-03_00-48-28_169_4942355857717156539-1/-local-10008/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.

so I dont have any results ...

Another sample example is Feature Class to HDFS ( https://github.com/Esri/gis-tools-for-hadoop/wiki/Getting-a-Feature-Class-into-HDFS) ... I am in step 6 - 7 ... I should write that DROP TABLE in Cygwin right? I did it, and have result OK, then I write describe formatted input_ex and nothing happens ... whats wrong with that? I am going step by step like a child ... It could be problem between keyboard and chair (me) but I do everything based on your tutorial ...

EDIT: Feature Class to HDFS is working now ... I just forgot to write ; and the end of describe formatted input_ex ... I didn´t know that it should be there and in tutorial this ; is missing
Thanks for advice :)

Creating Counties Table

Hello,

When creating the table for the California counties:

CREATE TABLE counties (Area string, Perimeter string, State string, County string, Name string, BoundaryShape binary)
ROW FORMAT SERDE 'com.esri.hadoop.hive.serde.JsonSerde'
STORED AS INPUTFORMAT 'com.esri.json.hadoop.EnclosedJsonInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';

It notes that a column is called BoundaryShape, but when i convert the json file to a feature class it does not have the column so I am trying to figure out where it gets this information from to find the boundaries for each of the polygons.

I am trying to do a similar project with the US States and am trying to figure out what would be the equivalent to the BoundaryShape field.

Error while creating table

For below example -
https://github.com/Esri/gis-tools-for-hadoop/tree/master/samples/point-in-polygon-aggregation-hive

I am getting this error while using hive 13.1 but it works in hive 12.
CREATE EXTERNAL TABLE IF NOT EXISTS counties (Area string, Perimeter string, State string, County string, Name string, BoundaryShape binary)
> ROW FORMAT SERDE 'com.esri.hadoop.hive.serde.JsonSerde'
> STORED AS INPUTFORMAT 'com.esri.json.hadoop.EnclosedJsonInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. binaryTypeInfo

I have read that hive 13.1 is much faster and takes up less memory when compared to hive 12.

'Unexected error: [Errno 11004] getaddrinfo failed' Error while Migrating GDB Feature Class to HDFS using ArcGISTools

We are facing issue while migrating GDB Feature Class to Hadoop HDFS using GIS Tools for Hadoop Geoprocessing tools.
Following is the system environment details being used :

ArcGIS Client : 10.3.1/10.2.2
Hadoop version : hadoop 2.4.1
Python version : python 2.7.5
ArcSDE: 10.2.2
RDBMS: Oracle 11.2.0.4
ClusterInfo: MasterNode(Nos.1),Secondary Node(Nos.1),DataNodes(Nos.8)

Following steps followed to install and configure ArcGIS tools for hadoop environment:

'
a) Added the ‘geo processing tools for hadoop' Downloaded from GIThub weblink 'https://github.com/Esri/gis-tools-for-hadoop' in hadoop.

b) Enabled webhdfs in hdfs by editing hdfs-site.xml in /opt/hadoop/etc/hadoop/hdfs-site.xml.

c) Added jar 'spatial-sdk-hadoop.jar' and 'esri – geometry - api.jar' in /opt/hadoop 2.4.1/share/hadoop/tools/lib location of our Hadoop master node.

d) Browse for ArcGIS Geoprocessing tool Tool box having python scripts for Hadoop using ArcCatalog 10.3.1

e) Above step enables hadoop tools for ArcGIS, and converted the feature class into json file using ‘features to json’ feature in hadoop toolbox.

f) ’Copy to hdfs’ Scripting tool in hadoop tool box of ArcGIS has been used in order to copy json files to hdfs.

g) Got Error message 'Unexected error: [Errno 11004] getaddrinfo failed'

Error message after running tool:

_Start Time: Wed Mar 09 18:43:44 2016
Running script CopyToHDFS...
Unexpected error : [Errno 11004] getaddrinfo failed
Traceback (most recent call last):
File "", line 184, in execute
File "D:\GIS tools for hadoop\geoprocessing-tools-for-hadoop-master\geoprocessing-tools-for-hadoop-master\webhdfs\webhdfs.py", line 91, in copyToHDFS
fileUploadClient.request('PUT', redirect_path, open(source_path, "rb"), headers={})
File "C:\Python27\ArcGIS10.2\Lib\httplib.py", line 973, in request
self._send_request(method, url, body, headers)
File "C:\Python27\ArcGIS10.2\Lib\httplib.py", line 1007, in _send_request
self.endheaders(body)
File "C:\Python27\ArcGIS10.2\Lib\httplib.py", line 969, in endheaders
self._send_output(message_body)
File "C:\Python27\ArcGIS10.2\Lib\httplib.py", line 829, in send_output
self.send(msg)
File "C:\Python27\ArcGIS10.2\Lib\httplib.py", line 791, in send
self.connect()
File "C:\Python27\ArcGIS10.2\Lib\httplib.py", line 772, in connect
self.timeout, self.source_address)
File "C:\Python27\ArcGIS10.2\Lib\socket.py", line 553, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
gaierror: [Errno 11004] getaddrinfo failed

We followed all the guidelines and steps specified in following weblinks and references:

https://esri.github.io/gis-tools-for-hadoop/
https://github.com/Esri/gis-tools-for-hadoop/wiki

Please provide the resolution .

Need GP tool for executing mapreduce jobs using HCatalog

Justification

Oozie has far too complicated of a set up for simply executing a a jar file
HCatalog is now part of Hive which comes standard with most Hadoop distributions
Simple rest interface

% curl -s -d user.name=ctdean \
   -d jar=wordcount.jar \
   -d class=org.myorg.WordCount \
   -d libjars=transform.jar \
    -d arg=wordcount/input \
    -d arg=wordcount/output \
    'http://localhost:50111/templeton/v1/mapreduce/jar'

See https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+MapReduceJar.

look for advice to optimize ST_Intersect

Hi, for a class project I decided to look into this toolset. The plan was to bring in roads data (OSM and a gov't roads dataset) to the cluster and compare road lengths of each dataset. I built a fishnet in ArcGIS for my study area, uploaded to the cluster and am intersecting/summing the road distance for each roads dataset. The goal is to show on a map the areas in the study area that have more/less road distance.

here is my schema in hive:

CREATE TABLE bc_fishnet_05dd (OID string, geom binary)
ROW FORMAT SERDE 'com.esri.hadoop.hive.serde.JsonSerde'
STORED AS INPUTFORMAT 'com.esri.json.hadoop.EnclosedJsonInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';

CREATE TABLE bc_osm (osm_id string, type string, geom binary)
ROW FORMAT SERDE 'com.esri.hadoop.hive.serde.JsonSerde'
STORED AS INPUTFORMAT 'com.esri.json.hadoop.EnclosedJsonInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';

and here is the query just for OSM:

create temporary function ST_Intersects as 'com.esri.hadoop.hive.ST_Intersects';
create temporary function ST_Intersection as 'com.esri.hadoop.hive.ST_Intersection';
create temporary function ST_SetSRID as 'com.esri.hadoop.hive.ST_SetSRID';
create temporary function ST_GeodesicLengthWGS84 as 'com.esri.hadoop.hive.ST_GeodesicLengthWGS84';

insert overwrite table bc_osm_by_fishnet
select f.oid, sum(ST_GeodesicLengthWGS84(ST_SetSRID(ST_Intersection(f.geom, n.geom), 4326)))
from bc_fishnet_05dd f, bc_osm n
where type not in ('rest_area', 'cycleway' ,'footway', 'path', 'pedestrian', 'rest_area', 'steps', 'track', 'trail', 'service')
and ST_Intersects(n.geom, f.geom)
group by f.oid;

It takes a substantial amount of time, even when reducing the fishnet polygons to only ones where there are roads (count = 10k). My OSM count is 150K or so).

So, I guess the question is, is there a way to optimize this? This isn't really large amounts of data, ArcGIS can handle it fine.

Hoping to get advice or else my project will be more about evaluation of different spatial tools out there, not the actual results.

Thanks

Error with ST_LineString when running query below

In the Hadoop YARN log for a container I am seeing these errors:

2016-07-12 20:10:55,516 [ERROR] [TezChild] |hive.ST_LineString|: Internal error - ST_LineString: java.lang.NullPointerException. 2016-07-12 20:10:55,517 [ERROR] [TezChild] |hive.ST_SetSRID|: Invalid arguments - one or more arguments are null. 2016-07-12 20:10:55,517 [ERROR] [TezChild] |hive.ST_GeodesicLengthWGS84|: Invalid arguments - one or more arguments are null.

The query im running is:

select
PreQuery.name,
sum(case when PreQuery.Geode < 10.0 then 1 else 0 end) 10mCount,
sum(case when PreQuery.Geode < 50.0 then 1 else 0 end) 50mCount,
sum(case when PreQuery.Geode < 1000.0 then 1 else 0 end) 100mCount
from
( select
a.name,
ST_GeodesicLengthWGS84( ST_SetSRID( ST_LineString(a.lat, a.lon, b.lat, b.lon),4326)) as Geode
from a, b) PreQuery
GROUP BY
PreQuery.name
ORDER by
1000mCount desc

When I run this on a few thousand records it works fine but when I run on over 54k I see these problems.

Any ideas why?

Error for curve geometry while querying in hive

Hi.
Thank you for your support so far. We are often running into issues since we are just starting to transfer huge datasets like pan india buildings, roads etc. into HDFS.

I mentioned buildings and roads because there are curve geometries present in both the datasets like curved rings and curved paths.

I understand that the the current geometry api doesnt support complex geometry and deals with simple point, line and polygon geometry.

Our organisation right now is in the pilot stage of transferring chunks of datasets into HDFS and most of the datasets have features that deal with complex geometry.

Will there be a future release which will support such complex geometry ??
Or can you suggest an alternative route that we can proceed in, because we are and will be stuck with this issue.

I have attached a screenshot with the error

select count(*) from plannedroute;

Thank you for your help so far and I would highly appreciate if someone would rid us out of this issue or suggest a workaround.
Thanks

Error when trying to retrieve date values

Hi
I am finding difficulty when it comes to retrieving date values in hive.
My query is

create external table test1(DISPLAYSCALE int, CREATED_DATE date, LAST_EDITED_DATE date)
> ROW FORMAT SERDE 'com.esri.hadoop.hive.serde.JsonSerde'
> STORED AS INPUTFORMAT 'com.esri.json.hadoop.UnenclosedJsonInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';

When I try to use the select * from test1 limit5 I get this error;
Failed with exception java.io.IOException:java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.DateWritable cannot be cast to org.apache.hadoop.io.Text

As per the json the datatype for CREATED_DATE and CREATED_DATE are esriFieldTypeDate and the values are in this format say 2013-11-20 09:39:25.000001.
So i used the date datatype while creating the table, copied it to HDFS using the unenclosed json and used the select * query to retrieve the columns, but I get the above error.
To get the values we are creating the same table with string data type respectively instead of date and we are able to get the values .

Can you suggest a solution for this problem. This may seem silly but I am pretty new to programming.

Thanks

Installing GIS Tools for Hadoop

Do you have any detailed instructions for installing the GIS Tools for Hadoop? For those of us who are not familiar with building open source projects, the instructions are pretty vague.

Thanks!
Sandy

ST_X and ST_Y don't work as expected

I want to use the ST_X() and ST_Y() geometry functions, in order to get the x,y coordinates of a point, and build a string from it.
If I use them within a select, like this:

select ST_X(geom), ST_Y(geom) from tweets where geom is not null limit 10;

They work is expected:

2.444087        41.653078
1.108611        41.190833
0.622647        41.630436
2.212209        41.405437
2.132334        41.372308
0.42043 49.337155
0.42043 49.337155
2.007902        41.318285
2.183024        41.405962
2.20352 41.45851

In order to build the string, I make the query like this:

select concat(ST_Y(geom), "," , ST_X(geom)) from tweets where geom is not null limit 10;

And as a result, it overrides the second call (ST_X), with the results of the first call (ST_Y):

41.653078,41.653078
41.190833,41.190833
41.630436,41.630436
41.405437,41.405437
41.372308,41.372308
49.337155,49.337155
49.337155,49.337155
41.318285,41.318285
41.405962,41.405962
41.45851,41.45851

If I reverse, the call like this:

select concat(ST_X(geom), "," , ST_Y(geom)) from tweets where geom is not null limit 10;

,the same thing applies and I have the ST_X coordinate repeated in place of ST_Y:

2.444087,2.444087
1.108611,1.108611
0.622647,0.622647
2.212209,2.212209
2.132334,2.132334
0.42043,0.42043
0.42043,0.42043
2.007902,2.007902
2.183024,2.183024
2.20352,2.20352

Could this be a bug?

Invalid argument in point-in-polygon-aggregation-mr

I downloaded code from https://github.com/Esri/gis-tools-for-hadoop/tree/master/samples/point-in-polygon-aggregation-mr and after building the jar file I run the example and got an error "Invalid argument" because there is a condition if (args.length != 3) when I ouput the args.length it is 6

As a matter of fact I didn't change anything from source code or shell script

Copy to HDFS not working

Here is the error detail!!

I have Windows 7 as Host machine and Cloudera VM (Linux) as guest machine. Arcmap is installed on Host machine please find the screen shot below while accessing http://192.168.56.101:50070/ through browser on the same machine as ArcMap.

![clipboard02](https://cloud.githubusercontent.com/assets/3725274/4897881/9f46552a-640a-11e4-9dfb-066a77ff5b32.jpg)

Job Not Completing

I am trying to count how many points are within each zip code in the United States and it takes over 21 hours to complete.

If i use a very basic json such as states it runs within 3 minutes but anytime i try to use polygons that are far more advance the job takes substantially longer.

Is there any reasoning behind this?

Computing Area on projection using SRID

I am wondering how to compute the area of the underlying projection (using SRID) similar to the function ST_GeodesicLengthWGS84. Can somebody guide me on how to do this?

Currently I am using this code:

SELECT
  ST_Area(ST_GeomFromText(A.area_wkt, 4326)),
  ST_Length(ST_GeomFromText(A.line_wkt, 4326)),
  ST_GeodesicLengthWGS84(ST_GeomFromText(A.line_wkt, 4326))
FROM A

Best,
Christoph

Multipolygon JSON import

I got a JSON file with country boundaries as polygons. Some countries have just one shape (polygon) and other countries have more, like this:

{"type":"Feature","id":"AZE","properties":{"name":"Azerbaijan"},"geometry":{"type":"MultiPolygon","coordinates":[[[[45.001987,39.740004],[45.298145,39.471751],[45.739978,39.473999],[45.735379,39.319719],[46.143623,38.741201],[45.457722,38.874139],[44.952688,39.335765],[44.79399,39.713003],[45.001987,39.740004]]],[[[47.373315,41.219732],[47.815666,41.151416],[47.987283,41.405819],[48.584353,41.80887],[49.110264,41.282287],[49.618915,40.572924],[50.08483,40.526157],[50.392821,40.256561],[49.569202,40.176101],[49.395259,39.399482],[49.223228,39.049219],[48.856532,38.815486],[48.883249,38.320245],[48.634375,38.270378],[48.010744,38.794015],[48.355529,39.288765],[48.060095,39.582235],[47.685079,39.508364],[46.50572,38.770605],[46.483499,39.464155],[46.034534,39.628021],[45.610012,39.899994],[45.891907,40.218476],[45.359175,40.561504],[45.560351,40.81229],[45.179496,40.985354],[44.97248,41.248129],[45.217426,41.411452],[45.962601,41.123873],[46.501637,41.064445],[46.637908,41.181673],[46.145432,41.722802],[46.404951,41.860675],[46.686071,41.827137],[47.373315,41.219732]]]]}},
{"type":"Feature","id":"BDI","properties":{"name":"Burundi"},"geometry":{"type":"Polygon","coordinates":[[[29.339998,-4.499983],[29.276384,-3.293907],[29.024926,-2.839258],[29.632176,-2.917858],[29.938359,-2.348487],[30.469696,-2.413858],[30.527677,-2.807632],[30.743013,-3.034285],[30.752263,-3.35933],[30.50556,-3.568567],[30.116333,-4.090138],[29.753512,-4.452389],[29.339998,-4.499983]]]}},

When creating this table:

CREATE EXTERNAL TABLE boundaries (Id STRING, Name STRING, BoundaryShape binary)
ROW FORMAT SERDE 'com.esri.hadoop.hive.serde.JsonSerde'
STORED AS INPUTFORMAT 'com.esri.json.hadoop.EnclosedJsonInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/user/esri/Test/Tables/boundaries';

I get the following error:

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Cannot validate serde: com.esri.hadoop.hive.serde.JsonSerde.

I want to create a table with Id STRING, Name STRING, BoundaryShape BINARY (which should contain just one polygon) fields. How can I load multipolygon data into this BoundaryShape field?

Thanks in advance.

hadoop -mkdir as root user

Hi,

I am trying to started on big data and following the sample.

I tried to hadoop fs -mkdir earthquake-demo to no avail when I am a root user. I tried to su to hdfs and managed to create the folder under /user/hdfs/earthquake-demo.

Then comes the problem when trying to put a root owned file (local copy of counties data) to hadoop hdfs.

How should I start?

Can't get point-in-polygon-aggregation-hive to run in HW 0.11.0

Running the run-sample query in our HW 0.11.0 cluster fails with the error shown below.

Running this query works fine.

SELECT counties.name
from counties
where ST_Contains(boundaryshape, ST_Point(-122.419678,37.781569));

I can use the JSON boundaries in any query that does not include a join. Any ideas?

Error report

Execution log at: /tmp/mxe5138/.log
java.lang.ClassNotFoundException: com.esri.hadoop.hive.serde.JsonSerde
Continuing ...
java.lang.ClassNotFoundException: com.esri.json.hadoop.EnclosedJsonInputFormat
Continuing ...
2014-01-06 01:08:53 Starting to launch local task to process map join; maximum memory = 4261937152
org.apache.hadoop.hive.ql.metadata.HiveException: Failed with exception nulljava.lang.NullPointerException
at org.apache.hadoop.hive.ql.exec.FetchOperator.getRowInspectorFromTable(FetchOperator.java:230)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getOutputObjectInspector(FetchOperator.java:598)
at org.apache.hadoop.hive.ql.exec.MapredLocalTask.initializeOperators(MapredLocalTask.java:406)
at org.apache.hadoop.hive.ql.exec.MapredLocalTask.executeFromChildJVM(MapredLocalTask.java:290)
at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:682)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)

    at org.apache.hadoop.hive.ql.exec.FetchOperator.getOutputObjectInspector(FetchOperator.java:634)
    at org.apache.hadoop.hive.ql.exec.MapredLocalTask.initializeOperators(MapredLocalTask.java:406)
    at org.apache.hadoop.hive.ql.exec.MapredLocalTask.executeFromChildJVM(MapredLocalTask.java:290)
    at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:682)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:160)

Execution failed with exit status: 2
Obtaining error information

Task failed!
Task ID:
Stage-4

BUt

Map doesn't complete running aggregation example in Hortonworks sandbox

Migrated from Issue #25, from @TikoS

Anyway I was trying Aggregation sample (taxi demo) and when I tried aggregation (step 9) it was for looooooooooooooong time and the process bar (map and %) was like:

0%, after 30 minutes 0%, after another 30 minutes 89% and 3 times the same (only Map), in Reduce was 0% ... This is okey? because I hade to turn off my lapton I was using for it so I don´t know ..

but then I tried it again, skip step 9 and use 10 and 11 and get this:

Could not find job application_1430678691120_0002. The job might not be running yet.

Job job_1430678691120_0002 could not be found: {"RemoteException":{"exception":"NotFoundException","message":"java.lang.Exception: job, job_1430678691120_0002, is not found","javaClassName":"org.apache.hadoop.yarn.webapp.NotFoundException"}} (error 404)

Steps 1-8 worked well ;) I don´t know if step 9 had some error message yet because I had to turn it off ...
but I will try it again ... (its really long process - its was running more than 1 hour... and still 89% Map then again 30% ... I will try it and write results ...

Hadoop HDF and stuff ... That would be great to implement that to these tools ...

BTW: Taxi demo sample ... I am again in step 9 .. .change the value from 0.01 to 1 to make it faster BUT ... it is still slow ... or its okey? I am just asking because I don´t know ... Isn ´t it weird ?

hive> FROM (SELECT ST_Bin(1, ST_Point(dropoff_longitude,dropoff_latitude)) bin_id, *FROM taxi_demo) bins
    > SELECT ST_BinEnvelope(1, bin_id) shape,
    > COUNT(*) count
    > GROUP BY bin_id;
Query ID = root_20150507120909_e000001c-4259-48dd-8e98-3684d0e94566
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 3
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1431013082714_0001, Tracking URL = http://sandbox.hortonworks.com:8088/proxy/application_1431013082714_0001/
Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1431013082714_0001
Hadoop job information for Stage-1: number of mappers: 9; number of reducers: 3
2015-05-07 12:11:05,397 Stage-1 map = 0%,  reduce = 0%
2015-05-07 12:12:09,782 Stage-1 map = 0%,  reduce = 0%
2015-05-07 12:13:25,554 Stage-1 map = 0%,  reduce = 0%
2015-05-07 12:14:37,769 Stage-1 map = 0%,  reduce = 0%
2015-05-07 12:15:38,096 Stage-1 map = 0%,  reduce = 0%
2015-05-07 12:16:42,504 Stage-1 map = 0%,  reduce = 0%
2015-05-07 12:17:25,371 Stage-1 map = 89%,  reduce = 0%
2015-05-07 12:18:11,890 Stage-1 map = 0%,  reduce = 0%
2015-05-07 12:19:12,073 Stage-1 map = 0%,  reduce = 0%
2015-05-07 12:20:12,697 Stage-1 map = 0%,  reduce = 0%
2015-05-07 12:21:26,323 Stage-1 map = 0%,  reduce = 0%
2015-05-07 12:22:26,650 Stage-1 map = 0%,  reduce = 0%
2015-05-07 12:23:33,421 Stage-1 map = 11%,  reduce = 0%
2015-05-07 12:23:35,272 Stage-1 map = 56%,  reduce = 0%
2015-05-07 12:24:09,535 Stage-1 map = 89%,  reduce = 0%
2015-05-07 12:24:41,054 Stage-1 map = 67%,  reduce = 0%
2015-05-07 12:25:28,244 Stage-1 map = 44%,  reduce = 0%
2015-05-07 12:26:22,278 Stage-1 map = 0%,  reduce = 0%
2015-05-07 12:28:36,400 Stage-1 map = 0%,  reduce = 0%
2015-05-07 12:29:46,988 Stage-1 map = 0%,  reduce = 0%
2015-05-07 12:30:47,851 Stage-1 map = 0%,  reduce = 0%
2015-05-07 12:31:48,892 Stage-1 map = 0%,  reduce = 0%
2015-05-07 12:33:13,617 Stage-1 map = 0%,  reduce = 0%
2015-05-07 12:34:14,299 Stage-1 map = 0%,  reduce = 0%

EDIT:
I guess that my last code isn´t good for me because it end now with this:

2015-05-07 12:56:44,267 Stage-1 map = 0%,  reduce = 0%
2015-05-07 13:07:46,126 Stage-1 map = 89%,  reduce = 0%
java.io.IOException: Job status not available
        at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:322)
        at org.apache.hadoop.mapreduce.Job.getStatus(Job.java:329)
        at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:598)
        at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:288)
        at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:547)
        at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:426)
        at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136)
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
        at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
        at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1504)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1271)
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1089)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:912)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
        at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
        at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:793)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Ended Job = job_1431013082714_0001 with exception 'java.io.IOException(Job status not available )'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
hive>

GP tools point-in-polygon-aggregation-mr not working

I followed your tutorial at gis-tools-for-hadoop / samples / point-in-polygon-aggregation-mr / gp / README.md but not Tool is not working and shows red X's on the GP Tools from the Hadoop Tools toolbox. However as suggested at same location to double click rectangle in the model and reset the location of the tool.

I acted accordingly after my double click it shows error dialog with "Unable to execute the selected tool." I tried to right click->edit on the "Run Sample Application" but nothing happens.

I'm using ArcGIS desktop 10.2 with Advances ArcInfo licenses

Enhancement request: magic operators <-> and <#>

I just completed a clunky nearest neighbor implementation using Hive and PySpark for a 1B+ record data set that needs to be queried against a global city lat/lon list. I realized how much cooler it would have been if the <-> and <#> operators were available and my 60 lines of code would become a single query. Any plans for putting these operators in a forthcoming release?

Error creating UDF functions

Hi all,

I'm very interested in carring uber tutorial out, as it would be quite useful for me in order to complete my university degree dissertation and hope you can help me. I tried every else sample but didn't succed performing them following that documentation step by step. This could be because Apache Hive doesn’t support geo spatial functions right out of the box.

I'm using Hortonworks HDP 2.1 sandbox as recommended in GIS-Tools-for-Hadoop documentation and already executed the following script to set up JAVA and install Maven:

rm /usr/bin/java
rm /usr/bin/javac
rm /usr/bin/javadoc
rm /usr/bin/javaws
ln -s /usr/java/jdk1.7.0_45/bin/java /usr/bin/java
ln -s /usr/java/jdk1.7.0_45/bin/javac /usr/bin/javac
ln -s /usr/java/jdk1.7.0_45/bin/javadoc /usr/bin/javadoc
ln -s /usr/java/jdk1.7.0_45/bin/javaws /usr/bin/javaws
echo "export JAVA_HOME=/usr/java/jdk1.7.0_45/" >> /etc/profile

wget http://mirror.symnds.com/software/Apache/maven/maven-3/3.1.1/binaries/apache-maven-3.1.1-bin.tar.gz
tar –zxvf apache-maven-3.1.1-bin.tar.gz
mv apache-maven-3.1.1 /usr/lib/maven
export PATH=$PATH:/usr/lib/maven/bin

After that, I built framework and api with Maven and added jar files into Hive. However, I got the error below at creating UDF functions ST_LineString, ST_Intersects and ST_Point:

hive> create temporary function ST_LineString as 'com.esri.hadoop.hive.ST_LineString';
java.lang.NoClassDefFoundError: com/esri/core/geometry/Geometry
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.hadoop.hive.ql.exec.FunctionTask.getUdfClass(FunctionTask.java:313)
at org.apache.hadoop.hive.ql.exec.FunctionTask.createTemporaryFunction(FunctionTask.java:181)
at org.apache.hadoop.hive.ql.exec.FunctionTask.execute(FunctionTask.java:81)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1504)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1271)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1089)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:912)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:793)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.lang.ClassNotFoundException: com.esri.core.geometry.Geometry
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 23 more
FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.FunctionTask. com/esri/core/geometry/Geometry

I already checked ST_Point.java is within spatial-sdk-hive-1.1.1-SNAPSHOT.jar file and can't find out the reason of the error. May it be a jar version or a maven build issue?

Please, could you help me?

Thanks in advance.

earthquakes.csv has different schema than sample expects

The “create table earthquakes” instructions given at: https://github.com/Esri/gis-tools-for-hadoop/tree/master/samples/point-in-polygon-aggregation-hive no longer aligns with the schema of the data located at: https://github.com/Esri/gis-tools-for-hadoop/tree/master/samples/data/earthquake-data

(I’m guessing that the earthquake-data is occasionally pulled from a USGS or similar source, and they changed their column definitions?)

I had to insert an additional column “unknown” of type double in front of the Magnitude column.

For example, the instructions provide the following schema:

(earthquake_date STRING, latitude DOUBLE, longitude DOUBLE, magnitude DOUBLE)

and a random sample line from the file (the unknown column is 80.0 and the magnitude is 6.5):

1930/12/06 07:03:28.00,53.0,-172.0,80.0,6.5,ML,0,,,,AK,

The schema that I used:

(earthquake_date STRING, latitude DOUBLE, longitude DOUBLE, unknown DOUBLE, magnitude DOUBLE)

If corrected, this works:

hive> select min(magnitude), max(magnitude) from earthquakes;
OK
5.0 9.1

If magnitude still points to the wrong column, you will see:

hive> select min(magnitude), max(magnitude) from earthquakes;
OK
-5.0    700.0

BinaryTypeInfo Error

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. binaryTypeInfo

I receive the following error while running the code:

CREATE TABLE mtkellr2.counties (Area string, Perimeter string, State string, County string, Name string, BoundaryShape binary)
ROW FORMAT SERDE 'com.esri.hadoop.hive.serde.JsonSerde'
STORED AS INPUTFORMAT 'com.esri.json.hadoop.EnclosedJsonInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';

I am using Hive 0.8.1

trip-discovery build.xml fails to compile

I am seeing an error while trying to compile trip-discovery jar. So am I doing something wrong.

/usr/local/ant/bin/ant
Buildfile: /Users/harshil/EDH/gis-tools-for-hadoop/samples/trip-discovery/build.xml

BUILD FAILED
/Users/harshil/EDH/gis-tools-for-hadoop/samples/trip-discovery/build.xml:5: Problem: failed to create task or type antlib:org.apache.maven.artifact.ant:dependencies
Cause: The name is undefined.
Action: Check the spelling.
Action: Check that any custom tasks/types have been declared.
Action: Check that any <presetdef>/<macrodef> declarations have taken place.
No types or tasks have been defined in this namespace yet

This appears to be an antlib declaration.
Action: Check that the implementing library exists in one of:
        -/usr/local/ant/lib
        -/Users/harshil/.ant/lib
        -a directory added on the command line with the -lib argument


Total time: 0 seconds

Adding following lines in build.xml fixes it:

<path id="maven-ant-tasks.classpath" path="../lib/maven-ant-tasks-2.1.3.jar" />
  <typedef resource="org/apache/maven/artifact/ant/antlib.xml"
          uri="antlib:org.apache.maven.artifact.ant"
          classpathref="maven-ant-tasks.classpath" />

[edited 02/02 to display the output and XML correctly -- Randall]

The UDF implementation class 'com.esri.hadoop.hive.ST_Point' is not present in the class path

I'm working with the latest version of HortonWorks sandbox, version 2.1. It uses Hive 0.13.0.

I created a database table

create table test2 (id INT, latitude DOUBLE, longitude DOUBLE) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
load data local inpath 'spatialdata.txt' into table test2;

and import a small data set contained in a text file, spatialdata.txt.

It has the following data:

id, longitude, latitude
1, 0.0, 0.0
2, 0.0, 4.0
3, 4.0, 4.0
4, 4.0, 0.0
5, 0.0, 0.0

I went through their Hive query interface, and added the two jars

I ran the query and got the following results:

OK
converting to local hdfs://sandbox.hortonworks.com:8020/user/hue/esri-geometry-api.jar
Added /tmp/55573121-b937-4ea7-8e63-0811c2b29f50_resources/esri-geometry-api.jar to class path
Added resource: /tmp/55573121-b937-4ea7-8e63-0811c2b29f50_resources/esri-geometry-api.jar
converting to local hdfs://sandbox.hortonworks.com:8020/user/hue/spatial-sdk-hive-1.0.3-SNAPSHOT.jar
Added /tmp/55573121-b937-4ea7-8e63-0811c2b29f50_resources/spatial-sdk-hive-1.0.3-SNAPSHOT.jar to class path
Added resource: /tmp/55573121-b937-4ea7-8e63-0811c2b29f50_resources/spatial-sdk-hive-1.0.3-SNAPSHOT.jar
FAILED: SemanticException [Error 10014]: Line 1:76 Wrong arguments 'latitude': The UDF implementation class 'com.esri.hadoop.hive.ST_Point' is not present in the class path

From what I can tell, my small example follows many of the on-line ones. I'm not sure why it can't find the function.

ERROR

resolved

Add Topics

see #54

Can't get ArcGIS and Hadoop to work together

We have a clean install of Ubuntu 12.04.3 as our OS (started from scratch). Hadoop 1.2.1 as been installed as per instructions from the Apache Hadoop site. A pseudo-distributed installation has been configured and tested. We are able to see both the job tracker and name node sites, as well as run the test example given in the installation instructions for Apache Hadoop.

We have downloaded the ArcGIS tools for Hadoop but when we try to use any of these tools we get errors. We would like to know how install and configure the GIS tools for Hadoop. I can send screen shots to show the error... please let me know how to get this working.

trip-discovery sample not working - geometry library not loaded

Yes I run the sample using run-it.sh
I edited the file TripCellDriver.java as below

if (args.length != 5) {
System.out.println("Invalid Arguments");
print_usage();
// throw new IllegalArgumentException();
}
System.out.println("Start Arguments");
int size = args.length;
for (int i=0; i<size; i++)
{
System.out.println(String.valueOf(i) + " * " + args[i]);
}

out put on terminal is

Start Arguments
0 * TripCellDriver
1 * -libjars
2 * ../../lib/esri-geometry-api.jar,../../lib/spatial-sdk-hadoop.jar
3 * 15
4 * 1000
5 * /user/cloudera/trip/data/sample-study-area.json
6 * /user/cloudera/trip/data/sample-vehicle-positions.csv
7 * /user/cloudera/trip/inter
End Arguments

It was giving error are arguments are at wrong place so I adjusted the arguments accordingly and got exception as below

Error: java.lang.ClassNotFoundException: com.esri.core.geometry.SpatialReference
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at TripCellReducer.setup(TripCellReducer.java:138)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:168)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

Connect cloudera VM (linux) from ArcMap

I have a cloudera VM (linux) installed locally and ArcMAP installed in my machine. Now Can we connect to to Hadoop cluster from ArcMAP.
I other words ArcMAP is installed on host machine and cloudera VM (linux) installed on guest

develop weighted average tool

Develop Map/Reduce tool that does weighted averages using point in polygon aggregation

Error while converting json to features for ST_Buffer

Hi,
My objective is to create a buffer of 1000 m for enode. I am able to obtain a json using the copy from HDFS tool. But I am unable to create a feature class using json to features tool.
I used the following query

 > CREATE external table enodebuffer (shape binary);
 >INSERT OVERWRITE TABLE enodebuffer
> SELECT ST_Buffer(enodeaggregate.shape,100) from enodeaggregate;

I am able to obtain the count and retrieve the columns on hive. Copy from HDFS to obtain a json file also works successfully.

When I convert json to features using the git hub tool I get
Failed to execute json to features and I get the following description for the error as shown in the image below.

I tried the same query creating a table using multiple colums with the shape column but I get the same error as shown above.

Any info would be greatly appreciated

Thanks

The 'point-in-polygon-aggregation-hive' is failing with ClassCastException: org.apache.hadoop.io.Text incompatible with java.lang.String

Hive - Version 0.9.0

I downloaded the jars and created the tables as mentioned in the sample. But when I run the any query on the counties table it fails:

Here is the Hive create part:
hive> CREATE EXTERNAL TABLE IF NOT EXISTS counties (Area string, Perimeter string, State string, County string, Name string, BoundaryShape binary)
> ROW FORMAT SERDE 'com.esri.hadoop.hive.serde.JsonSerde'
> STORED AS INPUTFORMAT 'com.esri.json.hadoop.EnclosedJsonInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION '/TEST/HIVE/SPATIAL/data/counties';
OK
Time taken: 0.287 seconds
hive> select * from counties limit 5;
OK
Failed with exception java.io.IOException:java.lang.ClassCastException: org.apache.hadoop.io.Text incompatible with java.lang.String
Time taken: 0.45 seconds

Exception:

2013-08-05 20:09:24,847 FATAL ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.ClassCastException: org.apache.hadoop.io.Text incompatible with java.lang.String
at org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaStringObjectInspector.getPrimitiveJavaObject(JavaStringObjectInspector.java:40)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:263)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:349)
at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:219)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:435)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(AccessController.java:310)
at javax.security.auth.Subject.doAs(Subject.java:573)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

Not sure what I am missing. Any pointers are appreciated.

Thanks,

Geodesic Length/Area calculations

I wanted to confirm the availability and functionality of the geodesicArea function in the Geometry Engine library. It seems to point to to a planar2D function which is is throwing off our caculations.

#50

This older issue seems to point to the functionality is missing from the library and I didn't see anything that followed up with an update.

The Java API points to it being available.
https://developers.arcgis.com/java/api-reference/index.html?com/esri/core/geometry/GeometryEngine.html
static double geodesicArea(Geometry geometry, SpatialReference spatialReference, AreaUnit areaUnit)

We also tested the geodesic line functionality against our reference function and it wasn't providing results that matched our reference function.

Issue with Aggregation Sample

Was looking for a little bit of help here: I am working on the Taxi Demo aggregation sample using Hortonworks Sandbox on Ambari and have been able to get up to step 8 even though step 7 didn't work as quick as possible. I was originally using the command line to clone the GIS tools for later use but have now switched to the GUI interface on Ambari to load data and run SQL queries through the Hive editor. If I navigate to local files I can see the esri tools for hadoop directory such as the tutorial suggests but the data that I loaded only worked in step 7 after I put it up on the HDFS. Anyway... I am running into the following error when just trying to create the temporary function: create temporary function ST_Bin as 'com.esri.hadoop.hive.ST_Bin'; ---- The error comes back with the following - H110 Unable to submit statement. Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask [ERROR_STATUS].

Not really understanding the architecture of local vs. hdfs since this is all new to me but would like to finish the tutorial. Any ideas???

Thanks in advance!
Jason

multi-char deserialize error(geojson or esrijson)

data file (utf8 encoding)
create table
ROW FORMAT SERDE 'com.esri.hadoop.hive.serde.JsonSerde'
STORED AS INPUTFORMAT 'com.esri.json.hadoop.EnclosedJsonInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
or
ROW FORMAT SERDE 'com.esri.hadoop.hive.serde.GeoJsonSerDe'
STORED AS INPUTFORMAT 'com.esri.json.hadoop.UnenclosedGeoJsonInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
always code error
hive (default)> select County from u_qqmap.counties limit 1;
OK
?????????

esri / gis-tools-for-hadoop Goto Github PK

gis-tools-for-hadoop's Introduction

gis-tools-for-hadoop

What's New

Features

Instructions

Requirements

Resources

Issues

Contributing

Licensing

gis-tools-for-hadoop's People

Contributors

Stargazers

Watchers

Forkers

gis-tools-for-hadoop's Issues

Recommend Projects

Recommend Topics

Recommend Org