Code Monkey home page Code Monkey logo

oap-project / velox Goto Github PK

View Code? Open in Web Editor NEW

This project forked from facebookincubator/velox

15.0 15.0 45.0 68.93 MB

A new C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.

Home Page: https://facebookincubator.github.io/velox/

License: Apache License 2.0

CMake 1.12% Makefile 0.03% Python 0.61% C 0.84% Shell 0.20% C++ 96.63% Batchfile 0.01% LLVM 0.02% Yacc 0.04% Dockerfile 0.01% Thrift 0.13% Cuda 0.35% JavaScript 0.01% CSS 0.01%

velox's People

Contributors

aditi-pandit avatar ahornby avatar assignuser avatar bikramsingh91 avatar chadaustin avatar duanmeng avatar funrollloops avatar gggrace14 avatar huamengjiang avatar jinchengchenghh avatar jkself avatar kagamiori avatar karteekmurthys avatar kevinwilfong avatar kewang1024 avatar kgpai avatar laithsakka avatar majetideepak avatar mbasmanova avatar pedroerp avatar pramodsatya avatar r-barnes avatar rui-mo avatar tanjialiang avatar usurai avatar xiaoxmeng avatar yingsu00 avatar yuhta avatar zacw7 avatar zzhao0 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

velox's Issues

The CHAR type of ORC supports the problem

Bug description

Is the CHAR type of ORC currently not supported?

Gluten:1.1.0

System information

Caused by: java.lang.RuntimeException: Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: CHAR not supported yet.
Retriable: False
Context: Split [Hive: hdfs://sparkHa/warehouse/tablespace/managed/hive/tpcds_bin_partitioned_orc_100.db/customer/000085_0 0 - 886854] Task Gluten_Stage_2_TID_2067
Top-Level Context: Same as context.
Function: kind
File: /root/src/oap-project/gluten/ep/build-velox/build/velox_ep/velox/dwio/dwrf/common/FileMetadata.cpp
Line: 105
Stack trace:
0 _ZN8facebook5velox7process10StackTraceC1Ei
1 ZN8facebook5velox14VeloxExceptionC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7
2 ZN8facebook5velox6detail14veloxCheckFailINS0_17VeloxRuntimeErrorERKSsEEvRKNS1_18VeloxCheckFailArgsET0
3 _ZNK8facebook5velox4dwrf11TypeWrapper4kindEv
4 _ZN8facebook5velox4dwrf10ReaderBase11convertTypeERKNS1_13FooterWrapperEjb
5 _ZN8facebook5velox4dwrf10ReaderBase11convertTypeERKNS1_13FooterWrapperEjb
6 _ZN8facebook5velox4dwrf10ReaderBaseC2ERNS0_6memory10MemoryPoolESt10unique_ptrINS0_4dwio6common13BufferedInputESt14default_deleteIS9_EESt10shared_ptrINS8_10encryption16DecrypterFactoryEEmmNS8_10FileFormatEb
7 _ZN8facebook5velox4dwrf10DwrfReaderC1ERKNS0_4dwio6common13ReaderOptionsESt10unique_ptrINS4_13BufferedInputESt14default_deleteIS9_EE
8 _ZN8facebook5velox4dwrf10DwrfReader6createESt10unique_ptrINS0_4dwio6common13BufferedInputESt14default_deleteIS6_EERKNS5_13ReaderOptionsE
9 _ZN8facebook5velox4dwrf16OrcReaderFactory12createReaderESt10unique_ptrINS0_4dwio6common13BufferedInputESt14default_deleteIS6_EERKNS5_13ReaderOptionsE
10 _ZN8facebook5velox9connector4hive11SplitReader12prepareSplitERKSt10shared_ptrINS2_15HiveTableHandleEERKNS0_4dwio6common13ReaderOptionsESt10unique_ptrINSA_13BufferedInputESt14default_deleteISF_EES4_INS0_6common14MetadataFilterEERNSA_17RuntimeStatisticsE
11 _ZN8facebook5velox9connector4hive14HiveDataSource8addSplitESt10shared_ptrINS1_14ConnectorSplitEE
12 _ZN8facebook5velox4exec9TableScan9getOutputEv
13 _ZN8facebook5velox4exec6Driver11runInternalERSt10shared_ptrIS2_ERS3_INS1_13BlockingStateEERS3_INS0_9RowVectorEE
14 _ZN8facebook5velox4exec6Driver4nextERSt10shared_ptrINS1_13BlockingStateEE
15 _ZN8facebook5velox4exec4Task4nextEPN5folly10SemiFutureINS3_4UnitEEE
16 _ZN6gluten24WholeStageResultIterator4nextEv
17 Java_io_glutenproject_vectorized_ColumnarBatchOutIterator_nativeHasNext
18 0x00007fc899018427
at io.glutenproject.vectorized.ColumnarBatchOutIterator.nativeHasNext(Native Method)
at io.glutenproject.vectorized.ColumnarBatchOutIterator.hasNextInternal(ColumnarBatchOutIterator.java:65)
at io.glutenproject.vectorized.GeneralOutIterator.hasNext(GeneralOutIterator.java:37)
... 20 more

Relevant logs

No response

TPC-H Q15 outputs unstable query results

Bug description

[Expected behavior] and [actual behavior].
When I use velox as the backend execution engine to test TPC-H, the output result is unstable. Sometimes it prints one result but sometimes the result is empty like below:
image

System information

Velox System Info v0.0.2
Commit: 0fd70ff
CMake Version: 3.27.2
System: Linux-5.15.0-spr.bkc.pc.16.4.24.x86_64
Arch: x86_64
C++ Compiler: /opt/rh/gcc-toolset-11/root/usr/bin/c++
C++ Compiler Version: 11.2.1
C Compiler: /opt/rh/gcc-toolset-11/root/usr/bin/cc
C Compiler Version: 11.2.1
CMake Prefix Path: /usr/local;/usr;/;/home/weiqiang/.local/lib/python3.8/site-packages/cmake/data;/usr/local;/usr/X11R6;/usr/pkg;/opt

\nThe results will be copied to your clipboard if xclip is installed.

Relevant logs

No response

aggreate with varbinary type fail with

Bug description

When run below spark query which has aggregate based on varbianry type. The sql will fail with

Caused by: java.lang.RuntimeException: Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: Unknown input type for min aggregation VARBINARY
Retriable: False
Expression: false
Function: operator()
File: /home/gayangya/Work/Git/OSS/velox/velox/functions/prestosql/aggregates/MinMaxAggregates.cpp
Line: 516
Stack trace:

import org.apache.spark.sql.Row
import org.apache.spark.sql.types.{BinaryType, BooleanType, ByteType, DateType, Decimal, DecimalType, DoubleType, FloatType, IntegerType, LongType, ShortType, StringType, StructField, StructType, TimestampType}

import java.sql.{Date, Timestamp}

implicit class StringToDate(s: String) {
  def date: Date = Date.valueOf(s)
}

implicit class StringToTs(s: String) {
  def ts: Timestamp = Timestamp.valueOf(s)
}

val rows =
  Seq(
    Row(
      "sparkSQL",
      "Spark SQL".getBytes),
    Row(
      "parquet",
      "Parquet".getBytes),
    Row(
      "sparkML",
      "SparkML".getBytes)
  )

val schema = StructType(List(
  StructField("StringCol", StringType, true),
  StructField("BinaryCol", BinaryType, false)).toArray)

val rdd = sc.parallelize(rows)

spark.createDataFrame(rdd, schema).write.format("parquet").save("/tmp/spark1/datatest1")

spark.read.format("parquet").load("/tmp/spark1/datatest1").createOrReplaceTempView("test")
val df = sql("SELECT min(BinaryCol) FROM test")
df.collect

Expected behavior:

It run spark sql successfully.

System information

ubuntu 20.04

Relevant logs

Caused by: java.lang.RuntimeException: Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: Unknown input type for min aggregation VARBINARY
Retriable: False
Expression: false
Function: operator()
File: /home/gayangya/Work/Git/OSS/velox/velox/functions/prestosql/aggregates/MinMaxAggregates.cpp
Line: 516
Stack trace:

This is also causes glutet ut aggregate push down - different data types of GlutenParquetV2AggregatePushDownSuite fail which traced by another issue apache/incubator-gluten#2169

Incompatibility between Spark and Velox Data Types Causing Runtime Failures

Bug description

When utilizing Velox to read data from Spark, we've observed that certain data types are not represented identically between Spark and Parquet files. This discrepancy results in a runtime error when the data returned by the Parquet reader differs from what Spark anticipates. We've identified the following types as problematic:

  1. u8 -> i16
  2. u16 -> i32
  3. u32 -> i64
  4. u64 -> decimal(20, 0)
  5. DateType ignores rebaseMode conf
  6. TimeStampType ignores rebaseMode conf

For instance, while reading columns through Velox, Gluten creates a Velox scan node based on the format expected by Spark. However, due to the incompatible data representation, an error arises as exemplified by the following log:

E0607 16:44:56.883301 11636 Exceptions.h:68] Line: /root/Velox/velox/vector/ComplexVector.h:68, Function:RowVector, Expression: child->type()->kindEquals(type->childAt(i)) Got type BIGINT for field `n0_0` at position 0, but expected DECIMAL(20,0)., Source: RUNTIME, ErrorCode: INVALID_STATE
23/06/07 16:44:56 ERROR TaskResources: Task 4 failed by error: 
java.lang.RuntimeException: Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: Got type BIGINT for field `n0_0` at position 0, but expected DECIMAL(20,0).
Retriable: False
Expression: child->type()->kindEquals(type->childAt(i))
Context: Split [file file:///root/Gluten/backends-velox/target/scala-2.12/test-classes/data-type-validation-data/type3/primitive_types_parquet_file.parquet 0 - 7295] Task gluten task 1
Top-Level Context: Same as context.
Function: RowVector
File: /root/Velox/velox/vector/ComplexVector.h
Line: 68
Stack trace:
# 0  std::shared_ptr<facebook::velox::VeloxException::State const> facebook::velox::VeloxException::State::make<facebook::velox::VeloxException::make(char const*, unsigned long, char const*, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, bool, facebook::velox::VeloxException::Type, std::basic_string_view<char, std::char_traits<char> >)::{lambda(auto:1&)#1}>(facebook::velox::VeloxException::Type, facebook::velox::VeloxException::make(char const*, unsigned long, char const*, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, bool, facebook::velox::VeloxException::Type, std::basic_string_view<char, std::char_traits<char> >)::{lambda(auto:1&)#1})
# 1  facebook::velox::VeloxException::VeloxException(char const*, unsigned long, char const*, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, bool, facebook::velox::VeloxException::Type, std::basic_string_view<char, std::char_traits<char> >)
# 2  facebook::velox::VeloxRuntimeError::VeloxRuntimeError(char const*, unsigned long, char const*, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, bool, std::basic_string_view<char, std::char_traits<char> >)
# 3  void facebook::velox::detail::veloxCheckFail<facebook::velox::VeloxRuntimeError, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>(facebook::velox::detail::VeloxCheckFailArgs const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
# 4  facebook::velox::RowVector::RowVector(facebook::velox::memory::MemoryPool*, std::shared_ptr<facebook::velox::Type const>, boost::intrusive_ptr<facebook::velox::Buffer>, unsigned long, std::vector<std::shared_ptr<facebook::velox::BaseVector>, std::allocator<std::shared_ptr<facebook::velox::BaseVector> > >, std::optional<int>)
# 5  void __gnu_cxx::new_allocator<facebook::velox::RowVector>::construct<facebook::velox::RowVector, facebook::velox::memory::MemoryPool*&, std::shared_ptr<facebook::velox::RowType const> const&, boost::intrusive_ptr<facebook::velox::Buffer>, int&, std::vector<std::shared_ptr<facebook::velox::BaseVector>, std::allocator<std::shared_ptr<facebook::velox::BaseVector> > >&>(facebook::velox::RowVector*, facebook::velox::memory::MemoryPool*&, std::shared_ptr<facebook::velox::RowType const> const&, boost::intrusive_ptr<facebook::velox::Buffer>&&, int&, std::vector<std::shared_ptr<facebook::velox::BaseVector>, std::allocator<std::shared_ptr<facebook::velox::BaseVector> > >&)
# 6  void std::allocator_traits<std::allocator<facebook::velox::RowVector> >::construct<facebook::velox::RowVector, facebook::velox::memory::MemoryPool*&, std::shared_ptr<facebook::velox::RowType const> const&, boost::intrusive_ptr<facebook::velox::Buffer>, int&, std::vector<std::shared_ptr<facebook::velox::BaseVector>, std::allocator<std::shared_ptr<facebook::velox::BaseVector> > >&>(std::allocator<facebook::velox::RowVector>&, facebook::velox::RowVector*, facebook::velox::memory::MemoryPool*&, std::shared_ptr<facebook::velox::RowType const> const&, boost::intrusive_ptr<facebook::velox::Buffer>&&, int&, std::vector<std::shared_ptr<facebook::velox::BaseVector>, std::allocator<std::shared_ptr<facebook::velox::BaseVector> > >&)
# 7  std::_Sp_counted_ptr_inplace<facebook::velox::RowVector, std::allocator<facebook::velox::RowVector>, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<facebook::velox::memory::MemoryPool*&, std::shared_ptr<facebook::velox::RowType const> const&, boost::intrusive_ptr<facebook::velox::Buffer>, int&, std::vector<std::shared_ptr<facebook::velox::BaseVector>, std::allocator<std::shared_ptr<facebook::velox::BaseVector> > >&>(std::allocator<facebook::velox::RowVector>, facebook::velox::memory::MemoryPool*&, std::shared_ptr<facebook::velox::RowType const> const&, boost::intrusive_ptr<facebook::velox::Buffer>&&, int&, std::vector<std::shared_ptr<facebook::velox::BaseVector>, std::allocator<std::shared_ptr<facebook::velox::BaseVector> > >&)
# 8  std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<facebook::velox::RowVector, std::allocator<facebook::velox::RowVector>, facebook::velox::memory::MemoryPool*&, std::shared_ptr<facebook::velox::RowType const> const&, boost::intrusive_ptr<facebook::velox::Buffer>, int&, std::vector<std::shared_ptr<facebook::velox::BaseVector>, std::allocator<std::shared_ptr<facebook::velox::BaseVector> > >&>(facebook::velox::RowVector*&, std::_Sp_alloc_shared_tag<std::allocator<facebook::velox::RowVector> >, facebook::velox::memory::MemoryPool*&, std::shared_ptr<facebook::velox::RowType const> const&, boost::intrusive_ptr<facebook::velox::Buffer>&&, int&, std::vector<std::shared_ptr<facebook::velox::BaseVector>, std::allocator<std::shared_ptr<facebook::velox::BaseVector> > >&)
# 9  std::__shared_ptr<facebook::velox::RowVector, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<facebook::velox::RowVector>, facebook::velox::memory::MemoryPool*&, std::shared_ptr<facebook::velox::RowType const> const&, boost::intrusive_ptr<facebook::velox::Buffer>, int&, std::vector<std::shared_ptr<facebook::velox::BaseVector>, std::allocator<std::shared_ptr<facebook::velox::BaseVector> > >&>(std::_Sp_alloc_shared_tag<std::allocator<facebook::velox::RowVector> >, facebook::velox::memory::MemoryPool*&, std::shared_ptr<facebook::velox::RowType const> const&, boost::intrusive_ptr<facebook::velox::Buffer>&&, int&, std::vector<std::shared_ptr<facebook::velox::BaseVector>, std::allocator<std::shared_ptr<facebook::velox::BaseVector> > >&)
# 10 std::shared_ptr<facebook::velox::RowVector>::shared_ptr<std::allocator<facebook::velox::RowVector>, facebook::velox::memory::MemoryPool*&, std::shared_ptr<facebook::velox::RowType const> const&, boost::intrusive_ptr<facebook::velox::Buffer>, int&, std::vector<std::shared_ptr<facebook::velox::BaseVector>, std::allocator<std::shared_ptr<facebook::velox::BaseVector> > >&>(std::_Sp_alloc_shared_tag<std::allocator<facebook::velox::RowVector> >, facebook::velox::memory::MemoryPool*&, std::shared_ptr<facebook::velox::RowType const> const&, boost::intrusive_ptr<facebook::velox::Buffer>&&, int&, std::vector<std::shared_ptr<facebook::velox::BaseVector>, std::allocator<std::shared_ptr<facebook::velox::BaseVector> > >&)
# 11 std::shared_ptr<facebook::velox::RowVector> std::allocate_shared<facebook::velox::RowVector, std::allocator<facebook::velox::RowVector>, facebook::velox::memory::MemoryPool*&, std::shared_ptr<facebook::velox::RowType const> const&, boost::intrusive_ptr<facebook::velox::Buffer>, int&, std::vector<std::shared_ptr<facebook::velox::BaseVector>, std::allocator<std::shared_ptr<facebook::velox::BaseVector> > >&>(std::allocator<facebook::velox::RowVector> const&, facebook::velox::memory::MemoryPool*&, std::shared_ptr<facebook::velox::RowType const> const&, boost::intrusive_ptr<facebook::velox::Buffer>&&, int&, std::vector<std::shared_ptr<facebook::velox::BaseVector>, std::allocator<std::shared_ptr<facebook::velox::BaseVector> > >&)
# 12 std::shared_ptr<facebook::velox::RowVector> std::make_shared<facebook::velox::RowVector, facebook::velox::memory::MemoryPool*&, std::shared_ptr<facebook::velox::RowType const> const&, boost::intrusive_ptr<facebook::velox::Buffer>, int&, std::vector<std::shared_ptr<facebook::velox::BaseVector>, std::allocator<std::shared_ptr<facebook::velox::BaseVector> > >&>(facebook::velox::memory::MemoryPool*&, std::shared_ptr<facebook::velox::RowType const> const&, boost::intrusive_ptr<facebook::velox::Buffer>&&, int&, std::vector<std::shared_ptr<facebook::velox::BaseVector>, std::allocator<std::shared_ptr<facebook::velox::BaseVector> > >&)
# 13 facebook::velox::connector::hive::HiveDataSource::next(unsigned long, folly::SemiFuture<folly::Unit>&)
# 14 facebook::velox::exec::TableScan::getOutput()
# 15 facebook::velox::exec::Driver::runInternal(std::shared_ptr<facebook::velox::exec::Driver>&, std::shared_ptr<facebook::velox::exec::BlockingState>&, std::shared_ptr<facebook::velox::RowVector>&)
# 16 facebook::velox::exec::Driver::next(std::shared_ptr<facebook::velox::exec::BlockingState>&)
# 17 facebook::velox::exec::Task::next(folly::SemiFuture<folly::Unit>*)
# 18 gluten::WholeStageResultIterator::next()
# 19 gluten::ResultIterator::getNext()
# 20 gluten::ResultIterator::hasNext()
# 21 Java_io_glutenproject_vectorized_ColumnarBatchOutIterator_nativeHasNext
# 22 0x00007fc7dd020848

    at io.glutenproject.vectorized.ColumnarBatchOutIterator.nativeHasNext(Native Method)
    at io.glutenproject.vectorized.ColumnarBatchOutIterator.hasNextInternal(ColumnarBatchOutIterator.java:45)
    at io.glutenproject.vectorized.GeneralOutIterator.hasNext(GeneralOutIterator.java:37)
    at io.glutenproject.backendsapi.glutendata.GlutenIteratorApi$$anon$2.hasNext(GlutenIteratorApi.scala:240)
    at io.glutenproject.vectorized.CloseableColumnBatchIterator.hasNext(CloseableColumnBatchIterator.scala:41)
    at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
    at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:400)
    at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:897)
    at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:897)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:57)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:366)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:330)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:136)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)

This error states that a BIGINT type was returned for the field n0_0 at position 0, while a DECIMAL(20,0) was expected.

System information

Velox System Info v0.0.2
Commit: 3fa47086d3f81927a21d4da2cabff40bfd73331c
CMake Version: 3.25.2
System: Linux-5.4.0-1109-azure
Arch: x86_64
C++ Compiler: /usr/bin/c++
C++ Compiler Version: 9.4.0
C Compiler: /usr/bin/cc
C Compiler Version: 9.4.0
CMake Prefix Path: /usr/local;/usr;/;/usr;/usr/local;/us

Relevant logs

No response

fix the failed UT on `update` branch

Bug description

The following tests FAILED:
217 - velox_type_test (Failed)
225 - velox_expression_test (Failed)
227 - velox_dwio_common_test (Failed)
262 - velox_dwrf_e2e_filter_test (Failed)
282 - velox_functions_test (Failed)
284 - velox_functions_spark_aggregates_test (Failed)
288 - velox_hive_connector_test (Failed)
292 - velox_exec_test (Failed)

System information

n/a

Relevant logs

No response

`hive.s3.endpoint` is not set correctly by gluten

Bug description

I test native write to Alibaba OSS with gluten + velox. Spark configuration is here

spark.hadoop.fs.s3a.endpoint: https://oss-cn-hangzhou.aliyuncs.com
spark.hadoop.fs.s3a.access.key: <access-key>
spark.hadoop.fs.s3a.secret.key: <secret-key>
spark.hadoop.fs.s3a.path.style.access: false
spark.hadoop.fs.s3a.connection.ssl.enabled: true

But aws-cpp-sdk wants to visit s3.us-east-1.amazonaws.com
aws_sdk_2023-12-18-08.log

System information

Velox System Info v0.0.2
Commit: c7389f1
CMake Version: 3.22.1
System: Linux-6.5.11-linuxkit
Arch: x86_64
C++ Compiler: /usr/bin/c++
C++ Compiler Version: 11.4.0
C Compiler: /usr/bin/cc
C Compiler Version: 11.4.0
CMake Prefix Path: /usr/local;/usr;/;/usr;/usr/local;/usr/X11R6;/usr/pkg;/opt

Relevant logs

gluten commit is fcdd30e582df12275c66275e3a1dc956a193a324

Gluten Build fails with libvelox_hive_connector.a missing after latest upstream sync with velox.

Problem description

Gluten CPP Build checks if libvelox_hive_connector.a is present in the path. If it isn't present, the build fails. This is happening because the build of hive connector is changed to OBJECT file rather than a static library. We need to remove the OBJECT label from the CMakeLists.txt here - https://github.com/oap-project/velox/blob/175870db403ba9159d588fc772a5c936c27d0910/velox/connectors/hive/CMakeLists.txt#L16C24-L16C30

Upstream Velox uses OBJECT file for Presto integration. Without OBJECT label for velox_hive_conector, build of presto_server sees error like undefined reference to
HiveTableHandle::HiveTableHandle()

But Gluten needs static object file to generate a combined shared object later.

System information

Velox System Info v0.0.2
Commit: 175870d
CMake Version: 3.25.1
System: Linux-6.3.11-amd64
Arch: x86_64
C++ Compiler: /usr/bin/c++
C++ Compiler Version: 12.2.0
C Compiler: /usr/bin/cc
C Compiler Version: 12.2.0
CMake Prefix Path: /usr/local;/usr;/;/usr;/usr/local;/usr/X11R6;/usr/pkg;/opt

CMake log

CMake Error at velox/CMakeLists.txt:68 (message):
  Velox library not exists:
  /velox/connectors/hive/libvelox_hive_connector.a
Call Stack (most recent call first):
  velox/CMakeLists.txt:111 (add_velox_dependency)
  velox/CMakeLists.txt:305 (add_velox_dependencies)

Crash caused by TimestampColumnReader

Bug description

Crash:


#

# A fatal error has been detected by the Java Runtime Environment:

#

#  SIGSEGV (0xb) at pc=0x00007fbf5a802547, pid=2230, tid=0x00007fbf21bd3700

#

# JRE version: OpenJDK Runtime Environment (8.0_382-b05) (build 1.8.0_382-b05)

# Java VM: OpenJDK 64-Bit Server VM (25.382-b05 mixed mode linux-amd64 compressed oops)

# Problematic frame:

# C  [libvelox.so+0x3667547]  void facebook::velox::parquet::PageReader::readWithVisitor<facebook::velox::dwio::common::ColumnVisitor<__int128, facebook::velox::common::AlwaysTrue, facebook::velox::dwio::common::ExtractToReader<facebook::velox::dwio::common::SelectiveIntegerColumnReader>, true> >(facebook::velox::dwio::common::ColumnVisitor<__int128, facebook::velox::common::AlwaysTrue, facebook::velox::dwio::common::ExtractToReader<facebook::velox::dwio::common::SelectiveIntegerColumnReader>, true>&)+0x487

#

# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again

#

# An error report file with more information is saved as:

# /home/vsts/work/1/s/core/hs_err_pid2230.log

#

# If you would like to submit a bug report, please visit:

#   https://github.com/adoptium/adoptium-support/issues

# The crash happened outside the Java Virtual Machine in native code.

# See problematic frame for where to report the bug.

#

Exception in thread "Thread-23" java.io.EOFException

  at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:3111)

  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1620)

  at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)

  at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)

  at org.scalatest.tools.Framework$ScalaTestRunner$Skeleton$1$React.react(Framework.scala:839)

  at org.scalatest.tools.Framework$ScalaTestRunner$Skeleton$1.run(Framework.scala:828)

  at java.lang.Thread.run(Thread.java:750)

Repro:
spark.read.format("parquet").load("00000000000000000003.checkpoint.parquet").collect()
File is uploaded to https://github.com/zhli1142015/velox/blob/check-point-parquet/00000000000000000003.checkpoint.parquet.

By our investigation, this is caused by below override is missed in TimestampColumnReader, adding it should fix the issue.

  bool hasBulkPath() const override {
    return false;
  }

System information

Velox System Info v0.0.2
Commit: 6c92156
CMake Version: 3.16.3
System: Linux-5.10.102.1-microsoft-standard-WSL2
Arch: x86_64
C++ Compiler: /usr/bin/c++
C++ Compiler Version: 9.4.0
C Compiler: /usr/bin/cc
C Compiler Version: 9.4.0
CMake Prefix Path: /usr/local;/usr;/;/usr;/usr/local;/usr/X11R6;/usr/pkg;/opt

Relevant logs

No response

branch-1.1:Failed to get metadata for S3 object

Bug description

Bug description
I built gluten+velox using branch-1.1, submitted a tpch query using spark-shell, and the data was stored in s3. However, the following error occurred during execution:

Reason: Failed to get metadata for S3 object due to: 'Unknown error'. Path:'s3://xxxxxxx/user/hive/warehouse/tpch_orc.db/customer/part-00027-31ef1f3c-5b27-4f6c-aef4-7f77f7749873-c000.snappy.orc', SDK Error Type:100, HTTP Status Code:400, S3 Service:'AmazonS3', Message:'No response body.', RequestID:'KC5WQZ78QWKQ9BFX'"

But I can use gluten tag v1.0.0 version to execute normally.

@majetideepak

System information

System information
build branch-1.1 system info:

Velox System Info v0.0.2
Commit: facebookincubator@bbd65c4
CMake Version: 3.16.3
System: Linux-5.15.0-91-generic
Arch: x86_64
C++ Compiler: /bin/c++
C++ Compiler Version: 9.4.0
C Compiler: /bin/cc
C Compiler Version: 9.4.0
CMake Prefix Path: /usr/local;/usr;/;/usr;/usr/local;/usr/X11R6;/usr/pkg;/opt

run on aws eks

Relevant logs

"2023-12-05T07:12:37.689576121Z stdout F 23/12/05 07:12:37 ERROR TaskResources: Task 8 failed by error: ",
"2023-12-05T07:12:37.689606328Z stdout F io.glutenproject.exception.GlutenException: java.lang.RuntimeException: Exception: VeloxRuntimeError",
"2023-12-05T07:12:37.689628682Z stdout F Error Source: RUNTIME",
"2023-12-05T07:12:37.689632451Z stdout F Error Code: INVALID_STATE",
"2023-12-05T07:12:37.689636372Z stdout F Reason: Failed to get metadata for S3 object due to: 'Unknown error'. Path:'s3://xxxxxx/user/hive/warehouse/tpch_orc.db/customer/part-00027-31ef1f3c-5b27-4f6c-aef4-7f77f7749873-c000.snappy.orc', SDK Error Type:100, HTTP Status Code:400, S3 Service:'AmazonS3', Message:'No response body.', RequestID:'KC5WQZ78QWKQ9BFH'",
"2023-12-05T07:12:37.689639435Z stdout F Retriable: False",
"2023-12-05T07:12:37.689643198Z stdout F Context: Split [Hive: s3a://xxxxxx/user/hive/warehouse/tpch_orc.db/customer/part-00027-31ef1f3c-5b27-4f6c-aef4-7f77f7749873-c000.snappy.orc 0 - 121746056] Task Gluten_Stage_0_TID_8",
"2023-12-05T07:12:37.689646437Z stdout F Top-Level Context: Same as context.",
"2023-12-05T07:12:37.689649292Z stdout F Function: initialize",
"2023-12-05T07:12:37.689652406Z stdout F File: ../../velox/connectors/hive/storage_adapters/s3fs/S3FileSystem.cpp",
"2023-12-05T07:12:37.689655045Z stdout F Line: 93", 
"2023-12-05T07:12:37.689657984Z stdout F Stack trace:",
"2023-12-05T07:12:37.689661375Z stdout F # 0  facebook::velox::VeloxException::VeloxException(char const*, unsigned long, char const*, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, bool, facebook::velox::VeloxException::Type, std::basic_string_view<char, std::char_traits<char> >)",
"2023-12-05T07:12:37.689670744Z stdout F # 1  void facebook::velox::detail::veloxCheckFail<facebook::velox::VeloxRuntimeError, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>(facebook::velox::detail::VeloxCheckFailArgs const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)", 
"2023-12-05T07:12:37.68967352Z stdout F # 2  facebook::velox::(anonymous namespace)::S3ReadFile::initialize()",
"2023-12-05T07:12:37.689677103Z stdout F # 3  facebook::velox::filesystems::S3FileSystem::openFileForRead(std::basic_string_view<char, std::char_traits<char> >, facebook::velox::filesystems::FileOptions const&)",
"2023-12-05T07:12:37.689680232Z stdout F # 4  facebook::velox::FileHandleGenerator::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)",
"2023-12-05T07:12:37.689682935Z stdout F # 5  facebook::velox::CachedFactory<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::shared_ptr<facebook::velox::FileHandle>, facebook::velox::FileHandleGenerator>::generate(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)", 
"2023-12-05T07:12:37.689686275Z stdout F # 6  facebook::velox::connector::hive::HiveDataSource::addSplit(std::shared_ptr<facebook::velox::connector::ConnectorSplit>)",
"2023-12-05T07:12:37.68970488Z stdout F # 7  facebook::velox::exec::TableScan::getOutput()",
"2023-12-05T07:12:37.689707926Z stdout F # 8  facebook::velox::exec::Driver::runInternal(std::shared_ptr<facebook::velox::exec::Driver>&, std::shared_ptr<facebook::velox::exec::BlockingState>&, std::shared_ptr<facebook::velox::RowVector>&)",
"2023-12-05T07:12:37.689710953Z stdout F # 9  facebook::velox::exec::Driver::next(std::shared_ptr<facebook::velox::exec::BlockingState>&)",
"2023-12-05T07:12:37.689713812Z stdout F # 10 facebook::velox::exec::Task::next(folly::SemiFuture<folly::Unit>*)",
"2023-12-05T07:12:37.689716972Z stdout F # 11 gluten::WholeStageResultIterator::next()",
"2023-12-05T07:12:37.689719966Z stdout F # 12 Java_io_glutenproject_vectorized_ColumnarBatchOutIterator_nativeHasNext",

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.