Chendi Xue

I am linux software engineer, currently working on Spark, Arrow, Kubernetes, Ceph, c/c++, and etc.

Apache Arrow enabling HDFS Parquet support

20 Aug 2019 » Spark, Arrow

First Arrow building from source blog refer to here: link

install double-conversion

git clone https://github.com/google/double-conversion.git
cd double-conversion
mkdir build
cd build
cmake -DBUILD_SHARED_LIBS=ON ..
make
make install

apache arrow and gandiva

cd arrow/cpp/build
rm -rf *
cmake -DARROW_GANDIVA_JAVA=ON -DARROW_GANDIVA=ON -DARROW_PARQUET=ON -DARROW_HDFS=ON -DARROW_BOOST_USE_SHARED=ON ..
# I noticed that even when we saw succeed of above cmake procedure, there will be some error inside "CMakeFiles/CMakeError.log", those errors are not fatal. So if you failed in building, you should keep looking the issue on foreground terminal log instead of the "CMakeFiles/CMakeError.log"
make
make install