Hadoop building from source
git clone https://github.com/apache/hadoop.git cd hadoop git checkout rel/release-3.2.0 # only build binary for hadoop mvn clean install -Pdist -DskipTests -Dtar # build binary and native library such as libhdfs.so for hadoop # mvn clean install -Pdist,native -DskipTests -Dtar
Using mvn install, compiled hadoop jar will be put into /root/.m2/repository/ for other projects depends on hadoop to use. Check below pic, now, we have hadoop-3.2.0 installed and can be used by continuelly spark compiling.
To run hadoop, binary path is /hadoop/hadoop-dist/target/hadoop-3.2.0/
git clone https://github.com/apache/spark.git cd spark # check spark supported hadoop version grep \<hadoop\.version\> -r pom.xml <hadoop.version>2.7.4</hadoop.version> <hadoop.version>3.2.0</hadoop.version> # so we should build spark specifying hadoop version as 3.2 ./build/mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.0 -DskipTests clean package # we can use package here since there is no following package depends on spark, other wise, also use install here.
Specify SPARK_HOME to spark path
use -o option when executing mvn to avoid online package update or installation.
./build/mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.0 -DskipTests clean package -o
pre install requirements to mvn repo – /root/.m2/repository to avoid slow mvn downloading.