Spark
1. Introduction
2. Installation
java -version
# If java 8 is not installed
sudo apt install --no-install-recommends openjdk-8-jre-headless -y
# Download spark from https://spark.apache.org/downloads.html
wget http://apache.40b.nl/spark/spark-2.4.0/spark-2.4.0-bin-hadoop2.7.tgz
gunzip -c spark-2.4.0-bin-hadoop2.7.tgz | tar xvf -
rm spark-2.4.0-bin-hadoop2.7.tgz
# Install spark
sudo mv spark-2.4.0-bin-hadoop2.7/ /usr/local/sparkecho "# For spark!" >> ~/.bashrc
echo "JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64" >> ~/.bashrc
echo "export PATH=$PATH:/usr/local/spark/bin" >> ~/.bashrc
source ~/.bashrc3. Testing
4. Pyspark
4.1 Setup
4.2 Create DataFrame
5. Docker Image
Last updated
Was this helpful?