Install spark and hadoop in Windows 10

Install spark to run locally

Versions

java -version
openjdk version "1.8.0_292"
OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_292-b10)
OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.292-b10, mixed mode)
sbt
console
Welcome to Scala 2.12.13 (OpenJDK 64-Bit Server VM, Java 1.8.0_292).

Install Spark

Download from https://spark.apache.org/downloads.html.

Add system environment variable SPARK_HOME with value C:\Spark\spark-3.2.0-bin-hadoop2.7. Add %SPARK_HOME%\bin to system PATH.

Install Hadoop

Downloaded from https://github.com/cdarlint/winutils. Folder hadoop-2.7.7 downloaded as C:\Hadoop\hadoop-2.7.7. Add system environment variable HADOOP_HOME with value C:\Hadoop\hadoop-2.7.7 Add %HADOOP_HOME%\bin system PATH.

Spark structured streaming with kafka

https://spark.apache.org/docs/3.1.1/structured-streaming-kafka-integration.html https://medium.com/expedia-group-tech/apache-spark-structured-streaming-checkpoints-and-triggers-4-of-6-b6f15d5cfd8d

Parquet viewer

https://github.com/mukunku/ParquetViewer

Access Azure Data Lake Gen 2

https://docs.databricks.com/data/data-sources/azure/adls-gen2/azure-datalake-gen2-get-started.html