Scala up and running

Installation

Java

Installed AdoptOpenJDK 11. Other failed in runtime for https issues (apparently, missing the local store that is installed with Java)

Scala

Installed SBT from official site. Get version:

sbt scalaVersion

New project

Content for build.sbt for a project with a subproject

ThisBuild / version      := "0.1.0"
ThisBuild / scalaVersion := "2.13.8"
ThisBuild / organization := "gs"
ThisBuild / scalacOptions ++= Seq("-unchecked", "-deprecation") // so sbt compile shows deprecation warnings details

// add `% "provided"` to not include the dependency in the compiled JAR, as it is assumed that Databricks will already have this library on the classpath available when running in the cluster.
lazy val KafkaStreamProcessing = (project in file("."))
  .settings(
    name := "SparkProcessing",

    // spark 3.2 with hadoop 2.7 installed
    libraryDependencies ++= List("org.apache.spark" %% "spark-core" % "3.2.0",
                                "org.apache.spark" %% "spark-sql" % "3.2.0",
                                "org.apache.spark" %% "spark-sql-kafka-0-10" % "3.2.0",
                                "org.apache.spark" %% "spark-avro" % "3.2.0", 
                                "org.apache.hadoop" % "hadoop-common" % "3.3.1",
                                "org.apache.hadoop" % "hadoop-azure" % "3.3.1"),
    
    libraryDependencies += "io.confluent" % "kafka-schema-registry-client" % "7.0.0" from "https://packages.confluent.io/maven/io/confluent/kafka-schema-registry-client/7.0.0/kafka-schema-registry-client-7.0.0.jar",


    libraryDependencies += "org.scalatest" %% "scalatest" % "3.0.8" % Test,
    libraryDependencies += "com.typesafe" % "config" % "1.4.1", // for typed config

    assembly / assemblyJarName := s"${name.value}.jar", // this is for sbt-assembly plugin
    // https://github.com/sbt/sbt-assembly#merge-strategy
    assembly / assemblyMergeStrategy := { 
      case PathList("META-INF", xs @ _*) => MergeStrategy.discard
      case _ => MergeStrategy.first
    }
  )

plugins.sbt

// sbt-assembly plugin to package in a single JAR
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.15.0")

AppConfig.scala file

package bkr.data.spark

import com.typesafe.config.ConfigFactory

object AppConfig {
    
    private val environment: String = {
        val env = sys.env.get("env") // this returns Option[String]
        env.getOrElse("local") // to get the actual value from the Option[String], and return "ENV" if no value found
    }

    private val appConfig = ConfigFactory.load(s"app.$environment.conf") // loads from `resources/app.conf`

    def apply() = appConfig
}

App.scala file

package bkr.scala.upAndRunning

object Application extends App {
    if (args.lenght == 0) throw new Exception("No arguments specified")

    val url = args(0)

    val response = scala.io.Source.fromURL(url).mkString

    println(response)
}

AppConfigTests.scala

import org.scalatest.funsuite._

class AppConfigTests extends AnyFunSuite {
  test("Hello should start with H") {
    assert("Hello".startsWith("H"))
  }
}
sbt "run arg0Value arg1Value"
java -jar .\app.jar arg0Value

About sbt-assembly plugin

See Package in a single JAR file later on.

Project folders structure

.gitignore

target/

Sbt project dependency graph

Add plugin

// project/plugins.sbt
addSbtPlugin("net.virtual-void" % "sbt-dependency-graph" % "0.9.2")

Then

sbt dependencyTree

Sbt commands cheatsheet

https://www.scala-sbt.org/1.x/docs/Command-Line-Reference.html To go into sbt shell, go

sbt

Alternatively, all the commands can be run from the regular terminal pre-pending sbt, e.g., command compile would become

sbt compile

Open scala console

console

To exit the console

:q

Compile

sbt compile

Apply changes made to build.sbt

sbt reload

Run/Compile tests

sbt test
sbt test:compile

List all projects and subprojects

sbt projects

Compile subproject

Subproject name is not what appears as name in the settings, but instead the variable name declared in the build file. O just run projects to get the names.

sbt [SUBPROJECT_NAME]/compile
// e.g.
helloCore/compile

Run the app

sbt run

Create a zip package with sbt dist and run

This produces a ZIP file containing all JAR files needed to run your application in the target/universal folder of your application.To run the application, unzip the file on the target server, and then run the script in the bin directory. The name of the script is your application name, and it comes in two versions, a bash shell script, and a windows .bat script.

sbt dist

Then unzip, e.g. in a publish folder, and in another terminal (cmd/powershell) run the bat script in bin folder

cd publish
.\bin\hello

A different configuration file can be specified for a production environment, from the command line:

/bin/hello -Dconfig.file=/full/path/to/conf/application.prod.conf

Package in a single JAR file

https://github.com/sbt/sbt-assembly Added plugin sbt-assembly in project/assembly.sbt file, and configured in build.sbt. To create the fat JAR file, in sbt console go

sbt assembly

The jar is created in root/target/scala-2.12/Weather.ConsoleUI.jar To run it, go

java -jar Weather.ConsoleUI.jar

Dockerize the app

Dockerfile

FROM hseeberger/scala-sbt:8u302_1.5.5_2.13.6
WORKDIR /app
COPY . ./
RUN sbt compile
## Ideally, build the JAR and deploy it to a different runtime image... but just using this one for the moment
ENTRYPOINT ["sbt", "run"]