Scala up and running
Installation
Java
Installed AdoptOpenJDK 11. Other failed in runtime for https issues (apparently, missing the local store that is installed with Java)
Scala
Installed SBT from official site. Get version:
sbt scalaVersion
New project
- Create project root folder.
- In the root folder, create
build.sbt
file. - Create
[root]/src/main/scala/bkr/data/spark/App.scala
file, for a package namedbkr.data.spark
Content for build.sbt
for a project with a subproject
ThisBuild / version := "0.1.0"
ThisBuild / scalaVersion := "2.13.8"
ThisBuild / organization := "gs"
ThisBuild / scalacOptions ++= Seq("-unchecked", "-deprecation") // so sbt compile shows deprecation warnings details
// add `% "provided"` to not include the dependency in the compiled JAR, as it is assumed that Databricks will already have this library on the classpath available when running in the cluster.
lazy val KafkaStreamProcessing = (project in file("."))
.settings(
name := "SparkProcessing",
// spark 3.2 with hadoop 2.7 installed
libraryDependencies ++= List("org.apache.spark" %% "spark-core" % "3.2.0",
"org.apache.spark" %% "spark-sql" % "3.2.0",
"org.apache.spark" %% "spark-sql-kafka-0-10" % "3.2.0",
"org.apache.spark" %% "spark-avro" % "3.2.0",
"org.apache.hadoop" % "hadoop-common" % "3.3.1",
"org.apache.hadoop" % "hadoop-azure" % "3.3.1"),
libraryDependencies += "io.confluent" % "kafka-schema-registry-client" % "7.0.0" from "https://packages.confluent.io/maven/io/confluent/kafka-schema-registry-client/7.0.0/kafka-schema-registry-client-7.0.0.jar",
libraryDependencies += "org.scalatest" %% "scalatest" % "3.0.8" % Test,
libraryDependencies += "com.typesafe" % "config" % "1.4.1", // for typed config
assembly / assemblyJarName := s"${name.value}.jar", // this is for sbt-assembly plugin
// https://github.com/sbt/sbt-assembly#merge-strategy
assembly / assemblyMergeStrategy := {
case PathList("META-INF", xs @ _*) => MergeStrategy.discard
case _ => MergeStrategy.first
}
)
plugins.sbt
// sbt-assembly plugin to package in a single JAR
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.15.0")
AppConfig.scala
file
package bkr.data.spark
import com.typesafe.config.ConfigFactory
object AppConfig {
private val environment: String = {
val env = sys.env.get("env") // this returns Option[String]
env.getOrElse("local") // to get the actual value from the Option[String], and return "ENV" if no value found
}
private val appConfig = ConfigFactory.load(s"app.$environment.conf") // loads from `resources/app.conf`
def apply() = appConfig
}
App.scala
file
package bkr.scala.upAndRunning
object Application extends App {
if (args.lenght == 0) throw new Exception("No arguments specified")
val url = args(0)
val response = scala.io.Source.fromURL(url).mkString
println(response)
}
AppConfigTests.scala
import org.scalatest.funsuite._
class AppConfigTests extends AnyFunSuite {
test("Hello should start with H") {
assert("Hello".startsWith("H"))
}
}
- To pass arguments
sbt "run arg0Value arg1Value"
java -jar .\app.jar arg0Value
About sbt-assembly plugin
See Package in a single JAR file
later on.
Project folders structure
- root/
- build.sbt
- project/
- plugins.sbt
- src/
- main/
- resources/
- app.conf
- scala/bkr/data/spark
- App.scala
- AppConfig.scala
- resources/
- test/
- resources/
- App.conf
- scala/bkr/data/spark
- AppConfigTests.scala
- resources/
- main/
.gitignore
target/
Sbt project dependency graph
Add plugin
// project/plugins.sbt
addSbtPlugin("net.virtual-void" % "sbt-dependency-graph" % "0.9.2")
Then
sbt dependencyTree
Sbt commands cheatsheet
https://www.scala-sbt.org/1.x/docs/Command-Line-Reference.html To go into sbt shell, go
sbt
Alternatively, all the commands can be run from the regular terminal pre-pending sbt, e.g., command compile
would become
sbt compile
Open scala console
console
To exit the console
:q
Compile
sbt compile
Apply changes made to build.sbt
sbt reload
Run/Compile tests
sbt test
sbt test:compile
List all projects and subprojects
sbt projects
Compile subproject
Subproject name is not what appears as name
in the settings, but instead the variable name declared in the build file. O just run projects
to get the names.
sbt [SUBPROJECT_NAME]/compile
// e.g.
helloCore/compile
Run the app
sbt run
Create a zip package with sbt dist
and run
This produces a ZIP file containing all JAR files needed to run your application in the target/universal folder of your application.To run the application, unzip the file on the target server, and then run the script in the bin directory. The name of the script is your application name, and it comes in two versions, a bash shell script, and a windows .bat script.
sbt dist
Then unzip, e.g. in a publish
folder, and in another terminal (cmd/powershell) run the bat
script in bin
folder
cd publish
.\bin\hello
A different configuration file can be specified for a production environment, from the command line:
/bin/hello -Dconfig.file=/full/path/to/conf/application.prod.conf
Package in a single JAR file
https://github.com/sbt/sbt-assembly
Added plugin sbt-assembly
in project/assembly.sbt
file, and configured in build.sbt
.
To create the fat JAR file, in sbt console go
sbt assembly
The jar
is created in root/target/scala-2.12/Weather.ConsoleUI.jar
To run it, go
java -jar Weather.ConsoleUI.jar
Dockerize the app
Dockerfile
FROM hseeberger/scala-sbt:8u302_1.5.5_2.13.6
WORKDIR /app
COPY . ./
RUN sbt compile
## Ideally, build the JAR and deploy it to a different runtime image... but just using this one for the moment
ENTRYPOINT ["sbt", "run"]