Introduction to Spark 3 with Scala: Lab Setup Instructions (Linux: Your Environment)
Below are the standard requirements for this course. If you have any questions or issues, please contact us.
Important Note: Student lab files are required on each computer used for the course. The links for these are not in this lab setup, and you should receive them separately.
Other notes:
- It’s a good idea to keep downloaded software install files on the machines during the class in case of problems that require a re-install.
- Cloning a setup is generally not a problem. If it is, we’ll mention it in the software section (for example, much of the IBM/RAD-WAS software can be problematic in this regard).
Linux: Hardware and classroom setup
Each student and the instructor shall have a work environment that fulfills the listed requirements.
- RAM: 8GB recommended
- Disk Space: Free disk space for software installs (5GB is sufficientl)
- Operating System: Linux
- We assume that you know how to set up and administer your Linux system.
- We can briefly review a setup once it's done, but we do not have the resources to set up nor troubleshoot your Linux installations.
- Note that the setup is relatively standard, with standard software packages.
- Any relatively recent Linux system you are comfortable with should work
- It must have the required software and equivalent environment setup.
- Again, we do not have the resources to provide setup or troubleshooting support for other Linux variants. We'll do our best to help if you have questions/problems, but may not have the expertise.
- When installing, consider the following choices.
- Root Password:
- You can use any password you like, as long as whoever needs it (e.g. the instructor or system manager) knows what it is.
- For example, set to password123 if you need something easily accessible
- User Creation:
- Make sure to create a student user that is easily used - tailor this to your environment
- e.g. Create a user student with password of password123
- You can use a different user/password as long as students/instructor are aware of what it is, and can use it where needed.
- Note: For specific environments (e.g. running as a virtual machine under another environment) you may need to do specific setup.
- We assume that you know what you need to do for this, and can't support these many possible environments.
- Recommended: Internet access
- It's best to provide internet access to the student machines.
- If this is not feasible for your environment, please contact us to ensure that everything works.
- Required: Adobe Acrobat Reader
- Required: One of either Firefox browser (https://www.mozilla.org/en-US/firefox/new/) or Chrome browser (https://www.google.com/chrome/).
- Required: An editor for editing lab files (e.g. Java files, or maven POM files).
- If NOT using an IDE, then this should be as capable as possible for your environment
- For example vim is a more capable editor than nano or vi
- If using an IDE (e.g. Eclipse or IntelliJ) which one you use is not important, as long as it's easily available.
Lab Files: Each student and instructor must have lab files installed (links to these files are generally sent separately via e-mail).
- Extract the lab files to a location conveniently accessible to the student (generally the student’s home directory - e.g. /home/student)
- Make sure that students/instructor know where they are and can freely access them.
Other instructor requirements for the classroom
- Capability to display presentation slides or code examples.
- For virtual environments: Generally some type of screen sharing capability.
- For physical in-person classes:
- Projector or large screen TV capable of 1280x800 or higher resolution. Instructor must be able to use this to project slides.
- Whiteboard (preferred) or flip charts with markers.
Install Java Development Kit – JDK 11 (11.0.x)
- Note that any JDK 11 version should work fine. Other close (later) Java versions may work, but have not been tested. Please contact us if you have an issue with using Java 11.
- Removing existing Java and installing Java 11:
- Many recent versions of Linux come pre-installed with Java 11. If you already have Java 11 installed, you can skip this step and can go on to "Find Java install location"
Check if you have Java 11 by opening a terminal window, and typing the following. If you see some variation of the output that indicates you have Java 11 installed, then skip this step.
$ java -version
openjdk version "11.0.13" 2021-10-19 LTS
Otherwise, you should un-install the existing Java install, and install the latest version of Java 11, which we did for our Linux version as follows.
$ sudo yum -y remove java*
$ sudo yum -y install java-11-openjdk-devel
$ sudo alternatives --config java #(select the Java 11 option, usually option '2', then hit enter to save)
$ sudo alternatives --config javac #(select the Java 11 option, usually option '2', then hit enter to save)
- Continue here whether or not you had to install Java 11.
- Find Java Install location.
- Can be found as follows (with sample output from our system)
$ readlink -f $(which java)
/usr/lib/jvm/java-11-openjdk-11.0.13.0.8-1.el8_4.x86_64/bin/java
- On our installation, it was under /usr/lib/jvm/java-11-openjdk-nnnn (nnnn depends on version).
- Edit/save student user's .bash_profile to set JAVA_HOME environment variable pointing to your java install. e.g. in our install, it looked like this.
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-11.0.13.0.8-1.el8_4.x86_64
- Open a terminal window, and test the install, as follows - with sample (first line only) of expected output
$ java -version
openjdk version "11.0.13" 2021-10-19 LTS
$ javac -version
javac 11.0.13
- If this all works, you are done.
Spark-Shell Environment Setup
- Edit/save student user's .bash_profile to set the following environment variables
- Important Note: The below assumes that the lab setup was extracted to $HOME (which should point to the student home directory). If the lab setup was extracted elsewhere, then make sure to set SPARK_LABS to the location consistent with your environment.
export SPARK_LABS=$HOME/spark-labs-scala
export SPARK_HOME=$SPARK_LABS/spark
export KAFKA_HOME=$SPARK_LABS/kafka
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin:$KAFKA_HOME/bin
- This will set up the environment appropriately to run the Spark labs.
- Test the setup by opening a terminal window (logged in as the student user) and running spark-shell. We illustrate this below, with sample output.
$ spark-shell
... Warnings and logging omitted ...
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://mycomputer2574:4040
Spark context available as 'sc' (master = local[*], app id = local-1655209447467).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.0.1
/_/
Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 11.0.5)
Type in expressions to have them evaluated.
Type :help for more information.
scala>
- If you see the above, then you're all done. If you see errors and don't get to the scala> prompt then you have a problem.