Introduction to Spark 3 with Scala: Lab Setup Instructions (Windows OS: Java 11)
Below are the standard requirements for this course. If you have any questions or issues, please contact us.
Important Note: Student lab files are required on each computer used for the course. The links for these are not in this lab setup, and you should receive them separately.
Other notes:
- It’s a good idea to keep downloaded software install files on the machines during the class in case of problems that require a re-install.
- Cloning a setup is generally not a problem. If it is, we’ll mention it in the software section (for example, much of the IBM/RAD-WAS software can be problematic in this regard).
Hardware and classroom setup.
Each student and the instructor shall have a workstation that fulfills the listed requirements.
- Required: Intel-compatible processor (with reasonably recent hardware).
- Memory: 8GB min recommended
- Disk Space: Free disk space for software installs (generally minimum 2GB)
- Operating System: Windows OS (Any modern version - e.g. Windows 10. - labs have not been tested on Windows 8 variants)
- Required: Zip utility. A good free one is 7-zip
- Required: Adobe Acrobat Reader
- Required: One of Firefox browser (https://www.mozilla.org/en-US/firefox/new/) or Chrome browser (https://www.google.com/chrome/). Edge browser is not sufficient.
- Recommended: Internet access
- Recommended: Class machines networked together - allows students to access a shared network directory.
Install 7-zip
We’ve found that there are sometimes problems using the built in Windows archive/zip utility. This generally has to do with long path lengths that it can’t handle. Use 7-zip to extract the labs and any software zips which we’ve found very reliable.
- Can try direct download link for 64-bit install: https://www.7-zip.org/a/7z2301-x64.exe
- If that doesn’t work, go to home page https://www.7-zip.org
- Near the top of the page, find the download link for your bitness (probably 64 bit), and download the installer.
- Execute the installer, and take all the defaults.
- You can now extract zip files by right clicking on them, and selecting 7-Zip | Extract ...
Lab Files: Each student and instructor must have lab files installed (links to these files are generally sent separately via e-mail).
- Extract the lab files to a location conveniently accessible to the student (e.g. C:\ )
- Recommend using utility like 7-zip, not Windows built-in extractor.
- If using folder other than C:\, make sure that students know where they are.
Other instructor requirements for the classroom
- Projector or large screen TV capable of 1280x800 or higher resolution. Instructor must be able to use this to project slides.
- Whiteboard (preferred) or flip charts with markers.
Install Java Development Kit – JDK 11 (11.0.24)
- Note that any JDK 11 version should work fine. Other close (later) Java versions (e.g. Java 12 or 13) should be fine also, but have not been tested.
- Download:
- From https://www.oracle.com/java/technologies/javase/jdk11-archive-downloads.html download the installer file for the Windows x64 Installer
- File name is something like: jdk-11.0.24_windows-x64_bin
-
Run the installer and take all defaults.
- Create or modify environment variables. This will add an environment variable JAVA_HOME, and modify your path to include the jdk bin folder. For windows set the following.
- JAVA_HOME:
- Navigate to the System Properties widget > click the Advanced tab > click the Environment Variables button
- In the bottom half of the dialog, click New to add a new System variable
- Variable name: JAVA_HOME (this is case-sensitive)
- Variable value: C:\Program Files\Java\jdk-11.0.24 (or adjust to the actual path for your JDK version and where you installed the JDK – please double-check this path – probably best to copy and paste it)
- Click OK
- Path:
- Find this existing entry in the bottom half of the Environment Variables button, and click Edit
- Click in the Variable value field and move your cursor all the way to the left (pressing Home on your keyboard should do this quickly for you)
- Check whether the value below is already present, or add it at the beginning if necessary (make sure you get all of this, including the trailing semicolon, with no spaces):
%JAVA_HOME%\bin;
- Click OK repeatedly (likely in 3 different dialogs) until all the dialogs close.
-
Open a terminal prompt, type the below, and press Enter
javac -version
- You should get a message that tells you the version. If the command is not found, you did something wrong.
- Close the terminal prompt. You’re done installing Java
Environment Setup
These instructions assume you’ve extracted the lab setup to C:\, creating a C:\spark-labs folder with the lab files. If you’ve extracted it elsewhere, change the instructions accordingly.
NOTE: If you’ve extracted the lab folder to a folder with a path name longer than C:\spark-labs, you must copy the spark-labs\kafka folder to c:\ (so there is a c:\kafka folder) or to something with a similarly short path name. Otherwise you may break the Kafka install.
- Create the following environment variables
- set SPARK_HOME=c:\spark-labs\spark
- set HADOOP_HOME=c:\spark-labs\winutils
- set KAFKA_HOME=C:\spark-labs\kafka - Or to the correct path if you’ve installed it in a directory with a different but short name as above.
- Add the following to your path
- %SPARK_HOME%\bin
- %SPARK_HOME%\sbin (that’s sbin)
- %KAFKA_HOME%\bin\windows
- Add the Visual C++ Redistributable for needed DLLs
- Test the install of the VC++ components
- Open a command shell in C:\spark\labs (not a PowerShell) and run the following command
C:\spark-labs>winutils\bin\winutils.exe
- You should NOT get a windows dialog about a missing DLL - if you do, something’s wrong.
- Test/Initialize the spark-shell install and the OS path
- Open a command shell in C:\spark-labs (not a PowerShell)
- Run the following command
C:\spark-labs> spark-shell
- The spark shell should come up with a scala> prompt.
- Exit the spark shell by typing Ctrl-D (pressing Ctrl key and d key at same time). It should exit cleanly.
- Run the following command in the same command prompt (You CAN'T copy paste all of it from here as noted below).
- It should do some downloading of needed jars, then start up cleanly. We do this to make sure the jars are stored locally in class students have issues using the internet.
- Exit the spark shell by typing Ctrl-D.
- You’re done testing this
Install Cygwin
- Go to install page https://www.cygwin.com/install.html
- This also includes the download link - which we give here: https://www.cygwin.com/setup-x86_64.exe
- Execute the installer, and you will eventually get to Select packages dialog where you can select packages to install (most of Cygwin is not installed in the default install)
- In the Select packages dialog, make sure the “View” drop down in the upper left is set to Category
- Expand the Net category, and find the nc item.
- Click the down arrow associated with the nc item, and select the entry that looks like a version number (e.g. 1.107-4)
- Click Next through all remaining dialogs. When you come to the dialog that asks, create shortcuts on Start and desktop.
Test Cygwin
- To test, start Cygwin from the desktop shortcut.
- Once the window comes up, execute the following commands.
- ——————
- cd C:
- ls
- ——————
- You should get a directory listing of C:\
Install Notepad++