Introduction to Spark 2 with Python: Lab Setup Instructions (Windows OS: Java 8, Python 3.6, Spark 2.4.6)
Below are the standard requirements for this course. If you have any questions or issues, please contact us.
Important Note: Student lab files are required on each computer used for the course. The links for these are not in this lab setup, and you should receive them separately.
Other notes:
- It’s a good idea to keep downloaded software install files on the machines during the class in case of problems that require a re-install.
- Cloning a setup is generally not a problem. If it is, we’ll mention it in the software section (for example, much of the IBM/RAD-WAS software can be problematic in this regard).
Hardware and classroom setup.
Each student and the instructor shall have a workstation that fulfills the listed requirements.
- Required: Intel-compatible processor (with reasonably recent hardware).
- Memory: 8GB min recommended
- Disk Space: Free disk space for software installs (generally minimum 2GB)
- Operating System: Windows OS (Any modern version - e.g. Windows 10. - labs have not been tested on Windows 8 variants)
- Required: Zip utility. A good free one is 7-zip
- Required: Adobe Acrobat Reader
- Required: One of Firefox browser (https://www.mozilla.org/en-US/firefox/new/) or Chrome browser (https://www.google.com/chrome/). Edge browser is not sufficient.
- Recommended: Internet access
- Recommended: Class machines networked together - allows students to access a shared network directory.
Install 7-zip
We’ve found that there are sometimes problems using the built in Windows archive/zip utility. This generally has to do with long path lengths that it can’t handle. Use 7-zip to extract the labs and any software zips which we’ve found very reliable.
- Can try direct download link for 64-bit install: https://www.7-zip.org/a/7z2301-x64.exe
- If that doesn’t work, go to home page https://www.7-zip.org
- Near the top of the page, find the download link for your bitness (probably 64 bit), and download the installer.
- Execute the installer, and take all the defaults.
- You can now extract zip files by right clicking on them, and selecting 7-Zip | Extract ...
Lab Files: Each student and instructor must have lab files installed (links to these files are generally sent separately via e-mail).
- Extract the lab files to a location conveniently accessible to the student (e.g. C:\ )
- Recommend using utility like 7-zip, not Windows built-in extractor.
- If using folder other than C:\, make sure that students know where they are.
Other instructor requirements for the classroom
- Projector or large screen TV capable of 1280x800 or higher resolution. Instructor must be able to use this to project slides.
- Whiteboard (preferred) or flip charts with markers.
Install Java Development Kit – JDK 1.8 Update 411
- Note that any relatively recent JDK1.8 version is fine.
- Note that you'll need a free Oracle logon to download the JDK.
- From https://www.oracle.com/java/technologies/javase/javase8u211-later-archive-downloads.html find the latest Java 8 release installer file for your OS
- Windows 64 bit: e.g. jdk-8u411-windows-x64.exe (Almost certainly this is the one you want. 32-bit Windows OS installs are now rare).
- Click the link for the installer file, accept the license agreement, enter your login credentials, and download the installer.
-
Run the installer and take all defaults.
- Create or modify environment variables as appropriate for your OS. This will add an environment variable JAVA_HOME, and modify your path to include the jdk bin folder
-
Open a terminal prompt type the below, and press Enter
javac -version
You should get a message that tells you the version. If the command is not found, you did something wrong.
- Close the terminal prompt. You’re done installing Java
PySpark Environment Setup
- Set the following Environment Variables using the standard Windows dialogs
- PYSPARK_PYTHON=C:\Users\[CurrentUserName]\Python36
- Make sure this is appropriate for your environment - this is where our setup says to put the Python install
- SPARK_LABS=C:\spark-labs-python
- SPARK_HOME=%SPARK_LABS%\spark
- KAFKA_HOME=%SPARK_LABS%\kafka
- HADOOP_HOME=%SPARK_LABS%\winutils
- Add the following to the PATH
- %SPARK_HOME%\bin
- %SPARK_HOME%\sbin
- %KAFKA_HOME%\bin\windows
- Add the Visual C++ Redistributable for needed DLLs
- Test the install of the VC++ components
- Open a command shell in C:\spark-labs-python (not a PowerShell) and run the following command
C:\spark-labs>winutils\bin\winutils.exe
- You should NOT get a windows dialog about a missing DLL - if you do, something’s wrong.
- Test/Initialize the pyspark install and the OS path
- Open a command shell in C:\spark-labs-python (not a PowerShell)
- Run the following command
C:\spark-labs-python> pyspark
- The shell should come up cleanly with a >>> prompt.
- Exit the spark shell by typing quit()
- Run the following command in the same command prompt (You can NOT copy paste all of it from here as noted below).
- It should do some downloading of needed jars, then start up cleanly. We do this to make sure the jars are stored locally in case students have issues using the internet.
- Exit the spark shell by typing quit()
- You’re done testing this
Install Cygwin
- Go to install page https://www.cygwin.com/install.html
- This also includes the download link - which we give here: https://www.cygwin.com/setup-x86_64.exe
- Execute the installer, and you will eventually get to Select packages dialog where you can select packages to install (most of Cygwin is not installed in the default install)
- In the Select packages dialog, make sure the “View” drop down in the upper left is set to Category
- Expand the Net category, and find the nc item.
- Click the down arrow associated with the nc item, and select the entry that looks like a version number (e.g. 1.107-4)
- Click Next through all remaining dialogs. When you come to the dialog that asks, create shortcuts on Start and desktop.
Test Cygwin
- To test, start Cygwin from the desktop shortcut.
- Once the window comes up, execute the following commands.
- ——————
- cd C:
- ls
- ——————
- You should get a directory listing of C:\
Install Notepad++
Install Firefox Browser
This specific browser is required for this course. Other browsers may not have the exact capabilities needed for some labs.
- Download and save the installer file.
- Execute the installer - you can take all the defaults in the installation.
- Once installed, start it and make sure it starts up normally.
Install Chrome Browser
This specific browser may be required for this course. Other browsers may not have the exact capabilities needed for some labs.
- Download and save the installer file.
- Save and execute the installer - you can take all the defaults in the installation.
- Once installed, start it and make sure it starts up normally.
- Make it the default browser.