When you want to start learning about spark and try to write or run some scripts, you needn’t spent time to search the integration solution about spark environment, you needn’t confuse that you don’t have a linux environment or a cloud service account. Just take the integrated spark environment into your own PC, only one step setup, then you can use it. and even you needn’t to install anything in your PC.

This solution is win-spark-env, You can find it on Github.

In this article, I will show how I use it.

Download from Github. Link to heading

  • Copy the root folder Apache under C:¥ save

Set Up Link to heading

  • Open the Command Prompt as administrator, execute the environment_variable_setup.bat under C:¥Apache¥Spark3.3¥tools run

Run your spark script Link to heading

  • Run the example pyspark script by spark-submit in Command Prompt.
1
python C:¥Apache¥Spark3.3¥tools¥spark-3.3.0-bin-hadoop3¥bin¥spark-submit.py C:\Apache\Spark3.3\source\example.py
  • Avoid the ouput folder permission problem, Suggest you open the Command Prompt as administrator. run

Develop IDE Link to heading

  • Start the VSCode as administrator, install Python extension for VSCode. C:\Apache\Spark3.3\tools\VSCode-win32-x64-1.72.0¥Code.exe
  • Import the source folder. C:\Apache\Spark3.3\source
  • Run the example pyspark script in DEBUG model by spark-submit. run

Example script result Link to heading

  • When executed the example script successfully, the result file will be create under output folder. run

Other Link to heading

You can change the root director to avoid permission problem.

  1. Change the setted environment variable path
  2. Grep all the files under C:\Apache\Spark3.3\source, update the path as you want to change.

Reference Link to heading

You can find more information on Github. If you have any question, you can submit an Issue on it.