When you want to start learning about spark and try to write or run some scripts, you needn’t spent time to search the integration solution about spark environment, you needn’t confuse that you don’t have a linux environment or a cloud service account. Just take the integrated spark environment into your own PC, only one step setup, then you can use it. and even you needn’t to install anything in your PC.
This solution is win-spark-env, You can find it on Github.
In this article, I will show how I use it.
Download from Github. Link to heading
- Copy the root folder Apache under C:¥
Set Up Link to heading
- Open the Command Prompt as administrator, execute the environment_variable_setup.bat under C:¥Apache¥Spark3.3¥tools
Run your spark script Link to heading
- Run the example pyspark script by spark-submit in Command Prompt.
|
|
- Avoid the ouput folder permission problem, Suggest you open the Command Prompt as administrator.
Develop IDE Link to heading
- Start the VSCode as administrator, install Python extension for VSCode.
C:\Apache\Spark3.3\tools\VSCode-win32-x64-1.72.0¥Code.exe
- Import the source folder.
C:\Apache\Spark3.3\source
- Run the example pyspark script in DEBUG model by spark-submit.
Example script result Link to heading
- When executed the example script successfully, the result file will be create under output folder.
Other Link to heading
You can change the root director to avoid permission problem.
- Change the setted environment variable path
- Grep all the files under
C:\Apache\Spark3.3\source
, update the path as you want to change.
Reference Link to heading
You can find more information on Github. If you have any question, you can submit an Issue on it.