Install Apache Zeppelin on CDH
Apache Zeppelin is a new and incubating multi-purposed web-based notebook which brings data ingestion, data exploration, visualization, sharing and collaboration features to Hadoop and Spark.
Apache Zeppelin interpreter concept allows any language/data-processing-backend to be plugged into Zeppelin. Currently, Apache Zeppelin supports many interpreters such as Apache Spark, Python, JDBC, Markdown and Shell.
Apache Spark integration
Apache Zeppelin provides built-in Apache Spark integration. You don’t need to build a separate module, plugin or library for it.
Apache Zeppelin with Spark integration provides
- Automatic SparkContext and SQLContext injection
- Runtime jar dependency loading from local filesystem or maven repository.
- Canceling job and displaying its progress
Below are the steps to be followed to achieve this objective –
Step 1: Download Zeppelin
Step 2: Move Zeppelin to /opt
$ sudo mkdir /opt/zeppelin $ sudo tar -xvf <zeppelin.tar.gz> -C /opt/zeppelin
Below General Configuration of Zeppelin are already set in ZEPPELIN_HOME/conf folder just go to Step 3 to change to permission and then start the Zeppelin demon.
General Configuration of Zeppelin
1. In ZEPPELIN_HOME/conf folder duplicate zeppelin-env.sh.template and rename it to zeppelin-env.sh .
$ sudo cp zeppelin-env.sh.template zeppelin-env.sh
2. In ZEPPELIN_HOME/conf folder duplicate zeppelin-site.xml.template and rename it to zeppelin-site.xml.
$ sudo cp zeppelin-site.xml.template zeppelin-site.xml
In the Zeppelin /conf directory go to the zeppelin-env.sh file,
export HADOOP_CONF_DIR=/etc/hadoop/conf # extra classpath. e.g. set classpath for hive-site.xml export ZEPPELIN_INTP_CLASSPATH_OVERRIDES=/etc/hive/conf
Note: Sometimes you may not be able to run the above command. In that case, make all scripts in /bin folder executable with the following command:
Step 3: Change the permission
$ sudo chmod –R 777 /opt/zeppelin
Step 4: Start Zeppelin:
$ cd /opt/zeppelin $ ./bin/zeppelin-daemon.sh start
You can access your notebook at https://localhost:8080
Step 5: Open Zeppelin UI Click on interpreter
The first time you connect to Zeppelin, you’ll land on the main page similar to the below screen capture.
On the left of the page are listed all existing notes. Those notes are stored by default in the $ZEPPELIN_HOME/notebook folder.
You can filter them by name using the input text form. You can also create a new note, refresh the list of existing notes (in case you manually copy them into the $ZEPPELIN_HOME/notebook folder) and import a note.
When clicking on Import Note link, a new dialog open. From there you can import your note from local disk or from a remote location if you provide the URL.
By default, the name of the imported note is the same as the original note but you can override it by providing a new name.
The Notebook menu proposes almost the same features as the note management section in the home page. From the drop-down menu you can:
- Open a selected note
- Filter node by name
- Create a new note
This menu gives you access to settings and displays information about Zeppelin. User name is set to anonymous if you use default shiro configuration. If you want to set up authentification.
In this menu you can:
- Configure existing interpreter instance
- Add/remove interpreter instances
This menu displays all the Zeppelin configuration that are set in the config file
Each Zeppelin note is composed of 1 .. N paragraphs. The note can be viewed as a paragraph container
Each paragraph consists of 2 sections: code section where you put your source code and result section where you can see the result of the code execution.
On the top-right corner of each paragraph there are some commands to:
● execute the paragraph code
● hide/show code section
● hide/show result section
● configure the paragraph
To configure the paragraph, just click on the gear icon:
From this dialog, you can (in descending order):
● find the paragraph id ( 20150924-163507_134879501 )
● control paragraph width
● move the paragraph 1 level up
● move the paragraph 1 level down
● create a new paragraph
● change paragraph title
● show/hide line number in the code section
● disable the run button for this paragraph
● export the current paragraph as an iframe and open the iframe in a new window
● clear the result section
● delete the current paragraph
At the top of the note, you can find a toolbar which exposes command buttons as well as configuration, security and display options.
On the far right is displayed the note name, just click on it to reveal the input form and update it.
In the middle of the toolbar you can find the command buttons:
● execute all the paragraphs sequentially, in their display order
● hide/show code section of all paragraphs
● hide/show result section of all paragraphs
● clear the result section of all paragraphs
● clone the current note
● commit the current node content
● delete the note
● schedule the execution of all paragraph using a CRON syntax
On the right of the note toolbar you can find configuration icons:
● display all the keyboard shortcuts
● configure the interpreters binding to the current note
● configure the note permissions
● switch the node display mode between default, simple and report
It is missing some e.g.
ctrl + / comment whole line ctrl + shift + / in-line comment