Apache Hadoop is a framework that allows distributed processing of large datasets.
The Hadoop SmartMachine includes the following components:
- hadoop
- hbase
- hcatalog
- hive
- pig
- templeton
To learn more about Hadoop, see the Hadoop Documentation.
In this topic:
Release notes for this image:
Hadoop SmartMachine Release Notes
Provisioning a Hadoop SmartMachine
Since Hadoop runs on Java, you should provision your Hadoop SmartMachine with a comfortable amount of memory. For a stand-alone machine, 4GB should be enough. If your machine is part of a cluster, you should use 8GB or more.
Logging into your Hadoop SmartMachine
The Hadoop SmartMachine is configured with the two standard accounts: root and admin. You can use SmartLogin to log into your account using the keys in your my.joyentcloud.com account. Both accounts also have generated passwords that you can see in the Credentials section of the machine's detail page.
Log into your Hadoop SmartMachine the same way you log into a standard SmartMachine:
or
When you log in to your Hadoop SmartMachine for the first time, it is a good idea to bring the pkgsrc repository up to date and to upgrade the installed packages.
Location of Hadoop Files
| Item | Location |
|---|---|
| executables (hadoop, hbase, pig, etc) | /opt/local/bin |
| shell scripts (start-all.sh, hadoop-create-user.sh, etc) | /opt/local/sbin |
| configuration files | /opt/local/etc/hadoop /opt/local/etc/hbase /opt/local/etc/hcatalog /opt/local/etc/hive /opt/local/etc/pig /opt/local/etc/templeton |
| examples | /opt/local/share/hadoop /opt/local/share/hbase /opt/local/share/hcatalog /opt/local/share/hive /opt/local/base/pig /opt/local/share/templeton |
Environment Variables
Some of the Hadoop tools rely on the JAVA_HOME environment variable to be set. This environment variable is set automatically when you run java and in /opt/local/etc/hadoop/hadoop-env.sh.
If you need to set it yourself, you can do so like this:
The HADOOP_HOME variable is set by /opt/local/etc/hadoop/hadoop-env.sh relative to the startup scripts.
Versions Installed
| hadoop | |
| hbase | |
| hcatalog | 0.4.0 |
| hive | 0.9.0 |
| pig | |
| templeton | 0.1.0 |
Installed Packages
The Hadoop SmartMachine is based on the SmartMachine Base 1.8.4 image, using the http://pkgsrc.joyent.com/sdc6/2012Q2/x86_64/All repository.
For a detailed list of every package installed with this image, click here.
Documentation
| hadoop | Hadoop Documentation |
| hbase | The Apache HBase‚Ñ¢ Reference Guide |
| hcatalog | HCatalog Documentation |
| hive | Apache Hive Wiki |
| pig | Pig Documentation |
| templeton | Templeton Documentation |