STEP1. install java - reference Hadoop Install
STEP2. download scala
cd /tmp
sudo wget http://www.scala-lang.org/files/archive/scala-2.10.4.tgz
sudo mkdir /usr/local/scala
sudo tar xvf scala-2.10.4.tgz -C /usr/local/scala/
vim ~/.bashrc and add scala env.
scala -version
STEP3. Download Spark
su - hduser
cd /usr/local/
sudo wget http://ftp.twaren.net/Unix/Web/apache/spark/spark-1.6.1/spark-1.6.1-bin-hadoop2.6.tgz
tar -xzf spark-1.6.1-bin-hadoop2.6.tgz
chown hduser:hadoop -R spark-1.6.1-bin-hadoop2.6(確認執行者有Spark 資料夾權限,否則會有錯誤訊息)
STEP4. Set Environment Parameter
vim ~/.bashrc as follow
cd /usr/local/spark-1.6.1-bin-hadoop2.6/
./bin/spark-shell
Run example:
val wdsrc = sc.textFile("README.md")
val counts = wdsrc.flatMap(line=>line.split(" ")).map(word=>(word,1)).reduceByKey(_ + _)
counts.saveAsTextFile("hdfs://master:9000/user/hduser/wordcount")
master:9000<=這部分的值請參考你的Hadoop平台中conf file "core-site.xml" 裡面的fs.default.name
如果你是用Cloudera, 預設8020, 如果是Hadoop部署,預設9000
hdfs 上面的結果
留言列表