将src/main/java改名为src/main/scala
修改pom.xml
pom.xml
4.0.0 com.example sparkTest1.0-SNAPSHOT org.scala-lang scala-library2.11.8 org.apache.spark spark-core_2.123.1.2 org.apache.hadoop hadoop-client3.3.0
引入scala依赖
File->Project Structure
这样scala目录就可以创建scala文件了。
wordCount
package com.example import org.apache.spark.{SparkConf, SparkContext} object wordCount { def main(args: Array[String]): Unit = { val conf=new SparkConf() conf.setAppName("MyFirstSparkApplication") conf.setMaster("yarn") val sc=new SparkContext(conf) val data=sc.textFile(args(0)) val words=data.flatMap(_.split("n")).flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).collect() words foreach println } }
IDEA打包
File->Project Structure
集群上运行
spark-submit --master yarn --name wordCount --class com.example.wordCount hdfs:///sparkTest.jar /user/hadoop/input
其中/user/hadoop/input为hdfs路径。
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)