how to read HDFS file in spark or Scala
How to read local file using scala is quite simple
If you want to read from hdfs then I would recommend to use spark where
you would have to use sparkContext.
import scala.io.source
object ReadLine {
def main(args:Array[String]) {
if (args.length>0) {
for (line <- Source.fromLine(args(0)).getLine())
println(line)
}
}
but when it come to read the file from hdfs its trickySpark :
If you want to read from hdfs then I would recommend to use spark where
you would have to use sparkContext.
below code will be helpful for reading HDFS file in spark or Scala
simple trick is :
If you don't want to use spark then you should go with BufferedReader or StreamReader or hadoop filesystem api.
for example
val lines = sc.textFile(args(0)) //args(0) should be hdfs:///usr/local/log_data/file1
No Spark
If you don't want to use spark then you should go with BufferedReader or StreamReader or hadoop filesystem api.
for example
val hdfs = FileSystem.get(new URI("hdfs://yourUrl:port/"), new Configuration())
val path = new Path("/path/to/file/")
val stream = hdfs.open(path)
def readLines = Stream.cons(stream.readLine, Stream.continually( stream.readLine))
hope you got how to read HDFS file in spark or Scala
Post a Comment
image video quote pre code