Advanced RDD Actions reduce() action reduce(func) action is used for aggregating the elements of a regular RDD. The fucntion should be commutative (changing the order of the operands does …
Read More »Tag Archives: Apache spark beginner
PySpark RDD Example
PySpark RDD Example Hello, in this post we will do 2 short examples, we will use reducebykey and sortbykey. Rdd = sc.parallelize([(1,2), (3,4), (3,6), (4,5)]) # Apply reduceByKey() operation on …
Read More »Introduction to PySpark RDD
Introduction to PySpark RDD In this chapter, we will start with RDDs which are Spark’s core abstraction for working with data. What is RDD RDD = Resilient Distributed Datasets …
Read More »Introduction to Big Data analysis with Spark
Hello, we’ll be introducing Spark in this series of articles. Spark can also be developed with many programming languages. We will use python in our series of articles. Introduction to …
Read More »