Site icon IT Tutorial

Advanced RDD Actions

Advanced RDD Actions

 

reduce() action

 

saveAsTextFile() action

 

countByKey() action

 

collectAsMap() action

 

Example

CountingBykeys

Rdd = sc.parallelize([(1, 2), (3, 4), (3, 6), (4, 5)])

total = Rdd.countByKey()

# What is the type of total?
print("The type of total is", type(total))

# Iterate over the total and print the output
for k, v in total.items():
  print("key", k, "has", v, "counts")

 

Create a base RDD and transform

Here are the brief steps for writing the word counting program:

 

You can dowload here, http://www.gutenberg.org/ebooks/100

baseRDD = sc.textFile("100-0.txt")

# Split the lines of baseRDD into words
splitRDD = baseRDD.flatMap(lambda x: x.split())

# Count the total number of words
print("Total number of words in splitRDD:", splitRDD.count())

 

for word in resultRDD.take(10):
   print(word)

# Swap the keys and values
resultRDD_swap = resultRDD.map(lambda x: (x[1], x[0]))

# Sort the keys in descending order
resultRDD_swap_sort = resultRDD_swap.sortByKey(ascending=False)

# Show the top 10 most frequent words and their frequencies
for word in resultRDD_swap_sort.take(10):
   print("{} has {} counts". format(word[1], word[0]))

See you in the next article..

Exit mobile version