You have an Apache Spark cluster in Azure HDInsight.
You plan to join a large table and a lookup table.
You need to minimize data transfers during the join operation.
What should you do?
A.
Use the reduceByKey function.
B.
Use a Broadcast variable.
C.
Repartition the data.
D.
Use the DISK_ONLY storage level.
Explanation:
https://www.dezyre.com/article/top-50-spark-interview-questions-and-answers-for-2017/208