You need to minimize data transfers during the join ope…

You have an Apache Spark cluster in Azure HDInsight.
You plan to join a large table and a lookup table.
You need to minimize data transfers during the join operation.
What should you do?

You have an Apache Spark cluster in Azure HDInsight.
You plan to join a large table and a lookup table.
You need to minimize data transfers during the join operation.
What should you do?

A.
Use the reduceByKey function.

B.
Use a Broadcast variable.

C.
Repartition the data.

D.
Use the DISK_ONLY storage level.

Explanation:
https://www.dezyre.com/article/top-50-spark-interview-questions-and-answers-for-2017/208



Leave a Reply 0

Your email address will not be published. Required fields are marked *