What is the best property to recommend adding to the Hi…

A company named Fabrikam, Inc. has a Microsoft Azure web app. Billions of users visit the app daily.
The web app logs all user activity by using text files in Azure Blob storage. Each day, approximately 200 GB of
text files are created.
Fabrikam uses the log files from an Apache Hadoop cluster on Azure HDInsight.You need to recommend a solution to optimize the storage of the log files for later Hive use.
What is the best property to recommend adding to the Hive table definition to achieve the goal? More than one
answer choice may achieve the goal. Select the BEST answer.

A company named Fabrikam, Inc. has a Microsoft Azure web app. Billions of users visit the app daily.
The web app logs all user activity by using text files in Azure Blob storage. Each day, approximately 200 GB of
text files are created.
Fabrikam uses the log files from an Apache Hadoop cluster on Azure HDInsight.You need to recommend a solution to optimize the storage of the log files for later Hive use.
What is the best property to recommend adding to the Hive table definition to achieve the goal? More than one
answer choice may achieve the goal. Select the BEST answer.

A.
STORED AS RCFILE

B.
STORED AS GZIP

C.
STORED AS ORC

D.
STORED AS TEXTFILE

Explanation:
The Optimized Row Columnar (ORC) file format provides a highly efficient way to store Hive data. It was
designed to overcome limitations of the other Hive file formats. Using ORC files improves performance when
Hive is reading, writing, and processing data.
Compared with RCFile format, for example, ORC file format has many advantages such as:
a single file as the output of each task, which reduces the NameNode’s load
Hive type support including datetime, decimal, and the complex types (struct, list, map, and union)
light-weight indexes stored within the file
skip row groups that don’t pass predicate filtering
seek to a given row
block-mode compression based on data type
run-length encoding for integer columns
dictionary encoding for string columns
concurrent reads of the same file using separate RecordReaders
ability to split files without scanning for markers
bound the amount of memory needed for reading or writing
metadata stored using Protocol Buffers, which allows addition and removal of fields
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC#LanguageManualORCORCFileFormat



Leave a Reply 0

Your email address will not be published. Required fields are marked *