If an application is storing hourly log files from thousands of instances from a high traffic
web site, which naming scheme would give optimal performance on S3?
A.
Sequential
B.
HH-DD-MM-YYYY-log_instanceID
C.
YYYY-MM-DD-HH-log_instanceID
D.
instanceID_log-HH-DD-MM-YYYY
E.
instanceID_log-YYYY-MM-DD-HH
D
B
answer is B instance id always start with ami-xxxx you need to reverse it
B naming convention will allow the key name to change more often, distributing the objects over more partitions. See: http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html
Thats true B is the correct answer.
B
D should be the correct answer. The question says “thousands” of instances that means instance ID is more unique and random than hourly value of “HH”. The logging is done hourly so HH would be same for multiple EC2 instances at the same time and that is not what is suggested below:
ttp://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html
The key name would be more random if instance ID is considered as the first part of key name(thousands of instances).
Further HH is unique than DD so D looks correct to me.
I resonated the same at first, but since an instance id starts with “i-” I chose the HH- scheme. if the instance id were truly random I would design it like that.
Its D. The catch is “thousands” of instances which is surely more than 24 Hours.
So in this case instanceID_log-HH-DD-MM-YYYY
D.
Agree with the choice of D.
Instance’s ID is unique, then the day within the same month and the same year. By this naming schema, it creates a partition every 24 hours, constructed by the same name (partition key) only for the 24 records within the 24-hours period. In other words, each partition contains 24 records and the performance for queries is optimized if my understanding is correct.
Answer should be B (D and E provide the same uniqueness and we can only choose 1 answer) I believe the intent of the questions is “hourly” which is also unique.