which naming scheme would give optimal performance on S3?

If an application is storing hourly log files from thousands of instances from a high traffic
web site, which naming scheme would give optimal performance on S3?

If an application is storing hourly log files from thousands of instances from a high traffic
web site, which naming scheme would give optimal performance on S3?

A.
Sequential

B.
HH-DD-MM-YYYY-log_instanceID

C.
YYYY-MM-DD-HH-log_instanceID

D.
instanceID_log-HH-DD-MM-YYYY

E.
instanceID_log-YYYY-MM-DD-HH



Leave a Reply 11

Your email address will not be published. Required fields are marked *


raysmithvic1978

raysmithvic1978

D

fun4two

fun4two

answer is B instance id always start with ami-xxxx you need to reverse it

Rob

Rob

Thats true B is the correct answer.

Vishal Joshi

Vishal Joshi

D should be the correct answer. The question says “thousands” of instances that means instance ID is more unique and random than hourly value of “HH”. The logging is done hourly so HH would be same for multiple EC2 instances at the same time and that is not what is suggested below:
ttp://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html

The key name would be more random if instance ID is considered as the first part of key name(thousands of instances).

Further HH is unique than DD so D looks correct to me.

Maja

Maja

I resonated the same at first, but since an instance id starts with “i-” I chose the HH- scheme. if the instance id were truly random I would design it like that.

Pits - AWS SA

Pits - AWS SA

Its D. The catch is “thousands” of instances which is surely more than 24 Hours.

So in this case instanceID_log-HH-DD-MM-YYYY

James

James

D.
Agree with the choice of D.
Instance’s ID is unique, then the day within the same month and the same year. By this naming schema, it creates a partition every 24 hours, constructed by the same name (partition key) only for the 24 records within the 24-hours period. In other words, each partition contains 24 records and the performance for queries is optimized if my understanding is correct.

Brian

Brian

Answer should be B (D and E provide the same uniqueness and we can only choose 1 answer) I believe the intent of the questions is “hourly” which is also unique.