If an application is storing hourly log files from thousands of instances from a high traffic web site, which naming scheme
would give optimal performance on S3?
A.
Sequential
B.
instanceID_log-HH-DD-MM-YYYY
C.
instanceID_log-YYYY-MM-DD-HH
D.
HH-DD-MM-YYYY-log_instanceID
E.
YYYY-MM-DD-HH-log_instanceID
Correct answer is B or C, But, I’d use C.
Because instanceID is is more uniformly distributed string than YYYY or HH or sequential.
C
The correct answer is D, HH will give some randomness to start with instead of “instaneId” where the first always character starts with ‘i'(i-1234567890abcdef0 or i-057c9415db5ca3158)
But here the first 17 characters will not change for all PUT operations in any given hour, which will put a lot of pressure on a single partition. Therefore option B should be the correct answer: Here only the first 2 characters (‘i-‘ at the start of the instance id) will not change over time.
It shoud D, “hourly log files” 😉
D is correct. HH is the randomness cause it changes hourly.