If an application is storing hourly log files from thousands of instances from a high traffic
web site, which naming scheme would give optimal performance on S3?
A.
Sequential
B.
HH-DD-MM-YYYY-log_instanceID
C.
YYYY-MM-DD-HH-log_instanceID
D.
instanceID_log-HH-DD-MM-YYYY
E.
instanceID_log-YYYY-MM-DD-HH
Probably D
I agree with D.
Thousands of Instance IDs + Hourly logs seems like the most random sequence option.
Yes, you are correct. You have correct explanation.
I choose C
I think B is the correct choice
The answer should be B. See http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html
B looks correct to me,
http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html
B
Yes B is right option. The main reason is the random prefix and the performance would be higher in this case.
A – Don’t make sense
C – YYYY ( This would be same and would be difficult to achieve good performance)
D & E – The instance Id would be same for the first two characters ( i-)
Agree!
B
D
D
D
D. It seems thousands of keys with same prefix “HH-” in one hour is not an optimized performance case.
D
Even if the first couple characters are “i-“, the first 3-4 characters provides more random
prefix than HH-DD.
D , the random hostname prevents hammering a specific partition, and the HH-DD following hostname is more random than E
B will hammer a partition once per day at HH-DD
A changes i/o pattern, does not apply
C is just as bad as A
E is almost as good as D by YYYY will not be as random as D
D is the answer.
A,B,C are all sequential.
E is less random than D.
C
d
C is still sequential. Ans is D
D
I think the answer is C because it is anticipated that you will tend to search for logs based on date and time for various instances but the word log should be at the end.
The correct answer is B 参见S3性能优化章节 CDE都是原文的反面教材 百分百选B
Anyone who understands how S3 stores data knows that B is the option if you want performance. They key thing to remember here is the more random or changing you can get the prefix to be, the more distributed your objects will be across the stack.
I guess D is correct
NONE of these answers is correct. In order to partition data stored on S3 the key needs to use one or more slashes (/), therefore the best way in this scenario would be to use _log/YYYY/MM/DD/HH (the order of YY, MM, DD, HH essentially doesn’t matter). This would cause the log file from each instance to be written to a different S3 partition because the instance IDs are unique, therefore they would be an effective hash key.
The way these keys (I.E. file names) are written above they would all be written to the same partition in S3, no matter how the names are jumbled as listed. Effectively there is no difference (performance-wise) among the listed options.
NONE of these answers is correct. In order to partition data stored on S3 the key needs to use one or more slashes (/), therefore the best way in this scenario would be to use instanceID_log/YYYY/MM/DD/HH (the order of YY, MM, DD, HH essentially doesn’t matter). This would cause the log file from each instance to be written to a different S3 partition because the instance IDs are unique, therefore they would be an effective hash key.
The way these keys (I.E. file names) are written above they would all be written to the same partition in S3, no matter how the names are jumbled as listed. Effectively there is no difference (performance-wise) among the listed options.
(Had to repost because “instanceID” isn’t displayed.)