You need a persistent and durable storage to trace call activity of an IVR (Interactive Voice Response) system.
Call duration is mostly in the 2-3 minutes timeframe. Each traced call can be either active or terminated. An
external application needs to know each minute the list of currently active calls, which are usually a few
calls/second. Put once per month there is a periodic peak up to 1000 calls/second for a few hours The system
is open 24/7 and any downtime should be avoided. Historical data is periodically archived to files. Cost saving
is a priority for this project.
What database implementation would better fit this scenario, keeping costs as low as possible?
A.
Use RDS Multi-AZ with two tables, one for -Active calls” and one for -Terminated calls”. In this way the
“Active calls_ table is always small and effective to access.
B.
Use DynamoDB with a “Calls” table and a Global Secondary Index on a “IsActive'” attribute that is present
for active calls only In this way the Global Secondary index is sparse and more effective.
C.
Use DynamoDB with a ‘Calls” table and a Global secondary index on a ‘State” attribute that can equal to
“active” or “terminated” in this way the Global Secondary index can be used for all Items in the table.
D.
Use RDS Multi-AZ with a “CALLS” table and an Indexed “STATE* field that can be equal to ‘ACTIVE” or –
TERMINATED” In this way the SOL query Is optimized by the use of the Index.
Hi,
I would go for answer B, because of the 1000 calls/second and most cost efficient choice.
– RDS would cost more than DynamoDB so that rule-out A and D
– B is more cost effective than C, since the Global Secondary Index only contains active calls
Any ideas on this are welcome.
Thanks,
Frank
See also: https://aws.amazon.com/dynamodb/faqs/
Q: Can a global secondary index key be defined on non-unique attributes?
Yes. Unlike the primary key on a table, a GSI index does not require the indexed attributes to be unique.
Q: Are GSI key attributes required in all items of a DynamoDB table?
No. GSIs are sparse indexes. Unlike the requirement of having a primary key, an item in a DynamoDB table does not have to contain any of the GSI keys. If a GSI key has both hash and range elements, and a table item omits either of them, then that item will not be indexed by the corresponding GSI. In such cases, a GSI can be very useful in efficiently locating items that have an uncommon attribute.
Yes Frank it makes sense. Initially, I opted for C option but later realized that effectiveness is higher for option B and even the option conveys the same.
Definitely B, as the question also talks about archiving historical data to files.
Found below statement in the Dynamo DB FAQ section:
“If your table includes historical data that is infrequently accessed, consider archiving the historical data to Amazon S3, Amazon Glacier or another data store.”
Considering all the documents and logic behind the question I believe Frank is right. B is the answer.
RDS can’t be used for this.
answer is b
I agree with every above. B.
The word “durable” was also mentioned, that takes RDS options out
RDS with Multi-AZ is durable
Thanks Frank for the notes.
Can someone explain what makes B a choice over c?
Looking at the iOPS, I do not believe DynamoDB would be cheaper.
MultiAZ deployment providing durablility suggests D to me?
1. Storage should be Durable and low cost
2. Need only active calls details to application.
3. By tracking only active calls in the Table, it make it simple the table with few values.
So I would go with ” B “.
RDS setup that is able to handle 1000 calls/second is quite expensive! And you’ll have to pay for that all the time even though you need it only few hours once a month. For such peak usage DynamoDB provides better cost.
B is the right option.
B
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForGSI.html#GuidelinesForGSI.SparseIndexes
The idea behind sparse indexes is that only items with isActive = “y” will be in the index, so require less storage and processing than your main table
B
go for B too
RDS expensive and it’s not one-click to scale at peak time, so A,D out
C, when GSI state, remember partition,and definitely the table will pass 10G size and spread into partitions. In usual time, there’s not lots of call but partition still there and throughtput to the whole table will be splitted to all partitions, here “active” and “terminated”, if 12000 througputs at peak time, then each partition 6000, even when we decrease the throughtput after peak time, half of throughtput will be wasted
Surely this can’t be a Solutions Architect Associate question???