Differences
This shows you the differences between two versions of the page.
projects:caaws [2019/12/18 23:54] |
projects:caaws [2019/12/18 23:54] (current) |
||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ~~NOCACHE~~ | ||
+ | ~~NOTOC~~ | ||
+ | ====== Data Replication For AWS Spot Market ====== | ||
+ | |||
+ | This is a framework for geo-diverse data services hosted on EC2 spot instances. Spot instances implement market | ||
+ | driven pricing for spare resources within Amazon's data centers. On average, they can be 78\% cheaper than | ||
+ | instances provided with fixed, on-demand pricing. However, it is challenging to serve data from spot instances | ||
+ | because their prices can change every hour. Instances hosting critical data can be suspended with little warning | ||
+ | when spot prices change. We studied a trace of spot prices provided by Amazon and observed that prices change | ||
+ | more than 300 times per month. Further, the relative cost of spot instances in different regions changes more | ||
+ | frequently. Naively migrating data to sites with low cots would incur prohibitive bandwidth costs. Consistent | ||
+ | hashing, a widely used approach for data replication, would also incur significant migration costs. Thus, it is not | ||
+ | tailored to geo-diverse settings where latency aware placement is needed. | ||
+ | |||
+ | Our cost-aware data replication framework uses online data replication to reduce migration costs and make wise | ||
+ | decisions regarding price volatility. The key insight is that price volatility and non-uniform access rates magnify the | ||
+ | cost for poor replication policies on popular data. If we target these heavy hitters, by predicting them and carefully | ||
+ | allocating resources, we can significantly reduce the total cost. We have implemented our framework using novel | ||
+ | intra- and inter-region data management policies. When considering data replication across regions, the framework | ||
+ | forecasts the price at each site and replicate data to sites that combined to yield low cost. Such replication decisions | ||
+ | are made online, i.e., when data is created (after a short profiling period), and thus avoids overhead by moving data | ||
+ | frequently in response to the changing price. The framework manages the intra-region data replication to meet the | ||
+ | dynamic workload. We built a 0+1 raid scheme to spawn new spot instances for workload peaks. That is, we | ||
+ | maintain a service mirror in an on-demand instance in case the spot bid fails. We have evaluated our framework at a small | ||
+ | scale, using up to 10 spot instances and 1 on-demand instance. The results show that, compared to consistent | ||
+ | hashing, our approach reduces cost by 80\% while increasing response time by less than 5\%. All queries are served | ||
+ | without failure reported. | ||
+ | |||
+ | |||
+ | For more information, please check [[http://pacs.ece.ohio-state.edu/proposal/aws_proposal.pdf|AWS Proposal]] or email [[http://www2.ece.ohio-state.edu/~xuz/|Zichen Xu]]. | ||