Yahoo! Grid Lab IIT Madras

Research


Generate MapReduce :

GMR is a new programming framework that extends MapReduce to support iterative and recursive computations. It introduces a new Generate abstraction into the MapReduce framework that enables expressing recursive computations. The runtime also supports a distributed communication model using shared data structures. Thus, GMR allows modeling of complex computations from different domains like Artificial Intelligence, Clustering Algorithms, Compute Intensive Scientific Workflows, Optimization Algorithms, etc. which were difficult to express using MapReduce.


Cloud Bursting :

Today, batch processing frameworks like Hadoop MapReduce are difficult to scale to multiple clouds due to latencies involved in inter-cloud data transfer. BStream is a new cloud bursting framework that couples stream-processing in the external cloud (EC) with Hadoop in the internal cloud (IC) to realize inter-cloud MapReduce. Stream processing in EC enables pipelined uploading, processing and downloading of data, thereby minimizing network latencies. Thus, BStream scales Hadoop MapReduce to multiple clouds.


Skew Handling in Map Reduce :

A Stragglers in distributed programming framework like MapReduce has become a common phenomenon now a days. It highly decreases resource utilization and more importantly increases job completion time. Skew in job is one major contributor to straggler creation. Every job in MapReduce framework consists of several tasks, which runs parallely in the cluster custom writing paper. Completion of job depends upon the slowest running task. These slow running tasks are designated as straggler tasks. Skew in tasks mainly arise due to imbalance in amount of input being processed or expensive records being processed by some tasks or both. Other then skew, several external factors like heterogeneity in the cluster, issues with the machine or interference from other tasks also contribute to creation of straggler tasks. Work is being done to mitigate skew by dynamically increasing parallelism of a skewed tasks, thereby maitaining good load balance for each task and also increasing resource utilization.


Improving Genetic Algorithms using Generate MapReduce