This test is to evaluate the candidate’s technical content understanding, conceptualizing and writing skills. The content writing profile requires the reading of technical research papers, understanding and summarizing them. At the end of this page, 5 research papers are provided. You need to generate the answers and summary of these 5 papers as instructed below. In addition to these 5 given research papers, you need to search 3 more papers from Google which are recently published within the last 3 years. So, a total of 8 research paper summary within 700-800 words needs to be submitted along with answers to the below questions for each paper. Kindly avoid plagiarism too.

There are a few rules to make reading easy and fast. You have to take these points into consideration during the test.

  1. while reading the research paper, the following questions should be answered in 1-2 sentences:

    • what is the problem targeted in the paper

    • what is the key contribution in terms of the solution to the problem statement

    • what is the outcome of the work contributed in the paper

    • The idea conception of other existing works. Any common attribute or extension of methodology or common dataset etc.

  2. Critically summarize the answers to these questions into 2-3 lines and link them with the previous paper’s similar summary

  3. In the end, summarize the whole work carried out from the perspective of the existing research gaps

Ways to analyze the paper and answer these questions quickly

Normally, a research paper consists of many sections like abstract, introduction, related work/background, proposed work, results and conclusion. You need to analyze only these sections to get the answers to the above questions:

  1. Abstract: Normally, can answer the targeted problem in the paper

  2. The last two paragraphs of the introduction: highlight the major contribution

  3. Conclusion: The paper outcome or results achieved.


  1. Paper title: A heuristic method toward deadline-aware energy-efficient MapReduce scheduling problems in Hadoop YARN

    • Issue: i)energy consumption optimization is ignored ii)scheduling for slot-based hadoop framework was proposed, not for the container-based like hadoop yarn

    • Key contribution: i) deadline aware energy efficient MR scheduling algorithm for Hadoop yarn scheduler. ii) scheduling problem presented as the Nonlinear integer programming problem iii) scheduling is presented as the binary indexed problem iv) objective is energy efficiency and constraint is the deadline.

    • outcome: i) compared with default schedulers ii) saved 35% more energy than default hadoop schedulers

    • relation with other paper: NA

    • Summary of the paper: the work in the paper [1] has focused on improving the Hadoop yarn scheduler such that energy efficiency can be improved by fulfilling the deadline constraint of the tasks. A 35% more reduction has been observed in the energy consumption than the default hadoop schedulers.

  2. Paper title: New Scheduling Algorithms for Improving Performance and Resource Utilization in Hadoop YARN Clusters

    • Issue: inefficient scheduler which ignored the dependency of the tasks. This creates issues in heterogeneous tasks in the real environment

    • Key contribution: i) scheduler reduces the total execution time of the tasks. ii) presented scheduling uses the information of the available resources, requested resources and tasks dependencies iii) heterogeneous jobs are processed

    • outcome: i) proposed scheduler HaSTE-A has presented the efficient solution for the iterative tasks

    • relation with other paper: NA

    • Summary of the paper: the work in [2] presented the scheduler solution for the heterogeneous tasks and efficiently reduced the total execution time of the jobs.

Taxonomy of the Final Summary

After the completion of the above steps, a brief short summary of the final literature review will be required like this:

There are many scheduling algorithms that address the main issues of MapReduce scheduling with different techniques and approaches. As it has already been mentioned, some of these algorithms have been focused on improving data locality and some aim to provide synchronization processing. Also, many of these algorithms have been designed to decrease the completion time. LATE , SAMR , CREST , LARTS , Maestro and Matchmaking algorithms have focused on data locality. What follows is a brief description of some of the most important algorithms: in Longest Approximate Time to End (LATE) scheduler, backup tasks are used for the tasks that have a longer remaining execution time. LATE uses a set of fixed weights to estimate the remaining execution time. This scheduler tries to identify the slow-running tasks, and once identified, sends them to another node for execution. If this node is able to perform the task faster, then the system performance increases. The advantage of this method is the calculation of the remaining execution time of the task, together with the calculation of the rate of job progress, which leads to an increase in the rate of system response. In contrast, one of the disadvantages of LATE is that the task selection for re-execution is carried out incorrectly in some cases, which is due to the wrong calculation of the remaining execution time of the task. As a result, Chen et al. recommended Self-Adaptive MapReduce (SAMR) scheduling algorithm, inspired by LATE scheduling algorithm. In this algorithm, the history of job executions is used to calculate the remaining execution time more accurately.

5 Research Papers for Summary

Paper 1

Paper 2

Paper 3

Paper 4

Paper 5

Paper 6

You need to find the paper on Google. The topic is Tasks scheduling in Hadoop yarn.

Paper 7

You need to find the paper on Google. The topic is Tasks scheduling in Hadoop yarn.

Paper 8

You need to find the paper on Google. The topic is Tasks scheduling in Hadoop yarn.