java - How reduce phase works after map phase in hadoop -
i reading hadoop framework past few weeks,but not able understand 1 concept. may question foolish,if sorry that. question suppose have create word count program on file long , hence distributed on 3 different datanodes. since map phase running on 3 datanodes create key value pair , after merging performed on map data created 3 datanodes. unable understand next phase. means how merge data distributed along different reduced phase, , how many reduce phase run , how many datanodes run reduce phase.please clear above confusions,because of unable move further in hadoop. sorry foolish question if so. thank you
- each of map tasks after processing share of input sorts , merges data, based on
compateto()
method implementation of map out key class instance. (for example there tree different groups produced a, b , c). - when processing reaches determined phase, each of reduce tasks, based on intermediary data produced map tasks, transfers files interested in (considering interested in group @ moment, transfer files belong group machines which produced these category files).
- the reducer performs own sorting , merging aggregated data transferred machines executing map tasks (i.e have files a.1, a.2 , a.3, since each of map tasks independent sorting order aggregated data not guaranteed, sorting applied on aggregated group of files)
- the reduce task performs required processing , writes results final location.
- the operation repeated each of result groups.
Comments
Post a Comment