java - How reduce phase works after map phase in hadoop -


i reading hadoop framework past few weeks,but not able understand 1 concept. may question foolish,if sorry that. question suppose have create word count program on file long , hence distributed on 3 different datanodes. since map phase running on 3 datanodes create key value pair , after merging performed on map data created 3 datanodes. unable understand next phase. means how merge data distributed along different reduced phase, , how many reduce phase run , how many datanodes run reduce phase.please clear above confusions,because of unable move further in hadoop. sorry foolish question if so. thank you

  1. each of map tasks after processing share of input sorts , merges data, based on compateto() method implementation of map out key class instance. (for example there tree different groups produced a, b , c).
  2. when processing reaches determined phase, each of reduce tasks, based on intermediary data produced map tasks, transfers files interested in (considering interested in group @ moment, transfer files belong group machines which produced these category files).
  3. the reducer performs own sorting , merging aggregated data transferred machines executing map tasks (i.e have files a.1, a.2 , a.3, since each of map tasks independent sorting order aggregated data not guaranteed, sorting applied on aggregated group of files)
  4. the reduce task performs required processing , writes results final location.
  5. the operation repeated each of result groups.

Comments

Popular posts from this blog

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

python Tkinter Capturing keyboard events save as one single string -

sql server - Why does Linq-to-SQL add unnecessary COUNT()? -