Finding The Median In Large Sets Of Numbers Split Across N Servers using zeromq and nodejs (experimental)

###Finding The Median In Large Sets Of Numbers Split Across N Servers using zeromq and nodejs (experimental) Build Status Coverage Status

  • It takes a data and distributes the data equally to workers;
  • When StatsCollector’s getMedian is called, sends SORT message to sort data on workers as first step,
  • After sort operation confirmed for all workers, master sends GET_MEDIAN message to get median for each worker and stores median of medians. This value is likely to be the median of our data set.
  • After this step the binary search approach is applied to find exact median.
    • As a first step of this approach, the median estimation which is median of medians which are gathered from workers, will be used as a mid value in binary search. By collecting the values which are upper and lower than the estimated median, I updated the estimated median in order to equalize the counts of upper and lower values.
    • This step works recursively and I converge to the exact median.
    • The recursive step is that the master sends GET_LOWER_UPPER_COUNTS message to get lower and upper counts regarding to estimated median.

Improvements - Could be improve design by decouple from ZeroMQ to provide extensibility (e.g MPI). - Dynamically manage worker size and data distribution to workers and continuous data processing (streaming) - Could be implement multi-core processing using cluster on worker nodes to improve performance

Known issues - It needs refactoring to support duplicate data handling - It needs design refactoring


###Install Dependencies

On Windows

 npm install

On Linux

 sudo npm install

### Commands

Start App

//Start Workers up to size that determined in config file (for example:3)
node main.js --role='WORKER'
node main.js --role='WORKER'
node main.js --role='WORKER'

//Start Master
node main.js --role='MASTER'


 npm test


 npm run test-cov


 npm run lint

Related Repositories



⭐️ 给变量起名的事情上,为你生命省 3s (Save 3 seconds of your life when naming things.) ...



O Cerebro é um projeto com o propósito de disseminar o conhecimento. ...



Central de conteúdo do projeto CEREBRO. ...



Open alerting platform over Graphite (timeseries) and Seyren (scheduling). ...




Top Contributors