performance - Bottleneck in NiFi workflow caused by Kafka -


i creating data ingest workflow in apache nifi, using kafka buffering system. have 3 node cluster set running same workflow, , each node has 4 cores.

i rely on several instances of moving data , different kafka topics, , slowest part of workflow, , being inconsistent in terms of performance, 2 identical tests can have 100% duration increase.

our publish , consume kafka processors running on 3 nodes, , our kafka topics have 3 partitions accross 3 brokers.

does have idea of cause inconsistency , mitigate , speed workflow?

the single biggest performance improvement design flow have fewer flow files many messages per flow file, rather many flow files 1 message each.

it hard how use-case because don't know flow format of data or doing each message, lets pretend have csv data...the goal have 1 flow file many lines of csv, rather 1 flow file per line of csv.

on publishing side, when send flow file publishkafka_0_10, set message demarcator property new-line (using shift+enter) , stream each line of csv kafka.

on consuming, if set message demarcator, write many messages 1 flow file, maximum of max poll records.

in addition, can try tuning concurrent tasks of each processor (found on scheduling tab) in order more publishing or consuming in parallel. there not benefit increasing concurrent tasks on consuming side since have 3 partitions , 3 nifi nodes, have thread per partition, if had 6 partitions , 3 nifi nodes might benefit having 2 concurrent tasks.


Comments

Popular posts from this blog

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

python Tkinter Capturing keyboard events save as one single string -

sql server - Why does Linq-to-SQL add unnecessary COUNT()? -