For the most part, I feel like I have squeezed everything out of Kafka in terms of performance and want to focus internally on optimizing my own service. It might be relevant to mention that I am using Apache Kafka, using Kafka Streams with Spring Cloud Streams. Some systems may deal with single threaded workload much better then others. Is there a way in Java 8 (or later) to switch between stream() and parallelStream() intelligently? I considered at one point defining boundaries on the data that would allow for alternating between the two but in the end, not every piece of equipment is designed the same. I have profiled my application a dozen times and have had mixed results. Then there are times where a single thread will accomplish the task much more efficiently. There will be times where the volume is so large, my application would massively benefit from parallelizing the workload. In my particular use case, do to the nature of the application, the velocity and volume of the data I am processing will be all over the place. From what I understand, parallelStream() is a great facility to process entries in parallel but it all comes down to execution time and overhead. This is a question I constantly ask myself when designing a data intensive application: When is it appropriate to use stream() over parallelStream()? Would it make sense to use both? How do I quantify the metrics and conditions to intelligently decide which ones to use at runtime.
0 Comments
Leave a Reply. |