[GitHub] [commons-lang] YuyuZha0 commented on issue #443: Optimize string split methods: 1. Use ThreadLocal to make reuse of th…

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] [commons-lang] YuyuZha0 commented on issue #443: Optimize string split methods: 1. Use ThreadLocal to make reuse of th…

GitBox
YuyuZha0 commented on issue #443: Optimize string split methods: 1. Use ThreadLocal to make reuse of th…
URL: https://github.com/apache/commons-lang/pull/443#issuecomment-524597564
 
 
   @kinow Thanks for the carefully reviewing ! Nice weekend, isn't it? I will edit the code later follow your advice. Currently I was on the performance, I've tried more cases, here is the result:
   ```
   Benchmark                                   (arrayLen)  Mode  Cnt     Score    Error  Units
   StringSplitBenchmark.testCommonsLang3Split          10  avgt   25   499.547 ± 10.716  ns/op
   StringSplitBenchmark.testCommonsLang3Split          30  avgt   25  1502.510 ± 16.956  ns/op
   StringSplitBenchmark.testCommonsLang3Split          50  avgt   25  2467.303 ± 18.970  ns/op
   StringSplitBenchmark.testFastSplitUtils             10  avgt   25   396.252 ±  4.653  ns/op
   StringSplitBenchmark.testFastSplitUtils             30  avgt   25  1145.600 ±  5.604  ns/op
   StringSplitBenchmark.testFastSplitUtils             50  avgt   25  1885.414 ±  4.121  ns/op
   StringSplitBenchmark.testGuavaSplit                 10  avgt   25   565.904 ±  5.483  ns/op
   StringSplitBenchmark.testGuavaSplit                 30  avgt   25  1665.049 ± 81.051  ns/op
   StringSplitBenchmark.testGuavaSplit                 50  avgt   25  2758.394 ±  7.684  ns/op
   ```
   Cases is shown bellow:
   ```
   import com.google.common.base.Splitter;
   import org.apache.commons.lang3.StringUtils;
   import org.openjdk.jmh.annotations.*;
   
   import java.util.ArrayList;
   import java.util.List;
   import java.util.concurrent.ThreadLocalRandom;
   import java.util.concurrent.TimeUnit;
   import java.util.function.Supplier;
   
   /**
   *
    * @author zhaoyuyu
    * @since 2019-08-21
    **/
   @OutputTimeUnit(TimeUnit.NANOSECONDS)
   @BenchmarkMode(Mode.AverageTime)
   @Warmup(iterations = 5, time = 5)
   @Measurement(iterations = 5, time = 5)
   public class StringSplitBenchmark {
   
       private static final char separator = '@';
       private static final Splitter splitter = Splitter.on(separator);
   
   
       @Benchmark
       public String[] testCommonsLang3Split(StringSupplier stringSupplier) {
           return StringUtils.splitPreserveAllTokens(stringSupplier.get(), separator);
       }
   
       @Benchmark
       public String[] testFastSplitUtils(StringSupplier stringSupplier) {
           return FastSplitUtils.splitPreserveAllTokens(stringSupplier.get(), separator);
       }
   
       @Benchmark
       public String[] testGuavaSplit(StringSupplier supplier) {
           return splitter.splitToList(supplier.get()).toArray(new String[0]);
       }
   
   
       @State(Scope.Thread)
       public static class StringSupplier implements Supplier<String> {
   
           @Param({"10", "30", "50"})
           private int arrayLen;
   
           private String[] array;
           private int index = 0;
   
           @Setup
           public void setup() {
   
               List<String> list = new ArrayList<>(1000);
               ThreadLocalRandom random = ThreadLocalRandom.current();
               for (int i = 0; i < 1000; i++) {
                   String s = StringUtils.join(
                           random.ints(arrayLen).toArray(),
                           separator
                   );
                   list.add(s);
               }
               this.array = list.toArray(new String[0]);
           }
   
           @Override
           public String get() {
               if (index >= array.length)
                   index = 0;
               return array[index++];
           }
       }
   }
   
   ```
   The reason why I propose this optimization is that sometimes these methods are really under heavily usage(In my case, I use splitPreserveAllTokens for log processing, the method would be called **billions** of times every day). So for me, the performance is really important.
   
   The StringUtils is widely used, any edition must be cautiously, so I fully understand you warring. In computer science, everything would be a trade off, it's really a hard choice.  

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services