Close

Performance difference between Java Regex and String based replace operations

[Last Updated: Nov 21, 2017]

Following test shows the performance difference between regex based String.ReplaceAll(), and manual replacement based on String.indexOf() and String.substring() methods.

A utility class to record time

public class TimerUtil {

  public static void runTask(String msg, Runnable task) {
      long startTime = getTimeElapsed(0);
      task.run();
      System.out.printf("%s time taken: %s%n", msg, timeToString(getTimeElapsed(startTime)));
  }

  private static long getTimeElapsed(long startTime) {
      return System.nanoTime() - startTime;
  }

  public static String timeToString(long nanos) {
      Optional<TimeUnit> first = Stream.of(DAYS, HOURS, MINUTES, SECONDS, MILLISECONDS,
              MICROSECONDS).filter(u -> u.convert(nanos, NANOSECONDS) > 0)
                                       .findFirst();
      TimeUnit unit = first.isPresent() ? first.get() : NANOSECONDS;
      double value = (double) nanos / NANOSECONDS.convert(1, unit);
      return String.format("%.4g %s", value, unit.name().toLowerCase());
  }
}

Regex Performance Test

We are going to repeat the following test three times (the first time will always be slower because of the cold start time):

public class RegexPerformanceTest {

  public static void main(String[] args) {
      String str = getString();
      for (int i = 0; i < 3; i++) {
          TimerUtil.runTask("regex replace",
                  () -> {
                      String result = str.replaceAll("\\n+", " ");
                      // System.out.println(result);
                  });
          TimerUtil.runTask("manual replace",
                  () -> {
                      String result = manualReplace(str, "\n", " ");
                      //System.out.println(result);
                  });
          System.out.println("-----");
      }
  }

  private static String getString() {
      String rv = "";
      for (int i = 0; i < 10000; i++) {
          rv += "test string \n ends.. ";
      }
      return "'" + rv + "'";
  }

  private static String manualReplace(String input, String toReplace, String replaceWith) {
      int i = input.indexOf(toReplace);
      while (i != -1) {
          input = input.substring(0, i) + replaceWith + input.substring(i + toReplace.length());
          i = input.indexOf(toReplace, i + replaceWith.length());
      }
      return input;
  }
}

Output

regex replace time taken: 14.09 milliseconds
manual replace time taken: 2.371 seconds
-----
regex replace time taken: 9.498 milliseconds
manual replace time taken: 2.406 seconds
-----
regex replace time taken: 2.184 milliseconds
manual replace time taken: 2.360 seconds
-----

As seen above in all three iterations, manual replacement is of 'seconds' magnitude, whereas regex is of 'milliseconds' magnitude.

System info:

Following is the system details where the above test ran:

OS Name	Microsoft Windows 10 Home
Version	10.0.15063 Build 15063
Other OS Description 	Not Available
OS Manufacturer	Microsoft Corporation
System Name	XXXX
System Manufacturer	Micro-Star International
System Model	GP72 2QE
System Type	x64-based PC
System SKU	To be filled by O.E.M.
Processor	Intel(R) Core(TM) i7-5700HQ CPU @ 2.70GHz, 2701 Mhz, 4 Core(s), 8 Logical Processor(s)
BIOS Version/Date	American Megatrends Inc. E1793IMS.108, 6/11/2015
SMBIOS Version	2.8
Embedded Controller Version	255.255
BIOS Mode	UEFI
BaseBoard Manufacturer	Micro-Star International Co., Ltd.
BaseBoard Model	Not Available
BaseBoard Name	Base Board
Platform Role	Mobile
Secure Boot State	On
PCR7 Configuration	Binding Not Possible
Windows Directory	C:\WINDOWS
System Directory	C:\WINDOWS\system32
Boot Device	\Device\HarddiskVolume3
Locale	United States
Hardware Abstraction Layer	Version = "10.0.15063.502"
User Name	XXXX
Time Zone	Central Standard Time
Installed Physical Memory (RAM)	16.0 GB
Total Physical Memory	15.9 GB
Available Physical Memory	4.12 GB
Total Virtual Memory	31.8 GB
Available Virtual Memory	10.7 GB
Page File Space	15.9 GB
Page File	C:\pagefile.sys
Device Encryption Support	Reasons for failed automatic device encryption: TPM is not usable, PCR7 binding is not supported, Hardware Security Test Interface failed and device is not InstantGo, Un-allowed DMA capable bus/device(s) detected, TPM is not usable
Hyper-V - VM Monitor Mode Extensions	Yes
Hyper-V - Second Level Address Translation Extensions	Yes
Hyper-V - Virtualization Enabled in Firmware	Yes
Hyper-V - Data Execution Protection	Yes

Conclusion

Regex definitely performs better than String based operations. Java regex engine uses efficient algorithms for finding matches, whereas String.substring creates a new copy of the original String on each call which comparatively performs less if invoked repeatedly.

Example Project

Dependencies and Technologies Used:

  • JDK 1.8
  • Maven 3.3.9

Java Regex String Replacement Performance Test Select All Download
  • regex-replace-all-performance
    • src
      • main
        • java
          • com
            • logicbig
              • example
                • RegexPerformanceTest.java

    See Also