Following test shows the performance difference between regex based String.ReplaceAll() , and manual replacement based on String.indexOf() and String.substring() methods.
A utility class to record time
public class TimerUtil {
public static void runTask(String msg, Runnable task) {
long startTime = getTimeElapsed(0);
task.run();
System.out.printf("%s time taken: %s%n", msg, timeToString(getTimeElapsed(startTime)));
}
private static long getTimeElapsed(long startTime) {
return System.nanoTime() - startTime;
}
public static String timeToString(long nanos) {
Optional<TimeUnit> first = Stream.of(DAYS, HOURS, MINUTES, SECONDS, MILLISECONDS,
MICROSECONDS).filter(u -> u.convert(nanos, NANOSECONDS) > 0)
.findFirst();
TimeUnit unit = first.isPresent() ? first.get() : NANOSECONDS;
double value = (double) nanos / NANOSECONDS.convert(1, unit);
return String.format("%.4g %s", value, unit.name().toLowerCase());
}
}
Regex Performance Test
We are going to repeat the following test three times (the first time will always be slower because of the cold start time):
public class RegexPerformanceTest {
public static void main(String[] args) {
String str = getString();
for (int i = 0; i < 3; i++) {
TimerUtil.runTask("regex replace",
() -> {
String result = str.replaceAll("\\n+", " ");
// System.out.println(result);
});
TimerUtil.runTask("manual replace",
() -> {
String result = manualReplace(str, "\n", " ");
//System.out.println(result);
});
System.out.println("-----");
}
}
private static String getString() {
String rv = "";
for (int i = 0; i < 10000; i++) {
rv += "test string \n ends.. ";
}
return "'" + rv + "'";
}
private static String manualReplace(String input, String toReplace, String replaceWith) {
int i = input.indexOf(toReplace);
while (i != -1) {
input = input.substring(0, i) + replaceWith + input.substring(i + toReplace.length());
i = input.indexOf(toReplace, i + replaceWith.length());
}
return input;
}
} Outputregex replace time taken: 14.09 milliseconds manual replace time taken: 2.371 seconds ----- regex replace time taken: 9.498 milliseconds manual replace time taken: 2.406 seconds ----- regex replace time taken: 2.184 milliseconds manual replace time taken: 2.360 seconds -----
As seen above in all three iterations, manual replacement is of 'seconds' magnitude, whereas regex is of 'milliseconds' magnitude.
System info:
Following is the system details where the above test ran:
OS Name Microsoft Windows 10 Home
Version 10.0.15063 Build 15063
Other OS Description Not Available
OS Manufacturer Microsoft Corporation
System Name XXXX
System Manufacturer Micro-Star International
System Model GP72 2QE
System Type x64-based PC
System SKU To be filled by O.E.M.
Processor Intel(R) Core(TM) i7-5700HQ CPU @ 2.70GHz, 2701 Mhz, 4 Core(s), 8 Logical Processor(s)
BIOS Version/Date American Megatrends Inc. E1793IMS.108, 6/11/2015
SMBIOS Version 2.8
Embedded Controller Version 255.255
BIOS Mode UEFI
BaseBoard Manufacturer Micro-Star International Co., Ltd.
BaseBoard Model Not Available
BaseBoard Name Base Board
Platform Role Mobile
Secure Boot State On
PCR7 Configuration Binding Not Possible
Windows Directory C:\WINDOWS
System Directory C:\WINDOWS\system32
Boot Device \Device\HarddiskVolume3
Locale United States
Hardware Abstraction Layer Version = "10.0.15063.502"
User Name XXXX
Time Zone Central Standard Time
Installed Physical Memory (RAM) 16.0 GB
Total Physical Memory 15.9 GB
Available Physical Memory 4.12 GB
Total Virtual Memory 31.8 GB
Available Virtual Memory 10.7 GB
Page File Space 15.9 GB
Page File C:\pagefile.sys
Device Encryption Support Reasons for failed automatic device encryption: TPM is not usable, PCR7 binding is not supported, Hardware Security Test Interface failed and device is not InstantGo, Un-allowed DMA capable bus/device(s) detected, TPM is not usable
Hyper-V - VM Monitor Mode Extensions Yes
Hyper-V - Second Level Address Translation Extensions Yes
Hyper-V - Virtualization Enabled in Firmware Yes
Hyper-V - Data Execution Protection Yes
Conclusion
Regex definitely performs better than String based operations. Java regex engine uses efficient algorithms for finding matches, whereas String.substring creates a new copy of the original String on each call which comparatively performs less if invoked repeatedly.
Example ProjectDependencies and Technologies Used:
|