I made a log analyzer which does currently use the SHELL which works fine. Now I wanted to replace it with native Swift code but the result is really slow.
Here is an example with about 100 MB real log data. It needs just a second via SHELL but more than 80 seconds with Swift code.
Is there a way to improve the Swift code?
In SHELL I use
% time (cd /regTest/logs;cat access.log access.log.1 | grep '26\/Apr' | egrep -v '(www.apple.com/go/applebot | www.bing.com/bingbot.htm | www.googlebot.com/bot.html | Xing Bot)' | awk '/GET/ {print $1}' | sort -n | uniq 1>/dev/null)
1,09s user 0,05s system 105% cpu 1,081 total
Here is the result of my Swift test:
% time ./regTest
Found 1813 lines.
82,54s user 0,26s system 99% cpu 1:22,83 total
My Swift sample code
import Foundation
import RegexBuilder
guard let fullText = try? String(contentsOf: URL(filePath: "/regTest/logs/access.log")) + String(contentsOf: URL(filePath: "/regTest/logs/access.log.1")) else {
print("Cannot read files!")
exit(1)
}
let yesterdayRegEx = Regex {
Capture {
"26/Apr"
}
}
let botListRegEx = Regex {
Capture {
ChoiceOf {
"www.apple.com/go/applebot"
"www.bing.com/bingbot.htm"
"www.googlebot.com/bot.html"
"Xing Bot"
}
}
}
let dateMatch = fullText.split(separator: "\n")
.filter{ $0.firstMatch(of: yesterdayRegEx) != nil }
.filter{ $0.firstMatch(of: botListRegEx) == nil }
print("Found \(dateMatch.count) lines.")