Apache and Nginx access-log triage for incident response. Combined-log field positions, web-shell URI families, status-code semantics for attacker URIs, scanner User-Agent fingerprints, brute-force success detection patterns. Verify scanner UA defaults against current tool versions before relying for attribution - tool defaults shift over major releases.
Apache combined-log format
%h %l %u %t "%r" %>s %O "%{Referer}i" "%{User-Agent}i"
192.168.1.1 - - [05/Oct/2019:13:17:54 +0100] "GET /scripts/update.php?cmd=ls" 200 244 "-" "Mozilla/5.0 ..."
| Field | Position | Forensic value |
|---|---|---|
%h | 1 | source IP (or hostname if HostnameLookups On) |
%l | 2 | identd remote user (almost always -) |
%u | 3 | HTTP-auth username (populated only if Basic/Digest auth) |
%t | 4 | request timestamp with timezone offset |
%r | 5 (between quotes) | full request line: method + URI + protocol |
%>s | 9 | final HTTP status (after internal redirects) |
%O | 10 | response size in bytes |
%{Referer}i | 11 (between quotes) | referer header |
%{User-Agent}i | 12+ (between quotes) | UA string |
Path locations
| Distro | Access | Error |
|---|---|---|
| Debian / Ubuntu / Kali | /var/log/apache2/access.log | /var/log/apache2/error.log |
| RHEL / CentOS / Fedora | /var/log/httpd/access_log | /var/log/httpd/error_log |
| Nginx (any) | /var/log/nginx/access.log | /var/log/nginx/error.log |
awk -F'"' '{print $1, $2}' access.log # split on quotes - field 1 = IP/date, field 2 = request
awk '{print $1}' access.log | sort | uniq -c | sort -nr | head -20 # rank source IPs
awk '$9 == 200 {print}' access.log # filter to 200s
awk '$9 == 404 {print $1}' access.log | sort | uniq -c | sort -nr # 404 volume per IP (enumeration signal)
awk -F'"' '{print $6}' access.log | sort | uniq -c | sort -nr # rank User-AgentsWeb-shell URI families
[T1505.003] Server Software Component: Web Shell
| Family | Signature | Examples |
|---|---|---|
| Canonical-name shells | known shell filename in URI | r57.php, shell.php. Other named shells exist (search current threat reports for filename catalogues) |
| Command-injection query strings | innocuous filename, telltale parameter | ?cmd=, ?exec=, ?c= (read the value: cmd=ls/cmd=id/cmd=cat /etc/passwd) |
grep -E '\.(php|jsp|asp|aspx)\?(cmd|exec|c|do|x|run)=' access.log
grep -iE '(r57|webshell)\.(php|asp|aspx|jsp)' access.logStatus-code semantics for attacker URIs
| Status | Meaning |
|---|---|
200 | URI resolved + response sent. Size disambiguates: 9 KB to r57.php = full shell UI rendered, 220 B = empty handler. A 200 to a ?cmd= payload does NOT mean RCE happened - cross-check response size against the targeted app’s default page size |
302 POST to login | Classic successful-login signature (redirect to authenticated landing). For DVWA-style login forms |
403 | File exists, server refused. Attackers re-probe with different paths |
404 | File does not exist. High-volume 404 from one IP = directory enumeration (T1083) |
500 | Server error - usually a payload that broke PHP/handler. The request itself is the IOC even though execution failed |
Scanner User-Agent fingerprints
Defaults rarely changed in practice. Identifies the tool, NOT the operator (same UA from same IP could be one or many actors).
| Tool | UA pattern | ATT&CK |
|---|---|---|
| Nikto | Mozilla/5.00 (Nikto/X.X.X) (Evasions:None) (Test:...) | T1595.002 Vulnerability Scanning |
| DirBuster | DirBuster-X.X-RC1 (http://www.owasp.org/...) | T1595.003 Wordlist Scanning |
| Gobuster | gobuster/X.X.X (often left default) | T1595.003 Wordlist Scanning |
| dirsearch | Mozilla/5.0 (...) dirsearch | T1595.003 Wordlist Scanning |
| Hydra (HTTP-form) | Mozilla/5.0 (Hydra) | T1110.001 Password Guessing |
| sqlmap | sqlmap/X.X.X.X#stable (http://sqlmap.org) | T1190 Exploit Public-Facing Application |
| Nmap http-script | Mozilla/5.0 (compatible; Nmap Scripting Engine; ...) | T1595.002 Vulnerability Scanning |
| Masscan | (none) (no UA at all) | T1595.001 Scanning IP Blocks |
| Burp Suite | varies, often spoofed to Mozilla/5.0 | T1595.002 Vulnerability Scanning |
grep -iE 'Nikto|DirBuster|Hydra|sqlmap|Nmap Scripting|gobuster|dirsearch' access.log
grep -E 'User-Agent: -|""$' access.log # missing or empty UA - Masscan signatureBrute-force success detection
Two distinct patterns depending on endpoint type:
Real login form (returns 302 on success)
awk '$7 ~ /login\.php$/ && $9 == 302 {print $1, $4}' access.log # successful logins
awk '$7 ~ /login\.php$/ {print $1, $9}' access.log | sort | uniq -c # rate per IP per statussuccess = 302 Found redirect to authenticated landing. Failure = 200 re-render with error.
Training endpoint that returns 200 for both (e.g., DVWA brute panel)
awk '$7 ~ /vulnerabilities\/brute\// {print $10}' access.log | sort | uniq -c | sort -nr # response-size clusterboth success and fail return 200. Discriminator is response-size. Outlier sizes vs the dominant cluster = candidate successful auths. Cross-reference with application logs and database state to confirm.
Triage workflow
graph TD log["access.log"] --> ip["rank source IPs<br/>awk '{print $1}' | sort | uniq -c | sort -nr"] ip --> top["top IP by request count<br/>= candidate attacker"] top --> ua["read User-Agent for that IP<br/>scanner identification"] ua --> uri["read URIs for that IP<br/>web shell? scanner? brute-force?"] uri -->|"shell URI hits"| shell["status code disambiguation<br/>response size matters"] uri -->|"high 404 volume"| enum["directory enumeration<br/>T1083"] uri -->|"login.php repeats"| brute["brute-force pattern check<br/>302 vs 200 size cluster"]
Pitfalls
- Filtering only on
200. Recon (404s) and failed-execution (500s) are still attack evidence. Hunt across all status codes. 200to a?cmd=payload does not always mean RCE. The targeted app may have returned its default page (size signal disambiguates).- UA spoofing is real but rare. Default UAs are reliable scanner signatures because operators rarely customise them. Custom UAs ARE seen in mature operations - low-confidence attribution from UA alone.
?cmd=in a request line does not always mean a web shell. Some legitimate apps usecmdas a parameter name. Check whether the targeted file is in document root and whether it executes shell-style.- Logs across rotation.
access.log.1,access.log.2.gzcarry historical data. Usezcatfor compressed rotated logs:zcat access.log.*.gz | grep <pattern>. - Time-window discipline. Attacker activity may span days or weeks. Filter by date range early to reduce noise:
awk -v s='2024-04-01' -v e='2024-05-01' '$4 ~ s "|" e' access.log. - HostnameLookups Off is the default. Field 1 is an IP, not a hostname. Resolve via
digseparately if attribution needs DNS.