Apache and Nginx access-log triage for incident response. Combined-log field positions, web-shell URI families, status-code semantics for attacker URIs, scanner User-Agent fingerprints, brute-force success detection patterns. Verify scanner UA defaults against current tool versions before relying for attribution - tool defaults shift over major releases.


Apache combined-log format

%h         %l       %u    %t                          "%r"                             %>s   %O    "%{Referer}i"      "%{User-Agent}i"
192.168.1.1 -        -    [05/Oct/2019:13:17:54 +0100] "GET /scripts/update.php?cmd=ls" 200   244   "-"                "Mozilla/5.0 ..."
FieldPositionForensic value
%h1source IP (or hostname if HostnameLookups On)
%l2identd remote user (almost always -)
%u3HTTP-auth username (populated only if Basic/Digest auth)
%t4request timestamp with timezone offset
%r5 (between quotes)full request line: method + URI + protocol
%>s9final HTTP status (after internal redirects)
%O10response size in bytes
%{Referer}i11 (between quotes)referer header
%{User-Agent}i12+ (between quotes)UA string
Path locations
DistroAccessError
Debian / Ubuntu / Kali/var/log/apache2/access.log/var/log/apache2/error.log
RHEL / CentOS / Fedora/var/log/httpd/access_log/var/log/httpd/error_log
Nginx (any)/var/log/nginx/access.log/var/log/nginx/error.log
awk -F'"' '{print $1, $2}' access.log              # split on quotes - field 1 = IP/date, field 2 = request
awk '{print $1}' access.log | sort | uniq -c | sort -nr | head -20   # rank source IPs
awk '$9 == 200 {print}' access.log                 # filter to 200s
awk '$9 == 404 {print $1}' access.log | sort | uniq -c | sort -nr   # 404 volume per IP (enumeration signal)
awk -F'"' '{print $6}' access.log | sort | uniq -c | sort -nr       # rank User-Agents

Web-shell URI families

[T1505.003] Server Software Component: Web Shell
FamilySignatureExamples
Canonical-name shellsknown shell filename in URIr57.php, shell.php. Other named shells exist (search current threat reports for filename catalogues)
Command-injection query stringsinnocuous filename, telltale parameter?cmd=, ?exec=, ?c= (read the value: cmd=ls/cmd=id/cmd=cat /etc/passwd)
grep -E '\.(php|jsp|asp|aspx)\?(cmd|exec|c|do|x|run)=' access.log
grep -iE '(r57|webshell)\.(php|asp|aspx|jsp)' access.log

Status-code semantics for attacker URIs

StatusMeaning
200URI resolved + response sent. Size disambiguates: 9 KB to r57.php = full shell UI rendered, 220 B = empty handler. A 200 to a ?cmd= payload does NOT mean RCE happened - cross-check response size against the targeted app’s default page size
302 POST to loginClassic successful-login signature (redirect to authenticated landing). For DVWA-style login forms
403File exists, server refused. Attackers re-probe with different paths
404File does not exist. High-volume 404 from one IP = directory enumeration (T1083)
500Server error - usually a payload that broke PHP/handler. The request itself is the IOC even though execution failed

Scanner User-Agent fingerprints

Defaults rarely changed in practice. Identifies the tool, NOT the operator (same UA from same IP could be one or many actors).

ToolUA patternATT&CK
NiktoMozilla/5.00 (Nikto/X.X.X) (Evasions:None) (Test:...)T1595.002 Vulnerability Scanning
DirBusterDirBuster-X.X-RC1 (http://www.owasp.org/...)T1595.003 Wordlist Scanning
Gobustergobuster/X.X.X (often left default)T1595.003 Wordlist Scanning
dirsearchMozilla/5.0 (...) dirsearchT1595.003 Wordlist Scanning
Hydra (HTTP-form)Mozilla/5.0 (Hydra)T1110.001 Password Guessing
sqlmapsqlmap/X.X.X.X#stable (http://sqlmap.org)T1190 Exploit Public-Facing Application
Nmap http-scriptMozilla/5.0 (compatible; Nmap Scripting Engine; ...)T1595.002 Vulnerability Scanning
Masscan(none) (no UA at all)T1595.001 Scanning IP Blocks
Burp Suitevaries, often spoofed to Mozilla/5.0T1595.002 Vulnerability Scanning
grep -iE 'Nikto|DirBuster|Hydra|sqlmap|Nmap Scripting|gobuster|dirsearch' access.log
grep -E 'User-Agent: -|""$' access.log  # missing or empty UA - Masscan signature

Brute-force success detection

Two distinct patterns depending on endpoint type:

Real login form (returns 302 on success)
awk '$7 ~ /login\.php$/ && $9 == 302 {print $1, $4}' access.log    # successful logins
awk '$7 ~ /login\.php$/ {print $1, $9}' access.log | sort | uniq -c   # rate per IP per status

success = 302 Found redirect to authenticated landing. Failure = 200 re-render with error.

Training endpoint that returns 200 for both (e.g., DVWA brute panel)
awk '$7 ~ /vulnerabilities\/brute\// {print $10}' access.log | sort | uniq -c | sort -nr   # response-size cluster

both success and fail return 200. Discriminator is response-size. Outlier sizes vs the dominant cluster = candidate successful auths. Cross-reference with application logs and database state to confirm.


Triage workflow

graph TD
    log["access.log"] --> ip["rank source IPs<br/>awk '{print $1}' | sort | uniq -c | sort -nr"]
    ip --> top["top IP by request count<br/>= candidate attacker"]
    top --> ua["read User-Agent for that IP<br/>scanner identification"]
    ua --> uri["read URIs for that IP<br/>web shell? scanner? brute-force?"]
    uri -->|"shell URI hits"| shell["status code disambiguation<br/>response size matters"]
    uri -->|"high 404 volume"| enum["directory enumeration<br/>T1083"]
    uri -->|"login.php repeats"| brute["brute-force pattern check<br/>302 vs 200 size cluster"]

Pitfalls

  • Filtering only on 200. Recon (404s) and failed-execution (500s) are still attack evidence. Hunt across all status codes.
  • 200 to a ?cmd= payload does not always mean RCE. The targeted app may have returned its default page (size signal disambiguates).
  • UA spoofing is real but rare. Default UAs are reliable scanner signatures because operators rarely customise them. Custom UAs ARE seen in mature operations - low-confidence attribution from UA alone.
  • ?cmd= in a request line does not always mean a web shell. Some legitimate apps use cmd as a parameter name. Check whether the targeted file is in document root and whether it executes shell-style.
  • Logs across rotation. access.log.1, access.log.2.gz carry historical data. Use zcat for compressed rotated logs: zcat access.log.*.gz | grep <pattern>.
  • Time-window discipline. Attacker activity may span days or weeks. Filter by date range early to reduce noise: awk -v s='2024-04-01' -v e='2024-05-01' '$4 ~ s "|" e' access.log.
  • HostnameLookups Off is the default. Field 1 is an IP, not a hostname. Resolve via dig separately if attribution needs DNS.

Field Manual | Log Triage | Filesystem Metadata | tshark