Web Log Triage

Apache and Nginx access-log triage for incident response. Combined-log field positions, web-shell URI families, status-code semantics for attacker URIs, scanner User-Agent fingerprints, brute-force success detection patterns. Verify scanner UA defaults against current tool versions before relying for attribution - tool defaults shift over major releases.

Apache combined-log format

%h         %l       %u    %t                          "%r"                             %>s   %O    "%{Referer}i"      "%{User-Agent}i"
192.168.1.1 -        -    [05/Oct/2019:13:17:54 +0100] "GET /scripts/update.php?cmd=ls" 200   244   "-"                "Mozilla/5.0 ..."

Field	Position	Forensic value
`%h`	1	source IP (or hostname if HostnameLookups On)
`%l`	2	identd remote user (almost always `-`)
`%u`	3	HTTP-auth username (populated only if Basic/Digest auth)
`%t`	4	request timestamp with timezone offset
`%r`	5 (between quotes)	full request line: method + URI + protocol
`%>s`	9	final HTTP status (after internal redirects)
`%O`	10	response size in bytes
`%{Referer}i`	11 (between quotes)	referer header
`%{User-Agent}i`	12+ (between quotes)	UA string

Path locations

Distro	Access	Error
Debian / Ubuntu / Kali	`/var/log/apache2/access.log`	`/var/log/apache2/error.log`
RHEL / CentOS / Fedora	`/var/log/httpd/access_log`	`/var/log/httpd/error_log`
Nginx (any)	`/var/log/nginx/access.log`	`/var/log/nginx/error.log`

awk -F'"' '{print $1, $2}' access.log              # split on quotes - field 1 = IP/date, field 2 = request
awk '{print $1}' access.log | sort | uniq -c | sort -nr | head -20   # rank source IPs
awk '$9 == 200 {print}' access.log                 # filter to 200s
awk '$9 == 404 {print $1}' access.log | sort | uniq -c | sort -nr   # 404 volume per IP (enumeration signal)
awk -F'"' '{print $6}' access.log | sort | uniq -c | sort -nr       # rank User-Agents

Web-shell URI families

[T1505.003] Server Software Component: Web Shell

Family	Signature	Examples
Canonical-name shells	known shell filename in URI	`r57.php`, `shell.php`. Other named shells exist (search current threat reports for filename catalogues)
Command-injection query strings	innocuous filename, telltale parameter	`?cmd=`, `?exec=`, `?c=` (read the value: `cmd=ls`/`cmd=id`/`cmd=cat /etc/passwd`)

grep -E '\.(php|jsp|asp|aspx)\?(cmd|exec|c|do|x|run)=' access.log
grep -iE '(r57|webshell)\.(php|asp|aspx|jsp)' access.log

Status-code semantics for attacker URIs

Status	Meaning
`200`	URI resolved + response sent. Size disambiguates: 9 KB to `r57.php` = full shell UI rendered, 220 B = empty handler. A `200` to a `?cmd=` payload does NOT mean RCE happened - cross-check response size against the targeted app’s default page size
`302` POST to login	Classic successful-login signature (redirect to authenticated landing). For DVWA-style login forms
`403`	File exists, server refused. Attackers re-probe with different paths
`404`	File does not exist. High-volume `404` from one IP = directory enumeration (T1083)
`500`	Server error - usually a payload that broke PHP/handler. The request itself is the IOC even though execution failed

Scanner User-Agent fingerprints

Defaults rarely changed in practice. Identifies the tool, NOT the operator (same UA from same IP could be one or many actors).

Tool	UA pattern	ATT&CK
Nikto	`Mozilla/5.00 (Nikto/X.X.X) (Evasions:None) (Test:...)`	T1595.002 Vulnerability Scanning
DirBuster	`DirBuster-X.X-RC1 (http://www.owasp.org/...)`	T1595.003 Wordlist Scanning
Gobuster	`gobuster/X.X.X` (often left default)	T1595.003 Wordlist Scanning
dirsearch	`Mozilla/5.0 (...) dirsearch`	T1595.003 Wordlist Scanning
Hydra (HTTP-form)	`Mozilla/5.0 (Hydra)`	T1110.001 Password Guessing
sqlmap	`sqlmap/X.X.X.X#stable (http://sqlmap.org)`	T1190 Exploit Public-Facing Application
Nmap http-script	`Mozilla/5.0 (compatible; Nmap Scripting Engine; ...)`	T1595.002 Vulnerability Scanning
Masscan	`(none)` (no UA at all)	T1595.001 Scanning IP Blocks
Burp Suite	varies, often spoofed to `Mozilla/5.0`	T1595.002 Vulnerability Scanning

grep -iE 'Nikto|DirBuster|Hydra|sqlmap|Nmap Scripting|gobuster|dirsearch' access.log
grep -E 'User-Agent: -|""$' access.log  # missing or empty UA - Masscan signature

Brute-force success detection

Two distinct patterns depending on endpoint type:

awk '$7 ~ /login\.php$/ && $9 == 302 {print $1, $4}' access.log    # successful logins
awk '$7 ~ /login\.php$/ {print $1, $9}' access.log | sort | uniq -c   # rate per IP per status

success = 302 Found redirect to authenticated landing. Failure = 200 re-render with error.

Training endpoint that returns 200 for both (e.g., DVWA brute panel)

awk '$7 ~ /vulnerabilities\/brute\// {print $10}' access.log | sort | uniq -c | sort -nr   # response-size cluster

both success and fail return 200. Discriminator is response-size. Outlier sizes vs the dominant cluster = candidate successful auths. Cross-reference with application logs and database state to confirm.

Triage workflow

graph TD
    log["access.log"] --> ip["rank source IPs<br/>awk '{print $1}' | sort | uniq -c | sort -nr"]
    ip --> top["top IP by request count<br/>= candidate attacker"]
    top --> ua["read User-Agent for that IP<br/>scanner identification"]
    ua --> uri["read URIs for that IP<br/>web shell? scanner? brute-force?"]
    uri -->|"shell URI hits"| shell["status code disambiguation<br/>response size matters"]
    uri -->|"high 404 volume"| enum["directory enumeration<br/>T1083"]
    uri -->|"login.php repeats"| brute["brute-force pattern check<br/>302 vs 200 size cluster"]

Pitfalls

Filtering only on 200. Recon (404s) and failed-execution (500s) are still attack evidence. Hunt across all status codes.
200 to a ?cmd= payload does not always mean RCE. The targeted app may have returned its default page (size signal disambiguates).
UA spoofing is real but rare. Default UAs are reliable scanner signatures because operators rarely customise them. Custom UAs ARE seen in mature operations - low-confidence attribution from UA alone.
?cmd= in a request line does not always mean a web shell. Some legitimate apps use cmd as a parameter name. Check whether the targeted file is in document root and whether it executes shell-style.
Logs across rotation. access.log.1, access.log.2.gz carry historical data. Use zcat for compressed rotated logs: zcat access.log.*.gz | grep <pattern>.
Time-window discipline. Attacker activity may span days or weeks. Filter by date range early to reduce noise: awk -v s='2024-04-01' -v e='2024-05-01' '$4 ~ s "|" e' access.log.
HostnameLookups Off is the default. Field 1 is an IP, not a hostname. Resolve via dig separately if attribution needs DNS.

links:

Field Manual | Log Triage | Filesystem Metadata | tshark

DFIR-Field-Manual

Explorer

Web Log Triage

Apache combined-log format

Path locations

Web-shell URI families

Status-code semantics for attacker URIs

Scanner User-Agent fingerprints

Brute-force success detection

Training endpoint that returns 200 for both (e.g., DVWA brute panel)

Triage workflow

Pitfalls

links:

Graph View

Table of Contents

Backlinks

DFIR-Field-Manual

Explorer

Web Log Triage

Apache combined-log format

Path locations

Web-shell URI families

Status-code semantics for attacker URIs

Scanner User-Agent fingerprints

Brute-force success detection

Real login form (returns 302 on success)

Training endpoint that returns 200 for both (e.g., DVWA brute panel)

Triage workflow

Pitfalls

links:

Graph View

Table of Contents

Backlinks