How to configure Nginx logs to improve GoAccess precision
June 13, 2021A year ago I published a post about how to install GoAccess and I stated
My idea is to have insights (most visited pages, operating systems, browsers and referrals) about the visitors without any client-side code and cookie.
I finally removed Google Analytics tracking code and it's been one month since I am relying only on GoAccess generated analytics.
In this post I share the configuration I am currently using to have the most possible precise data.
GoAccess comes with a lot of built-in filters to remove noise from server logs, but it cannot do all the work alone. It can be helped to work at his best fine tuning the web server logs, in my case Nginx.
What I did, it was to create an ad-hoc log file for GoAccess using Nginx conditional logging features.
After one year of observation of raw logs and some trial and error sessions, I defined an heuristic based on these four rules:
-
My website is generated using Gatsby and all page URLs have the structure
https://elia.contini.page/path/to/the-page/
. All end with/
. -
I am only interested in logged requests that use the
GET
HTTP method. -
I am only interested in logged requests with status code 200.
-
The protocol used is
HTTP/2.0
.
Implementing this heuristic is very simple. I edited my server block adding this code:
map $server_protocol $goAccess_protocol {
HTTP/2.0 1;
default 0;
}
map $status $goAccess_status {
200 $goAccess_protocol;
default 0;
}
map $request_method $goAccess_method {
GET $goAccess_status;
default 0;
}
map $request_uri $goAccess {
~.*/$ $goAccess_method;
default 0;
}
server {
# ...
access_log /var/log/nginx/elia.contini.page-goaccess.log combined if=$goAccess;
}
elia.contini.page-goaccess.log
is the input of GoAccess.
Referral spam
Unfortunately this configuration does not eliminate all the noise, especially the referral spam. Duckducking I found this really useful post.
I added this other rule to my heuristic
# ...
map $http_referer $referral_spam {
default 0;
include /etc/nginx/referral_spam.map;
}
server {
# ...
if ($referral_spam) {
return 444;
}
}
The file /etc/nginx/referral_spam.map
contains lines such as
"~*aoul.top" 1;
"~*namjv.top" 1;
"~*nicolaonline.top" 1;
"~*qwant.com" 1;
"~*sarahonline.top" 1;
"~*seanonline.top" 1;
"~*sloopyjoes.com" 1;
"~*ucablog.top" 1;
"~*uk-events.com" 1;
"~*wphi.top" 1;
The boring thing is that as soon I see a suspect domain in my logs, I have to manually add the domain to the file and restart Nginx.