Raven - Blog
October 3, 2022

GoAccess

Posted on October 3, 2022  •  5 minutes  • 962 words  • Other languages:  Français

This blog is my personal digital space of freedom and, when I write articles, sometimes I just want to have an idea of how many people come to read me. That’s why I wanted to have simple and respectful statistics of my readers, and therefore to do what is more commonly called “analytics”.

To collect this statistical information, there are globally two main directions:

Cookies are accurate, they can really identify a user but they require that my website (or a third party, hello Google Analytics 👋) deposits a file (the cookie) on the user’s PC in order to trace his usage of my website. This cookie deposit leads to the notion of collection of consent with the implementation of a cookie acceptance/refusal banner. The famous banners that pollute modern web browsing.

Log analysis**, on the other hand, is less accurate since it is based on the IP address of the visitor. Except that IP addresses can be shared by several people (NAT). Moreover, a person can use several IP addresses (VPN). Finally, it is less easy to distinguish bots from real users. On the other hand, via the log, we can **obtain the “Referer “** Referer, a request header that allows us to know the origin of the user: to know if he arrives on my site by DuckDuckGo, Google, Twitter or any other site that would contain a link to my blog.

Now, the question I’ve been asking myself is: Do I need to have accurate statistics to the nearest visitor for my personal blog? The answer is no, I just want to have an idea of the number of visitors and referrers (where the visitors come to my site).

In my case, here is what I care about:

On the side of free technical solutions that allow analytics oriented log analysis, there is obviously the good old AWStats. But I find that its interface and its functionalities are a bit old. That’s why I’m presenting you today GoAccess ! I have used this solution many times in professional contexts and it is always very practical.

Installation of GoAccess

For my article, I deal with the installation of GoAccess on a Debian server with an Apache web server.

But the software is available on most Linux distributions here. Moreover, it is able to read the logs of almost all types of web servers: “Apache, Nginx, Amazon S3, Elastic Load Balancing, CloudFront, etc.” - Source

Under Debian 11.5, you will get the version 1.4 of GoAccess in the repositories, but nothing prevents you from adding their sources to get the last updated version.

Finally, GoAccess offers the consultation of statistics by web interface (dashboard) and/or command line.

Debian installation :

1
sudo apt-get install goaccess

For Command Line use, you can directly consult the statistics in real time in your terminal with :

1
goaccess /var/log/nginx/access.log -c

Setting up the GoAccess dashboard for Apache

Once the package is installed, it is necessary to configure the /etc/goaccess.conf file. This file is organized in sections with lines to comment/uncomment to enable/disable certain options.

For the Apache web server, you will need to uncomment the following line for the Time Format Options section:

1
time-format %H:%M:%S

For the Date Format Option section, uncomment the following line:

1
date-format %d/%b/%Y

Finally, in the Log Format Option section of the conf file, I’m using Apache with combined VHOST and logs, so I’ll have to uncomment the :

1
log-format %h %^[%d:%t %^] "%r" %s %b

And :

1
log-format COMBINED

Finally, it is possible, as a test or to share an analysis, to launch a parsing on a log and to get an HTML file ready to use or to share. Example with this command:

1
goaccess -f /Datas/hebergements/albatros-info/logs/access.log --log-format=COMBINED > /folder/r4ven.fr.log.html

Before attacking the rest, you can already take a look at your first generated dashboard !

Dashboard GoAccess For privacy reasons, this is a capture of the GoAccess demo.

Automate log analysis and make the dashboard available

For my example in the article, I want to:

  1. In the VHOST of your domain, lock access to the report:
1
2
3
4
5
6
<Files goaccess.html>
   AuthUserFile /foo/bar/domaine.fr/.htpasswd
   AuthName "Username and password required"
   AuthType Basic
   Require valid-user
</Files>

Use htpasswd to manage your authentication via apache.

  1. Add the automation of the dashboard generation in your cron :
1
0 */1 * * * USER goaccess /foo/bar/domaine.fr/logs/access.log -o /foo/bar/domaine.fr/goaccess.html --anonymize-ip --anonymize-level=2

To truncate IP addresses, we use the --anonymize-ip option in GoAccess. Level 2 of this option corresponds to masking of 16 bits in IPv4 and 80 bits in IPv6.

GoAccess dashboard options

A GoAccess analytics report is a simple HTML static file. On this page, you will find a menu on the left allowing you to edit display parameters as well as menus linking you to the different HTML anchors of the dashboard. You can also choose to show or hide certain anchors.

Options Panel Overview

The last word…

To conclude, you should know that GoAccess has multiple options and I urge you to dig into their documentation if you are interested. For example, there is an option to display your real-time statistics. GoAccess will mount a Websocket server to refresh the data constantly, perfect 👌.

Find a live demo of the GoAccess dashboard here: rt.goaccess.io.

There will certainly be many more of you reading this new article, at least I’ll know soon thanks to GoAccess 😉.

Follow me

Subscribe to my RSS feed !