Data Journey
At Proxima Analytics, we prioritise transparency and privacy in all of our data processing and storage practices. This includes following all relevant privacy laws and implementing data minimisation techniques, which involve only collecting and saving data that is necessary and privacy-conscious.
As part of our commitment to transparency, we have created a detailed data journey page to explain the process of collecting and protecting website visitor data using Proxima Analytics. By understanding exactly how our script operates on your website, you can be confident in our commitment to upholding the privacy of your website visitors.
To illustrate the process, let's consider the hypothetical scenario where you have installed the Proxima Analytics script on your website, or enabled one of our plugins within your CMS. The script is then activated and ready to collect privacy-focused website analytics data.
When a user accesses your website using a browser, the Proxima Analytics script begins to gather data in a way that prioritises the privacy of your website visitors. In this scenario, we will use the websites example.com and w3.org as examples. Both of these websites are owned by our paying customers or they are using our self-hosted version of Proxima Analytics.
The Journey Begins
Two users Drew and Taylor
Two users, who we will call Drew and Taylor, are visiting both example.com and w3.org from different locations. When they access these websites, our global content delivery network provider (Bunny CDN) will load the Proxima embedded script, a small piece of JavaScript code, onto their devices. This ensures that the file is quickly loaded from a server located in a city close to the user, with typical load times of around 30 milliseconds.
Once the script is loaded, a pageview request is sent to our EU-hosted servers. This request includes information about the page the user is accessing and the website that referred them. The browser also sends our servers the user's IP address and user-agent, which includes details about the browser and device they are using. Our technology does not use cookies, so Drew and Taylor are not interrupted by any cookie banners.
The Great Firewall of Proxima: Protecting User Data at All Costs
As part of Drew and Taylor's journey, once they make a request to our systems, our DDOS protection from the firewalls of Hetzner and Bunny.net kicks in to protect our services from any potential malicious attacks. This protection is vital to ensure the continued availability and security of our services, but it is also important to us that we do not keep logs of Drew and Taylor's requests or IPs.
On the application level, we have implemented measures to automatically identify and track requests made by bots and crawlers. This is done through the use of heuristics that can distinguish between human and non-human traffic. Once a request has been identified as coming from a bot or crawler, it is automatically filtered out and excluded from our analysis.
Hiding in Plain Sight: Anonymising Drew and Taylor's Data
Upon receiving a request from Drew and Taylor, our servers will parse the information contained in the request. This will include the website they are visiting, the website's unique identifier, the user agent (which contains information about their browser and device), and their IP address. For example, Drew's request may include the following information:
Referrer: duckduckgo.com
Website: w3.org/home
Website ID: 123456
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.146 Safari/537.36
IP Address: 123.456.789.012
Similarly, Taylor's request may include the following information:
Referrer: w3.org
Website: example.com/page
Website ID: 789123
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.146 Safari/537.36
IP Address: 987.654.321.090
To protect the privacy of our users, we do not store raw data such as IP addresses and user agents alongside browsing activity. Instead, we create a unique "signature" to identify users on future visits. This helps to maintain privacy while still allowing us to track and analyse website usage.
To ensure the privacy of our users, we have implemented a system that uses unique strings, called salts, which change on a daily basis. These salts are used to create unique identifiers for each user through the use of the SHA256
algorithm. We also store references to these identifiers for future use, allowing us to monitor users’ visits across different days without linking them to a specific individual.
For example, let's say Drew and Taylor visit our websites on Monday and Tuesday. On Monday, we would use the salt for that day to create a unique identifier for Drew by hashing his IP address, website ID, and user agent. We would then store this identifier, along with the future salts for Tuesday and the rest of the week. On Tuesday, we would do the same thing for Taylor, creating a unique identifier for them using the salt for that day.
This technique allows us to distinguish users across websites, as we cannot track individual users across different websites using these unique identifiers. Additionally, we cannot track users across devices, as our hashing process involves the user agent, which changes when a user switches devices. This ensures that our users' data remains private and secure.
Here is a table showing the hashes generated for Drew and Taylor's visits to example.com and w3.org across a week:
User | Website | Date | Salt | User’s Identifier |
---|---|---|---|---|
Drew | example.com | Monday | abc123 | bdb965548abfc63f1865 |
Drew | example.com | Tuesday | jkl012 | d34d6ff59be6c3077a3 |
Drew | w3.org | Wednesday | stu901 | 4a86a4e4cdaa4e82a8c |
Taylor | example.com | Thursday | DEF456 | 7f9b04aefb7f2547e0a8a |
Taylor | w3.org | Friday | MNO345 | a70ad86b0b50c8f37a9d2 |
These are just examples and are not actual hashes. The actual hashes would be much longer and more complex.
Following the Trail: A Journey Through Data Extraction
Now for Drew and Taylor, we would store the pageviews in our Clickhouse cluster, which is a column-oriented database designed for fast querying and analysing large datasets. Using MaxMind's GeoIP2, we extract the country, region, and city of the user, as well as the visited page, timestamp, and information about the user's device, browser, and operating system.
Here is an example entry for Drew's visit to example.com:
{
userId: "bdb965548abfc63f1865",
timestamp: "2025-06-15 23:10:00",
page: "/hello",
website: "example.com",
referrer: "duckduckgo.com",
device: "Desktop",
browser: "Firefox",
os: "Windows",
country: "France",
region: "Île de France",
city: "Paris"
}
We also save a few entries for Drew's upcoming visits using the upcoming salts:
[
{
timestamp: '2025-06-16 00:00:00',
userId: 'bdb965548abfc63f1861',
},
{
timestamp: '2025-06-17 00:00:00',
userId: 'bdb965548abfc63f1862',
},
];
To extract the device, browser, and operating system (OS) information from the user agent, we use a combination of regular expressions and a pre-built library of known user agent strings. This allows us to accurately determine the type of device, browser, and OS being used by each visitor to the website without sniffing any data from the users.
One of the key features of Proxima Analytics is that it allows website owners to control which data is collected and stored. This includes the ability to turn geolocation data collection on or off, as well as the ability to collect device, browser, and operating system data.
This flexibility is important because different website owners may have different data privacy concerns and regulatory requirements. For example, a website owner may choose to turn off the collection of geolocation data if they are only interested in monitor the overall traffic to their site and do not need to know the location of individual users. Ultimately, Proxima Analytics gives website owners the control they need to comply with data privacy regulations and protect the personal data of their users.
Wrapping up the Data Journey
In this final chapter, we'll explain how we finalise the data journey at Proxima Analytics.
Once we have received a pageview request from a user and extracted the necessary information, we discard the raw data, including the user's IP address and user agent. This is because we believe in the importance of data minimisation and only storing data that is essential and privacy-focused.
To process the stored pageviews, we use cron jobs. A cron job is a task that is automatically executed at a predetermined time or interval. In our case, these cron jobs run regularly to perform various tasks on the stored pageview data.
Using these cron jobs, we are able to create correlations in the data and calculate important metrics, such as bounce rate (the percentage of visitors who leave a website after viewing only one page), returning visitors, and the entry and exit pages for each website. We can also aggregate data on referrers and attribution. By using these cron jobs, we can continuously improve and analyse the data we collect, all while maintaining the privacy of our users.
After reading about our data journey, you may be wondering how we ensure compliance with privacy laws. At Proxima Analytics, we are committed to protecting the privacy of our users and have a dedicated EU-based privacy officer and a team of experienced lawyers to ensure that we are compliant with relevant laws, including the General Data Protection Regulation (GDPR), the ePrivacy Directive (cookie law), the Privacy and Electronic Communications Regulations (PECR), the Children's Online Privacy Protection Act (COPPA), and the California Consumer Privacy Act (CCPA). If you have any questions about how we process data, don't hesitate to reach out to us.
Thank you for trusting Proxima Analytics, Onwards and Upwards.