In the last decade, the cost of storing data on disk has fallen by something like a factor of 10. As a result, the traditional approach of storing pre-defined fields in a database is being supplanted by a “record everything” model.
However, the ability to collect data on such a large scale is a double-edged sword.
As well as having data for many visitors to keep track of, it is also extremely complex. We record something like 70 data points for every page a visitor sees and store several MB of data about a regular user is common.
As a human it’s almost impossible to look at that amount of data and see anything meaningful. A list of 300 URLs a visitor has seen is fairly difficult to take in. Many of the behaviors we want to see might only be expressed very subtly. For example, someone looking for a gift might be indicated by a small change in the type and value of products they consider, but even then you aren’t sure. Lots of the data is only useful in an overall context. Knowing that someone has looked at a t-shirt might indicate they’re interested in buying it. But if they’ve already gone on to buy a different t-shirt, and this was 3 months ago, it’s probably not what they’ve come back for today. Overall it’s a complicated and incomplete picture which is hard to piece back together.
In the end, the person using this data wants to ask quite high level questions: “what is this visitor interested in”? “Are they planning to buy or just browsing”? These are meaningful pieces of information which tell you something about a customer. Consider which of the following descriptions of a visitor to your website you would prefer:
“5:35pm saw homepage, 5:36pm saw black formal shoes, 5:36pm saw brown formal shoes, 5:37pm saw black formal shoes, 5:37pm saw striped socks, 9pm saw black formal shoes, 9:01pm saw brown formal shoes, 9:01pm saw brown formal shoes, 9:02pm saw brown formal shoes”
“This visitor is fairly engaged with your website and is primarily interested in formal shoes, preferably black or brown in a size 8”.
The second one is obviously much more useful, immediately we can see how we can serve this user better, making sure they’re presented with relevant content in a usable way.
The problem is presenting visitor information in this way means knowing how to compress all of that behavioural information down into one answer, taking into account all the different aspects and context contained in the data. This is a difficult problem, which requires building up understand of visitors a layer at a time. We’ve done a lot in this area at Qubit, our Universal Variable model for example brings a lot of structure and context to pages. It allows us to see that most of the products our example user saw were black and brown items in the footwear category.
At Qubit we’re focused on understanding customer behavior and this ability to record huge amounts of data has enabled us to be very flexible in the types of analysis we conduct. It’s enabling us to build products that would be nearly impossible if we had to define a small number of fields in advance. Excitingly, we are inventing a new way to create a comprehensive view of the customer online.